# t-test
## theory
    > main target: find mean differences bwtween two classes. Null hypothesis: no significant difference

## scipy implementation
```doc
ss.ttest_ind(
    a,
    b,
    axis=0,
    equal_var=True,
    nan_policy='propagate',
    permutations=None,
    random_state=None,
    alternative='two-sided',
    trim=0,
)
Docstring:
Calculate the T-test for the means of *two independent* samples of scores.

This is a two-sided test for the null hypothesis that 2 independent samples
have identical average (expected) values. This test assumes that the
populations have identical variances by default.

Parameters
----------
a, b : array_like
    The arrays must have the same shape, except in the dimension
    corresponding to `axis` (the first, by default).
axis : int or None, optional
    Axis along which to compute test. If None, compute over the whole
    arrays, `a`, and `b`.
equal_var : bool, optional
    If True (default), perform a standard independent 2 sample test
    that assumes equal population variances [1]_.
    If False, perform Welch's t-test, which does not assume equal
    population variance [2]_.

nan_policy : {'propagate', 'raise', 'omit'}, optional
    Defines how to handle when input contains nan.
    The following options are available (default is 'propagate'):

      * 'propagate': returns nan
      * 'raise': throws an error
      * 'omit': performs the calculations ignoring nan values

    The 'omit' option is not currently available for permutation tests or
    one-sided asympyotic tests.

permutations : non-negative int, np.inf, or None (default), optional
    If 0 or None (default), use the t-distribution to calculate p-values.
    Otherwise, `permutations` is  the number of random permutations that
    will be used to estimate p-values using a permutation test. If
    `permutations` equals or exceeds the number of distinct partitions of
    the pooled data, an exact test is performed instead (i.e. each
    distinct partition is used exactly once). See Notes for details.


random_state : {None, int, `numpy.random.Generator`,
        `numpy.random.RandomState`}, optional

    If `seed` is None (or `np.random`), the `numpy.random.RandomState`
    singleton is used.
    If `seed` is an int, a new ``RandomState`` instance is used,
    seeded with `seed`.
    If `seed` is already a ``Generator`` or ``RandomState`` instance then
    that instance is used.

    Pseudorandom number generator state used to generate permutations
    (used only when `permutations` is not None).

alternative : {'two-sided', 'less', 'greater'}, optional
    Defines the alternative hypothesis.
    The following options are available (default is 'two-sided'):

      * 'two-sided'
      * 'less': one-sided
      * 'greater': one-sided

trim : float, optional
    If nonzero, performs a trimmed (Yuen's) t-test.
    Defines the fraction of elements to be trimmed from each end of the
    input samples. If 0 (default), no elements will be trimmed from either
    side. The number of trimmed elements from each tail is the floor of the
    trim times the number of elements. Valid range is [0, .5).

```

In [18]:
from sklearn.datasets import load_iris
import scipy.stats as ss
import numpy as np

data_iris = load_iris()
data_iris.keys()

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])

In [28]:
iris_f = data_iris["feature_names"]
iris_data = data_iris["data"][:,:3]
iris_target = data_iris.target
iris_f[:3]

['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)']

In [29]:
from collections import Counter

c = Counter(iris_target)
c.most_common()

[(0, 50), (1, 50), (2, 50)]

In [31]:
class0 = []
class1 = []
class2 = []
for idx,(feature, target) in enumerate(zip(iris_data,iris_target)):
    if target == 0:
        class0.append(feature)
    elif target == 1:
        class1.append(feature)
    else:
        class2.append(feature)
class0 = np.array(class0, dtype=np.float64)
class1 = np.array(class1, dtype=np.float64)
class2 = np.array(class2, dtype=np.float64)

In [33]:
# t-test for mean between class-0 and class-1
ss.ttest_ind(class0,class1,axis=0,equal_var=False)

Ttest_indResult(statistic=array([-10.52098627,   9.45497585, -39.49271939]), pvalue=array([3.74674261e-17, 2.48422790e-15, 9.93443296e-46]))

In [34]:
# t-test for mean between class-1 and class-2
ss.ttest_ind(class1,class2,axis=0,equal_var=False)

Ttest_indResult(statistic=array([ -5.62916526,  -3.20576075, -12.60377944]), pvalue=array([1.86614439e-07, 1.81948348e-03, 4.90028753e-22]))

In [35]:
# t-test for mean between class-0 and class-2
ss.ttest_ind(class0,class2,axis=0,equal_var=False)

Ttest_indResult(statistic=array([-15.38619582,   6.45034909, -49.98618626]), pvalue=array([3.96686727e-25, 4.57077142e-09, 9.26962759e-50]))

## notification
> 1. due to limitation of dimensionality, t-test will only work on 1-d data, which means that it could only calculate p value one feature at a time.  <br>
> 2. This means we assume that features are independent to each other. Instead of modeling all features as one single compound probability density function, we simply divided each variable under the assumption that they are independent to each other, and compound pdf are the product of each marginal pdf

# scatter matrix and FDR
> this definition will use variance(second order central distance) and mean(first order original distance)
<br>
> within-class scatter matrix
$$
    \mathbf{S_{\omega}} = \displaystyle\sum_{i=1}^{M}{p_i\Sigma_i}
$$
where
$p_i = \frac{n_i}{N}$ is the proportion of data of certain class to the whole data set, and $M$ is the total number of classes <br>
and $\Sigma_i = E[(x-\mu_i){(x-\mu_i)}^T]$ is the covariance matrix of ith-class

> between-class scatter matrix 
$$
   \mathbf{S_b} = \displaystyle\sum_{i=1}^{M}{p_i(\mu_i-\mu_0){(\mu_i-\mu_0)}^T}
$$
where
$\mu_0 = \displaystyle\sum_{i=1}^{M}{p_i\mu_i}$ is the global mean matrix <br>

> mixture scatter matrix
$$
    \mathbf{S_m} = E[(x-\mu_0){(x-\mu_0)}^T]
$$

> we could easily proof
$$
    \mathbf{S_m} =  \mathbf{S_{\omega}} +  \mathbf{S_b}
$$

> as we can see, in two class classification, $| \mathbf{S_{\omega}}|$ is proportional to ${\sigma_1}^2 + {\sigma_2}^2$ amd $| \mathbf{S_{b}}|$ is proportional to ${(\mu_1 - \mu_2)}^2$ <br>
> thus we could define FDR(Fisher's Discriminant Ratio)
$$
    FDR = \displaystyle\sum_{i}^{M}\displaystyle\sum_{j \neq i}^{M}\frac{{(\mu_1 - \mu_2)}^2}{{\sigma_1}^2 + {\sigma_2}^2}
$$

which could be easier to memo
$$
    FDR \sim \frac{|\mathbf{S_b}|}{|\mathbf{S_{\omega}}|}
$$