### <i>Exercise: Take the code from the Examples section of the scipy stats documentation for independent samples t-tests, add it to your own notebook and add explain how it works using MarkDown cells and code comments. Improve it in any way you think it could be improved.<i/>

Initializing a Random Generator, and setting it to the variable rng. It is similar to the RandomState method:

In [1]:
from scipy import stats
from scipy.stats import norm
import numpy as np
from numpy.random import default_rng
rng = np.random.default_rng()

Test two samples with identical means. 'loc' specifies the mean. 'scale' specifies the standard deviation. 'random_state' is using an instance of the 'rng' Generator. 'norm' is an instance of the 'rv_continuous' class and inherits the rvs (random variates) method, which returns a random sample with probability equal to the distribution. The 'size' argument specifies the number of elements in the sample:

In [None]:
rvs1 = stats.norm.rvs(loc=5, scale=10, size=500, random_state=rng)
rvs2 = stats.norm.rvs(loc=5, scale=10, size=500, random_state=rng)
# 'ttest_ind' calculates the T-test for the means of the 2 independent samples:
stats.ttest_ind(rvs1, rvs2)
#The resulting statistic and pvalue after the code is run:
Ttest_indResult(statistic=-0.4390847099199348, pvalue=0.6606952038870015)
# 'equal_var' set to false performs Welchs T-test which doesn't assume theres an equal population variance:
stats.ttest_ind(rvs1, rvs2, equal_var=False)
Ttest_indResult(statistic=-0.4390847099199348, pvalue=0.6606952553131064)

In [5]:
rvs1 = stats.norm.rvs(loc=5, scale=10, size=500, random_state=rng)
rvs2 = stats.norm.rvs(loc=5, scale=10, size=500, random_state=rng)
# 'ttest_ind' calculates the T-test for the means of the 2 independent samples:
stats.ttest_ind(rvs1, rvs2)
#print(stats.ttest_ind(rvs1, rvs2)

Ttest_indResult(statistic=-1.0327189472767258, pvalue=0.3019856331191846)

In [4]:
# 'equal_var' set to false performs Welchs T-test which doesn't assume theres an equal population variance:
stats.ttest_ind(rvs1, rvs2, equal_var=False)

Ttest_indResult(statistic=-0.35195133067929774, pvalue=0.7249490628113271)

ttest_ind underestimates p for unequal variances:

In [6]:
rvs3 = stats.norm.rvs(loc=5, scale=20, size=500, random_state=rng)
stats.ttest_ind(rvs1, rvs3)

Ttest_indResult(statistic=-1.4210208294460442, pvalue=0.15562307389186666)

In [7]:
stats.ttest_ind(rvs1, rvs3, equal_var=False)

Ttest_indResult(statistic=-1.4210208294460442, pvalue=0.15572275531999222)

When n1 != n2, the equal variance t-statistic is no longer equal to the unequal variance t-statistic:

In [8]:
rvs4 = stats.norm.rvs(loc=5, scale=20, size=100, random_state=rng)
stats.ttest_ind(rvs1, rvs4)

Ttest_indResult(statistic=-0.12319188161357855, pvalue=0.9019965460015809)

In [9]:
stats.ttest_ind(rvs1, rvs4, equal_var=False)
# A different t-statistic and pvalue due to no longer assuming the variance of the samples are equal, which they aren't:

Ttest_indResult(statistic=-0.08074320428617175, pvalue=0.9357939921908117)

T-test with different means, variance, and n:

In [10]:
rvs5 = stats.norm.rvs(loc=8, scale=20, size=100, random_state=rng)
stats.ttest_ind(rvs1, rvs5)

Ttest_indResult(statistic=-3.23377349216541, pvalue=0.0012889681820145585)

In [11]:
stats.ttest_ind(rvs1, rvs5, equal_var=False)

Ttest_indResult(statistic=-2.2710070530485007, pvalue=0.025053731765705214)

When performing a permutation test, more permutations typically yields more accurate results. Use a np.random.Generator to ensure reproducibility:

In [12]:
stats.ttest_ind(rvs1, rvs5, permutations=2000, random_state=rng)

TypeError: ttest_ind() got an unexpected keyword argument 'permutations'

In [None]:
stats.ttest_ind(rvs1, rvs5, permutations=10000,
                random_state=rng)
Ttest_indResult(statistic=-2.8415950600298774, pvalue=0.0052)

Take these two samples, one of which has an extreme tail.

In [None]:
a = (56, 128.6, 12, 123.8, 64.34, 78, 763.3)
b = (1.1, 2.9, 4.2)

Use the trim keyword to perform a trimmed (Yuen) t-test. For example, using 20% trimming, trim=.2, the test will reduce the impact of one (np.floor(trim*len(a))) element from each tail of sample a. It will have no effect on sample b because np.floor(trim*len(b)) is 0.

In [None]:
stats.ttest_ind(a, b, trim=.2)
Ttest_indResult(statistic=3.4463884028073513,
                pvalue=0.01369338726499547)

In [None]:
stats.ttest_ind(a, b, trim=.5)
Ttest_indResult(statistic=3.4463884028073513,
                pvalue=0.01369338726499547)