Initializing a Random Generator, and setting it to the variable rng. It is similar to the RandomState method:

In [None]:
from scipy import stats
rng = np.random.default_rng()

Test two samples with identical means. 'loc' specifies the mean. 'scale' specifies the standard deviation. 'random_state' is using an instance of the 'rng' Generator. 'norm' is an instance of the 'rv_continuous' class and inherits the rvs (random variates) method, which returns a random sample with probability equal to the distribution. The 'size' argument specifies the number of elements in the sample:

In [4]:
rvs1 = stats.norm.rvs(loc=5, scale=10, size=500, random_state=rng)
rvs2 = stats.norm.rvs(loc=5, scale=10, size=500, random_state=rng)
# 'ttest_ind' calculates the T-test for the means of the 2 independent samples:
stats.ttest_ind(rvs1, rvs2)
#The resulting statistic and pvalue after the code is run:
Ttest_indResult(statistic=-0.4390847099199348, pvalue=0.6606952038870015)
# 'equal_var' set to false performs Welchs T-test which doesn't assume theres an equal population variance:
stats.ttest_ind(rvs1, rvs2, equal_var=False)
Ttest_indResult(statistic=-0.4390847099199348, pvalue=0.6606952553131064)

NameError: name 'rng' is not defined

ttest_ind underestimates p for unequal variances:

In [None]:
rvs3 = stats.norm.rvs(loc=5, scale=20, size=500, random_state=rng)
stats.ttest_ind(rvs1, rvs3)
Ttest_indResult(statistic=-1.6370984482905417, pvalue=0.1019251574705033)
stats.ttest_ind(rvs1, rvs3, equal_var=False)
Ttest_indResult(statistic=-1.637098448290542, pvalue=0.10202110497954867)

When n1 != n2, the equal variance t-statistic is no longer equal to the unequal variance t-statistic:

In [None]:
rvs4 = stats.norm.rvs(loc=5, scale=20, size=100, random_state=rng)
stats.ttest_ind(rvs1, rvs4)
Ttest_indResult(statistic=-1.9481646859513422, pvalue=0.05186270935842703)
stats.ttest_ind(rvs1, rvs4, equal_var=False)
#A different t-statistic and pvalue due to no longer assuming the variance of the samples are equal, which they aren't:
Ttest_indResult(statistic=-1.3146566100751664, pvalue=0.1913495266513811)

T-test with different means, variance, and n:

In [None]:
rvs5 = stats.norm.rvs(loc=8, scale=20, size=100, random_state=rng)
stats.ttest_ind(rvs1, rvs5)
Ttest_indResult(statistic=-2.8415950600298774, pvalue=0.0046418707568707885)
stats.ttest_ind(rvs1, rvs5, equal_var=False)
Ttest_indResult(statistic=-1.8686598649188084, pvalue=0.06434714193919686)

When performing a permutation test, more permutations typically yields more accurate results. Use a np.random.Generator to ensure reproducibility:

In [None]:
stats.ttest_ind(rvs1, rvs5, permutations=2000,
                random_state=rng)
Ttest_indResult(statistic=-2.8415950600298774, pvalue=0.0052)

In [None]:
stats.ttest_ind(rvs1, rvs5, permutations=10000,
                random_state=rng)
Ttest_indResult(statistic=-2.8415950600298774, pvalue=0.0052)

Take these two samples, one of which has an extreme tail.

In [None]:
a = (56, 128.6, 12, 123.8, 64.34, 78, 763.3)
b = (1.1, 2.9, 4.2)

Use the trim keyword to perform a trimmed (Yuen) t-test. For example, using 20% trimming, trim=.2, the test will reduce the impact of one (np.floor(trim*len(a))) element from each tail of sample a. It will have no effect on sample b because np.floor(trim*len(b)) is 0.

In [None]:
stats.ttest_ind(a, b, trim=.2)
Ttest_indResult(statistic=3.4463884028073513,
                pvalue=0.01369338726499547)