<h1 style="color: #001a79;">Task</h1>

Take the code from the <a href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html" style="color: #ff791e">Examples section of the scipy stats documentation for independent samples t-tests</a>, add it to your own notebook and add explain how it works using MarkDown cells and code comments. Improve it in any way you think it could be improved.
<hr style="border-top: 1px solid #001a79;" />

Step 1: Install all necessary packages

In [2]:
#import numpy package
import numpy as np
#import stats package from scipy
from scipy import stats

Step 2:Create a range of random numbers in a wide range of distributions to choose from in the functions to follow

In [3]:
#Create a variable called "rng" which is equal to an array of randomly generated numbers
rng = np.random.default_rng()

Step 3:Create 2 random variable samples, with normal distribution and have identical means and standard deviation. Then run an independent t-test on them.

In [4]:
#Create 2 random variable samples,using the "stats.norm.rvs" function, called "rvs1" and "rvs2"
#The parameters: loc= mean, scale= st dev, size= sample size, random_state= select variables from our rng array above
rvs1 = stats.norm.rvs(loc=5, scale=10, size=500, random_state=rng)
rvs2 = stats.norm.rvs(loc=5, scale=10, size=500, random_state=rng)
#Use stats.ttest_ind function to run a t-test on our 2 samples
stats.ttest_ind(rvs1, rvs2)

Ttest_indResult(statistic=0.15996693144197505, pvalue=0.8729394934882427)

Results:The t-test statistics is 0.15996693 and the corresponding p-value is equal to 0.872939
<hr style="border-top: 1px solid #001a79;" />
<h4 style="color: #001a79;">Experiment with parameters</h4>

The t-test assumes equal population variance but the t-test often underestimates the p-value for unequal variance

Below the variance is set to unequal but there is no change in results as the mean and st dev of the 2 samples is the same so the variance is equal

In [16]:
#You can set equal_var to False if the population variance is unequal which in this case makes no diff from above
stats.ttest_ind(rvs1, rvs2, equal_var=False)

Ttest_indResult(statistic=0.15996693144197505, pvalue=0.8729395105519876)

Compare the results of changing the equal_var parameter using samples with different standard deviations:

In [6]:
#Create a third random variable sample similar to previously except change the loc/mean and scale/st dev
#equal_var=True by default
rvs3 = stats.norm.rvs(loc=5, scale=20, size=500, random_state=rng)
stats.ttest_ind(rvs1, rvs3)

Ttest_indResult(statistic=-0.07701869525176737, pvalue=0.9386241097868785)

In [7]:
#Run the test again but change equal_var to equal False
stats.ttest_ind(rvs1, rvs3, equal_var=False)

Ttest_indResult(statistic=-0.07701869525176736, pvalue=0.9386301706607502)

Results:This time the p-value changed by increasing when variance was stated as unequal

Next try changing the sample size so they are not equal but have equal means:

In [8]:
#change sample size to 100
rvs4 = stats.norm.rvs(loc=5, scale=20, size=100, random_state=rng)
stats.ttest_ind(rvs1, rvs4)

Ttest_indResult(statistic=0.4486055483094085, pvalue=0.6538787361196783)

In [9]:
#run again with equal_var set to false
stats.ttest_ind(rvs1, rvs4, equal_var=False)

Ttest_indResult(statistic=0.30774192053708693, pvalue=0.7588546534832463)

Result:Two random variable samples of different sizes but same means have different t-test and p-value statistics when changing from equal population variance to unequal population variance

Lastly try compare variances of two samples that have different means,st dev and size

In [10]:
#create a new sample with diff mean, scale and size to sample 1
rvs5 = stats.norm.rvs(loc=8, scale=20, size=100, random_state=rng)
stats.ttest_ind(rvs1, rvs5)

Ttest_indResult(statistic=-2.009960191604169, pvalue=0.044884034032089744)

In [11]:
#run again with equal_var set to false
stats.ttest_ind(rvs1, rvs5, equal_var=False)

Ttest_indResult(statistic=-1.3860962253433327, pvalue=0.16848607453524933)

Results:As you can see there is an even larger difference between the results for equal population variance vs. unequal population variance
<hr style="border-top: 1px solid #001a79;" />

Note: More permutations mean more accurate results therefore you can insert a permutations parameter

In [12]:
#Insert a permutations parameter equal to 10000
stats.ttest_ind(rvs1, rvs5, permutations=10000,random_state=rng)

Ttest_indResult(statistic=-2.009960191604169, pvalue=0.0416)

<hr style="border-top: 1px solid #001a79;" />
<h4 style="color: #001a79;">Example</h4>

In [13]:
#Create 2 samples a and b which are a list of random numbers
a = (56, 128.6, 12, 123.8, 64.34, 78, 763.3)
b = (1.1, 2.9, 4.2)

In [14]:
#run ttest function on the 2 samples
#trim defines the fraction of elements to be trimmed from each end of the samples which is set to 20%
stats.ttest_ind(a, b, trim=.2)

Ttest_indResult(statistic=3.4463884028073513, pvalue=0.01369338726499547)