# One Sample T test - ttest_1samp(sample, pop_mean)

The one-sample t-test is a statistical hypothesis test used to determine whether an unknown population mean is different from a specific value.

Define Hypotheses:

Null Hypothesis (h0): there is no significant difference between the sample and population mean

Alternate Hypothesis (h1): there is a significant difference between the sample and population mean

In [1]:
import numpy as np

Create a population

In [2]:
scores=np.zeros((1,100))
for i in range(100):
    scores[0][i] = np.random.randint(0,100)
scores = scores[0]
scores


array([43., 92., 71., 18., 15., 18., 90., 86., 41., 67., 24., 57., 64.,
       46., 25., 86., 94., 30., 49., 27., 79., 82., 22., 57.,  7., 39.,
       47., 43., 10., 22., 25., 81., 16., 23., 57., 86., 11.,  8., 69.,
       72., 83., 10., 19., 32., 28., 70., 50., 78., 67., 72., 13., 56.,
       83., 81.,  8., 93., 25., 34., 10., 35., 59., 25., 99., 23., 53.,
       92., 20., 94., 67., 76., 55., 77.,  6., 68., 80., 34., 91., 94.,
       29., 38.,  4., 79., 56., 94., 38., 96., 57., 66., 64., 12.,  3.,
       84., 87., 79., 90., 56., 23.,  2., 61., 49.])

In [3]:
pop_mean = np.mean(scores)
pop_mean

51.26

Generate a random sample from population

In [4]:
sample_size=30
sample_scores=np.random.choice(scores,sample_size)
sample_scores

array([38., 32., 34., 56., 22., 71., 49.,  4., 20., 43., 34., 50., 18.,
       28., 39., 68.,  2., 94., 38., 56., 71., 71., 56., 77., 20.,  6.,
       67., 94., 90., 25.])

In [5]:
sample_mean = np.mean(sample_scores)
sample_mean

45.766666666666666

Perform T_test

In [6]:
from scipy.stats import ttest_1samp

In [7]:
def one_sample_ttest(sample, pop_mean):
    _, p = ttest_1samp(sample, pop_mean)
    print("p value:", np.round(p,decimals=3))
    if p < 0.05: 
        print("Nulll Hypothesis (h0) can be rejected!") # sig. diff. bw sample and population means
    else:
        print("There is not enough evidence to reject the Null Hypothesis (h0)!") # no sig. diff. bw sample and population means
    

In [8]:
one_sample_ttest(sample_scores, pop_mean)

p value: 0.262
There is not enough evidence to reject the Null Hypothesis (h0)!


A p-value is the probability the getting the observed (or more extreme) result of a test if the null hypothesis (h0) is true

# 2 Sample T test (Independent) - ttest_ind(sample1, sample2)

The two-sample t-test (independent samples t-test) is a method used to test whether the means of two groups are equal or not

Define Hypotheses:

Null Hypothesis (h0): there is no significant difference between the means of the independent samples

Alternate Hypothesis (h1): there is a significant difference between the sample means

In [9]:
from scipy.stats import ttest_ind

In [10]:
sample_scores_2=np.random.choice(scores,sample_size)
sample_scores_2

array([57., 96., 25., 43., 94., 23., 24., 67.,  4., 57.,  4., 13., 83.,
       93., 86., 56., 22., 92., 67.,  3., 25., 22., 28., 20., 78., 34.,
       96.,  8., 57., 13.])

In [11]:
sample_2_mean = sample_scores_2.mean()
sample_2_mean

46.333333333333336

In [12]:
def two_sample_ttest(sample1, sample2):
    _, p = ttest_ind(sample1, sample2)
    print("p value:", np.round(p,decimals=3))
    if p < 0.05: 
        print("Nulll Hypothesis (h0) can be rejected!") # sig. diff. bw means of both samples
    else:
        print("There is not enough evidence to reject the Null Hypothesis (h0)!") # no sig. diff. 
    

In [13]:
two_sample_ttest(sample_scores, sample_scores_2)

p value: 0.941
There is not enough evidence to reject the Null Hypothesis (h0)!


Note: Both samples (sample_scores and sample_scores_2) are completely independent of one another as they were selected randomly from scores

# Paired T test -  ttest_rel(sample1, sample2)

Compares the means of two measurements taken from the same individual, object, or related units. 

These paired measurements can represent things like: A measurement taken at two different times (e.g., pre-test and post-test score)

Define Hypotheses:

Null Hypothesis (h0): there is no significant difference between the means of the paired samples

Alternate Hypothesis (h1): there is a significant difference between the means

In [14]:
import scipy.stats as stats

In [15]:
# create 2 paired samples say scores and scores_after_revision
np.random.seed(12345678) # fix random seed to get same numbers

s1 = stats.norm.rvs(loc=5,scale=10,size=100)
s2 = (stats.norm.rvs(loc=5,scale=10,size=100) + stats.norm.rvs(scale=0.2,size=100))

print("s1")
print(s1)
print("mean:", s1.mean())
print("----"*20)
print("s2")
print(s2)
print("mean:", s2.mean())

s1
[ 10.53708189  -9.59631993  -7.94585139 -10.09673945  20.71874901
  -4.75696191   9.80698788  11.25614307  12.22353018  14.10326442
   5.52467471  -9.00124273   3.35660182  -1.69418264  12.11979301
  13.5532108    6.76463381  15.69086258  10.06566129  12.1353775
  11.7155515   15.48822943   4.62792502   5.55703342  24.20794611
  15.71813886  13.5764859   11.78236866  -6.83241     -4.28932255
   2.65451189  -4.8131382   17.78519151  16.19017138  15.63490301
  15.64924909  18.12775051  11.94768725  22.17620133   8.781742
  -0.34058822   7.09047263  13.4541488   19.36486129  12.59518422
  15.45008524  22.94387481   1.01324157  11.5851746    1.10843976
  20.96495218   6.34231086  -3.9810508    4.68886498  16.57386087
   0.82668595   0.88591178   2.739668     8.03099184 -14.57948805
   3.61770438  10.0268816    7.02587537   3.59878701  11.67459141
  -4.24033068   0.12263636  -7.7101517   -9.98352859  12.0897582
  -2.83944367  -3.66835354  15.76232063  15.58171556  -1.21777709
  -6.803945

DataFrame contaning both samples and their difference

In [16]:
import pandas as pd
df = pd.DataFrame({
    "s1": s1,
    "s2": s2,
    "s1 - s2": s1 - s2
})

df.head()

Unnamed: 0,s1,s2,s1 - s2
0,10.537082,15.882354,-5.345272
1,-9.59632,4.106453,-13.702773
2,-7.945851,-6.78563,-1.160222
3,-10.096739,-7.11753,-2.979209
4,20.718749,-2.250749,22.969498


In [17]:
def paired_ttest(sample1, sample2):
    _, p = stats.ttest_rel(sample1, sample2)
    print("p value:", np.round(p,decimals=3))
    if p < 0.05: 
        print("Nulll Hypothesis (h0) can be rejected!") # sig. diff. bw means
    else:
        print("There is not enough evidence to reject the Null Hypothesis (h0)!") # no sig. diff.

In [18]:
paired_ttest(s1, s2)

p value: 0.016
Nulll Hypothesis (h0) can be rejected!
