We want to test if our the **sample mean** is not equal to the **population mean** = 120. We also know that our **sample** has a size of 100 individuals.

$Z = \frac{(\bar{X}-\mu_{0})}{\sigma/\sqrt{n}}$

where:

* $\bar{X}$ is the **sample mean**
* $\mu_{0}$ is the **population mean**
* ${\sigma}$ is the **population's standard deviation**
* $n$ is the number of measures in our **sample**

In [1]:
import math

sample_mean = 130.1
pop_mean = 120
sample_std = 21.21
n = 100
statistic = (sample_mean - pop_mean)/(sample_std/math.sqrt(n))
print("Statistic is: ", statistic)

Statistic is:  4.761904761904759


How this `Statistic` changes if we take other samples?
Let's use `normal` function to generate fake samples following a normal distribution with $\bar{x} = 130.1$ and $\sigma = 21.21$ of the same size $N = 100$

In [3]:
from scipy import stats
from numpy.random import normal
import numpy as np


samples = {}

for i in range(10):
    sample_name = "sample_" + str(i)
    # create the sample inputting the blood pressure mean of our sample, the number of individuals and the standard deviation
    samples[sample_name] = normal(loc = 130.1, scale = 21.21, size = 100)
    sample_mean = "sample_" + str(i) + "_mean"
    samples[sample_mean] = np.mean(samples[sample_name])
    sample_std = "sample_" + str(i) + "_std"
    samples[sample_std] = np.std(samples[sample_name],ddof=1)
    sample_statistic = "sample_" + str(i) + "_t-statistic"
    samples[sample_statistic] = (samples[sample_mean]- pop_mean)/(samples[sample_std]/math.sqrt(n)) 
    print("The t-statistic for the sample {} is: {}".format(i,samples[sample_statistic]))

The t-statistic for the sample 0 is: 3.2750312867837077
The t-statistic for the sample 1 is: 4.622336374098095
The t-statistic for the sample 2 is: 3.964949272147459
The t-statistic for the sample 3 is: 4.848680582450736
The t-statistic for the sample 4 is: 4.578633812929876
The t-statistic for the sample 5 is: 3.2542210630019093
The t-statistic for the sample 6 is: 4.7668471278408555
The t-statistic for the sample 7 is: 4.116877032780341
The t-statistic for the sample 8 is: 3.808985226636057
The t-statistic for the sample 9 is: 4.228532164324372


In [5]:
print(samples[sample_name])


[132.21123169 121.45873628 147.40960197 116.90445037 141.43870411
 161.79185953 127.00334717 117.53388438 147.05285817 175.36143005
 103.03708381 118.54401447 144.99253034 127.71532468 108.21648299
 145.68791094  96.49751449  59.79836385  75.70085549 140.05909637
 136.29620672 119.35359169 140.11941326 138.61000793 118.67714832
 147.75822887 120.08811895 113.45724733 115.82396207 149.52265942
 147.52380119 121.95830384 153.70761398 139.21281502 103.18605563
  91.75096603 151.15596724 153.56679594 153.57852668  96.899623
 131.31216776 113.48433325 124.94033621 107.53728222 149.06508415
 149.02242382 133.76488076 129.67014949 137.08732685 147.8350795
 114.45503208 165.04683031 117.17989868 119.70530325  87.80158123
 132.9497362  162.68894735 150.09084076 122.64049619 144.69945847
 143.05378608 150.35619246 109.96846735 140.76636952 172.16182275
 122.21358789 135.63480131 131.14802143 155.05310817 117.84804136
 117.33907598 135.89050777 122.08500087 117.34206876 117.33326439
 120.54010259

In [6]:
#sanity check of the sample inspected 
len(samples[sample_name])

100

In [4]:
print("Assuming a significance level of 0.05")
print()

for i in range(10):
    sample_name = "sample_" + str(i)
    # In the next line, 85 is the population's mean.
    print("The p-value of sample {} is: {:-5.3}".format(i,stats.ttest_1samp(samples[sample_name],120)[1]))
    print("The values in the sample are: ")
    print(samples[sample_name])
    sample_mean = "sample_" + str(i) + "_mean"
    print(samples[sample_mean])
    print()
    if ( stats.ttest_1samp(samples[sample_name],120)[1] < 0.05 ):
        print("Therefore we discard the null hypothesis Ho, as it's very unlikely to get sample {} given Ho.".format(i))
    else: 
        print("We accept the null hypothesis Ho, as it's very likely to obtain sample {} given Ho".format(i) )
    print()

Assuming a significance level of 0.05

The p-value of sample 0 is: 0.00146
The values in the sample are: 
[112.68480647 137.85906621 101.04686139 144.57056105 131.6364257
 149.54139438 131.87438288 137.23908788 163.68096022 122.3512249
 117.02873126 168.54455769 117.46895739 158.42978693 133.72286627
 158.39954818 125.9364854  113.61771312 108.39450549 152.48029922
 107.49162647 125.00696906  87.0845045  143.58953223 126.07763773
 107.4848427  100.42733046  97.04690698 105.19325266 135.09315708
 103.760194   134.92549375 122.45208439 131.60828618 124.85640731
 102.38710074 100.25934171 156.08315588 118.84806746 162.1157776
 121.06775963 119.43409205 109.85478464 155.62919779 114.68818762
 105.67889688 119.58114012 126.4963607  137.72677887 121.20210668
 150.66637648 118.49763459 150.33012616 148.71123415 141.30324537
 122.14046245 172.14532485 140.41109274 119.563647   142.70752936
 135.77368558  89.19638586 133.18810501 117.23000128 103.89738079
 132.07893939  79.60473129 120.60847102