### one sample test
A one sample test is used to test whether or not the mean of a population is equal to some value.

For example, suppose we want to know whether or not the mean weight of a certain species of some turtle is equal to 310 pounds.

To test this, we go out and collect a simple random sample of turtles with the following weights:

In [2]:
import scipy.stats as stats

#define data
data = [300, 315, 320, 311, 314, 309, 300, 308, 305, 303, 305, 301, 303]

#perform one sample t-test
stats.ttest_1samp(a=data, popmean=310)

(-1.5848116313861254, 0.11300913889451164)

In [6]:
# for z-test
from statsmodels.stats.weightstats import ztest as ztest
ztest(data, value =310 ) # value refers to the mean under the null, or mean difference (for two sample case)

(-1.5848116313861254, 0.11300913889451164)

Because the p-value of our test (0.1389) is greater than alpha = 0.05, we fail to reject the null hypothesis of the test.

We do not have sufficient evidence to say that the mean weight for this particular species of turtle is different from 310 pounds.

### Two Sample test in Python

A two sample test is used to test whether or not the means of two populations are equal.

For example, suppose we want to know whether or not the mean weight between two different species of turtles is equal.

In [4]:
import scipy.stats as stats 

#define array of turtle weights for each sample
sample1 = [300, 315, 320, 311, 314, 309, 300, 308, 305, 303, 305, 301, 303]
sample2 = [335, 329, 322, 321, 324, 319, 304, 308, 305, 311, 307, 300, 305]

#perform two sample t-test
stats.ttest_ind(a=sample1, b=sample2)

Ttest_indResult(statistic=-2.10090292575557, pvalue=0.04633501389516513)

In [5]:
# for z-test
ztest(sample1, sample2, value = 0)

(-2.1009029257555696, 0.03564948854888866)

Since the p-value of the test (0.0463) is less than .05, we reject the null hypothesis.

This means we have sufficient evidence to say that the mean weight between the two species is not equal.

### A paired samples test

is used to compare the means of two samples when each observation in one sample can be paired with an observation in the other sample.

For example, suppose we want to know whether or not a certain training program is able to increase the max vertical jump (in inches) of basketball players.

To test this, we may recruit a simple random sample of 12 college basketball players and measure each of their max vertical jumps. Then, we may have each player use the training program for one month and then measure their max vertical jump again at the end of the month.

The following data shows the max jump height (in inches) before and after using the training program for each player:

In [13]:
import scipy.stats as stats  

#define before and after max jump heights
before = [22, 24, 20, 19, 19, 20, 22, 25, 24, 23, 22, 21]
after = [23, 25, 20, 24, 18, 22, 23, 28, 24, 25, 24, 20]

#perform paired samples t-test
stats.ttest_rel(a=before, b=after)

Ttest_relResult(statistic=-2.5289026942943655, pvalue=0.02802807458682508)

In [23]:
# for Z-test
# If the assumed population difference is zero (as stated in the null hypothesis), the paired Z-test reduces to the one sample Z-test.
# Hence, we will perform one sample Z-test on paired differences.
# since in paired samples, you need to test if the mean of the difference observed between the measurements is different from 0
import numpy as np # we have to use numpy to get the differnce between the two samples, because we can't subtract two lists normally
before = np.array(before)
after = np.array(after)
diff = np.subtract(before, after)
diff = list(diff) # to convert the subtracted numpy array back to a list
diff
ztest(diff, value = 0)

(-2.5289026942943655, 0.011441974423335222)

Since the p-value of the test (0.0280) is less than .05, we reject the null hypothesis.

This means we have sufficient evidence to say that the mean jump height before and after using the training program is not equal.

### Chi-Square Test of Independence


it's used to determine whether or not there is a significant association between two categorical variables.

Suppose we want to know whether or not gender is associated with political party preference. We take a simple random sample of 500 voters and survey them on their political party preference. The following table shows the results of the survey:

Republican	Democrat Independent Total

Male	120	90	40	250

Female	110	95	45	250

Total	230	185	85	500

In [24]:
data = [[120, 90, 40],
        [110, 95, 45]] # create a table, each row, represent a variable

stats.chi2_contingency(data)


(0.8640353908896108,
 0.6491978887380976,
 2,
 array([[115. ,  92.5,  42.5],
        [115. ,  92.5,  42.5]]))

The way to interpret the output is as follows:

Chi-Square Test Statistic: 0.864
p-value: 0.649
Degrees of freedom: 2 (calculated as #rows-1 * #columns-1)
Array: The last array displays the expected values for each cell in the contingency table.
Recall that the Chi-Square Test of Independence uses the following null and alternative hypotheses:

H0: (null hypothesis) The two variables are independent.
H1: (alternative hypothesis) The two variables are not independent.
Since the p-value (.649) of the test is not less than 0.05, we fail to reject the null hypothesis. This means we do not have sufficient evidence to say that there is an association between the two variables.

In other words, gender and political party prefrence are independent(no relationship between your gender and the type of political party you support)