# HYPOTHESIS TESTING 

#### Scripts in this file produced for METU IS709 course, Assignment 2.
##### The assignment, work on statistical hypothesis testing and try different statistical tests and comment on your findings.


**Import the libraries you need.**  

In [2]:
import pandas as pd
import statistics
import numpy as np
import scipy 
from scipy import stats
from scipy.stats import chisquare
import math
%matplotlib inline
import statsmodels.api as sa
import statsmodels.formula.api as sfa
import scikit_posthocs as sp

  I have followed the **steps in a statistical test** mentioned on the 5th lecture slides (slide no. 30):

**1.** Statement of the question to be answered by the study   
**2.** Formulation of the null and alternative hypotheses  
**3.** Decision for a suitable statistical test  
**4.** Specification of the level of significance (for example, 0.05)  
**5.** Performance of the statistical test analysis (e.g., calculation of the p-value)  
**6.** Statistical decision: for example – p<0.05 leads to rejection of the null hypothesis and acceptance of the alternative hypothesis – p≥0.05 leads to retention of the null hypothesis  
**7.** Interpretation of the test result  

# Hypothesis Testing

## Q1.
A botanist is curious about the productivity/efficiency of her newly developed hybrid tomato seed. Therefore, she recorded the amount of product (in kg) obtained from the seedlings that grew from hybrid seeds and regular seeds.

**Conduct the hypothesis testing to check whether there is a difference between the average obtained product of these two seeds by using a 0.05 significance level to evaluate the null and alternative hypotheses. Before doing hypothesis testing, check the related assumptions. Comment on the results.**

regular_seed=[24.88, 20.96, 19.97, 14.29, 18.89, 18.67, 19.46, 17.88, 19.57,
       18.31, 15.88, 22.26, 22.25, 24.65, 19.84, 18.52, 18.11, 15.21,
       22.54, 16.5 , 16.26, 19.1 , 24.  , 20.38, 16.73, 17.63, 21.51,
       19.23, 17.47]
       
hybrid_seed=[22.28, 25.3 , 29.12, 19.14, 21.05, 20.5 , 15.5 , 20.13, 19.82,
       26.48, 22.59, 17.96, 25.  , 21.89, 17.59, 21.15, 21.17, 20.29,
       23.09, 16.03, 22.16, 26.14, 25.64, 26.43, 26.47, 27.61, 19.53,
       25.62, 17.23, 21.13, 17.06, 26.24, 27.13, 22.38])


1. Statement of the Question:  
Is there a **difference** between the average obtained product of these two seeds?

2. Hypotheses:  
$H_{0}$: $μ_{ regular}$ = $μ_{hybrid}$   
$H_{a}$: $μ_{ regular}$ $\neq$ $μ_{hybrid}$  

    We are interested in whether there is a difference. We have two tailed test.

3. Deciding on the Statistical Test:  
Here we have the amount in kilograms, in floating values , so we can say that we have continous as the scale of measurement.
Then we should check if the data is normally distrubuted before we choose test statistics.   

    When we check the list lengths we see that we have small number of observations. Therefore, we can use the Shapiro-Wilk's W test to check normality. As alpha=0.05 if the calculated p-value > alpha it can be said that we have a normally distributed data.    

In [4]:
regular_seed=[24.88, 20.96, 19.97, 14.29, 18.89, 18.67, 19.46, 17.88, 19.57, 18.31, 15.88, 22.26, 22.25, 24.65, 19.84, 18.52, 18.11, 15.21, 22.54, 16.5 , 16.26, 19.1 , 24. , 20.38, 16.73, 17.63, 21.51, 19.23, 17.47]
hybrid_seed=[22.28, 25.3 , 29.12, 19.14, 21.05, 20.5 , 15.5 , 20.13, 19.82, 26.48, 22.59, 17.96, 25. , 21.89, 17.59, 21.15, 21.17, 20.29, 23.09, 16.03, 22.16, 26.14, 25.64, 26.43, 26.47, 27.61, 19.53, 25.62, 17.23, 21.13, 17.06, 26.24, 27.13, 22.38]
#print("Number of obervations for regular seed:", len(regular_seed))
#print("Number of obervations for hybrid seed:", len(hybrid_seed))

In [5]:
test_stat_shapiro_regular, p_value_shapiro_regular=stats.shapiro(regular_seed)
test_stat_shapiro_hybrid, p_value_shapiro_hybrid=stats.shapiro(hybrid_seed)
print("For regular seeds p value:%.6f" % p_value_shapiro_regular)
print("For hybrid seeds p value:%.6f" % p_value_shapiro_hybrid)

For regular seeds p value:0.702764
For hybrid seeds p value:0.276011


In [6]:
resultKS_regular = stats.kstest(regular_seed, cdf='norm',alternative="two-sided")
resultKS_hybrid = stats.kstest(regular_seed, cdf='norm',alternative="two-sided")
print("kstest: ", resultKS_regular)
print("kstest: ", resultKS_hybrid)

kstest:  KstestResult(statistic=1.0, pvalue=0.0)
kstest:  KstestResult(statistic=1.0, pvalue=0.0)


3. ...  
    For both samples p-value is greater than alpha. It can be said that we have normally distributed data.  
    
    We are interested in the sample means, and we have samples from two groups which are independent from each other, so the type of study design is unpaired. Therefore we will use **unpaired t-test**.

4. Specification of Level of Significance  
alpha = 0.05  as stated in the question

5. Performance of the Test Statistics:

In [7]:
regular_data = pd.DataFrame({'regular seed': regular_seed})
#print(regular_data.head())
print(regular_data.describe()) 

       regular seed
count     29.000000
mean      19.343103
std        2.728040
min       14.290000
25%       17.630000
50%       19.100000
75%       20.960000
max       24.880000


In [8]:
hybrid_data = pd.DataFrame({'hybrid seed': hybrid_seed})
#print(hybrid_data.head())
print(hybrid_data.describe()) 

       hybrid seed
count    34.000000
mean     22.260294
std       3.653996
min      15.500000
25%      19.897500
50%      22.025000
75%      25.635000
max      29.120000


5. ...  
From the descriptive statistics we see that amount of product means differ from each other. Mean of the regular seed is smaller than mean of the hybrid seed. This realationship also applies to the standard deviations. We also see that minimum amount of product recieved from hybrid seed is greater than the minimum amount of regular seed.

In [9]:
stats.ttest_ind(a=regular_seed, b=hybrid_seed)

Ttest_indResult(statistic=-3.5381790855069073, pvalue=0.0007771046298238749)

In [10]:
#set up a confidence interval
n_regular = float(regular_data.count())
std_regular = float(regular_data.std())
mean_regular = float(regular_data.mean())

n_hybrid = float(hybrid_data.count())
std_hybrid = float(hybrid_data.std())
mean_hybrid = float(hybrid_data.mean())

observed_difference = mean_regular - mean_hybrid
std_error = (((std_regular**2)/n_regular)+((std_hybrid**2)/n_hybrid))*(1/2)
denominator = (((((std_regular**2)*(n_regular-1))+((std_hybrid**2)*(n_hybrid-1)))/(n_hybrid+n_regular-2))**(1/2))*(((1/n_regular)+(1/n_hybrid))**(1/2))
t = observed_difference / denominator
print("t-score =", t)
z_seeds = (observed_difference - 0)/std_error  #0 comes from the null hypothesis: mu's are equal
critical_value = 1.96 #for 0.05 check from standard normal distributions table
print("z-score = {}, z-critical = {}".format(z_seeds, critical_value))

t-score = -3.538179085506903
z-score = -8.985311710135266, z-critical = 1.96


6. Statistical Decision:  
    p value is smaller than alpha value of 0.05, there is a strong evidence against null hypothesis. Therefore, we reject the $H_{0}$ and accept $H_{a}$. 
    
    Degree of freedom = n_regular + n_hybrid - 2 = 61  
    From the t-critical table when we check for 61 and 0.05 we find t_critical = +/- 1.999624  
    We see that t-score is beyond the critical value, in the rejction region as well.
    

7. Interpretation of the Results:  
    We reject the claim that there is no difference between the average amount of obtained product from hybrid seeds and regular seeds of tomatoes. In other words, the difference of obtained products from regular and hybrid seeds are statistically significant.

## Q2.

Fifteen students were diagnosed with iron deficiency anemia by the METU Health Center in the previous semester. Healthcare personnel told these patients about the dangers of anemia and prescribed an iron supply. The patients came for control two months later, and their iron levels were reexamined.

**According to this information, conduct the hypothesis testing to check whether there is a increase in the iron levels of the patients after the iron supply by using a 0.05 significance level. Before doing hypothesis testing, check the related assumptions. Comment on the results.**

test_results_before_supply=[12.68, 10.65, 10.14,  7.2 ,  9.58,  9.46,  9.87,  9.05,  9.93, 9.28,  8.02, 11.32, 11.32, 12.56, 10.07]  
test_results_after_supply=[12.55, 12.4 , 11.29, 14.08, 11.78, 11.69, 12.77, 14.63, 13.26, 11.87, 12.21, 13.68, 12.82, 12.15, 12.74]

1. Statement of the Question:  
Is there an **increase** in the iron levels of the patients after the iron supply?

2. Hypotheses:  
$H_{0}$: There is no difference in the iron levels of patients after the iron supply (equal, =)  
$H_{a}$: There is an increase in the iron levels of patients after the iron supply  (greater, >)  

    We are interested in whether it is greater. We have one tailed test.

3. Deciding on the Statistical Test:  
Here we have the iron levels in real numbers, so we can say that we have continous as the scale of measurement.
Then we should check if the data is normally distrubuted before we choose test statistics.   

    We have 15 observations which is quite a small sample size. Therefore, we can use the Shapiro-Wilk's W test to check normality. As alpha=0.05 if the calculated p-value > alpha it can be said that we have a normally distributed data.   

In [11]:
test_results_before_supply=[12.68, 10.65, 10.14, 7.2 , 9.58, 9.46, 9.87, 9.05, 9.93, 9.28, 8.02, 11.32, 11.32, 12.56, 10.07]
test_results_after_supply=[12.55, 12.4 , 11.29, 14.08, 11.78, 11.69, 12.77, 14.63, 13.26, 11.87, 12.21, 13.68, 12.82, 12.15, 12.74]

In [12]:
test_stat_shapiro_before, p_value_shapiro_before=stats.shapiro(test_results_before_supply)
test_stat_shapiro_after, p_value_shapiro_after=stats.shapiro(test_results_after_supply)
print("Test_results_before_supply p value:%.6f" % p_value_shapiro_before)
print("Test_results_after_supply p value:%.6f" % p_value_shapiro_after)

Test_results_before_supply p value:0.780257
Test_results_after_supply p value:0.611893


In [13]:
resultKS_before = stats.kstest(test_results_before_supply, cdf='norm',alternative="two-sided")
resultKS_after = stats.kstest(test_results_after_supply, cdf='norm',alternative="two-sided")
print("kstest: ", resultKS_before)
print("kstest: ", resultKS_after)

kstest:  KstestResult(statistic=0.9999999999996989, pvalue=3.030600710579958e-188)
kstest:  KstestResult(statistic=1.0, pvalue=0.0)


3. ...  
    For both arrays p-value is greater than alpha. It can be said that the data is normally distributed.  
    
    We have the observations of the same patients in two conditions. The study design is paired. Therefore, we will use **paired t-test**.

4. Specification of Level of Significance  
alpha = 0.05 with the confidence of 95% as stated in the question

5. Performance of the Test Statistics:

In [14]:
before_supply_data = pd.DataFrame({'test results before supply': test_results_before_supply, 'test results after supply': test_results_after_supply})
#print(before_supply_data.head())
print(before_supply_data.describe()) 

       test results before supply  test results after supply
count                   15.000000                  15.000000
mean                    10.075333                  12.661333
std                      1.494126                   0.929684
min                      7.200000                  11.290000
25%                      9.370000                  12.010000
50%                      9.930000                  12.550000
75%                     10.985000                  13.040000
max                     12.680000                  14.630000


5. ...  
From the descriptive statistics we see that means differ from each other. iron levels mean before supply is smaller than mean after supply. After the supply, standard deviation gets smaller as well. We may also add that the minumum iron level observed after supply is greater than the before.

In [15]:
stats.ttest_rel(a=test_results_after_supply, b=test_results_before_supply)

Ttest_relResult(statistic=5.234470546314352, pvalue=0.0001263744085492977)

6. Statistical Decision:  
    p value is smaller than alpha value of 0.05. Therefore, we reject the $H_{0}$ and accept $H_{a}$. 
    
    Degree of freedom = n_before + n_after - 2 = 28  
    From the t-critical table when we check for 61 and 0.05 we find t_critical(one tailed) = 1.701131  
    We see that t-score is beyond the critical value, in the rejction region as well.
    

7. Interpretation of the Results:  
    We reject the claim that there is no difference in the iron levels of patients after the iron supply. And we accepted that iron levels after the iron supply is greater than before. It can be stated that after iron supply, patients iron levels is increased. Although we can't directly state that supply increased the iron level as we don't have the all necessary information, we can say there is an increase in the iron levels after using iron supply.

## Q3. 

A pediatrician wants to see the effect of formula consumption on the average monthly weight gain (in gr) of babies. For this reason, she collected  data from three different groups. The first group is exclusively breastfed children(receives only breast milk), the second group is children who are fed with only formula and the last group is both formula and breastfed children. These data as below 


only_breast=[794.1, 716.9, 993. , 724.7, 760.9, 908.2, 659.3 , 690.8, 768.7,
       717.3 , 630.7, 729.5, 714.1, 810.3, 583.5, 679.9, 865.1]      
   
only_formula=[ 898.8,  881.2,  940.2,  966.2,  957.5, 1061.7, 1046.2,  980.4,
        895.6,  919.7, 1074.1,  952.5,  796.3,  859.6,  871.1 , 1047.5,
        919.1 , 1160.5,  996.9]     
        
both=[976.4, 656.4, 861.2, 706.8, 718.5, 717.1, 759.8, 894.6, 867.6,
       805.6, 765.4, 800.3, 789.9, 875.3, 740. , 799.4, 790.3, 795.2 ,
       823.6, 818.7, 926.8, 791.7, 948.3]  
**According to this information, conduct the hypothesis testing to check whether there is a difference between the average monthly gain of these three groups by using a 0.05 significance level. If there is a significant difference, perform further analysis to find that caused the difference.  Before doing hypothesis testing, check the related assumptions. Comment on the results.**


1. Statement of the Question:  
Is there a **difference** between the average monthly gain of these three groups?  
If there is a significant difference, find that caused the difference?

2. Hypotheses:  
$H_{0}$: $μ_{only  breast}$ = $μ_{only  formula}$ = $μ_{both}$  
$H_{a}$: 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛𝑠 𝑖𝑠 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡 𝑓𝑟𝑜𝑚 𝑡ℎ𝑒 𝑜𝑡ℎ𝑒𝑟𝑠.   
(alpha = 0.05)


3. Deciding on the Statistical Test:  
Here we have the weight gain values in real numbers, as floating values, so we can say that we have continous as the scale of measurement.
Then we should check if the data is normally distrubuted before we choose test statistics.   

    The sample sizes are quite a small. Therefore, we can use the Shapiro-Wilk's W test to check normality. As alpha=0.05 if the calculated p-value > alpha it can be said that we have a normally distributed data.   

In [16]:
only_breast=[794.1, 716.9, 993. , 724.7, 760.9, 908.2, 659.3 , 690.8, 768.7, 717.3 , 630.7, 729.5, 714.1, 810.3, 583.5, 679.9, 865.1]
only_formula=[ 898.8, 881.2, 940.2, 966.2, 957.5, 1061.7, 1046.2, 980.4, 895.6, 919.7, 1074.1, 952.5, 796.3, 859.6, 871.1 , 1047.5, 919.1 , 1160.5, 996.9]
both=[976.4, 656.4, 861.2, 706.8, 718.5, 717.1, 759.8, 894.6, 867.6, 805.6, 765.4, 800.3, 789.9, 875.3, 740. , 799.4, 790.3, 795.2 , 823.6, 818.7, 926.8, 791.7, 948.3]
print("Number of obervations for only breast:",len(only_breast))
print("Number of obervations for only formula:",len(only_formula))
print("Number of obervations for both:",len(both))

Number of obervations for only breast: 17
Number of obervations for only formula: 19
Number of obervations for both: 23


In [17]:
test_stat_shapiro_a1, p_value_shapiro_a1=stats.shapiro(only_breast)
test_stat_shapiro_a2, p_value_shapiro_a2=stats.shapiro(only_formula)
test_stat_shapiro_a3, p_value_shapiro_a3=stats.shapiro(both)
print("only_breast p value:%.6f" % p_value_shapiro_a1)
print("only_formula p value:%.6f" % p_value_shapiro_a2)
print("both p value:%.6f" % p_value_shapiro_a3)

only_breast p value:0.469420
only_formula p value:0.887907
both p value:0.797288


In [18]:
resultKS_breast = stats.kstest(only_breast, cdf='norm',alternative="two-sided")
resultKS_formula = stats.kstest(only_formula, cdf='norm',alternative="two-sided")
resultKS_both = stats.kstest(both, cdf='norm',alternative="two-sided")
print("kstest: ", resultKS_breast)
print("kstest: ", resultKS_formula)
print("kstest: ", resultKS_both)

kstest:  KstestResult(statistic=1.0, pvalue=0.0)
kstest:  KstestResult(statistic=1.0, pvalue=0.0)
kstest:  KstestResult(statistic=1.0, pvalue=0.0)


3. ...  
    For each group p-value is greater than alpha. It can be said that we have normally distributed data.  
    
    There are three different groups we experimented with. The type of stud is unpaired. There is only one independent variable which is the formula consumption. Therefore, we will use **One-Way ANOVA**.

4. Specification of Level of Significance  
alpha = 0.05 with the confidence of 95% as stated in the question

5. Performance of the Test Statistics:

In [19]:
df_only_breast = pd.DataFrame({'formula_consumption': only_breast, 'type': list(map(lambda x: 'only breast', range(len(only_breast)))) })
print(df_only_breast.head())
print(df_only_breast.describe()) 

   formula_consumption         type
0                794.1  only breast
1                716.9  only breast
2                993.0  only breast
3                724.7  only breast
4                760.9  only breast
       formula_consumption
count            17.000000
mean            749.823529
std             102.007062
min             583.500000
25%             690.800000
50%             724.700000
75%             794.100000
max             993.000000


In [20]:
df_only_formula = pd.DataFrame({'formula_consumption': only_formula, 'type': list(map(lambda x: 'only formula', range(len(only_formula))))})
print(df_only_formula.head())
print(df_only_formula.describe()) 

   formula_consumption          type
0                898.8  only formula
1                881.2  only formula
2                940.2  only formula
3                966.2  only formula
4                957.5  only formula
       formula_consumption
count            19.000000
mean            959.215789
std              89.159004
min             796.300000
25%             897.200000
50%             952.500000
75%            1021.550000
max            1160.500000


In [21]:
df_both = pd.DataFrame({'formula_consumption': both, 'type': list(map(lambda x: 'both', range(len(both))))})
print(df_both.head())
print(df_both.describe()) 

   formula_consumption  type
0                976.4  both
1                656.4  both
2                861.2  both
3                706.8  both
4                718.5  both
       formula_consumption
count            23.000000
mean            809.952174
std              79.859546
min             656.400000
25%             762.600000
50%             799.400000
75%             864.400000
max             976.400000


In [22]:
alpha = 0.05
fscore, pvalue = stats.f_oneway(only_breast, only_formula, both)

print("f-score = {}, p-value = {}".format(fscore, pvalue))

f-score = 26.701251857537958, p-value = 7.18623550288582e-09


In [23]:
f_crit = scipy.stats.f.ppf(q=1-0.05, dfn=2, dfd=56)
print("f-critical = {}".format(f_crit))

f-critical = 3.161861164913022


6. Statistical Decision:  
    p value is smaller than alpha value of 0.05. Therefore, we reject the $H_{0}$.
    
    Also, further the f-score is from 1 the more likely it represents significant differences.
    
    When we check f-critical from the table with the following values;
    Degrees of freedom: numerator	2  
    Degrees of freedom: denominator	56  
    Probability level:	0.05  
    Critical value of F:	3.16  
    
    We see that f-score > f-critical. Calculated f-score is in the rejection area as well.


7. Interpretation of the Results:  
    We reject the claim that there is no difference in the means of three groups. Average monthly gain of the babies breastfed only, formula consumpted only or both breastfed and formula consumpted are not the same. In other words, at least one of the group's monthly weight gain average is different. But we don't know which group differs. Therefore, we will carry out post hoc test for our parametric ANOVA test.

In [24]:
df_concat = pd.concat([df_only_breast, df_only_formula, df_both]) 
#df_concat

In [25]:
#anova calculations once again:
lm = sfa.ols('formula_consumption ~ C(type)', data=df_concat).fit()
anova = sa.stats.anova_lm(lm)
print(anova)

            df         sum_sq        mean_sq          F        PR(>F)
C(type)    2.0  429013.775232  214506.887616  26.701252  7.186236e-09
Residual  56.0  449880.993243    8033.589165        NaN           NaN


In [26]:
#posthoc test
sp.posthoc_ttest(df_concat, val_col='formula_consumption', group_col='type', p_adjust='holm')

Unnamed: 0,only breast,only formula,both
only breast,1.0,4.702207e-07,0.043151
only formula,4.702207e-07,1.0,2e-06
both,0.04315143,2.338495e-06,1.0


7. ...  
When Student T test applied pairwisely, we can compare p values of pairs. It can be said that none of the means are equal to the other. Therefore, $μ_{only  breast}$ $\neq$ $μ_{only  formula}$ $\neq$ $μ_{both}$. Average weight gain of the babies in three different formula consumption level is different from the other. 

# Q4.

A medical company wants to test whether its new glucose meter kit is measuring correctly. For this reason, an experiment was organized and blood samples were taken from 12 participants. By using these samples, glucose levels of participants were measured with thr both newly developed glucose meter kit and standard lab methods.

standard_lab_method=[102.4, 123.7, 106.2, 149.2, 158.1, 159.3, 150.7, 151. , 140.8, 143.3, 112.7,  99.5, 158.8]    
new_meter=[105.2,  110.1, 120. , 142.2, 160.4, 138. ,  147.7, 137.8, 153.5, 160.8,  90.4, 104.6, 170. ]

**According to this information, conduct hypothesis testing to check whether there is a difference between measurements of two methods by using a 0.05 significance level. Before doing hypothesis testing, check the related assumptions. Comment on the results.**

1. Statement of the Question:  
Is there a **difference** between measurements of two methods?

2. Hypotheses:  
$H_{0}$: There is no difference between the measurements. (Measured glucose levels are equal)   
$H_{a}$: There is a difference between the measurements. (Measured glucose levels are not equal) 


3. Deciding on the Statistical Test:  
Here we have the glucose levels in real numbers, as floating values, so we can say that we have continous as the scale of measurement.
Then we should check if the data is normally distrubuted before we choose test statistics.   

    The sample sizes are quite a small. Therefore, we can use the Shapiro-Wilk's W test to check normality. As alpha=0.05 if the calculated p-value > alpha it can be said that we have a normally distributed data.   

In [27]:
standard_lab_method=[102.4, 123.7, 106.2, 149.2, 158.1, 159.3, 150.7, 151. , 140.8, 143.3, 112.7, 99.5, 158.8]
new_meter=[105.2, 110.1, 120. , 142.2, 160.4, 138. , 147.7, 137.8, 153.5, 160.8, 90.4, 104.6, 170. ]
#print("Number of obervations for only formula:",len(standard_lab_method))
#print("Number of obervations for both:",len(new_meter))

In [28]:
test_stat_shapiro_std, p_value_shapiro_std=stats.shapiro(standard_lab_method)
test_stat_shapiro_new, p_value_shapiro_new=stats.shapiro(new_meter)
print("standard_lab_method p value:%.6f" % p_value_shapiro_std)
print("new_meter p value:%.6f" % p_value_shapiro_new)

standard_lab_method p value:0.033830
new_meter p value:0.472728


3. ...  
    For the standard lab method observations p value is smaller than alpha. It can be said that the data is not normally distributed but at least ordinally scaled. Therefore, the test is non-parametric.
    
    We want to learn how closely an observed distribution matches an expected distribution. Whether there is a difference between for the sample's observation in two different conditions. The design of the study is paired.  Therefore, we will use **Wilcoxon signed rank test**.

4. Specification of Level of Significance  
alpha = 0.05 with the confidence of 95% as stated in the question

5. Performance of the Test Statistics:

In [29]:
standard_lab_method_data = pd.DataFrame({'standard_lab_method': standard_lab_method, 'new meter': new_meter})
#print(standard_lab_method_data.head())
print(standard_lab_method_data.describe()) 

       standard_lab_method   new meter
count            13.000000   13.000000
mean            135.053846  133.900000
std              22.866628   25.394652
min              99.500000   90.400000
25%             112.700000  110.100000
50%             143.300000  138.000000
75%             151.000000  153.500000
max             159.300000  170.000000


5. ...  
From the descriptive statistics we see that means differ from each other. new meter's standard deviation is smaller than the standard lab method's. We may also add that while minumum glucose level observed with new meter is smaller than the standard method, maximum level observed is greater. Therefore, new meter has a greater reange.

In [30]:
stats.wilcoxon(new_meter, standard_lab_method)

WilcoxonResult(statistic=41.0, pvalue=0.786865234375)

6. Statistical Decision:  
    p value is greater than alpha value of 0.05. Therefore, we fail to reject the Null Hypothesis. 

7. Interpretation of the Results:  
    We can not reject the that claim glucose levels measured with two methods are equal, as there is no evidence of difference observed. However, that does not means that our alternative hypothesis is correct. No difference was found in the observation, but the we do not know whether actual differences exist or not.

# Q5.
A newly graduate of Computer Science is interested in the entry-level data analyst position salaries of defense industry firms and game firms. To investigate whether there is a difference between them, she selected 13 job advertisements from Linkedin, Glassdoor, and Monster Jobs randomly in each of the two industries and records their annual salaries (in K USD).The data as below.

game=[80., 72., 70., 58., 68., 67., 69., 66., 69., 67., 62., 75., 95.]    
defense=[72., 79., 73., 76., 74., 78., 74., 72., 76., 79., 73., 72., 94.]

**According to this information, conduct the hypothesis testing to check whether there is a difference between the average salary of entry-level data analyst salary in these two industries by using a 0.05 significance level. Before doing hypothesis testing, check the related assumptions. Comment on the results.**

1. Statement of the Question:  
Is there a **difference** between the average salary of entry-level data analyst salary in these two industries?  

2. Hypotheses:  
$H_{0}$: $μ_{salary-game}$ = $μ_{salary-defence}$   
There is no difference between salary means (Average salaries are equal)  
$H_{a}$: $μ_{salary-game}$ $\neq$ $μ_{salary-defence}$   
There is a difference between salary means (Average salaries are not equal)

    We are interested in if there is a difference. So we will have a two tailed test.

3. Deciding on the Statistical Test:  
Here we have the salaries as integers, it can be said that the scale of measurement is discrete. 

    The sample sizes are quite a small. Therefore, we can use the Shapiro-Wilk's W test to check normality. As alpha=0.05 if the calculated p-value > alpha it can be said that we have a normally distributed data.   

In [31]:
game=[80., 72., 70., 58., 68., 67., 69., 66., 69., 67., 62., 75., 95.]
defense=[72., 79., 73., 76., 74., 78., 74., 72., 76., 79., 73., 72., 94.]

In [32]:
test_stat_shapiro_game, p_value_shapiro_game=stats.shapiro(game)
test_stat_shapiro_def, p_value_shapiro_def=stats.shapiro(defense)
print("game p value:%.6f" % p_value_shapiro_game)
print("defense p value:%.6f" % p_value_shapiro_def)

game p value:0.034390
defense p value:0.000528


3. ...  
    For both group observations p value is smaller than alpha. It can be said that the data is not normally distributed. Therefore we will be  looking for unparametric test.
    
    We have two groups an they are independent from each other, so the type of design study is unpaired. Therefore, we will use **Mann-Whitney U Test**.

4. Specification of Level of Significance  
alpha = 0.05 with the confidence of 95% as stated in the question

5. Performance of the Test Statistics:

In [33]:
job_data = pd.DataFrame({'game': game, 'defence': defense})
#print(after_supply_data.head())
print(job_data.describe()) 

            game    defence
count  13.000000  13.000000
mean   70.615385  76.307692
std     9.115358   5.907405
min    58.000000  72.000000
25%    67.000000  73.000000
50%    69.000000  74.000000
75%    72.000000  78.000000
max    95.000000  94.000000


5. ...  
From the descriptive statistics we see that means differ from each other. defense industry salary mean is greater than game. We may also add minumum salary observed in the defense sample is greater than the games. Standard deviation in the game industry is greater.

In [34]:
stats.mannwhitneyu(x=game, y=defense)

MannwhitneyuResult(statistic=33.5, pvalue=0.004704070006313247)

6. Statistical Decision:  
    p value is way smaller than alpha value of 0.05. Therefore, we reject the Null Hypothesis. 

7. Interpretation of the Results:  
    We reject the claim that average of game and defense industry is equal. Observed job advertisements imply that those two industries have different average salaries.