# Questions for Lesson 5: Hypothesis Testing

In this question we will be using a dataset from the American Potato Journal which rates Oregon-grown Russet potatoes on texture, flavour, moistness for potatoes of two sizes, from two areas, two holding temperatures, four holding periods and five cooking methods. Each score is an average of 20 scores.[1]

The code below imports the needed libraries and aggregates the data into 5 different *DataFrames* of the five cooking methods. Each of the cooking method has its own *DataFrame* which includes all the texture, flavour and moistness scores.

In [10]:
import pandas as pd
from scipy import stats

In [20]:
potato = pd.read_csv("https://raw.githubusercontent.com/ThomasJewson/datasets/master/PotatoesQuality/potatoes_full.csv")
potato = potato.drop(["Size","Growth Area","2 Week Holding Temp","Storage Period"],1)

boil = potato[potato["Cooking method"] == 1].drop(["Cooking method"],1)    # Boiled Potatoes
steam = potato[potato["Cooking method"] == 2].drop(["Cooking method"],1)   # Steamed Potatoes
mash = potato[potato["Cooking method"] == 3].drop(["Cooking method"],1)    # Mash Potatoes
bake180 = potato[potato["Cooking method"] == 4].drop(["Cooking method"],1) # Baked Potatoes at 180C
bake230 = potato[potato["Cooking method"] == 5].drop(["Cooking method"],1) # Baked Potatoes at 230C

Below, we are outputting the boil *DataFrame* so you can understand the structure of the data. 

In [21]:
boil

Unnamed: 0,Texture score,Flavour Score,Moistness score
0,2.9,3.2,3.0
5,1.8,3.0,1.7
10,1.8,2.6,1.5
15,2.6,3.1,2.4
20,3.1,3.0,2.8
25,1.8,2.6,1.8
30,1.9,3.0,1.8
35,1.5,2.6,1.3
40,2.8,2.6,3.0
45,2.3,2.9,1.9


**Question 1:**

We want to know whether baking potatoes at 180C or 230C makes a difference to the flavour. 

*i. Write a null hypothesis (H0) and an alternative hypothesis (H1) for the T-test we are going to perform on this dataset*

*ii. Produce a variable which contains the mean of the `bake230["Flavour Score"]` and another variable which contains the population of `bake180["Flavour Score"]`*

In [73]:
bake180pop = bake180["Flavour Score"]

In [72]:
bake230mean = bake230["Flavour Score"].mean()

H0 : The mean flavour score of baking potatoes at 180C and 230C is the same.

H1 : The mean flavour scoer of baking potatoes at 180C and 230C are not the same. 

**Question 2:**

*i. Conduct a T-test on your mean of `bake230["Flavour Score"]` and the population of `bake180["Flavour Score"]` saving the p-value into a variable called `pval`*

*ii. Compare your `pval` to ${0.05}$ and write a conclusion to our T-test.*

In [74]:
pval = stats.ttest_1samp(bake180pop,bake230mean).pvalue
pval

1.0095849692780624e-05

In [75]:
pval < 0.05

True

In [76]:
pval > 0.05

False

We reject our null hypothesis (H0) as there is evidence for our alternative hypothesis (H1). Therefore, the mean flavour scores are not equal. 

**Question 3:**

*Produce an `if` and `else` statement that will use the `print()` function to output a conclusion to a T-test. Comparing your `pval` variable to ${0.05}$.*

In [78]:
if pval < 0.05:
    print("We reject our null hypothesis (H0) as there is evidence for our alternative hypothesis (H1).")
    print("Therefore, the means are not eqaul.")
else:
    print("We accept our null hypothesis (H0) as there is insufficient evidence for our alternative hypothesis (H1).")
    print("Therefore, the means are the equal.")

We reject our null hypothesis (H0) as there is evidence for our alternative hypothesis (H1).
Therefore, the means are not eqaul.


**Question 4:**

*Define your own function using `def` that will automatically output the conclusion to a T-test.*

In [79]:
def ttest(data,mean):
    """This function outputs which hypothesis is accepted or rejected in a T-test.
    
    More precisely, this function runs the stats.ttest_1_samp() function to
    obtain a p-value which it compares to a significance level of 95%. Then 
    it outputs whether the H0 or the H1 are accepted or rejected.
    
    """
    pval = stats.ttest_1samp(data,mean).pvalue
    
    if pval < 0.05:
        print("We reject our null hypothesis (H0) as there is evidence for our alternative hypothesis (H1).")
        print("Therefore, the means are not eqaul.")
    else:
        print("We accept our null hypothesis (H0) as there is insufficient evidence for our alternative hypothesis (H1).")
        print("Therefore, the means are the equal.")
    return

**Question 5:**

*Use the function you made in question 4 to reason whether or not steaming and boiling potatoes leads to the same moistness score.*

In [81]:
boilmean = boil["Moistness score"].mean()
steampop = steam["Moistness score"]

In [83]:
ttest(steampop,boilmean) q

We reject our null hypothesis (H0) as there is evidence for our alternative hypothesis (H1).
Therefore, the means are not eqaul.


Sources:

[1] A. Mackey and J. Stockman (1958). "Cooking Quality of Oregon-Grown
Russet Potatoes", American Potato Journal, Vol.35, pp395-407