### Importing the necessary libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as stats

# 1. One-sample t-test 

The mass of a sample of N = 20 acorns from a forest subjected to acid rain from a coal power plant are m = [8.8, 6.6, 9.5, 11.2, 10.2, 7.4, 8.0, 9.6, 9.9, 9.0, 7.6, 7.4, 10.4, 11.1, 8.5, 10.0, 11.6, 10.7, 10.3, and 7.0 g ]

Does this sample provide enough evidence (alpha = 0.05) to say that the average mass of all acorns is different from 10 g?

**a) Formulate the null and alternate hypothesis**

$H_0$:  μ = 10, The average mass of the acorns is 10 g

$H_a$:  μ $\neq$ 10, The average mass of the acorns is different from 10 g

**b) Calculate the test-statistic and based on the p-value provide a conclusion.**

In [2]:
x = [8.8, 6.6, 9.5, 11.2, 10.2, 7.4, 8.0, 9.6, 9.9, 9.0,
     7.6, 7.4, 10.4, 11.1, 8.5, 10.0, 11.6, 10.7, 10.3, 7.0]

mu = 10

t, p = stats.ttest_1samp(x, popmean = mu)
print("tstat = ", t, ", p-value = ", p)

tstat =  -2.2491611580763973 , p-value =  0.03655562279112415


Since p-value(~0.04) < 0.05 (alpha), we reject the null hypothesis and conclude that the average mass of the acorns is different from 10 g.

# 2. Independent (unpaired) two-sample t-test

The mass of N<sub>1</sub> = 20 acorns from oak trees up wind from a coal power plant and N<sub>2</sub> = 30 acorns from oak trees down wind from the same coal power plant are measured. Is the mass of acorns from trees down wind different from the ones from up wind at a significance level of 0.05? The sample sizes are not equal but we will assume that the population variance for sample 1 and sample 2 are equal.

#### sample up wind:
x1 = [10.8, 10.0, 8.2, 9.9, 11.6, 10.1, 11.3, 10.3, 10.7, 9.7, 
      7.8, 9.6, 9.7, 11.6, 10.3, 9.8, 12.3, 11.0, 10.4, 10.4]

#### sample down wind:
x2 = [7.8, 7.5, 9.5, 11.7, 8.1, 8.8, 8.8, 7.7, 9.7, 7.0, 
      9.0, 9.7, 11.3, 8.7, 8.8, 10.9, 10.3, 9.6, 8.4, 6.6,
      7.2, 7.6, 11.5, 6.6, 8.6, 10.5, 8.4, 8.5, 10.2, 9.2]


**a) Formulate the null and alternate hypothesis.**

$H_0$: $\mu_1 = \mu_2$, There is no difference between the average mass of acorns from down wind and up wind trees

$H_a$: $\mu1 \neq \mu2$, There is a difference between the average mass of acorns from down wind and up wind trees


**b) Calculate the test-statistic and based on the p-value provide a conclusion.**

In [3]:
# sample up wind
x1 = [10.8, 10.0, 8.2, 9.9, 11.6, 10.1, 11.3, 10.3, 10.7, 9.7, 
      7.8, 9.6, 9.7, 11.6, 10.3, 9.8, 12.3, 11.0, 10.4, 10.4]

# sample down wind
x2 = [7.8, 7.5, 9.5, 11.7, 8.1, 8.8, 8.8, 7.7, 9.7, 7.0, 
      9.0, 9.7, 11.3, 8.7, 8.8, 10.9, 10.3, 9.6, 8.4, 6.6,
      7.2, 7.6, 11.5, 6.6, 8.6, 10.5, 8.4, 8.5, 10.2, 9.2]


t, p_value = stats.ttest_ind(x1, x2)
print("tstat = ",t, ", p_value = ", p_value)

tstat =  3.5981947686898033 , p_value =  0.0007560337478801464


Since p-value(0.0007) < 0.05 (alpha), we reject the null hypothesis and conclude that there is a difference in the two means.

# 3. Paired samples t-test

The average mass of acorns from the same N = 30 trees downwind of a power plant is measured before (x<sub>1</sub>) and after (x<sub>2</sub>) the power plant converts from burning coal to burning natural gas. Does the mass of the acorns change after the conversion from coal to natural gas at a significance level of 0.05? 

### sample before conversion to natural gas
x1 = np.array([10.8, 6.4, 8.3, 7.6, 11.4, 9.9, 10.6, 8.7, 8.1, 10.9,
      11.0, 11.8, 7.3, 9.6, 9.3, 9.9, 9.0, 9.5, 10.6, 10.3,
      8.8, 12.3, 8.9, 10.5, 11.6, 7.6, 8.9, 10.4, 10.2, 8.8])

### sample after conversion to natural gas
x2 = np.array([10.1, 6.9, 8.6, 8.8, 12.1, 11.3, 12.4, 9.3, 9.3, 10.8,
      12.4, 11.5, 7.4, 10.0, 11.1, 10.6, 9.4, 9.5, 10.0, 10.0,
      9.7, 13.5, 9.6, 11.6, 11.7, 7.9, 8.6, 10.8, 9.5, 9.6])



**a) Formulate the null and alternate hypothesis.**

* H<sub>0</sub>: $\mu_1 = \mu_2$ (The average mass of the acorns before and after conversion is not different)
* H<sub>1</sub>: $\mu_1 \neq \mu_2$ (The average mass of the acorns before and after conversion is different)

**b) Calculate the test-statistic and based on the p-value provide a conclusion.**

In [4]:
# sample before conversion to natural gas
x1 = np.array([10.8, 6.4, 8.3, 7.6, 11.4, 9.9, 10.6, 8.7, 8.1, 10.9,
      11.0, 11.8, 7.3, 9.6, 9.3, 9.9, 9.0, 9.5, 10.6, 10.3,
      8.8, 12.3, 8.9, 10.5, 11.6, 7.6, 8.9, 10.4, 10.2, 8.8])
# sample after conversion to natural gas
x2 = np.array([10.1, 6.9, 8.6, 8.8, 12.1, 11.3, 12.4, 9.3, 9.3, 10.8,
      12.4, 11.5, 7.4, 10.0, 11.1, 10.6, 9.4, 9.5, 10.0, 10.0,
      9.7, 13.5, 9.6, 11.6, 11.7, 7.9, 8.6, 10.8, 9.5, 9.6])


t, p = stats.ttest_rel(x1, x2)
print("tstat = ", t, ", p-value = ", p) 

tstat =  -3.905439081326491 , p-value =  0.0005168689824684378


Since p-value(0.0005) < 0.05 (alpha), we reject the null hypothesis and conclude that there is a significant difference between the means of the acorns before and after the power plant converts from burning coal to burning natural gas.


# 4. ANOVA test

The marks obtained by 5 randomly picked students in Mathematics exam from three sections A, B, and C are as follows:

### Marks of 5 randomly picked students from Section A
A = [51, 45, 33, 45, 67]

### Marks of 5 randomly picked students from Section B
B = [23, 43, 23, 43, 45]

### Marks of 5 randomly picked students from Section C
C = [56, 76, 74, 87, 56]

Does the sample provide enough evidence to say that the mean marks of students in the three sections are different?

**a) Formulate the null and alternate hypothesis.**

$H_0$: The mean marks of students in the three sections are equal

$H_a$: At least one of the mean marks of students in the three sections is unequal.

**b) Calculate the test-statistic and based on the p-value provide a conclusion.**

In [5]:
# marks of students in section A
A = np.array([51, 45, 33, 45, 67])

# marks of students in section B
B = np.array([23,43,23,43,45])

# marks of students in section C
C = np.array([56, 76, 74, 87, 56])

# performing a one-way ANOVA test
t, p = stats.f_oneway(A, B, C)
print("tstat = ", t, ", p-value = ", p)

tstat =  9.747205503009463 , p-value =  0.0030597541434430556


Since p-value(0.003) < 0.05 (alpha), we reject the null hypothesis and conclude that atleast one of the mean marks of students in the three sections is unequal.