### Importing the necessary libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as stats

### Q1. Samy, Product Manager of K2 Jeans, wants to Launch a product line into a new market area. A survey of a random sample of 400 households in that market showed a mean income of 30000 rupees per household. The standard deviation based on an earlier pilot study is 8000 rupees. Samy strongly believes that the product line will be adequately profitable only in markets where the mean household income is greater than 29000 rupees. Samy wants your help in deciding whether the product line should be introduced in the new market. Perform statistical analysis with a significance level of 0.05 and conclude.

Null Hypothesis: The mean income of households is less than or equal to 29000 

$ H_o: \mu\leq 29000 $

Alternate Hypothesis: The mean income of households is greater than 29000

$ H_a: \mu > 29000 $

x̅ = 30000 

μ = 29000

σ = 8000

n = 400

𝑍 = (x̅ − μ)/ (σ/(sqrt(n)) = (30000 – 29000) / (8000/sqrt(400))= 2.5

In [2]:
# Calculating the p-value for Z-stat=2.5
1 - stats.norm.cdf(2.5)

0.006209665325776159

**Insight**

The p-value is equal to 0.006 which is less than the significance level of 0.05, we reject the null hypothesis and conclude that the mean income of households is greater than 29000.

# One-sample t-test

### Q2. The average mass of all acorns is 10 g. The mass of 20 acorns collected from a forest, subjected to acid rain from a coal power plant, are m = 8.8, 6.6, 9.5, 11.2, 10.2, 7.4, 8.0, 9.6, 9.9, 9.0, 7.6, 7.4, 10.4, 11.1, 8.5, 10.0, 11.6, 10.7, 10.3, and 7.0 g. Is there enough statistical evidence to conclude that the average mass of this sample is different from the average mass of acorns with a significance level of 0.05?

**a) Formulate the null and alternate hypotheses**

* H<sub>0</sub>: x&#772; - &mu; = 0, There is no difference between sample mean and the value of &mu;.
* H<sub>a</sub>: x&#772; - &mu; &ne; 0, There is difference between sample mean and the value of &mu;.

**b) Calculate the test-statistics and based on the p-value provide a conclusion.**

In [3]:
x = [8.8, 6.6, 9.5, 11.2, 10.2, 7.4, 8.0, 9.6, 9.9, 9.0,
     7.6, 7.4, 10.4, 11.1, 8.5, 10.0, 11.6, 10.7, 10.3, 7.0]

mu = 10

t, p = stats.ttest_1samp(x, mu)
print("tstats = ", t, ", p-value = ", p)

tstats =  -2.2491611580763973 , p-value =  0.03655562279112415


Since the p-value (0.03) < 0.05 (alpha), we reject the null hypothesis and conclude that the sample mean is not equal to the population mean, i.e., the average mass of acorns is not equal to &mu; = 10.

# Independent (unpaired) two-sample t-test

### Q3. The mass of N<sub>1</sub>=20 acorns from oak trees upwind from a coal power plant and N<sub>2</sub>=30 acorns from oak trees downwind from the same coal power plant are measured. Is the mass of acorns from trees downwind different from the ones from upwind? 

**Note:** 
- The sample sizes are not equal but we will assume that the population variance of sample 1 and sample 2 are equal to satisfy the assumptions.
- Since the significance level is not provided. We can assume it to be 0.05

#### sample upwind:
x1 = [10.8, 10.0, 8.2, 9.9, 11.6, 10.1, 11.3, 10.3, 10.7, 9.7, 
      7.8, 9.6, 9.7, 11.6, 10.3, 9.8, 12.3, 11.0, 10.4, 10.4]

#### sample downwind:
x2 = [7.8, 7.5, 9.5, 11.7, 8.1, 8.8, 8.8, 7.7, 9.7, 7.0, 
      9.0, 9.7, 11.3, 8.7, 8.8, 10.9, 10.3, 9.6, 8.4, 6.6,
      7.2, 7.6, 11.5, 6.6, 8.6, 10.5, 8.4, 8.5, 10.2, 9.2]

**a) Formulate null and alternate hypotheses.**

* Ho: x1 = x2 -> x1 - x2 = 0 - There is no difference between the means
* Ha: x1 != x2 -> x1 - x2 != 0 - There is difference between the means


**b) Calculate the test-statistic and based on p-value provide a conclusion.**

In [4]:
# sample upwind
x1 = [10.8, 10.0, 8.2, 9.9, 11.6, 10.1, 11.3, 10.3, 10.7, 9.7, 
      7.8, 9.6, 9.7, 11.6, 10.3, 9.8, 12.3, 11.0, 10.4, 10.4]

# sample downwind
x2 = [7.8, 7.5, 9.5, 11.7, 8.1, 8.8, 8.8, 7.7, 9.7, 7.0, 
      9.0, 9.7, 11.3, 8.7, 8.8, 10.9, 10.3, 9.6, 8.4, 6.6,
      7.2, 7.6, 11.5, 6.6, 8.6, 10.5, 8.4, 8.5, 10.2, 9.2]


t, p_value = stats.ttest_ind(x2, x1)
print("tstats = ",t, ", p_value = ", p_value)

tstats =  -3.5981947686898033 , p_value =  0.0007560337478801464


**Insight**

The significance level is not given. So, we can assume it to be 0.05.
As the p-value (0.0007) < 0.05 (alpha), we reject the null hypothesis and conclude that there is a difference in the two means.

# Paired samples t-test

### Q4. The average mass of acorns from the same N=30 trees downwind of a power plant is measured before (x<sub>1</sub>) and after (x<sub>2</sub>) the power plant converts burning coal to burning natural gas. Does the mass of the acorns change after the conversion from coal to natural gas? 

**Note**: Since the significance level is not provided. We can assume it to be 0.05

### sample before conversion to natural gas
x1 = np.array([10.8, 6.4, 8.3, 7.6, 11.4, 9.9, 10.6, 8.7, 8.1, 10.9,
      11.0, 11.8, 7.3, 9.6, 9.3, 9.9, 9.0, 9.5, 10.6, 10.3,
      8.8, 12.3, 8.9, 10.5, 11.6, 7.6, 8.9, 10.4, 10.2, 8.8])

### sample after conversion to natural gas
x2 = np.array([10.1, 6.9, 8.6, 8.8, 12.1, 11.3, 12.4, 9.3, 9.3, 10.8,
      12.4, 11.5, 7.4, 10.0, 11.1, 10.6, 9.4, 9.5, 10.0, 10.0,
      9.7, 13.5, 9.6, 11.6, 11.7, 7.9, 8.6, 10.8, 9.5, 9.6])

**a) Formulate null and alternate hypotheses.**

* H<sub>0</sub>: x2 - x1 = 0 - The mean difference between the two samples is equal to zero
* H<sub>1</sub>: x2 - x1 != 0 - The mean difference between the two samples is not equal to zero

**b) Calculate the test-statistic and based on p-value provide a conclusion.**

In [5]:
# sample before conversion to natural gas
x1 = np.array([10.8, 6.4, 8.3, 7.6, 11.4, 9.9, 10.6, 8.7, 8.1, 10.9,
      11.0, 11.8, 7.3, 9.6, 9.3, 9.9, 9.0, 9.5, 10.6, 10.3,
      8.8, 12.3, 8.9, 10.5, 11.6, 7.6, 8.9, 10.4, 10.2, 8.8])
# sample after conversion to natural gas
x2 = np.array([10.1, 6.9, 8.6, 8.8, 12.1, 11.3, 12.4, 9.3, 9.3, 10.8,
      12.4, 11.5, 7.4, 10.0, 11.1, 10.6, 9.4, 9.5, 10.0, 10.0,
      9.7, 13.5, 9.6, 11.6, 11.7, 7.9, 8.6, 10.8, 9.5, 9.6])


t, p = stats.ttest_1samp(x2 - x1, 0)
print("tstats = ", t, ", p-value = ", p) 

tstats =  3.905439081326491 , p-value =  0.0005168689824684378


**Insight**

Since the p-value (0.0005) < 0.05 (alpha), we reject the null hypothesis and conclude that the mean difference between the two populations is not equal to zero and there is a significant difference between them.