<a href="https://colab.research.google.com/github/Nurlyssultan/ML-DS-Cheat-Sheet/blob/main/Hypothesis_Testing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Libraries and data

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
%cd /content/drive/MyDrive/Statistics with Python/Inferential Statistics/Hypothesis Testing

[Errno 2] No such file or directory: '/content/drive/MyDrive/Statistics with Python/Inferential Statistics/Hypothesis Testing'
/content


In [None]:
# Libraries
import pandas as pd
import numpy as np
import scipy.stats as st
import statsmodels.stats.weightstats as sm

# Hypothesis Testing


Hypothesis testing is a statistical method used to make inferences about a population based on a sample. It involves testing a hypothesis about a population parameter (such as the mean or proportion) based on the information contained in a sample.

The steps involved in hypothesis testing typically include:

1. **Formulating the null and alternative hypotheses:**
    - The null hypothesis (H0) represents the claim or assumption that is being tested. It usually states that there is no significant difference or effect.
    - The alternative hypothesis (H1) represents the claim or assumption that is being challenged. It usually states that there is a significant difference or effect.

2. **Collecting and analyzing sample data:**
    - A sample is randomly selected from the population of interest.
    - Statistical methods are used to analyze the sample data and calculate a test statistic.

3. **Comparing the test statistic to a critical value:**
    - A critical value is determined based on the chosen significance level (alpha) and the degrees of freedom.
    - If the test statistic falls within the critical region (rejection region), the null hypothesis is rejected.
    - If the test statistic falls outside the critical region, the null hypothesis is not rejected.

4. **Interpreting the results:**
    - If the null hypothesis is rejected, it means that there is sufficient evidence to support the alternative hypothesis.
    - If the null hypothesis is not rejected, it means that there is not enough evidence to support the alternative hypothesis.

Hypothesis testing allows researchers to make informed decisions about population parameters based on limited sample information. It provides a framework for evaluating the strength of evidence against the null hypothesis and making inferences about the population.


Hypothesis testing (by my word) - make inferences about a population from a sample.
p-value - is the probablility of "null hypothesis".
a lower there is a value means a low chance to be true.
So, if the p value is less than 0.05 - called Alpha - then, it means to be wrong/ be rejected.
If the value is higher than 0.05, it means we do not know the result. So, we failed to reject.

Type I error - false positives - null hypothesis is true, but you reject it.
Type II error - null hypothesis is false, but you fail to reject it.


In [None]:
import numpy as np
from scipy import stats

# Sample data: Test scores from 25 students
sample_scores = [68, 75, 72, 70, 73, 69, 65, 74, 76, 72, 70, 71, 75, 73, 68, 66, 71, 70, 73, 75, 74, 72, 73, 71, 69]

# Claimed population mean (null hypothesis)
claimed_mean = 70

# Perform a one-sample t-test using scipy.stats.ttest_1samp
t_statistic, p_value = stats.ttest_1samp(sample_scores, claimed_mean)

# Significance level
alpha = 0.05

# Print the results
print(f"Test statistic (t-value): {t_statistic}")
print(f"P-value: {p_value}")

# Decision based on the p-value
if p_value <= alpha:
    print(f"Reject the null hypothesis at significance level {alpha}")
    print("Conclusion: There is evidence that the average test score is different from 70.")
else:
    print(f"Fail to reject the null hypothesis at significance level {alpha}")
    print("Conclusion: There is no evidence that the average test score is different from 70.")

Test statistic (t-value): 2.449489742783188
P-value: 0.021982997044101837
Reject the null hypothesis at significance level 0.05
Conclusion: There is evidence that the average test score is different from 70.



# Z test formula

$$Z = \frac{\bar{x} - \mu}{\frac{s}{\sqrt{n}}}$$

where:

* $Z$ is the z-score
* $\bar{x}$ is the sample mean
* $\mu$ is the population mean
* $s$ is the sample standard deviation
* $n$ is the sample size

**Example:**

A company claims that the average lifespan of their product is 10 years. A random sample of 50 products has an average lifespan of 9.5 years and a standard deviation of 1.5 years. Test the claim at a significance level of 0.05.

**Step 1:**

State the null and alternative hypotheses:

* Null hypothesis: $H_0: \mu = 10$
* Alternative hypothesis: $H_1: \mu \neq 10$

**Step 2:**

Calculate the z-score:

$$Z = \frac{9.5 - 10}{\frac{1.5}{\sqrt{50}}} = -2.31$$

**Step 3:**

Find the critical values:

$$z_{\alpha/2} = \pm 1.96$$

**Step 4:**

Make a decision:

Since the z-score (-2.31) is less than the critical value (-1.96), we reject the null hypothesis.

**Conclusion:**

There is sufficient evidence to conclude that the average lifespan of the company's product is different from 10 years.

#Useful functions

In [None]:
# Create a function to read the p-value
def p_value_reader(p_value, alpha):
  if p_value < alpha:
    print("Reject the Null Hypothesis")
  else:
    print("Fail to reject the Null Hypothesis")

# 2-Tailed tests with known variance

You have invested thousands of dollars per employee to improve their satisfaction and productivity. Your goal is to improve from the average of 54 cars produced so far, with a corresponding standard deviation (of the population) of 2.
Bruno believes the opposite. That the benefits and other factors like the constant raining are hurting production due to constant sickness. The agreed confidence level between you both is 95%

**Null Hypothesis**: The average number of cars produced is 54

**Alternative Hypothesis**: The average number of cars produced is not 54

In [None]:
# Load data
df_main = pd.read_csv("/content/drive/MyDrive/Statistics with Python 2/Inferential Statistics/Hypothesis Testing/tesla_main.csv")
df_main.head()

Unnamed: 0,Production Date,Defects Found,Cars Produced,Weather Condition,Workers on Shift
0,2023-01-01,3,55,Rainy,20
1,2023-01-02,2,57,Rainy,19
2,2023-01-03,1,54,Rainy,21
3,2023-01-04,0,56,Rainy,22
4,2023-01-05,2,59,Rainy,20


In [None]:
# Info
mean_pop = 54
sd_pop = 2
confidence = 0.95
alpha = 1- confidence
mean_sample = df_main['Cars Produced'].mean()
print(f"The sample mean is {mean_sample}")
sample_size = df_main['Cars Produced'].count()
print(f"The sample size is {sample_size}")

The sample mean is 55.10909090909091
The sample size is 55


In [None]:
# Z Test formula (sample mean - pop mean) / (pop sd ( sqrt(sample size)))
z_score = (mean_sample - mean_pop) / (sd_pop / np.sqrt(sample_size))
print(f"The Z-score is {z_score}")

The Z-score is 4.112619161025777


In [None]:
# Calculate the p_value from the z-score (two tails)
tails = 2
p_value = st.norm.sf(abs(z_score)) * tails
print(f"The p-value is {p_value}")

The p-value is 3.9119543361101206e-05


In [None]:
# Interpret the p_value
if p_value < alpha:
  print("Reject the Null Hypothesis")
else:
  print("Fail to reject the p_value")
p_value_reader(p_value, alpha)

Reject the Null Hypothesis
Reject the Null Hypothesis


In [None]:
# Build a function to compute the z-test
def ztest(mean_pop, mean_sample, sample_size, sd_pop, alpha, tails):
  # Z Test formula (sample mean - pop mean) / (pop sd ( sqrt(sample size)))
  z_score = (mean_sample - mean_pop) / (sd_pop / np.sqrt(sample_size))
  print(f"The Z-score is {z_score}")

  # Calculate the p_value from the z-score (two tails)
  p_value = st.norm.sf(abs(z_score)) * tails
  print(f"The p-value is {p_value}")
  p_value_reader(p_value, alpha)

In [None]:
# Apply the function
ztest(mean_pop, mean_sample, sample_size, sd_pop, alpha, tails)

The Z-score is 4.112619161025777
The p-value is 3.9119543361101206e-05
Reject the Null Hypothesis


# 2-Tailed tests with unknown variance

Social Media has been all over Tesla. The engines from a couple of cars started to catch smoke. Even worse, the cars were from high profile customers. You talk to your employees who tell you that the number of defects is within normal average of 2.2. Bruno asked you to investigate the situation yourself. Since the car production has suffered many changes in the past few months, there is no data about the population.

**Null Hypothesis**: The average number of defects is 2.2

**Alternative hypothesis**: The average number of defects is not 2.2




In [None]:
# Data
df_main.head()

Unnamed: 0,Production Date,Defects Found,Cars Produced,Weather Condition,Workers on Shift
0,2023-01-01,3,55,Rainy,20
1,2023-01-02,2,57,Rainy,19
2,2023-01-03,1,54,Rainy,21
3,2023-01-04,0,56,Rainy,22
4,2023-01-05,2,59,Rainy,20


In [None]:
# Information
target_mean = 2.2
mean_sample = df_main['Defects Found'].mean()
print(f"The sample mean is {mean_sample}")
sample_size = df_main['Defects Found'].count()
print(f"The sample size is {sample_size}")
confidence = 0.95
alpha = 1 -confidence
sample_sd = df_main['Defects Found'].std()
print(f"The SD is {sample_sd}")

The sample mean is 2.3636363636363638
The sample size is 55
The SD is 1.0777829844714388


In [None]:
# Calculate the t-score
t_score = (mean_sample - target_mean) / (sample_sd / np.sqrt(sample_size))
print(f"The T-score is {t_score}")

The T-score is 1.1259778359082033


In [None]:
#Calculate the p_value
tails = 2
p_value = st.t.sf(abs(t_score), df = (sample_size - 1)) * tails
print(f"The p-value is {p_value}")

The p-value is 0.26515424936297255


In [None]:
#Interpret the p_value
p_value_reader(p_value, alpha)

Fail to reject the Null Hypothesis


In [None]:
# How to do the 2-tailed test with unknown pop variance
t_score, p_value = st.ttest_1samp(a = df_main['Defects Found'],
                                  popmean = target_mean,
                                  alternative = 'two-sided')
print(f"T-score: {t_score}")
print(f"p-value: {p_value}")
p_value_reader(p_value, alpha)

T-score: 1.1259778359082033
p-value: 0.26515424936297255
Fail to reject the Null Hypothesis


#2-Tailed Paired T-Test

The Sales department has been very critical of Tesla recently, saying that they have been getting many complains from customers that claim that the car is taking longer than expected. They have even voiced their concerns to Bruno!

Your department, on the other hand, says that the production has been stable over time and going according to plan. They fire back and say that the sales department has been selling too much. You decide to take initiative to see if the production is stable in the last 2 months.







**Null hypothesis**: production of month 1 = production of month 2

**Alternative Hypothesis**: production of month 1 is different than month 2

In [None]:
#Data
df_paired = pd.read_csv("/content/drive/MyDrive/Statistics with Python 2/Inferential Statistics/Hypothesis Testing/tesla_paired.csv")
df_paired.head()

Unnamed: 0,Day,Month 1,Month 2
0,1,58,54
1,2,54,56
2,3,57,55
3,4,55,53
4,5,55,52


**Both samples must have the same number of observations**

In [None]:
# Data
differences = df_paired['Month 2'] - df_paired['Month 1']
mean_difference = differences.mean()
sd_difference = differences.std()
sample_size = differences.count()
print(f"The mean difference is {mean_difference}")

The mean difference is -1.1


In [None]:
# Info of the test
dof = sample_size - 1
tails = 2
confidence = 0.95
alpha = 1- confidence

In [None]:
# Computing the t-score: (x1_avg - x2_avg) / (SD_diff / SQRT(sample size))
t_score = mean_difference / (sd_difference / np.sqrt(sample_size))
print(f"The T-score is {t_score}")

The T-score is -3.1708738954340316


In [None]:
# Compute the p_value
p_value = st.t.sf(abs(t_score), df = dof) * tails
print(f"The p-value is {p_value}")
p_value_reader(p_value, alpha)

The p-value is 0.0035743342552951936
Reject the Null Hypothesis


In [None]:
# Perform a paired t-test with 2 tails
t_score, p_value = st.ttest_rel(df_paired['Month 1'],
                                df_paired['Month 2'],
                                alternative='two-sided')

print(f"T-score: {t_score}")
print(f"p-value: {p_value}")
p_value_reader(p_value, alpha)

T-score: 3.170873895434031
p-value: 0.0035743342552951992
Reject the Null Hypothesis


#2-Tailed Two Sample T-Test

The Tesla factory you manage has two shifts during the day. You are present during shift 2, but not shift 1. Your second in command tells you that shift 1 is doing great.

Of course, you are aware that, to prove to Bruno that both shifts work with similar productivity, you need to show them numbers. Of course, nothing is better than hypothesis testing

**Null Hypothesis**: There is no difference between both shifts

**Alternative Hypothesis**: There is a difference between both shifts

In [None]:
# Data
df_2sample = pd.read_csv("/content/drive/MyDrive/Statistics with Python 2/Inferential Statistics/Hypothesis Testing/tesla_2sample.csv")
df_2sample.head()

Unnamed: 0,Day,Shift 1,Shift 2
0,1,53,49.0
1,2,61,57.0
2,3,72,68.0
3,4,59,47.0
4,5,62,60.0


In [None]:
# Summary statistics
df_2sample.describe()

Unnamed: 0,Day,Shift 1,Shift 2
count,30.0,30.0,29.0
mean,15.5,61.166667,55.0
std,8.803408,6.664799,8.647873
min,1.0,51.0,42.0
25%,8.25,55.25,48.0
50%,15.5,61.0,57.0
75%,22.75,66.75,62.0
max,30.0,72.0,72.0


In [None]:
# Isolate the samples
sample1 = df_2sample['Shift 1'].dropna()
sample2 = df_2sample['Shift 2'].dropna()

In [None]:
# Perform Levene's test
stat, p_value = st.levene(sample1, sample2)
print(f"The p-value is {p_value}")

The p-value is 0.044682721966871876


In [None]:
# Interpret the p-value
alpha = 0.05
if p_value < alpha:
  print("Reject the Null Hypothesis. Variances are unequal. Perform Welch's Test")
else:
  print("Fail to reject the Null Hypothesis. Variances are equal. Perform 2-sample T-test")

Reject the Null Hypothesis. Variances are unequal. Perform Welch's Test


In [None]:
# Performing Welch's test
t_statist, p_value = st.ttest_ind(sample1,
                                  sample2,
                                  equal_var = False,
                                  alternative = 'two-sided')
print(f"The p-value is {p_value}")
p_value_reader(p_value, alpha)

The p-value is 0.0034724013986656174
Reject the Null Hypothesis


In [None]:
# 2 sample t-test (independent samples)
t_statist, p_value = st.ttest_ind(sample1,
                                  sample2,
                                  equal_var = True,
                                  alternative = 'two-sided')
print(f"The p-value is {p_value}")
p_value_reader(p_value, alpha)

The p-value is 0.003237334319433138
Reject the Null Hypothesis


In [None]:
# Exercise -
#Build a function that performs 2 sample Test
#based on the outcome of Levene's test

def test_2sample(sample1, sample2, alpha, alternative):
  #levene's test
  stat, p_value = st.levene(sample1, sample2)
  #interpret the test
  if p_value < alpha:
    equal_var = False
    print("Reject the Null Hypothesis. Variances are unequal. Perform Welch's Test")
  else:
    equal_var = True
    print("Fail to reject the Null Hypothesis. Variances are equal. Perform 2-sample T-test")
  # 2 sample test
  t_statist, p_value = st.ttest_ind(sample1,
                                    sample2,
                                    equal_var = equal_var,
                                    alternative = alternative)
  print(f"The p-value is {p_value}")
  p_value_reader(p_value, alpha)

In [None]:
test_2sample(sample1, sample2, 0.05, 'two-sided')

Reject the Null Hypothesis. Variances are unequal. Perform Welch's Test
The p-value is 0.0034724013986656174
Reject the Null Hypothesis


# 1-Tailed Tests with Known Variance

You have invested thousands of dollars per employee to improve their satisfaction and productivity. Your goal is to improve from the average of 54.5 cars produced so far, with a corresponding standard deviation (of the population) of 2.

Bruno does not believe it and asks for proof. Statistical proof of course :)

**Null Hypothesis**: There is no improvement in productivity

**Alternative Hypothesis**: The sample mean is bigger than the population mean (54.5)

In [None]:
#Data
df_main.head()

Unnamed: 0,Production Date,Defects Found,Cars Produced,Weather Condition,Workers on Shift
0,2023-01-01,3,55,Rainy,20
1,2023-01-02,2,57,Rainy,19
2,2023-01-03,1,54,Rainy,21
3,2023-01-04,0,56,Rainy,22
4,2023-01-05,2,59,Rainy,20


In [None]:
# Data
mean_pop = 54.5
sd_pop = 2
confidence = 0.95
alpha = 1- confidence
mean_sample = df_main['Cars Produced'].mean()
print(f"The sample mean is {mean_sample}")
sample_size = df_main['Cars Produced'].count()
print(f"The sample size is {sample_size}")

The sample mean is 55.10909090909091
The sample size is 55


In [None]:
# Appy the ztest function that we created
ztest(mean_pop, mean_sample, sample_size, sd_pop, alpha, 1)

The Z-score is 2.258569539251862
The p-value is 0.011955087194577932
Reject the Null Hypothesis


# 1-Tailed Test with Unknown Variance

Social Media has been all over Tesla. They say that more and more people are complaining about defects. They claim that improvements are urgently needed. You talk to your employees who tell you that the number of defects is within normal average of 2.4, maybe even better than that.

You decide to investigate the situation yourself. Since the car production has suffered many changes in the past few months, there is no data about the population.

**Null Hypothesis**: The average number of defects is less or equal to 2.4

**Alternative Hypothesis**: The average number of defects is bigger than 2.4 -> Right Tailed test

In [None]:
# Data
df_main.head()

Unnamed: 0,Production Date,Defects Found,Cars Produced,Weather Condition,Workers on Shift
0,2023-01-01,3,55,Rainy,20
1,2023-01-02,2,57,Rainy,19
2,2023-01-03,1,54,Rainy,21
3,2023-01-04,0,56,Rainy,22
4,2023-01-05,2,59,Rainy,20


In [None]:
#Summary statistics
df_main.describe()

Unnamed: 0,Defects Found,Cars Produced,Workers on Shift
count,55.0,55.0,55.0
mean,2.363636,55.109091,20.854545
std,1.077783,2.424205,0.970265
min,0.0,48.0,18.0
25%,2.0,54.0,20.0
50%,2.0,55.0,21.0
75%,3.0,57.0,22.0
max,5.0,59.0,22.0


In [None]:
# How to do the 1-tailed test with unknown pop variance
t_score, p_value = st.ttest_1samp(a = df_main['Defects Found'],
                                  popmean = target_mean,
                                  alternative = 'greater')
print(f"T-score: {t_score}")
print(f"p-value: {p_value}")
p_value_reader(p_value, alpha)

T-score: 1.1259778359082033
p-value: 0.13257712468148627
Fail to reject the Null Hypothesis


# 1-Tailed Paired T-Test

The Sales department has been very critical of Tesla recently, saying that they have been getting many complains from customers that claim that the car is taking longer than expected and that the production slowed down last month. They have even voiced their concerns to Bruno.

Your department, on the other hand, says that the production is at least as good as before. The historical pattern is that productivity increases over time

**Null Hypothesis**: The production in month 2 is better than month 1

**Alternative Hypothesis**: The production in month 1 is more than month 2

In [None]:
##Data
df_paired.describe()

Unnamed: 0,Day,Month 1,Month 2
count,30.0,30.0,30.0
mean,15.5,55.6,54.5
std,8.803408,1.588754,1.196259
min,1.0,54.0,52.0
25%,8.25,54.0,54.0
50%,15.5,55.0,55.0
75%,22.75,57.0,55.0
max,30.0,58.0,56.0


In [None]:
# Perform a paired t-test with 1 tail
t_score, p_value = st.ttest_rel(df_paired['Month 1'],
                                df_paired['Month 2'],
                                alternative='greater')

print(f"T-score: {t_score}")
print(f"p-value: {p_value}")
p_value_reader(p_value, alpha)

T-score: 3.170873895434031
p-value: 0.0017871671276475996
Reject the Null Hypothesis


# 1-Tailed Two-Sample T-Test

The Tesla factory you manage has two shifts during the day. You are present during shift 2, but not shift 1. You have your second in command who tells you that shift 1 is doing great. In fact, her recent incentives led to higher efficiency. Let's see if that is true.

**Null Hypothesis**: Production in shift 2 >= production shift one

**Alternative Hypothesis**: production in shift 2 < production in shift 1

In [None]:
#Data
df_2sample.describe()

Unnamed: 0,Day,Shift 1,Shift 2
count,30.0,30.0,29.0
mean,15.5,61.166667,55.0
std,8.803408,6.664799,8.647873
min,1.0,51.0,42.0
25%,8.25,55.25,48.0
50%,15.5,61.0,57.0
75%,22.75,66.75,62.0
max,30.0,72.0,72.0


In [None]:
# Isolate the samples
sample1 = df_2sample['Shift 1'].dropna()
sample2 = df_2sample['Shift 2'].dropna()

In [None]:
# Apply the function
test_2sample(sample2, sample1, 0.05, 'less')

Reject the Null Hypothesis. Variances are unequal. Perform Welch's Test
The p-value is 0.0017362006993328087
Reject the Null Hypothesis


# Chisquare Test

You are a car production manager in charge of two factories, Factory A and Factory B. Both factories have recently implemented different quality control measures to reduce the number of defective cars produced in each of the three car categories: sedan, SUV, and truck. You want to determine if the quality control measures have had any significant impact on reducing the proportion of defective cars across the three categories.

**Null Hypothesis**: There is no difference in both factories for the car production

**Alternative Hypothesis**: There is a difference in the car production

In [None]:
#Load the data
df_chisquare = pd.read_csv("/content/drive/MyDrive/Statistics with Python 2/Inferential Statistics/Hypothesis Testing/tesla_chisquare.csv")
df_chisquare.head()

Unnamed: 0.1,Unnamed: 0,Day,Factory,Category,Count
0,0,1,Factory A,Sedan,48
1,1,2,Factory A,Sedan,38
2,2,3,Factory A,Sedan,24
3,3,4,Factory A,Sedan,17
4,4,5,Factory A,Sedan,30


In [None]:
# Actual frequency
contingency_table = pd.crosstab(index = df_chisquare['Factory'],
                                 columns = df_chisquare['Category'],
                                 values= df_chisquare['Count'],
                                 aggfunc = np.sum)

In [None]:
# Chi Square Test
stat, p_value, dof, expected_freq = st.chi2_contingency(observed = contingency_table)

In [None]:
# Print and interpret the p_value
print(f"The p-value is {p_value}")
p_value_reader(p_value, 0.05)

The p-value is 0.33710767193470104
Fail to reject the Null Hypothesis
