# **Hypthesis testing for Data Science**
## In this notebook, I have covered topics for hypothesis testing in data science.
### **1. Chi Square Testing**
### **2. Z-test**
### **3. T-test**
### **4. ANOVA**
### **5. A/B Testing**
### **6. Errors in Hypothesis Testing**
### **7. References**

# Chi Square Testing

In [1]:
import numpy as np
import pandas as pd

In [2]:
genders = ['Male', 'Female']
occupations = ['Data Scientist', 'Web Developer', 'Cloud Engineer', 'Web Services Specialist', 'Full Stack Engineer', 'Software Engineer']
organizations = ["Google","Mocrosoft","Amazon","Facebook","Adobe","Netflix","Nvidia","Apple"]

In [3]:
np.random.seed(42)
data = { "Gender" : np.random.choice(genders,size=10000),
        "Occupation" : np.random.choice(occupations,size=10000),
       "Organization" : np.random.choice(organizations,size=10000)}

data

{'Gender': array(['Male', 'Female', 'Male', ..., 'Female', 'Female', 'Male'],
       dtype='<U6'),
 'Occupation': array(['Cloud Engineer', 'Cloud Engineer', 'Software Engineer', ...,
        'Web Developer', 'Data Scientist', 'Full Stack Engineer'],
       dtype='<U23'),
 'Organization': array(['Apple', 'Adobe', 'Facebook', ..., 'Google', 'Mocrosoft',
        'Mocrosoft'], dtype='<U9')}

In [4]:
data = pd.DataFrame(data)

In [5]:
data

Unnamed: 0,Gender,Occupation,Organization
0,Male,Cloud Engineer,Apple
1,Female,Cloud Engineer,Adobe
2,Male,Software Engineer,Facebook
3,Male,Web Services Specialist,Apple
4,Male,Cloud Engineer,Mocrosoft
...,...,...,...
9995,Female,Software Engineer,Netflix
9996,Male,Web Services Specialist,Apple
9997,Female,Web Developer,Google
9998,Female,Data Scientist,Mocrosoft


In [6]:
crosstab = pd.crosstab([data['Gender'], data['Occupation']], data['Organization'])
crosstab

Unnamed: 0_level_0,Organization,Adobe,Amazon,Apple,Facebook,Google,Mocrosoft,Netflix,Nvidia
Gender,Occupation,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Female,Cloud Engineer,96,87,103,99,105,104,93,97
Female,Data Scientist,113,92,103,77,107,112,119,105
Female,Full Stack Engineer,93,116,83,118,113,108,117,104
Female,Software Engineer,105,106,117,80,111,113,82,116
Female,Web Developer,104,106,105,115,117,109,108,111
Female,Web Services Specialist,94,99,117,109,86,103,105,105
Male,Cloud Engineer,94,104,95,89,104,111,111,101
Male,Data Scientist,113,82,102,114,102,97,118,109
Male,Full Stack Engineer,104,115,123,87,97,113,95,119
Male,Software Engineer,97,96,100,118,96,96,107,90


In [7]:
data.describe()

Unnamed: 0,Gender,Occupation,Organization
count,10000,10000,10000
unique,2,6,8
top,Male,Web Developer,Nvidia
freq,5013,1738,1287


In [8]:
from scipy.stats import chi2_contingency

In [9]:
Ho="there is no significant relationship between the attributes"
Ha="the is a significant relationship between the attributes"
chi2, p, dof, expected = chi2_contingency(crosstab)
print(f"the chi square value is: {chi2}")
print(f"the p value is: {p}")
print(f"the degree of freedom is: {dof}")
print(f"the expected value: {expected}")
alpha = 0.05
if p < 0.05 : 
    print("There is  strong evidence and significant relationship between the gender and their occupation. we can reject our null hypothesis(Ho)")
elif p > 0.05:
    print("There is no  strong evidence and significant relationship between the gender and their occupation. we can accept our null hypothesis(Ho)")

the chi square value is: 82.92755957391171
the p value is: 0.30176146267556553
the degree of freedom is: 77
the expected value: [[ 94.7856  94.7072  99.568   95.0208  99.8032 100.8224  98.392  100.9008]
 [100.1052 100.0224 105.156  100.3536 105.4044 106.4808 103.914  106.5636]
 [103.0068 102.9216 108.204  103.2624 108.4596 109.5672 106.926  109.6524]
 [100.347  100.264  105.41   100.596  105.659  106.738  104.165  106.821 ]
 [105.7875 105.7    111.125  106.05   111.3875 112.525  109.8125 112.6125]
 [ 98.8962  98.8144 103.886   99.1416 104.1314 105.1948 102.659  105.2766]
 [ 97.8081  97.7272 102.743   98.0508 102.9857 104.0374 101.5295 104.1183]
 [101.1933 101.1096 106.299  101.4444 106.5501 107.6382 105.0435 107.7219]
 [103.1277 103.0424 108.331  103.3836 108.5869 109.6958 107.0515 109.7811]
 [ 96.72    96.64   101.6     96.96   101.84   102.88   100.4    102.96  ]
 [104.3367 104.2504 109.601  104.5956 109.8599 110.9818 108.3065 111.0681]
 [102.8859 102.8008 108.077  103.1412 108.3323 

### For further references.look at the below image
https://www.bing.com/images/search?view=detailV2&ccid=oQb3xpss&id=43D07B64351F36113790CED0125A46F1FF182601&thid=OIP.oQb3xpssKu7Mb4thd66zKAHaGq&mediaurl=https%3A%2F%2Fmicrobenotes.com%2Fwp-content%2Fuploads%2F2021%2F01%2Fp-value-table-from-chi-square-values.png&exph=723&expw=804&q=Chi-Square+P-Value+Chart&simid=607991675825362004&FORM=IRPRST&ck=5D53A4227FF4FA649EB0AA6F17696BEE&selectedIndex=0&itb=0&cw=1375&ch=707&ajaxhist=0&ajaxserp=0

In [10]:
n=crosstab.sum()
r,k =crosstab.shape
cramers_v = np.sqrt(chi2/(n*(min(k-1,r-1))))
cramers_v

Organization
Adobe        0.098989
Amazon       0.099030
Apple        0.096583
Facebook     0.098866
Google       0.096469
Mocrosoft    0.095980
Netflix      0.097158
Nvidia       0.095943
dtype: float64

In [11]:
n=crosstab.sum().sum()
r,k =crosstab.shape
cramers_v = np.sqrt(chi2/(n*(min(k-1,r-1))))
cramers_v

0.03441917230969468

# Z-Test

## 1. Z-Test is appropriate when the sample size is generally large.(i.e n>30)
## 2. The data follows the normal Distribution.
## 3. The population variance is known.
## Q). suppose a factory produces light bulbs with an average lifespan of 1000 hours and a known standard deviation of 50 hours.to test a new manufacturing process, you take a sample of 40 light bulbs,finding an average lifespan of 1015 hours.you want to know if this increase is statistically significant.

## 1. Null Hypothesis(Ho): The average lifespan eith the new process is 1000 Hours and there is no change or improvement with the new process.
## 2. Alternative Hypothesis(H1): The average life span is greater than 1000 Hours and there is change or improvement in the new manufacturing process.

In [12]:
import math
sample_mean=1015
sample_size=40
population_mean=1000
population_std=50
z_score = (sample_mean - population_mean)/(population_std/math.sqrt(sample_size))
z_score

1.8973665961010275

In [13]:
from scipy.stats import norm
p_value = 1 - norm.cdf(z_score)
aplha = 0.05
print(f"The p-value is: {p_value}")
if p_value<0.05:
    print("We can reject the null hypothesis")
    print(f"i.e {round(p_value,4)} < {alpha} ") 
    print("There is significant improvememt in our new manufacturing process!")
else:
    print(f"i.e {round(p_value,4)} >= {aplha} ")
    print("we can reject the alternative hypothesis.i.e accept the null hypothesis")
    print("there is no significant improvement in our new manufacturing process!")

The p-value is: 0.028889785561798664
We can reject the null hypothesis
i.e 0.0289 < 0.05 
There is significant improvememt in our new manufacturing process!


# T-Test
## 1. T-test is used when we dont know the population standard mean.
## 2. When sample size is small.(usually n<30).
## There are three kind of T-tests. They are:
### 1. One sample T-test
### 2. Independent two sample T-test
### 3. paired T-test

## One sample T-test
### **Example:** 
### 1. let students score an average of 75 for a test.
### 2. Now, a sample of same 15 students taken who took new teaching method with average 78.
### 3.we need to now if new teaching method "difference is significant" or likely just due to chance.


In [14]:
new_sample_scores = [72,74,78,76,80,79,82,77,75,76,73,74,78,77,79]


#### **Null Hypothesis(H0)**: The new teaching method is not significantly different from the old(i.e has no effect).So the mean score is likely by a chance.
#### **Alternative Hypothesis(H1)**: The new teaching methds is significantly different from the old.(i.e there is effect). So, the mean score is different.
#### **Note**: In a one-sample t-test, when we don’t have the population standard deviation, we use the sample's standard deviation as an estimate of variability. The sample standard deviation helps us understand the spread of the scores around the sample mean.
#### ddof=1 applies Bessel's correction to ensure we get an unbiased estimate of the population standard deviation (as we’re estimating from a sample). 

In [15]:
population_mean=75
sample_mean=77
sample_standard_deviation = np.std(new_sample_scores,ddof=1)
sample_size=len(new_sample_scores)

In [16]:
sample_standard_deviation

2.768874620972692

In [17]:
import math
t_statistic=(sample_mean-population_mean)/(sample_standard_deviation/math.sqrt(sample_size))

In [18]:
t_statistic

2.7975144247209407

#### Now , compare the "t_test" value with "critical t value".
#### **Decision making:**
#### 1. if t_statistic > critical_t, reject the null hypothesis.i.e New teching methodhas a significant effect on students.
#### 2. if t_statistic < critical_t,we have no evidence to say a new teaching made a significant difference.

In [19]:
#calculate ctical t_value.
from scipy.stats import t
alpha = 0.05
df = sample_size - 1
t_critical = t.ppf(1-alpha/2,df)
print(f"the t critical value is: {t_critical}")
if t_statistic > t_critical :
    print(f"{round(t_statistic,4)} > {round(t_critical,4)}, we reject the null hypothesis.i.e New teching methodhas a significant effect on students.")
else:
    print(f"{round(t_statistic)} < {round(t_critical,4)},we have no evidence to say a new teaching made a significant difference.")

the t critical value is: 2.1447866879169273
2.7975 > 2.1448, we reject the null hypothesis.i.e New teching methodhas a significant effect on students.


## Independent Two sample T-test
#### The independent tow sample T-test is used to determine if there is a statistically significant difference between the means of two independent groups.
#### It is especially used for comparing two differentpopulations or treatments.

### **Example:**
### Suppose you are studying the effectivenessof two different diets. You randomly assign 20 people to each diet and measure the weight loss after 3 months.
#### **Data** : 
#### **Diet A(Group 1)** : Average weight loss = 6.5 kg, standard deviation = 1.5kg, samplesize(A) = 20
#### **Diet B(Group 2)** : Average weight loss = 5.2 kg, standard deviation = 1.3kg, samplesize(B) = 20
#### Null Hypothesis(H0) : There is no difference in weight loss between Diet A and Diet B.(i.e mean1 = mean2)
#### Alternative Hypothesis(H1) : There is a difference in weight loss.(mean1 != mean2)

In [20]:
#calculate T-statistic
mean1=6.5
mean2=5.2
std1=1.5
std2=1.3
sample_size1 = 20
sample_size2 = 20
var1 = std1*std1
var2 = std2*std2
t_statistic = (mean1 - mean2)/math.sqrt((var1/sample_size1) + (var2/sample_size2))
dof = sample_size1 + sample_size2 -2
print(f"T-statistic value is: {t_statistic}")
#compare it with t critical value for decision making
t_critical = t.ppf(1-alpha/2,df)
print(f"the t critical value is: {t_critical}")
alpha = 0.05
if t_statistic > t_critical :
    print(f"{round(t_statistic,4)} > {round(t_critical,4)}, we reject the null hypothesis.i.e there is difference in results of diet A and diet B.")
else:
    print(f"{round(t_statistic)} < {round(t_critical,4)},we have no evidence to say diet methods are differnt from each other.")

T-statistic value is: 2.9289384088856636
the t critical value is: 2.1447866879169273
2.9289 > 2.1448, we reject the null hypothesis.i.e there is difference in results of diet A and diet B.


## **Paired T-test**
#### Paired T-test is used to compare the same data before and after scenarios.
#### Here,same subjects/arrtibutes are measured under two conditions.
### **Example:**
### Suppose you want to determine if a training program improves test scores you have test scores of 10 students before and after the training.
### **Data:**
#### Before : [70,65,80,72,68,74,73,77,66,71]
#### After : [75,70,85,78,74,78,76,82,70,75]
### **Hypothesis:**
### 1. **Null Hypothesis(H0)** : There is no significant difference in test scores n=before and after training.(mean(before) = mean(after))
### 2. **Alternative Hypothesis(H1)** : There is a significant differences between in test scores.(mean(before) != mean(after))

In [21]:
from scipy.stats import t
import numpy as np
before = np.array([70, 65, 80, 72, 68, 74, 73, 77, 66, 71])
after = np.array([75, 70, 85, 78, 74, 78, 76, 82, 70, 75])
differences = after - before

# Calculate the mean and standard deviation of differences
mean_diff = np.mean(differences)
std_diff = np.std(differences, ddof=1)  # ddof=1 for sample standard deviation
n = len(differences)  # Number of paired observations
alpha = 0.05
t_statistic = mean_diff / (std_diff / np.sqrt(n))
dof = n - 1
t_critical = t.ppf(1 - alpha / 2, dof)
print(f"T-statistic value is: {t_statistic}")
print(f"The t critical value is: {t_critical}")
print(f"Degrees of freedom: {dof}")
if abs(t_statistic) > t_critical:
    print(f"{round(t_statistic, 4)} > {round(t_critical, 4)}, we reject the null hypothesis. There is a significant difference in test scores before and after training.")
else:
    print(f"{round(t_statistic, 4)} <= {round(t_critical, 4)}, we fail to reject the null hypothesis. There is no significant difference in test scores before and after training.")


T-statistic value is: 15.666666666666668
The t critical value is: 2.2621571628540993
Degrees of freedom: 9
15.6667 > 2.2622, we reject the null hypothesis. There is a significant difference in test scores before and after training.


# ANOVA
## Analysis of Variance
## ANOVA is a statistical methos used to compare the means of three or more independent gropus to see if there is a statistically significant differnece in between them.
## **Note** : In T-test,we generally compare two groups.But, ANOVA can handle multiple Groups/Attributes.

### **Example:**  Suppose we have different groups.(DietA,DietB,DietC) with the following weight loss data.
### Diet A = [5,7,6,8,7]
### Diet B = [6,9,7,10,9]
### Diet C = [4,5,6,5,6]
## **Goal:** The goal is to determine if there is a significant difference in mean weight loss between the three diets.

## **Hypothesis:**
## **Null Hypothesis (H0)**: All group means are equal (no significant difference).
## **Alternative Hypothesis** (H1): At least one group mean is different.

In [22]:
Diet_A = np.array([5,7,6,8,7])
Diet_B = np.array([6,9,7,10,9])
Diet_C = np.array([4,5,6,5,6])
meanA = np.mean(Diet_A)
meanB = np.mean(Diet_B)
meanC = np.mean(Diet_C)


In [23]:
import numpy as np
from scipy.stats import f
Diet_A = np.array([5, 7, 6, 8, 7])
Diet_B = np.array([6, 9, 7, 10, 9])
Diet_C = np.array([4, 5, 6, 5, 6])
overall_mean = np.mean(np.concatenate((Diet_A, Diet_B, Diet_C)))
meanA = np.mean(Diet_A)
meanB = np.mean(Diet_B)
meanC = np.mean(Diet_C)
SSB = len(Diet_A) * (meanA - overall_mean)**2 + len(Diet_B) * (meanB - overall_mean)**2 + len(Diet_C) * (meanC - overall_mean)**2
all_data = np.concatenate((Diet_A, Diet_B, Diet_C))
SST = np.sum((all_data - overall_mean)**2)
SSW = SST - SSB
df_between = 3 - 1  # Number of groups - 1
df_within = len(all_data) - 3  # Total sample size - Number of groups
MSB = SSB / df_between
MSW = SSW / df_within
F_statistic = MSB / MSW
alpha = 0.05
F_critical = f.ppf(1 - alpha, df_between, df_within)
print(f"SST (Total Sum of Squares): {SST}")
print(f"SSB (Sum of Squares Between): {SSB}")
print(f"SSW (Sum of Squares Within): {SSW}")
print(f"Degrees of Freedom (Between): {df_between}")
print(f"Degrees of Freedom (Within): {df_within}")
print(f"Mean Square Between (MSB): {MSB}")
print(f"Mean Square Within (MSW): {MSW}")
print(f"F-statistic: {F_statistic}")
print(f"F-critical (alpha = {alpha}): {F_critical}")

if F_statistic > F_critical:
    print(f"Since F-statistic ({F_statistic}) > F-critical ({F_critical}), we reject the null hypothesis.")
    print("This suggests there is a significant difference between the means of the diets.")
else:
    print(f"Since F-statistic ({F_statistic}) <= F-critical ({F_critical}), we fail to reject the null hypothesis.")
    print("This suggests there is no significant difference between the means of the diets.")


SST (Total Sum of Squares): 41.33333333333333
SSB (Sum of Squares Between): 22.533333333333317
SSW (Sum of Squares Within): 18.80000000000001
Degrees of Freedom (Between): 2
Degrees of Freedom (Within): 12
Mean Square Between (MSB): 11.266666666666659
Mean Square Within (MSW): 1.5666666666666675
F-statistic: 7.191489361702119
F-critical (alpha = 0.05): 3.8852938346523946
Since F-statistic (7.191489361702119) > F-critical (3.8852938346523946), we reject the null hypothesis.
This suggests there is a significant difference between the means of the diets.


# A/B Testing

### 1. A/B Testing is also known as split testing.
### 2. A/B Testing is a method of testing and comparing Two versions of a Variable/Attribute.
### 3. Variable/Attrribute can be a Web page,Advertisement,Product feature.
### 4. Through A/B Testing, we can determine wich version performs better.
### 5. A/B Testing is used in data-driven decision-making to optimize user experience and drive business results.

## Q) How A/B Testing works?
## **Example:** "Changing the color of signup button to green will increase conversations.
### **Split the audience:** Randomly divide your audience into Two groups.
### **. Group A**(The control Group) sees the original version.
### **. Group B**(The test Group) sees the modified version.
### **measure the performance** : define a metric.(conversion rate,click through rate,time spent etc) to track the performance of each group.
### **Analyze results** :After gathering sufficient data,compare the metrics of two groups to determine if there is Statistically significant difference.Based on the difference, select the version to use.
### Lets work on conversion rate of the web site.
### **Conversion rate:** A customer conversion rate is the percentage of potential customers who take a desired action on a website or landing page.
### **Formula:** conversion rate = (conversions/total audience)*100

In [24]:
import numpy as np
from statsmodels.stats.proportion import proportions_ztest
np.random.seed(0)  # For reproducibility
n_users = 1000
conversion_rate_a = 0.10  # 10% conversion
conversions_a = np.random.binomial(1, conversion_rate_a, n_users)
conversion_rate_b = 0.15  # 15% conversion
conversions_b = np.random.binomial(1, conversion_rate_b, n_users)
cr_a = conversions_a.mean()
cr_b = conversions_b.mean()

print(f"Conversion Rate of Group A (Control): {cr_a:.2%}")
print(f"Conversion Rate of Group B (Test): {cr_b:.2%}")




Conversion Rate of Group A (Control): 10.80%
Conversion Rate of Group B (Test): 16.80%


#### We use np.random.binomial to simulate 1000 conversions with a probability of success (conversion_rate_a = 0.10). This function returns a list of 0s and 1s, where each 1 represents a conversion and each 0 represents no conversion.


In [25]:
conversions_a_sum = conversions_a.sum()
conversions_b_sum = conversions_b.sum()

# Perform the Z-test
stat, p_value = proportions_ztest([conversions_a_sum, conversions_b_sum], [n_users, n_users])

print(f"Z-test statistic: {stat:.2f}")
print(f"P-value: {p_value:.4f}")

# Interpretation
alpha = 0.05
if p_value < alpha:
    print("There is a statistically significant difference in conversion rates. The change is effective.")
else:
    print("There is no statistically significant difference in conversion rates.")

Z-test statistic: -3.89
P-value: 0.0001
There is a statistically significant difference in conversion rates. The change is effective.


# Errors in Hypothesis esting
### Eroors in Hypothesis testing are mistakes that can occur when we draw conclusions from statistical tests.
### There are basically two types of errors that can occur. they are :
### 1. Type I Error
### 2. Type II Error

## **Type I Error**
### 1. Type I Error is generally when we fail to reject the null hypothesis when it is actually False.
### 2. Thats also called as False Positive.
### 3. Lets go through a example for better undersranding.
### **Example:** Imagine a medical test for a Diabetes Disease. 
### Null hypothesis **(𝐻0)** : patient does not have the disease.
### Alternativehypothesis **(𝐻1)** : patient does have the disease.

#### . If the test result shows that the patient has the Diatetes disease (we reject H0​ ) .
#### . but, in reality, the patient is healthy (the null hypothesis is true).
#### . we've made a Type 1 error.
#### . In practice, the probability of making a Type 1 error is called the significance level, denoted by α, often set to 0.05. 
#### . This means we are willing to accept a 5% chance of incorrectly rejecting the null hypothesis.


## **Type II Error**
### 1. A Type 2 error occurs when we fail to reject the null hypothesis when it is actually false. 
### 2. This is like a "missed detection" or "False Negative."
### **Example:**  Take the same above example.
#### . If the test result shows that the patient is healthy (we fail to reject H0​ ) .
#### . but, in reality, the patient has the disease (the null hypothesis is false).
#### . we've made a Type 2 error.
#### . The probability of making a Type 2 error is denoted by β

## References : https://www.geeksforgeeks.org/understanding-hypothesis-testing/