In [1]:
# Hypothesis Testing and AB Testing with Python for data science

# Contents:
# 1. Hypothesis Testing
# 2. AB Testing
# 3. AB Testing with Python


In [2]:
# 1. Hypothesis Testing
# Hypothesis testing is a statistical method that is used in making statistical decisions using experimental data.
# Hypothesis testing is used to determine whether there is enough evidence in a sample of data to infer that a certain condition is true for the entire population.
# Hypothesis testing is a critical tool in inferential statistics, for determing what the data says about the population from which it was sampled.
# Hypothesis testing is used to determine whether a statement about the population parameter is likely to be true.

# Hypothesis Testing Steps:
# 1. State the Hypotheses
# 2. Formulate an Analysis Plan
# 3. Analyze Sample Data
# 4. Interpret the Results

# Null Hypothesis (H0): The null hypothesis is a statement that there is no effect or no difference. It is the default assumption that there is no effect.
# Alternative Hypothesis (H1): The alternative hypothesis is a statement that there is an effect or difference.

# Types of Hypothesis Testing:
# 1. One Sample T-Test
# 2. Two Sample T-Test
# 3. Paired T-Test
# 4. Z-Test
# 5. Chi-Square Test
# 6. ANOVA Test




In [3]:
# 1. One Sample T-Test
# A one sample t-test is used to determine whether the mean of a population is significantly different from a specific value.
# The test is used when the population variance is unknown.
# The null hypothesis is that the population mean is equal to the specific value.

# Example:
# A company produces a product and claims that the mean weight of the product is 100 grams.
# A sample of 10 products is selected and the weights are recorded.
# The sample mean is 98 grams and the sample standard deviation is 5 grams.
# Is there enough evidence to reject the company's claim?

import numpy as np
from scipy import stats

# Sample data
data = [98, 95, 100, 100, 102, 96, 97, 98, 99, 100]

# Population mean
pop_mean = 100

# Calculate the sample mean
sample_mean = np.mean(data)

# Calculate the sample standard deviation
sample_std = np.std(data)

# Calculate the t-statistic
t_statistic = (sample_mean - pop_mean) / (sample_std / np.sqrt(len(data)))

# Calculate the p-value
p_value = stats.t.sf(np.abs(t_statistic), len(data)-1)

print("Sample Mean:", sample_mean)
print("Sample Standard Deviation:", sample_std)
print("T-Statistic:", t_statistic)
print("P-Value:", p_value)

# Interpret the results
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis")
else:
    print("Fail to reject the null hypothesis")
    
# The p-value is greater than the significance level of 0.05, so we fail to reject the null hypothesis.

Sample Mean: 98.5
Sample Standard Deviation: 2.0124611797498106
T-Statistic: -2.3570226039551585
P-Value: 0.02140485154306535
Reject the null hypothesis


In [4]:
# Two Sample T-Test
# A two sample t-test is used to determine whether the means of two independent samples are significantly different from each other.
# The test is used when the population variances are unknown.
# The null hypothesis is that the means of the two populations are equal.

# Example:
# A company produces two different products and claims that the mean weight of product A is equal to the mean weight of product B.
# Two samples of 10 products each are selected and the weights are recorded.
# The sample mean weight of product A is 98 grams with a standard deviation of 5 grams.
# The sample mean weight of product B is 102 grams with a standard deviation of 4 grams.
# Is there enough evidence to reject the company's claim?

# Sample data
data_A = [98, 95, 100, 100, 102, 96, 97, 98, 99, 100]
data_B = [102, 98, 105, 100, 104, 101, 99, 102, 103, 100]

# Calculate the sample mean and standard deviation for product A
sample_mean_A = np.mean(data_A)
sample_std_A = np.std(data_A)

# Calculate the sample mean and standard deviation for product B
sample_mean_B = np.mean(data_B)
sample_std_B = np.std(data_B)

# Calculate the t-statistic
t_statistic = (sample_mean_A - sample_mean_B) / np.sqrt((sample_std_A**2/len(data_A)) + (sample_std_B**2/len(data_B)))

# Calculate the p-value
p_value = stats.t.sf(np.abs(t_statistic), len(data_A)+len(data_B)-2)

print("Sample Mean A:", sample_mean_A)
print("Sample Standard Deviation A:", sample_std_A)
print("Sample Mean B:", sample_mean_B)
print("Sample Standard Deviation B:", sample_std_B)
print("T-Statistic:", t_statistic)
print("P-Value:", p_value)

# Interpret the results
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis")
else:
    print("Fail to reject the null hypothesis")
    


Sample Mean A: 98.5
Sample Standard Deviation A: 2.0124611797498106
Sample Mean B: 101.4
Sample Standard Deviation B: 2.1071307505705477
T-Statistic: -3.1473435617799246
P-Value: 0.0027853462927495505
Reject the null hypothesis
