# Hypothesis Testing

## 1. Introduction to Hypothesis Testing
Hypothesis testing is a fundamental statistical method used in data science to make inferences or draw conclusions about a population based on sample data. It helps us determine whether an observed effect in a dataset is statistically significant or if it could have occurred by random chance.

In data science, hypothesis testing is used in a variety of applications, including A/B testing, regression analysis, and machine learning model evaluation.




## 2. Key Concepts in Hypothesis Testing
Null Hypothesis (H₀): The null hypothesis is a statement that there is no effect or no difference. It serves as the default or starting assumption. For example, in an A/B test, the null hypothesis might state that two groups have the same mean performance.

Alternative Hypothesis (H₁): The alternative hypothesis is the opposite of the null hypothesis. It suggests that there is a statistically significant effect or difference. For example, the alternative hypothesis could state that group A has a higher mean than group B.

Test Statistic: A test statistic is a standardized value calculated from sample data, which is used to assess the strength of the evidence against the null hypothesis. Common test statistics include t-scores and z-scores.

P-value: The p-value represents the probability of obtaining a test statistic at least as extreme as the one observed, assuming the null hypothesis is true. A small p-value (typically less than 0.05) indicates strong evidence against the null hypothesis.

Significance Level (α): The significance level (often denoted as α) is a threshold chosen by the analyst to decide whether to reject the null hypothesis. Commonly, α is set to 0.05, meaning there is a 5% chance of rejecting the null hypothesis when it is actually true (Type I error).

Type I and Type II Errors:

Type I Error: Incorrectly rejecting the null hypothesis when it is true (false positive).
Type II Error: Failing to reject the null hypothesis when it is false (false negative).
Power of the Test: The power of a test is the probability of correctly rejecting the null hypothesis when it is false. A high power means there is a lower risk of a Type II error.




## 3. Steps in Hypothesis Testing
State the Hypotheses: Formulate the null hypothesis (H₀) and the alternative hypothesis (H₁).

Example:

H₀: There is no difference between the mean performance of Group A and Group B.
H₁: The mean performance of Group A is greater than that of Group B.
Choose a Significance Level (α): Select a significance level (commonly 0.05) to control for the probability of making a Type I error.

Collect Data: Gather the data from your sample or experiment.

Select and Compute a Test Statistic: Depending on the type of data and the hypotheses, choose an appropriate statistical test (e.g., t-test, z-test, chi-square test) and compute the test statistic.

Calculate the P-value: Use the test statistic to calculate the p-value. This can be done manually, using statistical tables, or using software (e.g., Python, R, or statistical tools like SPSS).

Compare P-value to α: If the p-value is less than or equal to the significance level α, reject the null hypothesis; otherwise, fail to reject it.

Make a Conclusion: Based on the comparison, draw a conclusion and interpret the results in the context of your problem.




## 4. Common Hypothesis Tests in Data Science
Z-test: Used when the sample size is large (n > 30) and the population standard deviation is known. It compares the sample mean to the population mean.

Example: Testing whether a new drug reduces blood pressure more effectively than an old drug.
T-test: Used when the sample size is small (n ≤ 30) or the population standard deviation is unknown. There are two common types:

One-sample t-test: Tests if the sample mean is different from a known value.
Two-sample t-test: Tests if the means of two independent samples are different.
Chi-Square Test: Used to test relationships between categorical variables. It is commonly used in contingency tables to assess whether observed frequencies differ from expected frequencies.

Example: Testing if two marketing campaigns result in different customer purchase rates.
ANOVA (Analysis of Variance): Used to compare means among three or more groups to see if at least one group mean is significantly different from the others.

Example: Testing the effectiveness of different advertising methods on sales.
A/B Testing: This is a special case of hypothesis testing widely used in web analytics and product development. It involves comparing two versions (A and B) of a product or webpage to determine which performs better.




## 5. Hypothesis Testing in Machine Learning
In machine learning, hypothesis testing plays a role in model evaluation. For example, when comparing two models (e.g., a decision tree and a random forest), you can use hypothesis testing to assess whether the difference in performance (e.g., accuracy, precision) is statistically significant.

A common scenario is using cross-validation to train models on different subsets of data and then performing a statistical test (e.g., a paired t-test) to determine whether one model consistently outperforms the other.




## 6. Practical Example: A/B Testing with Hypothesis Testing
Imagine a scenario where an e-commerce company wants to test whether a new website design (B) results in more purchases than the current design (A). The company conducts an A/B test and collects data on the number of purchases made by users visiting each version of the site.

Steps:

Null Hypothesis (H₀): The new design does not affect the purchase rate (i.e., no difference between A and B).
Alternative Hypothesis (H₁): The new design increases the purchase rate.
Significance Level: Set α = 0.05.
Collect Data: Suppose you collect the following data:
Version A: 500 visitors, 50 purchases
Version B: 600 visitors, 80 purchases
Perform Test: Use a z-test for proportions to compare the purchase rates between the two groups.
Conclusion: Based on the p-value, either reject or fail to reject the null hypothesis.

## Example: one-sample t-test

In [3]:
# import packages 
import scipy.stats as stats 
import numpy as np
  
data = [182, 180, 190, 184]
  
# perform one sample t-test 
t_statistic, p_value = stats.ttest_1samp(a=data, popmean=183) 
print(t_statistic , p_value)

# data_average = [0, 1, 0, 1, 0, 1, 0, 1, 0, 1]
# t_statistic, p_value = stats.ttest_1samp(a=data_average, popmean=0.5) 
# print(t_statistic , p_value)

0.46291004988627577 0.6749411569291093


## One-way ANOVA

In [7]:
# Sample data from three independent groups
group1 = [2.9, 3.0, 2.5, 2.6, 3.2, 2.8]
group2 = [3.8, 2.7, 4.0, 3.9, 3.2, 3.1]
group3 = [3.1, 3.4, 3.7, 3.0, 3.1, 3.6]

# Perform one-way ANOVA
f_statistic, p_value = stats.f_oneway(group1, group2, group3)

# Output the results
print(f"F-statistic: {f_statistic}")
print(f"P-value: {p_value}")

F-statistic: 4.434477379095165
P-value: 0.03068555150519455


## One-sample Chi-Squered

In [4]:
# Observed frequencies (from your sample)
observed = np.array([30, 10, 5, 5, 5, 5])  

avg = observed.sum()

expected = np.array([avg/6, avg/6, avg/6, avg/6, avg/6, avg/6])

# Perform the one-sample Chi-Square goodness of fit test
chi2_stat, p_value = stats.chisquare(f_obs=observed, f_exp=expected)

# Output the results
print(f"Chi-Square Statistic: {chi2_stat}")
print(f"P-value: {p_value}")

Chi-Square Statistic: 50.0
P-value: 1.3857973367009573e-09


## Chi-Square Test of Independence

In [6]:
# Sample data: A contingency table of observed frequencies
# For example, this could be data on customer preferences between two brands across age groups.
observed = np.array([[1,2,2,5],
                     [1, 3, 2, 1]])

# Perform the Chi-Square Test of Independence
chi2_stat, p_value, dof, expected = stats.chi2_contingency(observed)

# Output the results
print(f"Chi-Square Statistic: {chi2_stat}")
print(f"P-value: {p_value}")
print(f"Degrees of Freedom: {dof}")
print(f"Expected Frequencies:\n{expected}")

Chi-Square Statistic: 2.412380952380953
P-value: 0.49133407367391724
Degrees of Freedom: 3
Expected Frequencies:
[[1.17647059 2.94117647 2.35294118 3.52941176]
 [0.82352941 2.05882353 1.64705882 2.47058824]]
