# Hypothesis Testing in Python

### Introduction
Hypothesis testing is a fundamental concept in statistics used to make inferences about a population based on sample data. In this tutorial, we'll learn how to perform hypothesis testing in Python using the scipy.stats module.

### Table of Contents

1.Background 

2.One-Sample t-test 
 
3.Two-Sample t-test 

4.Paired-Samplest est

### Background
**What is Hypothesis Testing?**

Hypothesis testing is a statistical method used to make inferences about a population parameter based on sample data. It involves two competing hypotheses: the null hypothesis (H0) and the alternative hypothesis (H1). The goal is to determine whether the observed data provide enough evidence to reject the null hypothesis in favor of the alternative hypothesis

**Key Concepts**
- **Null Hypothesis (H0)**: The default assumption that there is no significant difference or effect.
- **Alternative Hypothesis (H1)**: The hypothesis that contradicts the null hypothesis and suggests a significant difference or effect.
- **Test Statistic**: A numerical summary of the sample data used to assess the likelihood of observing the data under the null hypothesis.
- **P-value**: The probability of observing the test statistic or more extreme values under the null hypothesis. A small p-value indicates strong evidence against the null hypothesis. s. .

### One-Sample t-test
The one-sample t-test is used to compare the mean of a single sample to a known population mean or a hypothesized value.

Suppose we have a sample of exam scores and want to test whether the average score is significantly different from 75.

Null Hypothesis (H0): The average exam score is equal to 75.

Alternative Hypothesis (H1): The average exam score is not equal to 75.

In [1]:
import numpy as np
from scipy import stats

# Sample data
scores = np.array([70, 85, 78, 90, 82, 88, 75, 80, 72, 79])

# Hypothesized mean
hypothesized_mean = 75

# Perform one-sample t-test
t_statistic, p_value = stats.ttest_1samp(scores, hypothesized_mean)

# Print results
print("t-statistic:", t_statistic)
print("p-value:", p_value)

t-statistic: 2.363295183657119
p-value: 0.042371967997997174


Because the p-value is less than the chosen significance level (e.g., 0.05), we reject the null hypothesis and conclude that there is sufficient evidence to suggest that the average exam score is significantly different from 75.

To test whether the average score is significantly greater than 75, we need a one-tailed test.

Null Hypothesis (H0): The average exam score is equal to 75.

Alternative Hypothesis (H1): The average exam score is greater than 75.

In [2]:
# Perform one-sample one-tailed t-test (greater than)
t_statistic, p_value = stats.ttest_1samp(scores, hypothesized_mean, alternative='greater')

# Print results
print("t-statistic:", t_statistic)
print("p-value:", p_value)

t-statistic: 2.363295183657119
p-value: 0.021185983998998587


Because the p-value is less than the chosen significance level (e.g., 0.05), we reject the null hypothesis and conclude that there is sufficient evidence to suggest that the average exam score is significantly greater than 75.

### Two-Sample t-test
The two-sample t-test is used to compare the means of two independent samples.

Suppose we have two groups of students (Group A and Group B) and want to test whether there is a significant difference in their exam scores.

Null Hypothesis (H0): The means of the two groups are equal.

Alternative Hypothesis (H1): The means of the two groups are not equal.

In [3]:
# Sample data
scores_group_a = np.array([70, 85, 78, 90, 82])
scores_group_b = np.array([72, 79, 75, 80, 88])

# Perform two-sample t-test
t_statistic, p_value = stats.ttest_ind(scores_group_a, scores_group_b)

# Print results
print("t-statistic:", t_statistic)
print("p-value:", p_value)

t-statistic: 0.5082037759680839
p-value: 0.6250241578486049


Because the p-value is not less than the chosen significance level, we don't reject the null hypothesis and don't conclude that there is sufficient evidence to suggest that the means of the two groups are significantly different.

To test whether the exam score of group A is significantly greather than that of group B, we need a one-tailed test.

Null Hypothesis (H0): The mean of Group A is equal to the mean of Group B.

Alternative Hypothesis (H1):: The mean of Group A is greater than the mean of Group B..

In [4]:
import numpy as np
from scipy import stats

# Sample data for Group A and Group B
scores_group_a = np.array([70, 85, 78, 90, 82])
scores_group_b = np.array([72, 79, 75, 80, 88])

# Perform one-tailed two-sample t-test (Greater Than)
t_statistic, p_value = stats.ttest_ind(scores_group_a, scores_group_b, alternative='greater')

# Print results
print("t-statistic:", t_statistic)
print("p-value:", p_value)

t-statistic: 0.5082037759680839
p-value: 0.31251207892430244


Because the p-value is not less than the chosen significance level, we don't reject the null hypothesis and don't conclude that there is sufficient evidence to suggest that the mean of Group A is significantly greater than the mean of Group B.

### Paired-Sample t-test


The paired sample t-test is a statistical test used to determine whether the mean difference between two related groups is significantly different from zero. It is specifically designed for situations where the same subjects or items are measured or observed under two different conditions, treatments, or time points.

Suppose you are conducting a study to evaluate the effectiveness of a new treatment for lowering blood pressure. You measure the blood pressure of the same group of individuals before and after receiving the treatment. You want to determine whether there is a significant difference in blood pressure before and after the treatment.

Null Hypothesis (H0): The mean blood pressure before treatment is equal to the mean blood pressure after treatment.

Alternative Hypothesis (H1): The mean blood pressure before treatment is different from the mean blood pressure after treatment.

In [5]:
import numpy as np
from scipy import stats

# Sample data for paired observations (blood pressure before and after treatment)
before = np.array([120, 130, 125, 140, 135])  # Blood pressure before treatment (mmHg)
after = np.array([110, 125, 115, 132, 124])   # Blood pressure after treatment (mmHg)

# Perform paired sample t-test
t_statistic, p_value = stats.ttest_rel(before, after)

# Print results
print("Paired Sample t-test Results:")
print("t-statistic:", t_statistic)
print("p-value:", p_value)

Paired Sample t-test Results:
t-statistic: 8.241955141918908
p-value: 0.0011818615271342995


Because the p-value is less than the chosen significance level (e.g., 0.05), we reject the null hypothesis and conclude that there is a significant difference in blood pressure before and after the treatment.

To test if the mean blood pressure after treatment is significantly lower than the mean blood pressure before treatment, we need a one-tailed test.

Null Hypothesis (H0): The mean blood pressure before treatment is equal to the mean blood pressure after treatment.

Alternative Hypothesis (H1): The mean blood pressure after treatment is lower than the mean blood pressure before treatment.

In [6]:
import numpy as np
from scipy import stats

# Sample data for paired observations (blood pressure before and after treatment)
before = np.array([120, 130, 125, 140, 135])  # Blood pressure before treatment (mmHg)
after = np.array([110, 125, 115, 132, 124])   # Blood pressure after treatment (mmHg)

# Perform one-tailed paired sample t-test (Lower Than)
t_statistic, p_value = stats.ttest_rel(before, after)

# Print results
print("One-Tailed Paired Sample t-test Results (Lower Than):")
print("t-statistic:", t_statistic)
print("p-value:", p_value/2)  # Divide p-value by 2 for one-tailed test

One-Tailed Paired Sample t-test Results (Lower Than):
t-statistic: 8.241955141918908
p-value: 0.0005909307635671498


Because the p-value is less than the chosen significance level, we reject the null hypothesis and conclude that the mean blood pressure after treatment is lower than the mean blood pressure before treatment.

To test if the mean blood pressure after treatment is significantly higher than the mean blood pressure before treatment, we need another one-tailed test.

Null Hypothesis (H0): The mean blood pressure before treatment is equal to the mean blood pressure after treatment.

Alternative Hypothesis (H1): The mean blood pressure after treatment is higher than the mean blood pressure before treatment.

In [7]:
import numpy as np
from scipy import stats

# Sample data for paired observations (blood pressure before and after treatment)
before = np.array([120, 130, 125, 140, 135])  # Blood pressure before treatment (mmHg)
after = np.array([110, 125, 115, 132, 124])   # Blood pressure after treatment (mmHg)

# Perform one-tailed paired sample t-test (Higher Than)
t_statistic, p_value = stats.ttest_rel(before, after)

# Print results
print("One-Tailed Paired Sample t-test Results (Higher Than):")
print("t-statistic:", t_statistic)
print("p-value:", 1 - p_value/2)  # Use 1 - p_value/2 for higher than one-tailed test

One-Tailed Paired Sample t-test Results (Higher Than):
t-statistic: 8.241955141918908
p-value: 0.9994090692364328


Because the p-value is not less than the chosen significance level, we cannot reject the null hypothesis and cannot conclude that the mean blood pressure after treatment is lower than the mean blood pressure before treatment.