# Statistical tests

Statistical tests are procedures used to make inferences or decisions about populations or data based on sample information. These tests help determine whether observed differences or relationships in data are statistically significant or occurred by chance. There are various statistical tests, each suited for specific types of data and research questions. Here's an overview of some common statistical tests

## T-test ##
![T-test.webp](attachment:T-test.webp)

To compare means between two groups and determine if the difference is statistically significant.

**One-sample T-test**: Checks if the mean of a single sample differs significantly from a known or hypothesized population mean.

**Two-sample T-test**: Compares the means of two independent groups to assess if they are significantly different.

**Paired T-test**: Determines if there's a significant difference between paired observations (e.g., before and after treatment in the same group).
Assumption: Assumes data is normally distributed and has equal variances between groups (except for the paired T-test).

#### One-Sample T-test

In [2]:
from scipy import stats
import numpy as np

# Sample data
data = np.random.normal(loc=5, scale=2, size=100)

# One-sample t-test
t_stat, p_value = stats.ttest_1samp(data, 4)  # Null hypothesis: Population mean is 4

print(f"One-sample T-test:")
print(f"T-statistic: {t_stat}")
print(f"P-value: {p_value}")

One-sample T-test:
T-statistic: 7.784740393283619
P-value: 6.9191872195502245e-12


#### Two-sample T-test

In [3]:
from scipy import stats
import numpy as np

# Generating two sets of sample data
np.random.seed(42)
sample1 = np.random.normal(loc=50, scale=10, size=30)  # Sample 1 with mean 50
sample2 = np.random.normal(loc=55, scale=10, size=30)  # Sample 2 with mean 55

# Performing two-sample t-test
t_stat, p_value = stats.ttest_ind(sample1, sample2)

# Display results
print(f"Two-sample T-test:")
print(f"T-statistic: {t_stat}")
print(f"P-value: {p_value}")

Two-sample T-test:
T-statistic: -2.3981151520102415
P-value: 0.019717941865758228


#### Paired T-test

In [4]:
from scipy import stats
import numpy as np

# Generating paired data (before and after treatment)
np.random.seed(42)
before_treatment = np.random.normal(loc=50, scale=10, size=30)  # Measurements before treatment
after_treatment = before_treatment + np.random.normal(loc=5, scale=3, size=30)  # Measurements after treatment

# Performing paired T-test
t_stat, p_value = stats.ttest_rel(before_treatment, after_treatment)

# Display results
print(f"Paired T-test:")
print(f"T-statistic: {t_stat}")
print(f"P-value: {p_value}")

Paired T-test:
T-statistic: -9.091456480816795
P-value: 5.471059881900939e-10


## Z-test

![z-test.png](attachment:z-test.png)

Similar to the T-test, it compares means but is used when the population standard deviation is known and the sample size is large.

In [5]:
from statsmodels.stats.weightstats import ztest

# Example data
data = [28, 32, 35, 37, 29, 32, 35, 31, 34, 30]

# Z-test (comparing sample mean to a hypothetical value)
z_stat, p_value = ztest(data, value=30)  # Null hypothesis: Population mean is 30

print(f"Z-test:")
print(f"Z-statistic: {z_stat}")
print(f"P-value: {p_value}")

Z-test:
Z-statistic: 2.501248045900724
P-value: 0.012375646600810976


## ANOVA (Analysis of Variance)

Compares means between three or more groups to determine if at least one group is different from the others.
![ANOVA.png](attachment:ANOVA.png)

**One-way ANOVA**: Compares means across different groups for a single independent variable.
**Two-way ANOVA**: Examines the effect of two independent variables on the dependent variable.

In [6]:
group1 = np.random.normal(20, 5, 100)
group2 = np.random.normal(30, 5, 100)
group3 = np.random.normal(25, 5, 100)

# Performing one-way ANOVA
f_stat, p_value = stats.f_oneway(group1, group2, group3)

print(f"One-way ANOVA:")
print(f"F-statistic: {f_stat}")
print(f"P-value: {p_value}")

One-way ANOVA:
F-statistic: 125.52790844157205
P-value: 3.082131582810124e-40


## Chi-square Test

Analyzes the association between categorical variables to determine if they are independent or related.
![Chi-square.png](attachment:Chi-square.png)

**Chi-square Test of Independence**: Examines the relationship between two categorical variables in a contingency table.

**Chi-square Goodness-of-Fit Test**: Checks if observed categorical frequencies match expected frequencies.

In [7]:
# Creating a contingency table
observed = np.array([[10, 20, 30], [15, 25, 35]])

# Performing Chi-square test of independence
chi2_stat, p_value, dof, expected = stats.chi2_contingency(observed)

print(f"Chi-square Test:")
print(f"Chi-square statistic: {chi2_stat}")
print(f"P-value: {p_value}")

Chi-square Test:
Chi-square statistic: 0.27692307692307694
P-value: 0.870696738961232
