# Hypothesis Testing

The $ p $-value is a fundamental concept in statistics, particularly in hypothesis testing. Here's a breakdown:

1. **Hypothesis Testing**: When you're trying to make a decision based on statistical evidence, you often set up two opposing hypotheses:
   - **Null Hypothesis ($ H_0 $)**: This is a statement that there is no effect or no difference. It's what you're trying to test against.
   - **Alternative Hypothesis ($ H_a $ or $ H_1 $)**: This is what you want to prove, e.g., that there is a difference or an effect.

2. **Test Statistic**: You then collect data and compute a test statistic, which gives you a measure of how far your observed data is from what the null hypothesis would predict.

3. **$ p $-value**: The $ p $-value is the probability of observing a test statistic as extreme as, or more extreme than, the statistic computed from the data, assuming that the null hypothesis is true. 

   - If the $ p $-value is small (typically ≤ 0.05), it suggests that the observed data is inconsistent with the null hypothesis, so you might reject the null hypothesis in favor of the alternative hypothesis.
   - If the $ p $-value is large, you don't have enough statistical evidence to reject the null hypothesis.

4. **Interpretation**:
   - A small $ p $-value (e.g., ≤ 0.05) indicates that the observed data would be unlikely if the null hypothesis were true. This is often interpreted as evidence against the null hypothesis.
   - A large $ p $-value suggests that the observed data is consistent with what we would expect under the null hypothesis.
   - It's important to note that a $ p $-value does not provide the probability that either hypothesis is true. It only tells us how surprising our data would be if the null hypothesis were true.

5. **Caveats**:
   - **Not proof**: A small $ p $-value is not proof that the alternative hypothesis is true. It just indicates that the data observed would be unlikely under the null hypothesis.
   - **Arbitrary threshold**: The 0.05 threshold is conventional, but it's arbitrary. Some fields or situations might use stricter thresholds (like 0.01) or more lenient ones.
   - **Multiple comparisons**: If you're testing many hypotheses simultaneously, you need to adjust your $ p $-values or use techniques designed for multiple comparisons to avoid false positives.
   - **Data dredging/p-hacking**: This refers to the practice of repeatedly searching data for patterns and conducting many statistical tests until a significant result is found. This can lead to misleading results.

6. **Context Matters**: Always interpret $ p $-values in the context of the study, the data, and the specific hypotheses being tested. A $ p $-value by itself doesn't tell the whole story.

In summary, the $ p $-value is a tool that helps researchers determine the statistical significance of their results. However, it's essential to understand its limitations and use it appropriately in the context of comprehensive data analysis.

example

Of course! Let's consider a practical example:

**Scenario**: Imagine you're a botanist studying the effects of a new fertilizer on plant growth. You have two groups of plants:

1. **Control Group**: Plants not given the fertilizer.
2. **Treatment Group**: Plants given the fertilizer.

You want to know if the fertilizer has a significant effect on plant growth. To do this, you measure the height of the plants after a fixed period.

**Hypotheses**:
- $ H_0 $: The fertilizer has no effect on plant growth. (Mean height of Control Group = Mean height of Treatment Group)
- $ H_a $: The fertilizer has an effect on plant growth. (Mean height of Control Group ≠ Mean height of Treatment Group)

**Data**:
Let's assume you measured the height (in cm) of 10 plants from each group:

- Control Group: [15, 17, 16, 14, 15, 16, 17, 15, 16, 17]
- Treatment Group: [18, 19, 20, 19, 18, 21, 19, 20, 19, 18]

We'll use a two-sample t-test to determine if there's a significant difference in the means of the two groups.

In [2]:

import scipy.stats as stats

# Data
control_group = [15, 17, 16, 14, 15, 16, 17, 15, 16, 17]
treatment_group = [18, 19, 20, 19, 18, 21, 19, 20, 19, 18]

# Two-sample t-test
t_stat, p_value = stats.ttest_ind(control_group, treatment_group)

print(f"t-statistic: {t_stat:.3f}")
print(f"p-value: {p_value:.3f}")

# Decision
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: There's a significant difference in plant heights.")
else:
    print("Fail to reject the null hypothesis: No significant difference in plant heights.")

    

t-statistic: -7.279
p-value: 0.000
Reject the null hypothesis: There's a significant difference in plant heights.


When you run the above code, you'll get the t-statistic and p-value for the test. If the p-value is less than 0.05 (or another significance level you choose), you would reject the null hypothesis, suggesting that the fertilizer has a significant effect on plant growth. Otherwise, you wouldn't have enough evidence to say the fertilizer has an effect.

The two-sample t-test for independent samples can be computed from scratch using the following formula:

$ t = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} $

Where:
- $ \bar{X}_1 $ and $ \bar{X}_2 $ are the sample means of the two groups.
- $ s_1^2 $ and $ s_2^2 $ are the sample variances of the two groups.
- $ n_1 $ and $ n_2 $ are the sample sizes of the two groups.

The degrees of freedom for this test is:

$ df = n_1 + n_2 - 2 $

Once we have the t-statistic and degrees of freedom, we can find the p-value by looking up the t-distribution. For simplicity, we'll use Python's `math` module and the cumulative distribution function (CDF) for the t-distribution.



In [6]:
import math
from scipy.stats import t as t_dist

def mean(data):
    return sum(data) / len(data)

def variance(data):
    m = mean(data)
    return sum([(xi - m) ** 2 for xi in data]) / (len(data) - 1)

def t_statistic(group1, group2):
    mean1, mean2 = mean(group1), mean(group2)
    var1, var2 = variance(group1), variance(group2)
    n1, n2 = len(group1), len(group2)
    
    numerator = mean1 - mean2
    denominator = math.sqrt(var1/n1 + var2/n2)
    
    return numerator / denominator

def degrees_of_freedom(group1, group2):
    return len(group1) + len(group2) - 2

# Data
control_group = [15, 17, 16, 14, 15, 16, 17, 15, 16, 17]
treatment_group = [18, 19, 20, 19, 18, 21, 19, 20, 19, 18]

# Compute t-statistic and degrees of freedom
t = t_statistic(control_group, treatment_group)
df = degrees_of_freedom(control_group, treatment_group)

# Compute p-value (two-tailed)
p_value = 2 * (1 - t_dist.cdf(abs(t), df))

print(f"t-statistic: {t:.3f}")
print(f"p-value: {p_value:.3f}")

# Decision
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: There's a significant difference in plant heights.")
else:
    print("Fail to reject the null hypothesis: No significant difference in plant heights.")


t-statistic: -7.279
p-value: 0.000
Reject the null hypothesis: There's a significant difference in plant heights.
