# Hypotheses Validation

## Objectives

- Test whether pixel‐intensity **variance** differs between healthy and mildew leaves.  
- Test whether pixel‐intensity **mean** differs between healthy and mildew leaves.  
- Test whether the model’s **recall** on mildew images is significantly above 50 %.

## Inputs

- `outputs/v1/image_stats.csv` — per-image mean & variance values  
- `outputs/v1/y_true.npy` — true labels for each test image (0=healthy, 1=mildew)  
- `outputs/v1/y_pred.npy` — model’s predicted labels for each test image  
- `outputs/v1/metrics.json` — overall recall on the mildew class  

## Outputs

- `outputs/v1/hypothesis_tests.json` — t-statistics and p-values for each hypothesis  
- Notebook print-outs of each t-test result with a clear “Reject/Fail to reject” conclusion  

---

**Note:** Hypothesis 4 (Learning-Rate & EarlyStopping) was evaluated in Notebook 03: Modelling and Evaluating via comparative training runs (Run 1 vs Run 2) and detailed learning-curve analysis. The results (smoother convergence and higher validation accuracy with a lower learning rate and EarlyStopping) satisfy that hypothesis, so this notebook focuses on the remaining statistical hypotheses (1–3).

---

## Imports and Setup

In [4]:
import os, sys
from pathlib import Path

cwd = Path.cwd()
if cwd.name == "jupyter_notebooks":
    os.chdir(cwd.parent)
sys.path.insert(0, str(Path.cwd() / "src"))

import json
import pandas as pd
import numpy as np
from scipy import stats
from tensorflow.keras.models import load_model
from tensorflow.keras.preprocessing.image import ImageDataGenerator

print("Working directory:", Path.cwd())


2025-05-15 10:33:53.799952: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-05-15 10:33:54.786169: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Working directory: /workspaces/PP5-MildewDetection


---

## Load Image Statistics for Mean/Variance Tests

In [6]:
df_stats = pd.read_csv(Path("outputs") / "v1" / "image_stats.csv")
healthy = df_stats[df_stats["class"] == "healthy"]
mildew  = df_stats[df_stats["class"] == "powdery_mildew"]

---

### Hypothesis 2: Variance Difference

**H₀:** μ_var(healthy) = μ_var(mildew)  
**H₁:** μ_var(healthy) ≠ μ_var(mildew)  
*Test:* Welch’s two-sample t-test  
*α:* 0.05

### Two-sample t-test on variance

In [8]:
t_var, p_var = stats.ttest_ind(
    healthy["variance"],
    mildew["variance"],
    equal_var=False
)
print(f"Variance test: t = {t_var:.4f}, p = {p_var:.4f}")
print("Mean variance (healthy):", healthy["variance"].mean())
print("Mean variance (mildew) :", mildew["variance"].mean())

Variance test: t = 45.7180, p = 0.0000
Mean variance (healthy): 0.033343815199180414
Mean variance (mildew) : 0.021813304403305776


**Conclusion:**
We reject H₀ at α = 0.05 (p < 0.001), confirming that pixel‐intensity variance differs significantly, with powdery-mildew leaves showing higher variance than healthy leaves.

---

### Hypothesis 3: Mean Intensity Difference

**H₀:** μ_mean(healthy) = μ_mean(mildew)  
**H₁:** μ_mean(healthy) ≠ μ_mean(mildew)  
*Test:* Welch’s two-sample t-test  
*α:* 0.05

### Two-sample t-test on mean

In [9]:
t_mean, p_mean = stats.ttest_ind(
    healthy["mean"],
    mildew["mean"],
    equal_var=False
)
print(f"Mean-intensity test: t = {t_mean:.4f}, p = {p_mean:.4f}")

Mean-intensity test: t = 32.4529, p = 0.0000


**Conclusion:**
We reject H₀ at α = 0.05 (p < 0.001), confirming there is a significant mean intensity difference.

---

### Hypothesis 1: Model Recall vs. Random Baseline

**H₀:** Recall = 0.50  
**H₁:** Recall > 0.50  
*Test:* One-sample t-test against 0.5  
*α:* 0.05

In [11]:
# Load model and compute recall array
model = load_model(Path("models") / "run2_model.h5")

test_datagen = ImageDataGenerator(rescale=1/255.0)
test_iter = test_datagen.flow_from_directory(
    Path("input/datasets/cherry_leaf_dataset/cherry-leaves") / "test",
    target_size=(256,256),
    color_mode="rgb",
    batch_size=32,
    class_mode="binary",
    shuffle=False
)

# Predictions and true labels
probs = model.predict(test_iter, verbose=0).flatten()
y_true = test_iter.classes
y_pred = (probs >= 0.5).astype(int)

# Extract mildew recall samples
mildew_idx = test_iter.class_indices['powdery_mildew']
mask = (y_true == mildew_idx)
correct = (y_true[mask] == y_pred[mask]).astype(int)

# One-sample t-test vs 0.5
from scipy.stats import ttest_1samp
t_rec, p_rec = ttest_1samp(correct, popmean=0.5)
print(f"Recall test: t = {t_rec:.4f}, p = {p_rec:.4f}")



Found 844 images belonging to 2 classes.


  self._warn_if_super_not_called()


Recall test: t = 147.9611, p = 0.0000


**Conclusion:**  
- As p < 0.05 we can Reject H₀: recall is significantly above random.

---

### Save Results to JSON

In [12]:
results = {
    "t_var": t_var,  "p_var": p_var,
    "t_mean": t_mean, "p_mean": p_mean,
    "t_rec": t_rec,  "p_rec": p_rec,
    "n_mildew": int(mask.sum())
}
with open(Path("outputs")/"v1"/"hypothesis_tests.json", 'w') as f:
    json.dump(results, f, indent=4)
print("Saved hypothesis_tests.json")

Saved hypothesis_tests.json
