# Task 3 – A/B Hypothesis Testing

This notebook statistically validates or rejects key hypotheses about risk drivers.

## Hypotheses to test

1. **H₀**: There are no risk differences across provinces
2. **H₀**: There are no risk differences between zip codes
3. **H₀**: There is no significant margin (profit) difference between zip codes
4. **H₀**: There is no significant risk difference between Women and Men

## Metrics

- **Claim Frequency**: proportion of policies with at least one claim (`TotalClaims > 0`)
- **Claim Severity**: average claim amount, given a claim occurred
- **Margin**: `TotalPremium - TotalClaims`

In [1]:
# Setup: imports and data loading
import sys
from pathlib import Path

# Add project root to sys.path
project_root = Path.cwd().resolve()
if not (project_root / "src").exists():
    project_root = project_root.parent
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

from src.data_loader import DataLoader
from src.hypothesis_tests import (
    add_claim_flag,
    add_margin,
    chi_squared_test,
    ttest_two_groups,
    anova_test,
)

# Load data
loader = DataLoader.from_config()
df = loader.load_machine_learning_rating()

# Add derived columns
df = add_claim_flag(df)
df = add_margin(df)

print(f"Loaded {len(df):,} rows")
print(f"Claim frequency: {df['has_claim'].mean():.2%}")
print(f"Average margin: {df['margin'].mean():,.2f}")

Loaded 1,000,098 rows
Claim frequency: 0.28%
Average margin: -2.96


---
## Hypothesis 1: No risk differences across provinces

We test whether **claim frequency** differs significantly across provinces using a chi-squared test.

In [2]:
# H1: Risk differences across provinces (claim frequency)
result_province = chi_squared_test(df, "Province", "has_claim")

print(f"Chi-squared statistic: {result_province.statistic:.2f}")
print(f"P-value: {result_province.p_value:.2e}")
print(f"Reject null hypothesis: {result_province.reject_null}")

Chi-squared statistic: 104.19
P-value: 5.93e-19
Reject null hypothesis: True


What this means statistically
P-value = 5.93 × 10⁻¹⁹ is far below 0.05.
So you reject H₀: “There are no risk differences across provinces.”
In plain terms: the pattern of has_claim across provinces is very unlikely to be due to random chance.
Business interpretation
Claim frequency is meaningfully different between provinces.
Some provinces have higher proportions of policies with at least one claim; others are lower risk.
This gives strong statistical support for:
Using province as a rating factor in pricing.
Applying higher base premiums or stricter underwriting rules in high-risk provinces.
Potentially offering more competitive pricing or marketing focus in low-risk provinces.

### Interpretation – Province risk differences

- If **p-value < 0.05**, we **reject** the null hypothesis.
- This means there **are** statistically significant differences in claim frequency across provinces.
- **Business implication**: Provinces with higher claim frequencies may warrant higher premiums or stricter underwriting. Regional risk adjustments are justified.

---
## Hypothesis 2: No risk differences between zip codes

We test whether **claim frequency** differs across postal codes. Given the large number of zip codes, we use a chi-squared test.

In [8]:
# H2: Risk differences between zip codes (claim frequency)
result_zipcode = chi_squared_test(df, "PostalCode", "has_claim")

print(f"Chi-squared statistic: {result_zipcode.statistic:.2f}")
print(f"P-value: {result_zipcode.p_value:.2e}")
print(f"Reject null hypothesis: {result_zipcode.reject_null}")

Chi-squared statistic: 1454.47
P-value: 3.15e-30
Reject null hypothesis: True


What this means
Null hypothesis (H₀): Claim risk is the same across all zip codes.
P-value = 3.15 × 10⁻³⁰ is essentially zero → far below 0.05.
You reject H₀.
So the clear statement is:

There are statistically significant risk differences across zip codes.
Claim frequency is not uniform; some postal codes are clearly higher risk and some are lower.

Business implication (short)
This strongly supports fine‑grained geographic pricing:
High‑risk zip codes can have higher premiums and/or stricter underwriting.
Low‑risk zip codes can be targeted for competitive offers or growth.

### Interpretation – ZipCode risk differences

- If **p-value < 0.05**, we **reject** the null hypothesis.
- This indicates that **claim frequency varies significantly by zip code**.
- **Business implication**: Fine-grained geographic pricing (beyond province level) is supported by the data. High-risk zip codes can be identified for targeted action.

---
## Hypothesis 3: No significant margin difference between zip codes

We test whether **margin** (TotalPremium - TotalClaims) differs across postal codes using ANOVA.

In [4]:
# H3: Margin differences between zip codes
result_margin_zip = anova_test(df, "PostalCode", "margin")

print(f"F-statistic: {result_margin_zip.statistic:.2f}")
print(f"P-value: {result_margin_zip.p_value:.2e}")
print(f"Reject null hypothesis: {result_margin_zip.reject_null}")

F-statistic: 0.87
P-value: 9.98e-01
Reject null hypothesis: False


What this means
Null hypothesis (H₀): Average margin (TotalPremium − TotalClaims) is the same across zip codes.
P-value ≈ 0.998 is much larger than 0.05.
You do NOT reject H₀.
So the clear statement is:

We do not find evidence that average profitability (margin) is different across zip codes.
While claim risk differs a lot by zip code (H2), the margin after premiums vs claims looks broadly similar across areas.

Business implication (short)
At this stage, the portfolio seems to be priced so that, on average, profitability per policy is similar across postal codes, even though risk levels differ.
That suggests:
Geographic pricing is already compensating for risk reasonably well.
There is less evidence (from this test alone) that some zip codes are systematically under‑ or over‑priced in terms of margin.

### Interpretation – ZipCode margin differences

- If **p-value < 0.05**, we **reject** the null hypothesis.
- This means **profitability (margin) varies significantly by zip code**.
- **Business implication**: Some zip codes are more profitable than others. This supports differentiated pricing or marketing strategies by geography.

---
## Hypothesis 4: No significant risk difference between Women and Men

We test whether **claim frequency** differs between genders using a chi-squared test.

In [5]:
# H4: Risk differences between Women and Men (claim frequency)
# Filter to only Male and Female (exclude unknown/missing)
df_gender = df[df["Gender"].isin(["Male", "Female"])].copy()

result_gender = chi_squared_test(df_gender, "Gender", "has_claim")

print(f"Chi-squared statistic: {result_gender.statistic:.2f}")
print(f"P-value: {result_gender.p_value:.2e}")
print(f"Reject null hypothesis: {result_gender.reject_null}")

Chi-squared statistic: 0.00
P-value: 9.51e-01
Reject null hypothesis: False


What this means
Null hypothesis (H₀): Claim frequency is the same for Men and Women.
P-value ≈ 0.951 is much larger than 0.05.
You do NOT reject H₀.
So the clear statement is:

We do not find a statistically significant difference in claim risk between men and women.
In this dataset, male and female policyholders have very similar claim frequencies.

Business implication (short)
Gender does not appear to be a strong risk driver here.
Using gender as a pricing factor is not supported by this analysis, and given regulatory/ethical constraints, it’s safer to avoid gender-based pricing.

### Interpretation – Gender risk differences

- If **p-value < 0.05**, we **reject** the null hypothesis.
- This indicates a **statistically significant difference in claim frequency between men and women**.
- **Business implication**: Gender may be a relevant rating factor (subject to regulatory and ethical considerations). If differences are small in practical terms, gender-based pricing may not be justified despite statistical significance.

---
## Additional analysis: t-test for margin between two provinces

As an example of a two-group comparison, we compare margin between the two largest provinces.

In [6]:
# Find the two provinces with the most policies
top_provinces = df["Province"].value_counts().head(2).index.tolist()
print(f"Comparing provinces: {top_provinces[0]} vs {top_provinces[1]}")

result_province_margin = ttest_two_groups(
    df, "Province", "margin", top_provinces[0], top_provinces[1]
)

print(f"T-statistic: {result_province_margin.statistic:.2f}")
print(f"P-value: {result_province_margin.p_value:.2e}")
print(f"Reject null hypothesis: {result_province_margin.reject_null}")

Comparing provinces: Gauteng vs Western Cape
T-statistic: -1.39
P-value: 1.64e-01
Reject null hypothesis: False


### Interpretation – Province margin comparison

- This t-test compares the **mean margin** between the two largest provinces.
- If we reject the null, it suggests one province is significantly more profitable than the other.
- **Business implication**: Regional profitability differences may inform where to focus marketing or adjust pricing.

---
## Summary of hypothesis test results

In [7]:
import pandas as pd

summary = pd.DataFrame([
    {
        "Hypothesis": "H1: No risk diff across provinces",
        "Test": "Chi-squared",
        "Statistic": result_province.statistic,
        "P-value": result_province.p_value,
        "Reject H0": result_province.reject_null,
    },
    {
        "Hypothesis": "H2: No risk diff between zip codes",
        "Test": "Chi-squared",
        "Statistic": result_zipcode.statistic,
        "P-value": result_zipcode.p_value,
        "Reject H0": result_zipcode.reject_null,
    },
    {
        "Hypothesis": "H3: No margin diff between zip codes",
        "Test": "ANOVA",
        "Statistic": result_margin_zip.statistic,
        "P-value": result_margin_zip.p_value,
        "Reject H0": result_margin_zip.reject_null,
    },
    {
        "Hypothesis": "H4: No risk diff between genders",
        "Test": "Chi-squared",
        "Statistic": result_gender.statistic,
        "P-value": result_gender.p_value,
        "Reject H0": result_gender.reject_null,
    },
])

summary

Unnamed: 0,Hypothesis,Test,Statistic,P-value,Reject H0
0,H1: No risk diff across provinces,Chi-squared,104.190881,5.925510999999999e-19,True
1,H2: No risk diff between zip codes,Chi-squared,1454.46761,3.152172e-30,True
2,H3: No margin diff between zip codes,ANOVA,0.870747,0.997686,False
3,H4: No risk diff between genders,Chi-squared,0.003705,0.9514645,False


---
## Task 3 – Conclusions and Business Recommendations

Based on the hypothesis tests above:

1. **Province risk differences**: If rejected, implement regional premium adjustments. Provinces with higher claim frequencies should have higher base rates.

2. **ZipCode risk differences**: If rejected, consider fine-grained geographic rating factors. High-risk zip codes can be flagged for underwriting review.

3. **ZipCode margin differences**: If rejected, some zip codes are more profitable. Marketing can target high-margin areas; pricing can be adjusted in low-margin areas.

4. **Gender risk differences**: If rejected, gender is statistically associated with claim frequency. However, practical significance and regulatory constraints must be considered before using gender in pricing.

These findings provide a **data-driven foundation** for segmentation strategy and pricing decisions in subsequent modeling (Task 4).

Provinces: We do see strong risk differences across provinces (H1 rejected) → regional pricing by province is justified.
Zip codes: We do see strong risk differences across postal codes (H2 rejected), but average margin is similar across zip codes (H3 not rejected) → risk varies by area, but current pricing roughly compensates for that in profitability.
Gender: We do not see meaningful risk differences between men and women (H4 not rejected) → gender is not a useful or necessary rating factor in this portfolio.