# A/B Hypothesis Testing

This notebook performs statistical hypothesis testing to validate or reject key hypotheses about risk drivers.

## Hypotheses to Test

1. **H₀**: There are no risk differences across provinces
2. **H₀**: There are no risk differences between zip codes
3. **H₀**: There is no significant margin (profit) difference between zip codes
4. **H₀**: There is no significant risk difference between Women and Men

## Metrics

- **Claim Frequency**: Proportion of policies with at least one claim
- **Claim Severity**: Average amount of a claim, given a claim occurred
- **Margin**: TotalPremium - TotalClaims


In [1]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import sys

# Add src to path
sys.path.append(str(Path().resolve().parent))

from src.data.load_data import load_insurance_data
from src.ab_testing.hypothesis_tests import (
    calculate_claim_frequency,
    calculate_claim_severity,
    test_province_risk_differences,
    test_zipcode_risk_differences,
    test_zipcode_margin_differences,
    test_gender_risk_differences,
    format_test_results
)

# Set style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)
%matplotlib inline


In [2]:
# Load data
df = load_insurance_data()
print(f"Dataset loaded: {len(df):,} rows, {len(df.columns)} columns")


Loading data from: /Users/danielmituku/Documents/10Academy/week3/End-to-End-Insurance-Risk-Analytics-Predictive-Modeling/data/raw/MachineLearningRating_v3.txt
Loaded 1000098 rows and 52 columns
Dataset loaded: 1,000,098 rows, 52 columns


## Hypothesis 1: Risk Differences Across Provinces


In [3]:
# Test province risk differences
province_results = test_province_risk_differences(df, alpha=0.05)
print(format_test_results(province_results, "H₀: No risk differences across provinces"))



HYPOTHESIS: H₀: No risk differences across provinces

FREQUENCY TEST:
  test: chi2_contingency
  statistic: 104.19088107029361
  p_value: 5.925510718204677e-19
  degrees_of_freedom: 8
  reject_null: True
  interpretation: Reject H₀

SEVERITY TEST:
  test: ANOVA
  statistic: 4.830165899976358
  p_value: 6.304916760425209e-06
  reject_null: True
  interpretation: Reject H₀

OVERALL RESULT: Risk differences exist across provinces
Reject H₀: True



## Hypothesis 2: Risk Differences Between Zip Codes


In [4]:
# Test zipcode risk differences
zipcode_results = test_zipcode_risk_differences(df, alpha=0.05, top_n=10)
print(format_test_results(zipcode_results, "H₀: No risk differences between zip codes"))



HYPOTHESIS: H₀: No risk differences between zip codes

FREQUENCY TEST:
  test: chi2_contingency
  statistic: 72.64941061601782
  p_value: 4.5932778849314244e-12
  degrees_of_freedom: 9
  reject_null: True
  interpretation: Reject H₀

SEVERITY TEST:
  test: ANOVA
  statistic: 5.235823024055851
  p_value: 5.474878546964162e-07
  reject_null: True
  interpretation: Reject H₀

OVERALL RESULT: Risk differences exist between zip codes
Reject H₀: True



## Hypothesis 3: Margin Differences Between Zip Codes


In [5]:
# Test zipcode margin differences
margin_results = test_zipcode_margin_differences(df, alpha=0.05, top_n=10)
print(format_test_results(margin_results, "H₀: No significant margin difference between zip codes"))



HYPOTHESIS: H₀: No significant margin difference between zip codes

TEST:
  test: ANOVA
  statistic: 1.050575214633885
  p_value: 0.3963641045635653
  reject_null: False
  interpretation: Fail to reject H₀




## Hypothesis 4: Risk Differences Between Women and Men


In [6]:
# Test gender risk differences
gender_results = test_gender_risk_differences(df, alpha=0.05)
print(format_test_results(gender_results, "H₀: No significant risk difference between Women and Men"))



HYPOTHESIS: H₀: No significant risk difference between Women and Men

FREQUENCY TEST:
  test: chi2_contingency
  statistic: 0.003704891861036439
  p_value: 0.9514644755420456
  degrees_of_freedom: 1
  reject_null: False
  interpretation: Fail to reject H₀

SEVERITY TEST:
  test: t-test
  statistic: -0.4190662866061044
  p_value: 0.6760156776445874
  reject_null: False
  interpretation: Fail to reject H₀
  male_mean: 14858.552293766337
  female_mean: 17874.72130325815

OVERALL RESULT: No significant risk differences between genders
Reject H₀: False



## Summary and Business Recommendations


In [7]:
# Create summary of all tests
summary = {
    'Province Risk Differences': province_results.get('overall_interpretation', {}).get('reject_null', False),
    'Zipcode Risk Differences': zipcode_results.get('overall_interpretation', {}).get('reject_null', False),
    'Zipcode Margin Differences': margin_results.get('test', {}).get('reject_null', False),
    'Gender Risk Differences': gender_results.get('overall_interpretation', {}).get('reject_null', False)
}

print("\n" + "="*80)
print("SUMMARY OF HYPOTHESIS TESTS")
print("="*80)
for hypothesis, rejected in summary.items():
    status = "REJECT H₀" if rejected else "FAIL TO REJECT H₀"
    print(f"{hypothesis}: {status}")



SUMMARY OF HYPOTHESIS TESTS
Province Risk Differences: REJECT H₀
Zipcode Risk Differences: REJECT H₀
Zipcode Margin Differences: FAIL TO REJECT H₀
Gender Risk Differences: FAIL TO REJECT H₀
