# A/B Testing Simulation for Insurance Analytics

This notebook simulates an A/B test comparing claim frequencies between two groups: a control group with standard premiums and a treatment group with reduced premiums.


In [1]:
import pandas as pd
import numpy as np
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

In [2]:
# notebooks/example_notebook.ipynb

import sys
import os
sys.path.append(os.path.abspath('../Scripts'))

In [3]:
from utils import save_data_to_csv, gender_analysis

In [4]:
df = pd.read_csv('../data/insurance_text_data.csv')

### Risk differences across provinces

In [5]:
# Group by 'Province' and compute the mean 'TotalClaims'
provinces = df['Province'].unique()
risk_by_province = df.groupby('Province')['TotalClaims'].mean()

# Perform one-way ANOVA
f_statistic_province, p_value_province = stats.f_oneway(*[df[df['Province'] == province]['TotalClaims'] for province in provinces])

# Print results
print(f"Risk differences between provinces:\nF-statistic: {f_statistic_province},\np-value: {p_value_province}")
if p_value_province < 0.05:
    print("Reject the null hypothesis: There are significant risk differences across provinces.")
else:
    print("Accept the null hypothesis: There are no significant risk differences across provinces.")

Risk differences between provinces:
F-statistic: 5.849413762407606,
p-value: 1.6782057588675903e-07
Reject the null hypothesis: There are significant risk differences across provinces.


## Risk differences between zip codes

In [6]:
# Risk differences between zip codes
zip_codes = df['PostalCode'].unique()
#print(len(zip_codes))
risk_by_zipcode = df.groupby('PostalCode')['TotalClaims'].mean()
#print(risk_by_zipcode)

f_statistic_zip, p_value_zip = stats.f_oneway(*[df[df['PostalCode'] == zipcode]['TotalClaims'] for zipcode in zip_codes])

print("Risk differences between zip codes:")
print(f"F-statistic: {f_statistic_zip},\np-value: {p_value_zip}")
if p_value_zip < 0.05:
    print("Reject the null hypothesis: There are significant risk differences between zip codes.")
else:
    print("Accept the null hypothesis: There are no significant risk differences between zip codes.")

Risk differences between zip codes:
F-statistic: 0.9419762214391849,
p-value: 0.8906511279164051
Accept the null hypothesis: There are no significant risk differences between zip codes.


### Margin (profit) difference between zip codes

In [7]:
# Calculate profit margin
df['ProfitMargin'] = df['TotalPremium'] - df['TotalClaims']

# Perform one-way ANOVA for margin differences between zip codes
f_statistic_margin, p_value_margin = stats.f_oneway(*[df[df['PostalCode'] == zipcode]['ProfitMargin'] for zipcode in zip_codes])

# Print results
print(f"Margin differences between zip codes:\nF-statistic: {f_statistic_margin},\np-value: {p_value_margin}")
if p_value_margin < 0.05:
    print("Reject the null hypothesis: There are significant margin differences between zip codes.")
else:
    print("Accept the null hypothesis: There are no significant margin differences between zip codes.")

Margin differences between zip codes:
F-statistic: 0.8707474893589263,
p-value: 0.9976859758015036
Accept the null hypothesis: There are no significant margin differences between zip codes.


### Risk difference between Women and Men

In [8]:
# Perform a two-sample t-test for claims between genders
male_claims = df[df['Gender'] == 'Male']['TotalClaims']
female_claims = df[df['Gender'] == 'Female']['TotalClaims']

# Conduct t-test
t_statistic_gender, p_value_gender = stats.ttest_ind(male_claims, female_claims, nan_policy='omit')

# Print results
print(f"Risk differences between Men and Women:\nT-statistic: {t_statistic_gender},\np-value: {p_value_gender}")
if p_value_gender < 0.05:
    print("Reject the null hypothesis: There are significant risk differences between Men and Women.")
else:
    print("Accept the null hypothesis: There are no significant risk differences between Men and Women.")

Risk differences between Men and Women:
T-statistic: -0.24803623812388725,
p-value: 0.8041073961270343
Accept the null hypothesis: There are no significant risk differences between Men and Women.
