# A/B Hypothesis Testing

Accept or reject the following Null Hypotheses: 
1. There are no risk differences across provinces 
2. There are no risk differences between zip codes 
3. There are no significant margin (profit) difference between zip codes 
4. There are not significant risk difference between Women and Men


In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

import os
import sys
sys.path.append(os.path.abspath(os.path.join('..','src')))
from eda import EDA

from scipy.stats import chi2_contingency, ttest_ind, fisher_exact

import warnings
warnings.filterwarnings('ignore')

In [2]:
# get the CSV file
df_insurance = pd.read_csv("df_insurance.csv")
df_insurance.head()

# instantiate the class
eda = EDA(df_insurance)

# change the datatype to appropriate type
eda.change_dtype()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 990010 entries, 0 to 990009
Data columns (total 16 columns):
 #   Column                    Non-Null Count   Dtype         
---  ------                    --------------   -----         
 0   Gender                    990010 non-null  category      
 1   Province                  990010 non-null  category      
 2   PostalCode                990010 non-null  category      
 3   TransactionMonth          990010 non-null  datetime64[ns]
 4   VehicleType               990010 non-null  category      
 5   RegistrationYear          990010 non-null  category      
 6   SumInsured                990010 non-null  float64       
 7   TermFrequency             990010 non-null  category      
 8   TotalPremium              990010 non-null  float64       
 9   Product                   990010 non-null  category      
 10  CoverType                 990010 non-null  category      
 11  TotalClaims               990010 non-null  float64       
 12  St

> 1. There are no risk differences across provinces 

In [3]:
'''
Features: Province
Risk KPI Features: TotalClaims or SumInsured

Risk Score = TotalClaims / Number of PolicyID

'''

# Using risk metric group A and group B
group_a, group_b = eda.group_AB_risk('Province')

print("Group A (Low Risk Provinces):")
print(group_a.head(3))

print("Group B (High Risk Provinces):")
print(group_b.head(3))


Group A (Low Risk Provinces):
               Gender    Province PostalCode TransactionMonth  \
529610  Not specified  Mpumalanga       1064       2015-06-01   
941654           Male  North West       2530       2014-03-01   
674033  Not specified  North West        407       2015-04-01   

              VehicleType RegistrationYear  SumInsured TermFrequency  \
529610  Passenger Vehicle             2010     7500.00       Monthly   
941654  Passenger Vehicle             2013     7000.00       Monthly   
674033  Passenger Vehicle             2012        0.01       Monthly   

        TotalPremium                             Product            CoverType  \
529610     78.947368  Mobility Commercial Cover: Monthly  Basic Excess Waiver   
941654      0.000000  Mobility Commercial Cover: Monthly     Income Protector   
674033     21.929825  Mobility Commercial Cover: Monthly           Windscreen   

        TotalClaims StatutoryRiskType PolicyID                   Section  \
529610          0.0

In [4]:
# for categorical
columns = ['Gender','PostalCode','VehicleType','RegistrationYear','TermFrequency','Product','CoverType','StatutoryRiskType','Section']

for col in columns:
    p_value, effect_size = eda.chi2_test('Province', col, group_a, group_b)
    
    print(f"{col}: P Values: {p_value}")
    print(f"{col}: Cramér's V(Effective Size): {effect_size}\n")


Gender: P Values: 0.0
Gender: Cramér's V(Effective Size): 0.05612680799425666

PostalCode: P Values: 0.0
PostalCode: Cramér's V(Effective Size): 0.9991520325190562

VehicleType: P Values: 0.0
VehicleType: Cramér's V(Effective Size): 0.09180161597024714

RegistrationYear: P Values: 0.0
RegistrationYear: Cramér's V(Effective Size): 0.19816071267607627

TermFrequency: P Values: 0.15521777830758787
TermFrequency: Cramér's V(Effective Size): 0.0020721804182716234

Product: P Values: 0.0
Product: Cramér's V(Effective Size): 0.1704846409862012

CoverType: P Values: 0.0
CoverType: Cramér's V(Effective Size): 0.08256464850288567

Skipping StatutoryRiskType due to insufficient data.
StatutoryRiskType: P Values: None
StatutoryRiskType: Cramér's V(Effective Size): None

Section: P Values: 9.613888848479854e-131
Section: Cramér's V(Effective Size): 0.03601347709687241



- P-value is less than 0.05 for Gender,VehicleType, CoverType and Section, therefore this feautres reject the null hypothesis but there Cramér's V values are less than 0.1 indicating negligible association, which indicates there impact on the risk score across provinces is relatively small.

- P-value is less than 0.05 for PostalCode, RegistrationYear and Product with a weak association value of Cramér's V hence there impact is small on the risk score across provinces.

- The P-value value of TermFrequency is greater than 0.05 therefore fail to reject the null hypothesis .

- StatutoryRiskType is skipped because it only has one value.

In [5]:
# for numerical
columns = ['SumInsured','TotalPremium','TotalClaims','CalculatedPremiumPerTerm']
for col in columns:
    p_value, effect_size = eda.t_test_numerical('Province',col,group_a,group_b)

    print(f"{col}: P Values: {p_value}")
    print(f"{col}: Cohen's d(Effective Size): {effect_size}\n")

SumInsured: P Values: 0.16502999832588486
SumInsured: Cohen's d(Effective Size): -0.003278360325563921

TotalPremium: P Values: 1.3364977193935756e-96
TotalPremium: Cohen's d(Effective Size): 0.049254025481234726

TotalClaims: P Values: 1.1661539671082533e-08
TotalClaims: Cohen's d(Effective Size): -0.01347056779857492

CalculatedPremiumPerTerm: P Values: 0.0
CalculatedPremiumPerTerm: Cohen's d(Effective Size): 0.11354677664749972



- TotalPremium, TotalClaims, and CalculatedPremiumPerTerm show p-value of less than 0.05 thus null hypothesis is rejected  however the effect sizes suggest that the practical difference between these groups is relatively small in all cases, except for CalculatedPremiumPerTerm, where the effect is somewhat more pronounced.

- SumInsured has a p-vlaue great than 0.05 as a result the null hypothesis isn't rejected.

Final Conclusion:

After performing chi-squared and t-test tests for various features,majority of features show significant p-values (p < 0.05) concluding that there are significant risk differences across provinces.

> 2. There are no risk differences between zip codes

In [6]:
'''
Features: Province
Risk KPI Features: TotalClaims or SumInsured

Risk Score = TotalClaims / Number of PolicyID

'''

# Using risk metric group A and group B
group_a, group_b = eda.group_AB_risk('PostalCode')

print("Group A (Low Risk PostalCode):")
print(group_a.head(3))

print("Group B (High Risk PostalCode):")
print(group_b.head(3))


Group A (Low Risk PostalCode):
               Gender       Province PostalCode TransactionMonth  \
405513  Not specified   Western Cape       7612       2015-04-01   
418966  Not specified  KwaZulu-Natal       3276       2015-04-01   
730738  Not specified  KwaZulu-Natal       3370       2015-06-01   

              VehicleType RegistrationYear  SumInsured TermFrequency  \
405513  Passenger Vehicle             2009     7500.00       Monthly   
418966  Passenger Vehicle             2015        0.01       Monthly   
730738  Passenger Vehicle             2014        0.01       Monthly   

        TotalPremium                             Product  \
405513      5.240263  Mobility Commercial Cover: Monthly   
418966     21.929825  Mobility Commercial Cover: Monthly   
730738     21.929825  Mobility Commercial Cover: Monthly   

                                      CoverType  TotalClaims  \
405513  Cleaning and Removal of Accident Debris          0.0   
418966                               W

In [7]:
# for categorical
columns = ['Gender','Province','VehicleType','RegistrationYear','TermFrequency','Product','CoverType','StatutoryRiskType','Section']

for col in columns:
    p_value, effect_size = eda.chi2_test('PostalCode', col, group_a, group_b)
    
    print(f"{col}: P Values: {p_value}")
    print(f"{col}: Cramér's V(Effective Size): {effect_size}\n")


Gender: P Values: 9.934195139949184e-35
Gender: Cramér's V(Effective Size): 0.0285266644326923

Province: P Values: 0.0
Province: Cramér's V(Effective Size): 0.2843520135564286

VehicleType: P Values: 3.4782141827202377e-37
VehicleType: Cramér's V(Effective Size): 0.030319109638687766

RegistrationYear: P Values: 0.0
RegistrationYear: Cramér's V(Effective Size): 0.207080491542473

TermFrequency: P Values: 2.3697500355570458e-24
TermFrequency: Cramér's V(Effective Size): 0.02321303557969367

Product: P Values: 7.289020891813446e-209
Product: Cramér's V(Effective Size): 0.07081387651523657

CoverType: P Values: 2.04788369872631e-80
CoverType: Cramér's V(Effective Size): 0.04791305150626715

Skipping StatutoryRiskType due to insufficient data.
StatutoryRiskType: P Values: None
StatutoryRiskType: Cramér's V(Effective Size): None

Section: P Values: 5.8882814368123365e-77
Section: Cramér's V(Effective Size): 0.0433409896425719



- P-value is less than 0.05 for Gender,VehicleType, CoverType, Section, TermFrequency and Product therefore this feautres reject the null hypothesis but there Cramér's V values are less than 0.1 indicating negligible association, which indicates there impact on the risk score across provinces is relatively small.

- P-value is less than 0.05 for RegistrationYear and Province with a weak association value of Cramér's V hence there impact is small on the risk score across provinces.

- StatutoryRiskType is skipped because it only has one value.

In [8]:
# for numerical
columns = ['SumInsured','TotalPremium','TotalClaims','CalculatedPremiumPerTerm']
for col in columns:
    p_value, effect_size = eda.t_test_numerical('PostalCode',col,group_a,group_b)

    print(f"{col}: P Values: {p_value}")
    print(f"{col}: Cohen's d(Effective Size): {effect_size}\n")

SumInsured: P Values: 0.863033132741762
SumInsured: Cohen's d(Effective Size): 0.000585343733358674

TotalPremium: P Values: 0.014769392682000147
TotalPremium: Cohen's d(Effective Size): -0.008272127991260108

TotalClaims: P Values: 7.782869676022052e-19
TotalClaims: Cohen's d(Effective Size): -0.030073171042036696

CalculatedPremiumPerTerm: P Values: 0.002655686711457238
CalculatedPremiumPerTerm: Cohen's d(Effective Size): -0.010196076193768343



- TotalPremium, TotalClaims, and CalculatedPremiumPerTerm show p-value of less than 0.05 thus null hypothesis is rejected  however the effect sizes suggest that the practical difference between these groups is relatively small in all cases,indicating there impact is very small.

- SumInsured has a p-vlaue great than 0.05 as a result the null hypothesis isn't rejected.

> 3. There are no significant margin (profit) difference between zip codes 

In [9]:
'''
Profit margin = Total Premium - Total Claims

'''

# Using risk metric group A and group B
group_a, group_b = eda.group_AB_margin('PostalCode')

print("Group A (Low Profit PostalCode):")
print(group_a.head(3))

print("Group B (High Profit PostalCode):")
print(group_b.head(3))


Group A (Low Profit PostalCode):
               Gender      Province PostalCode TransactionMonth  \
854833  Not specified       Gauteng       2037       2015-05-01   
225551  Not specified       Gauteng        122       2015-02-01   
413482  Not specified  Western Cape       7100       2015-08-01   

              VehicleType RegistrationYear  SumInsured TermFrequency  \
854833  Passenger Vehicle             2009    125000.0       Monthly   
225551  Passenger Vehicle             2009      3500.0       Monthly   
413482  Passenger Vehicle             2008   5000000.0       Monthly   

        TotalPremium                             Product  \
854833         0.000  Mobility Commercial Cover: Monthly   
225551         0.000  Mobility Commercial Cover: Monthly   
413482         1.645  Mobility Commercial Cover: Monthly   

                        CoverType  TotalClaims StatutoryRiskType PolicyID  \
854833                 Own Damage          0.0     IFRS Constant     3700   
225551  Signag

In [10]:
# for categorical
columns = ['Gender','Province','VehicleType','RegistrationYear','TermFrequency','Product','CoverType','StatutoryRiskType','Section']

for col in columns:
    p_value, effect_size = eda.chi2_test('PostalCode', col, group_a, group_b)
    
    print(f"{col}: P Values: {p_value}")
    print(f"{col}: Cramér's V(Effective Size): {effect_size}\n")


Gender: P Values: 8.610309064385105e-05
Gender: Cramér's V(Effective Size): 0.004348507482237027

Province: P Values: 0.0
Province: Cramér's V(Effective Size): 0.09358715217465577

VehicleType: P Values: 1.7120650786839722e-19
VehicleType: Cramér's V(Effective Size): 0.009753073622993508

RegistrationYear: P Values: 0.0
RegistrationYear: Cramér's V(Effective Size): 0.05893267753167059

TermFrequency: P Values: 0.0
TermFrequency: Cramér's V(Effective Size): 0.05153511742323045

Product: P Values: 0.0
Product: Cramér's V(Effective Size): 0.053791332413476944

CoverType: P Values: 0.0
CoverType: Cramér's V(Effective Size): 0.05561118391735652

Skipping StatutoryRiskType due to insufficient data.
StatutoryRiskType: P Values: None
StatutoryRiskType: Cramér's V(Effective Size): None

Section: P Values: 0.0
Section: Cramér's V(Effective Size): 0.05468437226202706



- All of categorical features rejected the null values but showing a small impact on the profit.

In [11]:
# for numerical
columns = ['SumInsured','TotalPremium','TotalClaims','CalculatedPremiumPerTerm']
for col in columns:
    p_value, effect_size = eda.t_test_numerical('PostalCode',col,group_a,group_b)

    print(f"{col}: P Values: {p_value}")
    print(f"{col}: Cohen's d(Effective Size): {effect_size}\n")

SumInsured: P Values: 0.9096830627819994
SumInsured: Cohen's d(Effective Size): -0.00016147084710887592

TotalPremium: P Values: 0.07981789915792203
TotalPremium: Cohen's d(Effective Size): -0.0024934752671677343

TotalClaims: P Values: 0.9806280201733657
TotalClaims: Cohen's d(Effective Size): -3.456296310120932e-05

CalculatedPremiumPerTerm: P Values: 0.9895205637473221
CalculatedPremiumPerTerm: Cohen's d(Effective Size): -1.8695827481145636e-05



- All of the numerical features have a p-value greater than 0.05 thus doesn't reject the null hypothesis.

> 4. There are not significant risk difference between Women and Men

In [15]:
'''
Profit margin = Total Premium - Total Claims

'''

# Using risk metric group A and group B
group_a, group_b = eda.group_AB_margin('Gender')

print("Group A (Low Profit Gender):")
group_a = group_a[group_a['Gender'].isin(['Male','Female'])]           # remove not speicifed
print(group_a.head(3))

print("Group B (High Profit Gender):")
group_b = group_b[group_b['Gender'].isin(['Male','Female'])]           # remove not speicifed
print(group_b.head(3))


Group A (Low Profit Gender):
       Gender      Province PostalCode TransactionMonth        VehicleType  \
921275   Male       Gauteng        183       2014-05-01  Passenger Vehicle   
886470   Male       Gauteng       1496       2014-09-01  Passenger Vehicle   
850277   Male  Western Cape       7784       2014-06-01  Passenger Vehicle   

       RegistrationYear  SumInsured TermFrequency  TotalPremium  \
921275             2011      7500.0       Monthly           0.0   
886470             2012   5000000.0       Monthly           0.0   
850277             2014      7500.0       Monthly           0.0   

                                   Product  \
921275  Mobility Commercial Cover: Monthly   
886470  Mobility Commercial Cover: Monthly   
850277  Mobility Commercial Cover: Monthly   

                                      CoverType  TotalClaims  \
921275                        Emergency Charges          0.0   
886470                      Passenger Liability          0.0   
850277  Clea

In [18]:
# for categorical
columns = ['Province','PostalCode','VehicleType','RegistrationYear','TermFrequency','Product','CoverType','StatutoryRiskType','Section']

for col in columns:
    p_value, effect_size = eda.chi2_test('PostalCode', col, group_a, group_b)
    
    print(f"{col}: P Values: {p_value}")
    print(f"{col}: Cramér's V(Effective Size): {effect_size}\n")


Province: P Values: 2.951569660071686e-220
Province: Cramér's V(Effective Size): 0.14446745275993253

PostalCode: P Values: 0.0
PostalCode: Cramér's V(Effective Size): 1.0

VehicleType: P Values: 0.03383511710834468
VehicleType: Cramér's V(Effective Size): 0.011688561188416007

RegistrationYear: P Values: 8.395499329509113e-254
RegistrationYear: Cramér's V(Effective Size): 0.1583426665481803

TermFrequency: P Values: 1.1928278363546642e-269
TermFrequency: Cramér's V(Effective Size): 0.15757675604240834

Product: P Values: 7.529958625180555e-301
Product: Cramér's V(Effective Size): 0.1669778218978567

CoverType: P Values: 4.056001029689304e-295
CoverType: Cramér's V(Effective Size): 0.1698878933127173

Skipping StatutoryRiskType due to insufficient data.
StatutoryRiskType: P Values: None
StatutoryRiskType: Cramér's V(Effective Size): None

Section: P Values: 3.3612185841037533e-305
Section: Cramér's V(Effective Size): 0.16859099447818746



- Almost all of the categorical features reject the null hypothesis showcasing a small impact on profit.
- Conversely PostalCode has a p-value greater than 0.05 and effective size value of 1, consequently PostalCode has a statistically significant association with Gender(which is highly unlikely)

In [19]:
# for numerical
columns = ['SumInsured','TotalPremium','TotalClaims','CalculatedPremiumPerTerm']
for col in columns:
    p_value, effect_size = eda.t_test_numerical('PostalCode',col,group_a,group_b)

    print(f"{col}: P Values: {p_value}")
    print(f"{col}: Cohen's d(Effective Size): {effect_size}\n")

SumInsured: P Values: 0.8181206665981109
SumInsured: Cohen's d(Effective Size): 0.0004983295977082155

TotalPremium: P Values: 0.970611822586027
TotalPremium: Cohen's d(Effective Size): -7.983436736958942e-05

TotalClaims: P Values: 0.9455079920754902
TotalClaims: Cohen's d(Effective Size): 0.00014811186392176554

CalculatedPremiumPerTerm: P Values: 0.23579173343396534
CalculatedPremiumPerTerm: Cohen's d(Effective Size): 0.002569134467166921



- All of the numerical features have a p-value greater than 0.05 thus doesn't reject the null hypothesis.