# Exploratory Data Analysis for AlphaCare Insurance Solutions (ACIS)

This notebook contains the exploratory data analysis for the car insurance data from AlphaCare Insurance Solutions (ACIS). The analysis aims to help optimize the marketing strategy and discover "low-risk" targets for which the premium could be reduced.

In [None]:
import sys
sys.path.append('..')

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from src.data_loader import DataLoader
from src.eda import EDA
from src.statistical_analysis import StatisticalAnalysis

## 1. Data Loading and Initial Exploration

In [None]:
data_loader = DataLoader('../resources/Data/machineLearning.txt')
data = data_loader.load_data()
print(data.head())
print(data.info())

## 2. Data Quality Assessment

In [None]:
missing_values = data_loader.check_missing_values()
print("Missing values:")
print(missing_values)

# Check data types
print("\nData types:")
print(data.dtypes)

## 3. Univariate Analysis

In [None]:
eda = EDA(data)

# Numerical columns
numerical_cols = ['TotalPremium', 'TotalClaims', 'SumInsured', 'CalculatedPremiumPerTerm']
eda.plot_histograms(numerical_cols, 'output')

# Categorical columns
categorical_cols = ['Citizenship', 'LegalType', 'MaritalStatus', 'Gender', 'ItemType', 'VehicleType']
for col in categorical_cols:
    plt.figure(figsize=(10, 6))
    data[col].value_counts().plot(kind='bar')
    plt.title(f'Distribution of {col}')
    plt.tight_layout()
    plt.show()

## 4. Bivariate and Multivariate Analysis

In [None]:
# Correlation matrix
eda.plot_correlation_matrix('output')

# Scatter plot of TotalPremium vs TotalClaims
plt.figure(figsize=(10, 6))
sns.scatterplot(x='TotalPremium', y='TotalClaims', data=data)
plt.title('TotalPremium vs TotalClaims')
plt.show()

# Box plot of TotalPremium by VehicleType
plt.figure(figsize=(12, 6))
sns.boxplot(x='VehicleType', y='TotalPremium', data=data)
plt.title('TotalPremium by VehicleType')
plt.xticks(rotation=45)
plt.show()

## 5. Outlier Detection

In [None]:
eda.plot_boxplots(numerical_cols, 'output')

## 6. Trends Over Geography

In [None]:
# Average premium by Province
avg_premium_by_province = data.groupby('Province')['TotalPremium'].mean().sort_values(ascending=False)
plt.figure(figsize=(12, 6))
avg_premium_by_province.plot(kind='bar')
plt.title('Average Premium by Province')
plt.ylabel('Average Premium')
plt.xticks(rotation=45)
plt.show()

# Distribution of CoverType by Province
cover_type_by_province = pd.crosstab(data['Province'], data['CoverType'], normalize='index')
cover_type_by_province.plot(kind='bar', stacked=True, figsize=(12, 6))
plt.title('Distribution of Cover Type by Province')
plt.ylabel('Proportion')
plt.legend(title='Cover Type', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

## 7. Statistical Analysis

In [None]:
stats_analysis = StatisticalAnalysis(data)

# Confidence interval for TotalPremium
ci_premium = stats_analysis.calculate_confidence_interval('TotalPremium')
print(f"95% Confidence Interval for TotalPremium: {ci_premium}")

# T-test between TotalPremium for different genders
male_premiums = data[data['Gender'] == 'Male']['TotalPremium']
female_premiums = data[data['Gender'] == 'Female']['TotalPremium']
t_stat, p_value = stats.ttest_ind(male_premiums, female_premiums)
print(f"T-test results for TotalPremium between genders: t-statistic = {t_stat}, p-value = {p_value}")

# Correlation between TotalPremium and SumInsured
correlation = stats_analysis.calculate_correlation('TotalPremium', 'SumInsured')
print(f"Correlation between TotalPremium and SumInsured: {correlation}")

## 8. Key Insights and Recommendations

Based on the exploratory data analysis, here are some key insights and recommendations:

1. [Insert key insight 1]
2. [Insert key insight 2]
3. [Insert key insight 3]

Recommendations for optimizing marketing strategy:
1. [Insert recommendation 1]
2. [Insert recommendation 2]
3. [Insert recommendation 3]

Potential "low-risk" targets for reduced premiums:
1. [Insert target group 1]
2. [Insert target group 2]
3. [Insert target group 3]