## 🧱 0. Setup & Imports

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
import os
import pandas as pd
import numpy as np
import warnings
from pathlib import Path
import matplotlib.pyplot as plt
import seaborn as sns
from dotenv import load_dotenv
import sweetviz
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score, classification_report

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
#suppress warning errors
warnings.filterwarnings("ignore")
#load enviroment variables
load_dotenv()

True

In [5]:
# Automatically go to project root (where .git or README.md is)
project_root = Path.cwd()
while not (project_root / "README.md").exists() and project_root != project_root.parent:
    project_root = project_root.parent

os.chdir(project_root)
print("Project root set to:", project_root)


Project root set to: /home/teshager/Documents/10Academy/repositories/projects/insurance-risk-modeling


## 🔁 Step 1: Look at the Big Picture

### Framing the problem,Business Objective,Current Solution,Machine Learning Task & Assumptions:

| Element                | Description                                                               |
| ---------------------- | ------------------------------------------------------------------------- |
| **Business Objective** | Predict risk & set optimal premiums                                       |
| **Current Solution**   | Flat-rate pricing based on limited heuristics                             |
| **ML Tasks**           | Regression (severity), classification (probability), regression (premium) |
| **Assumptions**        | Data is representative, features correlate with claim behavior, etc.      |


## 📦 Step 2: Load the Data

In [6]:
# Load the insurance data
processed_data_dir=os.getenv("PROCESSED_DATA")
file_path= os.path.join(processed_data_dir,'cleaned_insurance_data.parquet')
df=pd.read_parquet(file_path)
df.head()


Unnamed: 0,UnderwrittenCoverID,PolicyID,TransactionMonth,IsVATRegistered,Citizenship,LegalType,Title,Bank,AccountType,MaritalStatus,...,CalculatedPremiumPerTerm,ExcessSelected,CoverCategory,CoverType,CoverGroup,Section,Product,TotalPremium,TotalClaims,Gender_Inferred
0,145249,12827,2015-03-01 00:00:00,True,,Close Corporation,Mr,First National Bank,Current account,Not specified,...,25.0,Mobility - Windscreen,Windscreen,Windscreen,Comprehensive - Taxi,Motor Comprehensive,Mobility Metered Taxis: Monthly,21.929825,0.0,Male
1,145249,12827,2015-05-01 00:00:00,True,,Close Corporation,Mr,First National Bank,Current account,Not specified,...,25.0,Mobility - Windscreen,Windscreen,Windscreen,Comprehensive - Taxi,Motor Comprehensive,Mobility Metered Taxis: Monthly,21.929825,0.0,Male
2,145249,12827,2015-07-01 00:00:00,True,,Close Corporation,Mr,First National Bank,Current account,Not specified,...,25.0,Mobility - Windscreen,Windscreen,Windscreen,Comprehensive - Taxi,Motor Comprehensive,Mobility Metered Taxis: Monthly,0.0,0.0,Male
3,145255,12827,2015-05-01 00:00:00,True,,Close Corporation,Mr,First National Bank,Current account,Not specified,...,584.6468,Mobility - Metered Taxis - R2000,Own damage,Own Damage,Comprehensive - Taxi,Motor Comprehensive,Mobility Metered Taxis: Monthly,512.84807,0.0,Male
4,145255,12827,2015-07-01 00:00:00,True,,Close Corporation,Mr,First National Bank,Current account,Not specified,...,584.6468,Mobility - Metered Taxis - R2000,Own damage,Own Damage,Comprehensive - Taxi,Motor Comprehensive,Mobility Metered Taxis: Monthly,0.0,0.0,Male


## 📊 Step 3: Explore the Data

## Step 4 Discover and Visualize the Data to Gain Insights

In [7]:
report = sweetviz.analyze(df)
report.show_html("reports/insurance_sweetviz.html")


Done! Use 'show' commands to display/save.   |██████████| [100%]   00:02 -> (00:00 left)

Report reports/insurance_sweetviz.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.





#### Sweetviz

### Quick Look at the Data Structure:

### Visualize the Data:

In [8]:
# 📊 Step 2: Run AutoViz Analysis
AV = AutoViz_Class()

# Specify your target column (change to 'HadClaim' for classification)
dft = AV.AutoViz(
    filename="", 
    dfte=df, 
    depVar="TotalClaims",  # or "HadClaim"
    verbose=1
)


NameError: name 'AutoViz_Class' is not defined

### Look for Correlations:

## 🧹 Step 4: Prepare the Data for ML Algorithms

### Separate Features and Labels: 

### Handle Missing Values:

### Handle Text and Categorical Attributes:

### Feature Scaling:

### Custom Transformers: 

### Transformation Pipelines:

## 🔧 Step 5: Select a Model and Train

### Baseline models:

### Training on a Small Set (for initial sanity check):

### Evaluate on the Training Set:

### Better Evaluation Using Cross-Validation:

## Step 6 : Fine-Tune Your Model

### optuna

### Grid Search:

### Analyze the Best Models and Their Errors:

### Ensemble Methods: 

## 📈 Step 7. Evaluate System on the Test Set

### Final Evaluation:

### Confidence Intervals:

##  🧠 Step  8. Present Your Solution

### Model Explainability  -SHAP

## 🚀 Step 9: Launch, Monitor, and Maintain System