Model Training
Model Training and Evaluation

Having preprocessed the data and saved it to a CSV file, the next step is to build and evaluate various models. This section outlines the approach for model training, evaluation, and feature importance analysis to ensure we achieve the best model for predicting insurance claims.

The objective of this report is to document the complete process of training, evaluating, and interpreting machine learning models for predicting car insurance claims. The preprocessed dataset (created in a previous step) was used to train various models, and each model was evaluated based on relevant metrics to identify the best-performing one. Additionally, SHAP analysis was used to explain feature importance and model predictions.

In [2]:
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


from sklearn.metrics import mean_squared_error, r2_score
import os, sys
# Add the 'scripts' directory to the Python path for module imports
sys.path.append(os.path.abspath(os.path.join('..', 'scripts')))
# Set max rows and columns to display
pd.set_option('display.max_columns', 200)
pd.set_option('display.max_rows', 200)
import warnings
warnings.filterwarnings("ignore", message="")

Train-Test Split Implementation Steps:
Load the Data:

Import the preprocessed dataset from the CSV file.

Divide the Data:
Use train_test_split from scikit-learn to split the data into training and test sets.

In [3]:
# Load the preprocessed data
df = pd.read_csv('../data/preprocessed_data.csv')

# Define features and target variable
X = df.drop(columns=['TotalPremium'])
y = df['TotalPremium']

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)