<a href="https://colab.research.google.com/github/anshupandey/MSA-analytics/blob/main/Model_Monitoring/Lab6_BeeRelevant_Model_Monitoring.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Bee-Relevant Hull Insurance Dataset - Model Monitoring & Retraining Labs
This notebook covers:
- Drift detection using PSI, Kolmogorov–Smirnov Test, and Jensen–Shannon Divergence
- Model training, evaluation, and retraining strategy
- Building a monitoring dashboard

Dataset: Bee-Relevant Ocean Hull Insurance (Anonymized)

In [None]:
import pandas as pd

# Load the dataset
df = pd.read_excel("Ocean Hull Data for Bee-Relevant - Anonymised.xlsx")
df.head()

## Step 1: Exploratory Data Analysis
Let's explore the data to identify key features and target variables.

In [None]:
# Summary and null counts
df.info()
df.isnull().sum().sort_values(ascending=False)

## Lab 2: Drift Detection - PSI, KS Test, and Jensen–Shannon Divergence
We'll compare the distribution of selected features between a historical training slice and a recent/current slice.

In [None]:
from sklearn.model_selection import train_test_split
import numpy as np
from scipy.stats import ks_2samp, entropy
import matplotlib.pyplot as plt

# Drop rows with missing values
df_clean = df.dropna()

# Create a binary split assuming 'Policy Inception Date' defines time
df_clean['Policy Inception Date'] = pd.to_datetime(df_clean['Policy Inception Date'])
cutoff_date = df_clean['Policy Inception Date'].quantile(0.5)

train_slice = df_clean[df_clean['Policy Inception Date'] <= cutoff_date]
current_slice = df_clean[df_clean['Policy Inception Date'] > cutoff_date]

# Pick numeric features for analysis
features = ['Gross Premium', 'Sum Insured', 'Vessel Age']
psi_values = {}

def calculate_psi(expected, actual, buckets=10):
    def scale_range(data, buckets):
        return np.percentile(data, np.linspace(0, 100, buckets + 1))

    breakpoints = scale_range(expected, buckets)
    expected_percents = np.histogram(expected, bins=breakpoints)[0] / len(expected)
    actual_percents = np.histogram(actual, bins=breakpoints)[0] / len(actual)

    psi = np.sum((expected_percents - actual_percents) * np.log(expected_percents / actual_percents + 1e-6))
    return psi

for feature in features:
    psi = calculate_psi(train_slice[feature], current_slice[feature])
    psi_values[feature] = psi

psi_values