<a href="https://colab.research.google.com/github/anshupandey/MSA-analytics/blob/main/Model_Monitoring/Lab4_Model_Retraining_Strategies.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab 4: Model Retraining Strategies
**Objective**: Learn strategies for retraining models in response to detected drift or performance degradation.

This lab will help us understand when and how to retrain models, build retraining pipelines, and assess improvements.

## Step 1: Criteria for Model Retraining
- PSI > 0.2 for one or more features
- Drop in model performance (e.g., F1-Score < threshold)
- Time-based schedule (e.g., monthly retrain)
- Significant increase in prediction errors


In [None]:
import pandas as pd

# Load dataset
url = "https://raw.githubusercontent.com/anshupandey/MSA-analytics/refs/heads/main/datasets/Ocean_Hull_Insurance_datasetv2.csv"
df = pd.read_csv(url)
df.head()

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

X = df.drop('Claim_Occurred', axis=1)
y = df['Claim_Occurred']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

numeric_features = X.select_dtypes(include=['int64', 'float64']).columns.tolist()
categorical_features = X.select_dtypes(include=['object']).columns.tolist()

preprocessor = ColumnTransformer([
    ('num', StandardScaler(), numeric_features),
    ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features)
])

pipeline = Pipeline([
    ('preprocessor', preprocessor),
    ('classifier', LogisticRegression(max_iter=1000))
])

pipeline.fit(X_train, y_train)

In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

y_pred = pipeline.predict(X_test)
print("Before Retraining:")
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred))
print("Recall:", recall_score(y_test, y_pred))
print("F1-Score:", f1_score(y_test, y_pred))

## Step 2: Retraining with Updated Dataset
We simulate new data arriving over time and use it for retraining.

In [None]:
# Simulate new data (using the test set here)
X_new = X_test.copy()
y_new = y_test.copy()

# Retrain the model
pipeline.fit(X_new, y_new)

# Evaluate after retraining
y_pred_new = pipeline.predict(X_new)
print("After Retraining:")
print("Accuracy:", accuracy_score(y_new, y_pred_new))
print("Precision:", precision_score(y_new, y_pred_new))
print("Recall:", recall_score(y_new, y_pred_new))
print("F1-Score:", f1_score(y_new, y_pred_new))