# Customer Modeling and Prediction

This notebook builds simple predictive models on top of the customer clusters.
The goal is not deep learning, but creating interpretable and business-usable
predictions that can be shown clearly on a dashboard.


In [1]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.metrics import mean_absolute_error, accuracy_score

## Objective

The objective of this notebook is to:
- Use clustered customer data
- Build simple prediction models
- Generate confidence scores
- Save trained models for the dashboard

In [3]:
print("Starting customer model building")

Starting customer model building


## Load Customer Feature Dataset

This dataset was generated in the clustering notebook and contains
customer-level features along with cluster labels.

In [5]:
data_path = "../data/customer_features.csv"
df = pd.read_csv(data_path)
df.head()

Unnamed: 0,customer_id,total_orders,total_quantity,total_spend,avg_order_value,active_days,cluster
0,12346.0,1,74215,77183.6,77183.6,0,1
1,12347.0,7,2458,4310.0,23.681319,365,0
2,12348.0,4,2341,1797.24,57.975484,282,0
3,12349.0,1,631,1757.55,24.076027,0,0
4,12350.0,1,197,334.4,19.670588,0,0


## Select Features for Modeling

Only numerical and interpretable features are selected.
These features will be used across multiple models.

In [6]:
feature_cols = [
    "total_orders",
    "total_quantity",
    "total_spend",
    "avg_order_value",
    "active_days"
]

X = df[feature_cols]

## Model 1: Customer Value Prediction

- This model predicts how valuable a customer is based on historical behavior.
- A simple regression model is used for transparency and ease of explanation.

In [7]:
y_value = df["total_spend"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y_value, test_size=0.2, random_state=42
)

scaler_value = StandardScaler()
X_train_scaled = scaler_value.fit_transform(X_train)
X_test_scaled = scaler_value.transform(X_test)

## Train Value Prediction Model

In [8]:
value_model = LinearRegression()
value_model.fit(X_train_scaled, y_train)

y_pred = value_model.predict(X_test_scaled)
mae = mean_absolute_error(y_test, y_pred)

mae

2.156150322289648e-12

## Value Prediction Confidence

Confidence is calculated based on prediction error.
Lower error indicates higher confidence.

In [9]:
value_confidence = 1 - (mae / y_value.mean())
value_confidence

np.float64(0.999999999999999)

## Model 2: Customer Stability Classification

This model predicts whether a customer is stable or unstable.
A binary classification approach is used for simplicity.

In [10]:
df["stable_customer"] = np.where(df["active_days"] > df["active_days"].median(), 1, 0)

y_stability = df["stable_customer"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y_stability, test_size=0.2, random_state=42
)

scaler_stability = StandardScaler()
X_train_scaled = scaler_stability.fit_transform(X_train)
X_test_scaled = scaler_stability.transform(X_test)

## Train Stability Classification Model

In [15]:
stability_model = LogisticRegression()
stability_model.fit(X_train_scaled, y_train)

y_pred_class = stability_model.predict(X_test_scaled)
y_pred_class_train = stability_model.predict(X_train_scaled)
accuracy = accuracy_score(y_test, y_pred_class)
accuracy_train = accuracy_score(y_train, y_pred_class_train)
accuracy, accuracy_train

(0.9988479262672811, 0.9985590778097982)

## Stability Risk Probability

Probabilities are extracted to show risk levels instead of hard labels.

In [16]:
stability_probabilities = stability_model.predict_proba(X_test_scaled)[:, 1]
stability_probabilities[:5]

array([6.30654354e-04, 9.30349976e-01, 3.36673431e-03, 6.96487101e-01,
       6.60182505e-04])

## Save Trained Models

The trained models are saved and later loaded by the Streamlit dashboard
for real-time predictions.

In [17]:
import joblib

joblib.dump(value_model, "../models/value_model.pkl")
joblib.dump(stability_model, "../models/stability_model.pkl")

print("Models saved successfully")

Models saved successfully


## Output Summary

This notebook produced:
- A customer value prediction model
- A customer stability classification model
- Confidence and probability metrics

These models will be directly used in the dashboard to display predictions
with confidence.