## Predictive Customer Lifetime Value & AI-Driven Retention Strategy

## 1. Business Objective
E-commerce companies often struggle to identify which customers will generate the highest long-term revenue.

The goal of this project is to:
- Predict Customer Lifetime Value (CLV)
- Identify high-value customers
- Enable targeted retention and upsell strategies
- Support data-driven marketing decisions

## 2. Data Preparation 
The dataset was cleaned and engineered to prepare features relevant for CLV prediction.

Key preprocessing steps:
- Converted monetary and behavioral columns to numeric
- Handled missing values
- Removed irrelevant columns
- Engineered historical CLV

In [16]:
import pandas as pd

In [17]:
df = pd.read_csv(r"C:\Users\shrey\Downloads\e-commerce_churn_dataset_cleaned.csv")

## 3. Feature Engineering
Historical Customer Lifetime Value (CLV) was calculated using: 
**CLV = Avg_Order_Value x Total_Orders**
Adjusted by Loyalty Score to reflect long-term engagement impact.

In [18]:
df['Avg_Order_Value'] = pd.to_numeric(df['Avg_Order_Value'], errors='coerce').fillna(0)
df['Total_Orders'] = pd.to_numeric(df['Total_Orders'], errors='coerce').fillna(0)
df['Loyalty_Score'] = pd.to_numeric(df['Loyalty_Score'], errors='coerce').fillna(0)
df['Email_Open_Rate'] = pd.to_numeric(df['Email_Open_Rate'], errors='coerce').fillna(0)

df.drop(columns=['Customer_Since'], inplace=True)
df['CLV_Historical'] = df['Avg_Order_Value'] * df['Total_Orders']
df['CLV_Historical'] = df['CLV_Historical'] * (1 + df['Loyalty_Score']/100)

## 4. Predictive Modeling
A Random Forest Regressor was used to predict future CLV based on: 
- Demographics
- Purchase behavior
- Loyalty metrics
- Engagement data

The dataset was split into training and testing sets (80/20).

In [4]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor

df_encoded = pd.get_dummies(df, columns=['Gender', 'Country', 'Preferred_Category', 'Churn_Risk'], drop_first=True)


In [5]:
X = df_encoded.drop(['Customer_Id', 'CLV_Historical', 'Last_Purchase', 'Avg_Order_Value', 'Total_Orders'], axis=1)
y = df_encoded['CLV_Historical']
customer_ids = df_encoded['Customer_Id']

In [6]:
X_train, X_test, y_train, y_test, ids_train, ids_test = train_test_split(
    X, y, customer_ids, test_size=0.2, random_state=42
)

## 5. Model Training & CLV Prediction
The model was trained and used to predict CLV for unseen customers.
Predicted CLV values were then merged back to the original dataset to identify high-value customers.

In [7]:
# Train Random Forest
rf = RandomForestRegressor(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

# Predict
y_pred = rf.predict(X_test)

# Combine predictions with Customer_Id
df_test_preds = pd.DataFrame({
    'Customer_Id': ids_test,
    'Predicted_CLV': y_pred
})


In [8]:
df_test_preds.head()

Unnamed: 0,Customer_Id,Predicted_CLV
1501,CUST_2927,1775.08324
2586,CUST_8980,938.47325
2653,CUST_1910,1517.402453
1055,CUST_3106,1401.594174
705,CUST_7677,1396.220933


## 6. Business Application: Identifying Top 20 High-Value Customers
Customers were ranked based on predicted CLV.

Top 20 customers were selected to:
- Prioritize retention strategies
- Design personalized campaigns
- Reduce churn risk
- Increase revenue concentration efficiency

## 7. Deployment: AI-Powered Insight Simulator
The top 20 predicted high-value customers were exported to CSV and deployed via a Streamlit web application.

The application: 
- Displays high-value customers
- Generates automated AI-style strategic insights
- Simulates marketing decision recommendations

Live Demo: [https://shrey0561-beyond-the-first-click-custome-genai-simulator-ryznlg.streamlit.app/]

In [9]:
df_test_preds = pd.DataFrame({
    'Customer_Id': ids_test,
    'Predicted_CLV': y_pred
})
# Merge back into original df to get full customer info
df_full = df.merge(df_test_preds, on='Customer_Id', how='left')

# Select top 20 predicted CLV customers
top_customers = df_full.sort_values('Predicted_CLV', ascending=False).head(20)

top_customers = top_customers[[
    'Customer_Id', 'Age', 'Gender', 'Country', 'Preferred_Category', 'Email_Open_Rate', 'Loyalty_Score', 'Churn_Risk', 'Predicted_CLV'
]]

top_customers.to_csv("top_customers.csv", index=False)

## 8. Business Impact
This project demonstrates how predictive analytics can: 
* Increase retention efficiency by targeting high-value customers
* Improve marketing ROI through personalized segmentation
* Support revenue forcasting
* Enable AI-driven decision support tools

Instead of reacting to churn, the model enables proactive strategy.