## Customer Churn Prediction Using RFM Analysis

This notebook identifies customers at risk of churn using Recency, Frequency, and Monetary (RFM) features.
A logistic regression model is used to predict churn based on customer purchase behavior.

In [1]:
import pandas as pd

In [3]:
df = pd.read_csv("../data/processed/cleaned_retail_data.csv")
df["InvoiceDate"] = pd.to_datetime(df["InvoiceDate"])

In [5]:
snapshot_date = df["InvoiceDate"].max() + pd.Timedelta(days=1)

rfm = (
    df.groupby("CustomerID")
    .agg({
        "InvoiceDate": lambda x: (snapshot_date - x.max()).days,
        "InvoiceID": "nunique",
        "Revenue": "sum"
    })
    .reset_index()
)

rfm.columns = ["CustomerID", "Recency", "Frequency", "Monetary"]
rfm.head()

Unnamed: 0,CustomerID,Recency,Frequency,Monetary
0,12346.0,165,11,372.86
1,12347.0,3,2,1323.32
2,12348.0,74,1,222.16
3,12349.0,43,3,2671.14
4,12351.0,11,1,300.93


In [7]:
rfm["Churn"] = (rfm["Recency"] > 90).astype(int)

In [12]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

X = rfm[["Recency", "Frequency", "Monetary"]]
y = rfm["Churn"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

churn_model = LogisticRegression(max_iter=1000)
churn_model.fit(X_train, y_train)

y_pred = churn_model.predict(X_test)
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00       571
           1       1.00      1.00      1.00       292

    accuracy                           1.00       863
   macro avg       1.00      1.00      1.00       863
weighted avg       1.00      1.00      1.00       863

