# **Customer Lifetime Value (CLV) & Churn Prediction**

## ***Exploratory Data Analysis***
This EDA will include some important analysis to better understand the business progress and customer behaviour such as:
1. **Core Business Questions**
    * Which customer segments bring most revenue?
    * Are repeat customers more profitable?
    * Which product categories drive CLV?
    * Which behaviors signal churn?
2. Key Visualizations
    * Revenue trend over time
    * Cohort Analysis (Customer Retention by Month)
    * RFM Distribution
    * Seasonality & Spending patterns
3. Feature Insights for ML
    * Who to retain
    * Discount or loyalty program targets
    * Top 10 risk customers & revenue at risk

In [28]:
# Importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [29]:
# Importing Dataset
# Importing the cleaned dataset from the previous step
df = pd.read_csv('cleaned_transactions.csv')
df.head()

Unnamed: 0,invoice,stockcode,description,quantity,price,customer_id,country,invoice_date,total_price
0,489434,85048,15CM CHRISTMAS GLASS BALL 20 LIGHTS,12,6.95,13085.0,United Kingdom,2009-12-01 07:45:00,83.4
1,489434,79323P,PINK CHERRY LIGHTS,12,6.75,13085.0,United Kingdom,2009-12-01 07:45:00,81.0
2,489434,79323W,WHITE CHERRY LIGHTS,12,6.75,13085.0,United Kingdom,2009-12-01 07:45:00,81.0
3,489434,22041,"RECORD FRAME 7"" SINGLE SIZE",48,2.1,13085.0,United Kingdom,2009-12-01 07:45:00,100.8
4,489434,21232,STRAWBERRY CERAMIC TRINKET BOX,24,1.25,13085.0,United Kingdom,2009-12-01 07:45:00,30.0


* Data Overview

In [30]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 779495 entries, 0 to 779494
Data columns (total 9 columns):
 #   Column        Non-Null Count   Dtype  
---  ------        --------------   -----  
 0   invoice       779495 non-null  int64  
 1   stockcode     779495 non-null  object 
 2   description   779495 non-null  object 
 3   quantity      779495 non-null  int64  
 4   price         779495 non-null  float64
 5   customer_id   779495 non-null  float64
 6   country       779495 non-null  object 
 7   invoice_date  779495 non-null  object 
 8   total_price   779495 non-null  float64
dtypes: float64(3), int64(2), object(4)
memory usage: 53.5+ MB


In [31]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
invoice,779495.0,537427.005391,26901.96111,489434.0,514483.0,536754.0,562002.0,581587.0
quantity,779495.0,13.507085,146.540284,1.0,2.0,6.0,12.0,80995.0
price,779495.0,3.218199,29.674823,0.0,1.25,1.95,3.75,10953.5
customer_id,779495.0,15320.262918,1695.722988,12346.0,13971.0,15246.0,16794.0,18287.0
total_price,779495.0,22.289821,227.416962,0.0,4.95,12.48,19.8,168469.6


In [32]:
df.nunique()

invoice         36975
stockcode        4631
description      5283
quantity          438
price             666
customer_id      5881
country            41
invoice_date    34591
total_price      3735
dtype: int64

* The dataset contains 779495 rows, 9 columns and 5881 customers.
* Average Revenue per transaction is around 22.28.
* No column have any critical missing values after cleaning.