<a href="https://colab.research.google.com/github/TuanNguyenDin/Ecommerce-customer-behavior/blob/main/ecommerce_churn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import pandas as pd
import kagglehub
from kagglehub import KaggleDatasetAdapter

## What question(s) are you trying to answer?

- Do certain age groups, genders, countries, or cities have higher or lower churn rates?
- How do membership duration and login frequency correlate with customer churn? Are long-term members less likely to churn?
- Do metrics like average session duration, pages per session, cart abandonment rate, and wishlist items indicate a higher propensity to churn?
- Is there a relationship between email open rates, customer service calls, product reviews, and social media engagement scores with churn?
- How do total purchases, average order value, days since last purchase, discount usage, returns rate, payment method diversity, lifetime value, and credit balance influence churn?
- Does mobile app usage play a role in customer retention or churn?
- Can we build a predictive model to identify customers at high risk of churning based on their behavioral and demographic data?

## What are the expected outcomes?

From this dataset, we can anticipate several key outcomes:

- Identification of Key Churn Drivers: We expect to pinpoint specific demographic, behavioral, and transactional factors that strongly correlate with customer churn. This includes understanding how age, gender, country, login frequency, session duration, purchase history, and engagement metrics influence whether a customer churns.
- Customer Segmentation: We should be able to identify distinct customer segments with varying churn probabilities. For instance, certain age groups or customers from specific regions might exhibit different churn behaviors.
- Behavioral Patterns Leading to Churn: Insights into specific actions or lack thereof that precede churn.
- Impact of Customer Service and Engagement: We can analyze the role of customer service interactions, product reviews, and social media engagement in customer retention.
- Ultimately, the analysis should lead to concrete, data-driven recommendations for improving customer retention strategies, such as targeted marketing campaigns, product improvements, or enhanced customer support.

## Dataset Overview

The dataset used in this project is sourced from Kaggle, titled "E-commerce Customer Behavior Dataset." It contains various attributes related to customer interactions and characteristics within an e-commerce platform. The goal is to identify patterns, understand customer churn, and derive actionable insights.

### Key Features:
- **Demographic Information:** Age, Gender, Country, City.
- **Membership Details:** Membership Years, Login Frequency.
- **Website Interaction:** Session Duration (Avg), Pages Per Session, Cart Abandonment Rate, Wishlist Items.
- **Engagement & Communication:** Email Open Rate, Customer Service Calls, Product Reviews Written, Social Media Engagement Score.
- **Transactional Information:** Total Purchases, Payment Method Diversity, Lifetime Value, Credit Balance.
- **Mobile App Usage:** Mobile App Usage.
- **Churn Status:** A binary indicator (`Churned`) representing whether a customer has churned.
- **Signup Details:** Signup Quarter.

The dataset provides a rich foundation for exploring customer segmentation, predicting churn, and optimizing marketing strategies.

In [None]:
ecommerce_df = kagglehub.load_dataset(
  KaggleDatasetAdapter.PANDAS,
  "dhairyajeetsingh/ecommerce-customer-behavior-dataset",
  "ecommerce_customer_churn_dataset.csv"
)
ecommerce_df.head(5)

  ecommerce_df = kagglehub.load_dataset(


Downloading from https://www.kaggle.com/api/v1/datasets/download/dhairyajeetsingh/ecommerce-customer-behavior-dataset?dataset_version_number=1&file_name=ecommerce_customer_churn_dataset.csv...


100%|██████████| 5.83M/5.83M [00:01<00:00, 3.71MB/s]


Unnamed: 0,Age,Gender,Country,City,Membership_Years,Login_Frequency,Session_Duration_Avg,Pages_Per_Session,Cart_Abandonment_Rate,Wishlist_Items,...,Email_Open_Rate,Customer_Service_Calls,Product_Reviews_Written,Social_Media_Engagement_Score,Mobile_App_Usage,Payment_Method_Diversity,Lifetime_Value,Credit_Balance,Churned,Signup_Quarter
0,43.0,Male,France,Marseille,2.9,14.0,27.4,6.0,50.6,3.0,...,17.9,9.0,4.0,16.3,20.8,1.0,953.33,2278.0,0,Q1
1,36.0,Male,UK,Manchester,1.6,15.0,42.7,10.3,37.7,1.0,...,42.8,7.0,3.0,,23.3,3.0,1067.47,3028.0,0,Q4
2,45.0,Female,Canada,Vancouver,2.9,10.0,24.8,1.6,70.9,1.0,...,0.0,4.0,1.0,,8.8,,1289.75,2317.0,0,Q4
3,56.0,Female,USA,New York,2.6,10.0,38.4,14.8,41.7,9.0,...,41.4,2.0,5.0,85.9,31.0,3.0,2340.92,2674.0,0,Q1
4,35.0,Male,India,Delhi,3.1,29.0,51.4,,19.1,9.0,...,37.9,1.0,11.0,83.0,50.4,4.0,3041.29,5354.0,0,Q4


In [None]:
ecommerce_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 25 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Age                            47505 non-null  float64
 1   Gender                         50000 non-null  object 
 2   Country                        50000 non-null  object 
 3   City                           50000 non-null  object 
 4   Membership_Years               50000 non-null  float64
 5   Login_Frequency                50000 non-null  float64
 6   Session_Duration_Avg           46601 non-null  float64
 7   Pages_Per_Session              47000 non-null  float64
 8   Cart_Abandonment_Rate          50000 non-null  float64
 9   Wishlist_Items                 46000 non-null  float64
 10  Total_Purchases                50000 non-null  float64
 11  Average_Order_Value            50000 non-null  float64
 12  Days_Since_Last_Purchase       47000 non-null 