# Customer Behaviour Analysis

Customer Behavior Analysis is a process that involves examining and understanding how customers interact with a business, product, or service. This analysis helps organizations make informed decisions, tailor their strategies, and enhance customer experiences.

process we can follow for the task of Customer Behaviour Analysis:

- Collect data related to customer interactions. It can include purchase history, website visits, social media engagement, customer feedback, and more.
- Identify and address data inconsistencies, missing values, and outliers to ensure the data’s quality and accuracy.
- Calculate basic statistics like mean, median, and standard deviation to summarize data.
- Create visualizations such as histograms, scatter plots, and bar charts to explore trends, patterns, and anomalies in the data.
- Use techniques like clustering to group customers based on common behaviours or characteristics.

Now let's get started with importing the necessary libraries

In [2]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go


In [5]:
#read data
data= pd.read_csv("ecommerce_customer_data.csv")

In [6]:
data.head()

Unnamed: 0,User_ID,Gender,Age,Location,Device_Type,Product_Browsing_Time,Total_Pages_Viewed,Items_Added_to_Cart,Total_Purchases
0,1,Female,23,Ahmedabad,Mobile,60,30,1,0
1,2,Male,25,Kolkata,Tablet,30,38,9,4
2,3,Male,32,Bangalore,Desktop,37,13,5,0
3,4,Male,35,Delhi,Mobile,7,20,10,3
4,5,Male,27,Bangalore,Tablet,35,20,8,2


In [7]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 9 columns):
 #   Column                 Non-Null Count  Dtype 
---  ------                 --------------  ----- 
 0   User_ID                500 non-null    int64 
 1   Gender                 500 non-null    object
 2   Age                    500 non-null    int64 
 3   Location               500 non-null    object
 4   Device_Type            500 non-null    object
 5   Product_Browsing_Time  500 non-null    int64 
 6   Total_Pages_Viewed     500 non-null    int64 
 7   Items_Added_to_Cart    500 non-null    int64 
 8   Total_Purchases        500 non-null    int64 
dtypes: int64(6), object(3)
memory usage: 35.3+ KB


Before moving forward let's have a look at the summary statistics for both numerical and categorical culomns in the dataset

In [9]:
#Summary statistics for numerical columns
numeric_summary= data.describe()

In [10]:
numeric_summary

Unnamed: 0,User_ID,Age,Product_Browsing_Time,Total_Pages_Viewed,Items_Added_to_Cart,Total_Purchases
count,500.0,500.0,500.0,500.0,500.0,500.0
mean,250.5,26.276,30.74,27.182,5.15,2.464
std,144.481833,5.114699,15.934246,13.071596,3.203127,1.740909
min,1.0,18.0,5.0,5.0,0.0,0.0
25%,125.75,22.0,16.0,16.0,2.0,1.0
50%,250.5,26.0,31.0,27.0,5.0,2.0
75%,375.25,31.0,44.0,38.0,8.0,4.0
max,500.0,35.0,60.0,50.0,10.0,5.0


In [12]:
#Summary statistics for numerical columns
categorical_summary= data.describe(include= 'object')
categorical_summary

Unnamed: 0,Gender,Location,Device_Type
count,500,500,500
unique,2,8,3
top,Male,Kolkata,Mobile
freq,261,71,178


let's visualize the distibution of Age at the dataset using histogram


In [20]:
fig= px.histogram(data, x='Age', title= 'Distribution of Age' )
fig.show()

Now let's see the gender distribution


In [28]:
fig1= px.histogram(data, x= 'Gender', title= 'Gender Distribution')
fig1.show()

In [29]:
gender_count= data['Gender'].value_counts().reset_index()
gender_count.columns= ['Gender', 'Count']

In [31]:
fig2= px.bar(gender_count, x= 'Gender',y= 'Count',
             title=' Gender Distribution')
fig2.show()

Now let's discover the relationship between the poduct browsing time and the total pages viewed

In [32]:
fig= px.scatter(data, x='Product_Browsing_Time',
               y='Total_Pages_Viewed',
               title= 'Product Browsing Time vs. Total Pages Viewed',
               trendline= 'ols')
fig.show()

The above scatter plot shows no consistent pattern or strong association between the time spent browsing products and the total number of pages viewed. It indicates that customers are not necessarily exploring more pages if they spend more time on the website, which might be due to various factors such as the website design, content relevance, or individual user preferences.

Now, let's have a look at the avrage total pages viewed by gender

In [38]:
#grouped analysis
pages_by_gender= data.groupby('Gender')['Total_Pages_Viewed'].mean().reset_index()
pages_by_gender.columns=['Gender', 'Average_Total_Pages_Viewed']

In [39]:
pages_by_gender

Unnamed: 0,Gender,Average_Total_Pages_Viewed
0,Female,27.577406
1,Male,26.819923


In [40]:
fig= px.bar(pages_by_gender, x='Gender', y='Average_Total_Pages_Viewed',
           title='Average Total Pages Viewed by Gender')
fig.show()

Now, let’s have a look at the average total pages viewed by devices:

In [41]:
devices_grouped = data.groupby('Device_Type')['Total_Pages_Viewed'].mean().reset_index()
devices_grouped.columns = ['Device_Type', 'Average_Total_Pages_Viewed']
fig = px.bar(devices_grouped, x='Device_Type', y='Average_Total_Pages_Viewed',
             title='Average Total Pages Viewed by Devices')
fig.show()

Now, let's calculate the customer lifetime value and visualize segments

In [42]:
data['CLV']= (data['Total_Purchases']* data['Total_Pages_Viewed'])/data['Age']

In [43]:
data['CLV']

0      0.000000
1      6.080000
2      0.000000
3      1.714286
4      1.481481
         ...   
495    0.000000
496    7.083333
497    6.473684
498    4.571429
499    1.290323
Name: CLV, Length: 500, dtype: float64

In [46]:
data['Segment']= pd.cut(data['CLV'], bins=[1,2.5,5, float('inf')],
                       labels=['Low value', 'Medium value', 'High value'])

segment_counts= data['Segment'].value_counts().reset_index()
segment_counts.columns= ['Segment', 'Count']

In [48]:
segment_counts

Unnamed: 0,Segment,Count
0,Low value,131
1,Medium value,116
2,High value,91


In [49]:
# Create a bar chart to visualize the customer segments
fig = px.bar(segment_counts, x='Segment', y='Count', 
             title='Customer Segmentation by CLV')

In [50]:
fig.update_xaxes(title='Segment')
fig.update_yaxes(title='Number of Customers')
fig.show()

Now, let’s have a look at the conversion funnel of the customers:

In [55]:
#Funnel analysis
funnel_data= data[['Product_Browsing_Time', 'Items_Added_to_Cart', 'Total_Purchases']]
funnel_data = funnel_data.groupby(['Product_Browsing_Time', 'Items_Added_to_Cart']).sum().reset_index()
funnel_data

Unnamed: 0,Product_Browsing_Time,Items_Added_to_Cart,Total_Purchases
0,5,2,3
1,5,3,5
2,5,6,0
3,5,7,1
4,5,8,3
...,...,...,...
337,60,1,0
338,60,6,0
339,60,7,5
340,60,8,10


In [56]:
fig = px.funnel(funnel_data, x='Product_Browsing_Time', y='Items_Added_to_Cart', title='Conversion Funnel')
fig.show()

Now, let’s have a look at the churn rate of the customers:

In [63]:
# Calculate churn rate
data['Churned'] = data['Total_Purchases'] == 0

churn_rate = data['Churned'].mean()
print(churn_rate)

2.464
