# User Drop-off & Funnel Analysis
### E-commerce User Behavior Case Study

# Introduction

Getting users to sign up is only half the battle... Getting them to actually *do something* is where most products struggle.

In this project, we analyze event-level user behavior from an e-commerce platform to understand how users move through the purchase funnel and, more importantly, where they drop off along the way. The goal is not just to count conversions, but to identify friction points in the user journey and understand what might be stopping users from moving forward.

We construct a clear conversion funnel, measure drop-offs at each stage, and analyze how long users take to progress between actions. The emphasis is on turning raw behavioral data into insights that a product or business team could realistically act on.

Given the size of the dataset i.e. millions of events, the analysis is intentionally scoped to a representative sample. This keeps the focus on patterns and decision-making.

## LET'S DO IT!!!!
![Funny gif](https://media.giphy.com/media/v1.Y2lkPTc5MGI3NjExN2NyZGluMDJlYzdkeWQ4YmFjdjE4bmZrZ2R1OHVsZXRhdmtxNnhmNSZlcD12MV9zdGlja2Vyc19zZWFyY2gmY3Q9cw/hiJ9ypGI5tIKdwKoK2/giphy.gif)

### Data Loading and Preparation

- The dataset used in this analysis is large and comes split across multiple files. Loading everything at once would be unnecessary, so the data loading process is scoped deliberately.

- A single month of data is selected to provide a representative snapshot of user behavior. Only the columns required for funnel analysis are loaded, and a row limit is applied to keep the dataset lightweight while still large enough to capture meaningful patterns.


In [3]:
# Installing  dependencies 
import kagglehub
from kagglehub import KaggleDatasetAdapter

file_path = "2019-Oct.csv"

# Loading dataset with pandas kwargs
df = kagglehub.load_dataset(
    KaggleDatasetAdapter.PANDAS,
    "mkechinov/ecommerce-behavior-data-from-multi-category-store",
    file_path,
    pandas_kwargs={
        "usecols": ["user_id", "event_type", "event_time"],
        "nrows": 500_000   # choosing just the enough rows
    }
)


  df = kagglehub.load_dataset(


The history saving thread hit an unexpected error (OperationalError('attempt to write a readonly database')).History will not be written to the database.


In [4]:
print(df.head())
print(df.shape)

                event_time event_type    user_id
0  2019-10-01 00:00:00 UTC       view  541312140
1  2019-10-01 00:00:00 UTC       view  554748717
2  2019-10-01 00:00:01 UTC       view  519107250
3  2019-10-01 00:00:01 UTC       view  550050854
4  2019-10-01 00:00:04 UTC       view  535871217
(500000, 3)


### Filtering Funnel Events

At this stage, the dataset still contains a wide variety of user actions, many of which are not directly relevant to conversion analysis. Since the goal is to understand how users move through the purchase funnel, the data is filtered to retain only the core funnel events: product views, cart additions, and purchases.

Event timestamps are then converted into a datetime format to allow for proper chronological ordering. This ensures that user actions can be accurately sequenced and that delays between funnel stages can be measured.

In [7]:
import pandas as pd
import numpy as np
df = df[df["event_type"].isin(["view", "cart", "purchase"])] #funnel-related events
df["event_time"] = pd.to_datetime(df["event_time"])  # Convert timestamp

df.head()


Unnamed: 0,event_time,event_type,user_id
0,2019-10-01 00:00:00+00:00,view,541312140
1,2019-10-01 00:00:00+00:00,view,554748717
2,2019-10-01 00:00:01+00:00,view,519107250
3,2019-10-01 00:00:01+00:00,view,550050854
4,2019-10-01 00:00:04+00:00,view,535871217


### Event Distribution

Before constructing the funnel, it is important to understand the overall distribution of events in the dataset. This helps validate assumptions about user behavior and provides early signals about where major drop-offs might occur.

In [8]:
# Distribution of funnel events
event_counts = df["event_type"].value_counts().reset_index()
event_counts.columns = ["event_type", "count"]
event_counts

Unnamed: 0,event_type,count
0,view,481833
1,purchase,9758
2,cart,8409


In [9]:
event_counts["percentage"] = (event_counts["count"] / event_counts["count"].sum() * 100)
event_counts

Unnamed: 0,event_type,count,percentage
0,view,481833,96.3666
1,purchase,9758,1.9516
2,cart,8409,1.6818


## Funnel Construction Approach 

Event-level counts alone do not represent how users progress through the funnel.  
To accurately measure drop-offs, the funnel must be constructed at the **user level**, ensuring that each user contributes at most once to each funnel stage.

For this analysis:
- Only the first occurrence of each funnel event per user is considered
- Events are ordered chronologically
- Funnel progression is evaluated in sequence: View → Cart → Purchase

This approach prevents inflated counts and reflects true user movement through the funnel.


In [10]:
df_sorted = df.sort_values(["user_id", "event_time"]) # Sorting the events by user and time
# Keep first occurrence of each event per user
df_first_events = (df_sorted.drop_duplicates(subset=["user_id", "event_type"], keep="first"))
df_first_events.head()

Unnamed: 0,event_time,event_type,user_id
387798,2019-10-01 08:47:35+00:00,view,244951053
1150,2019-10-01 01:32:09+00:00,view,306441847
187167,2019-10-01 05:55:19+00:00,view,321655812
459944,2019-10-01 09:46:41+00:00,view,330585300
337529,2019-10-01 08:06:33+00:00,view,332550649


### Funnel Stages

The funnel is defined using the following stages:

1. Product View  
2. Add to Cart  
3. Purchase  

Users are considered to have progressed through a stage only if they have completed all previous stages in the defined order.


In [11]:
# Count unique users at each funnel stage
funnel_counts = {
    "view": df_first_events[df_first_events["event_type"] == "view"]["user_id"].nunique(),
    "cart": df_first_events[df_first_events["event_type"] == "cart"]["user_id"].nunique(),
    "purchase": df_first_events[df_first_events["event_type"] == "purchase"]["user_id"].nunique()
}

funnel_df = (pd.DataFrame.from_dict(funnel_counts, orient="index", columns=["users"]).reset_index().rename(columns={"index": "funnel_stage"}))

# Calculate drop-offs
funnel_df["drop_off"] = funnel_df["users"].shift(1) - funnel_df["users"]
funnel_df["drop_off_percentage"] = funnel_df["drop_off"] / funnel_df["users"].shift(1) * 100

funnel_df


  has_large_values = (abs_vals > 1e6).any()
  has_small_values = ((abs_vals < 10 ** (-self.digits)) & (abs_vals > 0)).any()
  has_small_values = ((abs_vals < 10 ** (-self.digits)) & (abs_vals > 0)).any()


Unnamed: 0,funnel_stage,users,drop_off,drop_off_percentage
0,view,89108,,
1,cart,4441,84667.0,95.01616
2,purchase,7362,-2921.0,-65.773474


### Enforcing Funnel Order

A valid funnel requires users to complete each stage in sequence.  
Simply counting users who performed each event independently can lead to misleading results, especially when events occur across different sessions or when some steps are skipped in the logs.

To address this, funnel progression is enforced sequentially:
- Users must view a product before being counted as cart users
- Users must add to cart before being counted as purchasers

This ensures that each funnel stage represents a true progression rather than isolated event participation.


In [13]:
# Create user-level event presence table
user_events = (
    df_first_events
    .pivot(index="user_id", columns="event_type", values="event_time")
)

user_events.head()


event_type,cart,purchase,view
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
244951053,NaT,NaT,2019-10-01 08:47:35+00:00
306441847,NaT,NaT,2019-10-01 01:32:09+00:00
321655812,NaT,NaT,2019-10-01 05:55:19+00:00
330585300,NaT,NaT,2019-10-01 09:46:41+00:00
332550649,NaT,NaT,2019-10-01 08:06:33+00:00


In [14]:
# Users who viewed
view_users = user_events[user_events["view"].notna()]

# Users who viewed AND carted
cart_users = view_users[view_users["cart"].notna()]

# Users who viewed, carted AND purchased
purchase_users = cart_users[cart_users["purchase"].notna()]


In [15]:
funnel_fixed = pd.DataFrame({
    "funnel_stage": ["view", "cart", "purchase"],
    "users": [
        view_users.shape[0],
        cart_users.shape[0],
        purchase_users.shape[0]
    ]
})

funnel_fixed["drop_off"] = funnel_fixed["users"].shift(1) - funnel_fixed["users"]
funnel_fixed["drop_off_percentage"] = (
    funnel_fixed["drop_off"] / funnel_fixed["users"].shift(1) * 100
)

funnel_fixed


  has_large_values = (abs_vals > 1e6).any()
  has_small_values = ((abs_vals < 10 ** (-self.digits)) & (abs_vals > 0)).any()
  has_small_values = ((abs_vals < 10 ** (-self.digits)) & (abs_vals > 0)).any()


Unnamed: 0,funnel_stage,users,drop_off,drop_off_percentage
0,view,89108,,
1,cart,4438,84670.0,95.019527
2,purchase,2753,1685.0,37.967553


Initial event-level counts overstated funnel progression. Enforcing sequential funnel logic revealed the true magnitude of early-stage drop-offs...

## Funnel Insights

The funnel analysis reveals a significant drop-off at the earliest stage of the user journey. While a large number of users view products, only a small fraction proceed to add items to their cart.

Once users add a product to the cart, the likelihood of completing a purchase increases substantially. This suggests that the primary challenge lies in converting initial interest into purchase intent, rather than in the checkout process itself.

In other words, the funnel leaks heavily at the top, but performs relatively well once users show strong intent.


### Time-to-Event Analysis
To better understand the friction observed between product views and cart additions, we analyze the time taken by users to progress between these stages. Longer delays may indicate hesitation, uncertainty, or lack of sufficient information at the product view stage.

In [16]:
# Users who viewed and carted
view_cart_users = cart_users.copy()

# Time taken from view to cart
view_cart_users["view_to_cart_time"] = (
    view_cart_users["cart"] - view_cart_users["view"]
).dt.total_seconds() / 60  # minutes

view_cart_users["view_to_cart_time"].describe()


count    4438.000000
mean       20.495186
std        55.772174
min       -47.966667
25%         0.583333
50%         1.800000
75%         8.445833
max       588.150000
Name: view_to_cart_time, dtype: float64

In [17]:
# Remove negative or zero time differences
view_cart_users = view_cart_users[
    view_cart_users["view_to_cart_time"] > 0
]

view_cart_users["view_to_cart_time"].describe()


count    4429.000000
mean       20.549048
std        55.811994
min         0.050000
25%         0.583333
50%         1.800000
75%         8.516667
max       588.150000
Name: view_to_cart_time, dtype: float64

## Time-to-Event Insights

- The majority of users who add items to their cart do so quickly after viewing a product. The median time from product view to cart addition is under two minutes, suggesting that users who convert often make fast, confident decisions.
  
- However, the distribution shows a long tail, with some users taking significantly longer to proceed. This indicates hesitation or delayed decision-making, potentially due to comparison behavior, uncertainty about pricing, or lack of sufficient product information.

- A small number of negative time differences were observed, likely caused by events occurring across different sessions or outside the sampled time window. These cases were excluded to ensure only valid funnel progressions were analyzed.


In [18]:
# Create time-based segments
view_cart_users["speed_segment"] = pd.cut(
    view_cart_users["view_to_cart_time"],
    bins=[0, 2, 10, np.inf],
    labels=["fast", "medium", "slow"]
)

view_cart_users["speed_segment"].value_counts(normalize=True) * 100


speed_segment
fast      51.907880
medium    25.107248
slow      22.984872
Name: proportion, dtype: float64

## User Speed Segmentation Insights

- Users who add items to their cart can be grouped based on how quickly they move from viewing a product to taking action. Over half of converting users add items to their cart within two minutes, indicating strong or pre-existing purchase intent.

- However, a substantial portion of users take longer to make this decision. These medium and slow segments likely represent users who are comparing options, seeking reassurance, or waiting for additional information before committing.

- This suggests that while fast converters require minimal intervention, there is a meaningful opportunity to influence hesitant users through better product information, trust signals, or contextual nudges at the product view stage.


## Product Recommendations

Based on the funnel and time-based analysis, the following product interventions are recommended:

1. **Strengthen the product view experience**  
   Improve clarity around pricing, key features, and value propositions to help users form intent faster.

2. **Introduce intent nudges for hesitant users**  
   Use lightweight nudges such as social proof, limited-time messaging, or comparison highlights for users who spend longer on product views.

3. **Preserve frictionless checkout for high-intent users**  
   Since cart-to-purchase conversion is relatively strong, the checkout experience should remain streamlined with minimal additional steps.

These recommendations focus on reducing early-stage friction while maintaining efficiency for users who already intend to convert.


## Executive Summary

This project analyzed user behavior data from an e-commerce platform to identify where users drop off in the purchase funnel. The analysis revealed a significant loss of users between product views and cart additions, indicating that the primary challenge lies in converting initial interest into purchase intent.

Time-to-event analysis showed that users who convert tend to do so quickly, while a substantial segment of users hesitate before committing. This suggests that improving the product view experience and providing better decision support could meaningfully improve conversion rates.

By focusing on early-stage intent formation rather than checkout optimization, the platform can target the most impactful area for improving overall performance.
