# Project - Page Visits Funnel

Cool T-Shirts Inc. has asked you to analyze data on visits to their website. Your job is to build a funnel, which is a description of how many people continue to the next step of a multi-step process.

In this case, our funnel is going to describe the following process:

- A user visits CoolTShirts.com
- A user adds a t-shirt to their cart
- A user clicks “checkout”
- A user actually purchases a t-shirt

In [1]:
import numpy as np
import pandas as pd

In [2]:
# display dataframes side by side
from IPython.display import display_html
def display_side_by_side(*args):
    html_str=''
    for df in args:
        html_str+=df.to_html()
    display_html(html_str.replace('table','table style="display:inline"'),raw=True)

In [3]:
visits = pd.read_csv('visits.csv',
                     parse_dates=[1])
cart = pd.read_csv('cart.csv',
                   parse_dates=[1])
checkout = pd.read_csv('checkout.csv',
                       parse_dates=[1])
purchase = pd.read_csv('purchase.csv',
                       parse_dates=[1])

display_side_by_side(visits.head(), cart.head(), 
                     checkout.head(), purchase.head())

Unnamed: 0,user_id,visit_time
0,943647ef-3682-4750-a2e1-918ba6f16188,2017-04-07 15:14:00
1,0c3a3dd0-fb64-4eac-bf84-ba069ce409f2,2017-01-26 14:24:00
2,6e0b2d60-4027-4d9a-babd-0e7d40859fb1,2017-08-20 08:23:00
3,6879527e-c5a6-4d14-b2da-50b85212b0ab,2017-11-04 18:15:00
4,a84327ff-5daa-4ba1-b789-d5b4caf81e96,2017-02-27 11:25:00

Unnamed: 0,user_id,cart_time
0,2be90e7c-9cca-44e0-bcc5-124b945ff168,2017-11-07 20:45:00
1,4397f73f-1da3-4ab3-91af-762792e25973,2017-05-27 01:35:00
2,a9db3d4b-0a0a-4398-a55a-ebb2c7adf663,2017-03-04 10:38:00
3,b594862a-36c5-47d5-b818-6e9512b939b3,2017-09-27 08:22:00
4,a68a16e2-94f0-4ce8-8ce3-784af0bbb974,2017-07-26 15:48:00

Unnamed: 0,user_id,checkout_time
0,d33bdc47-4afa-45bc-b4e4-dbe948e34c0d,2017-06-25 09:29:00
1,4ac186f0-9954-4fea-8a27-c081e428e34e,2017-04-07 20:11:00
2,3c9c78a7-124a-4b77-8d2e-e1926e011e7d,2017-07-13 11:38:00
3,89fe330a-8966-4756-8f7c-3bdbcd47279a,2017-04-20 16:15:00
4,3ccdaf69-2d30-40de-b083-51372881aedd,2017-01-08 20:52:00

Unnamed: 0,user_id,purchase_time
0,4b44ace4-2721-47a0-b24b-15fbfa2abf85,2017-05-11 04:25:00
1,02e684ae-a448-408f-a9ff-dcb4a5c99aac,2017-09-05 08:45:00
2,4b4bc391-749e-4b90-ab8f-4f6e3c84d6dc,2017-11-20 20:49:00
3,a5dbb25f-3c36-4103-9030-9f7c6241cd8d,2017-01-22 15:18:00
4,46a3186d-7f5a-4ab9-87af-84d05bfd4867,2017-06-11 11:32:00


In [13]:
# combine visit and cart using left merge
visits_cart_left = pd.merge(visits, cart, how='left')
display(visits_cart_left.head())
print('Number of rows: ', len(visits_cart_left))

Unnamed: 0,user_id,visit_time,cart_time
0,943647ef-3682-4750-a2e1-918ba6f16188,2017-04-07 15:14:00,NaT
1,0c3a3dd0-fb64-4eac-bf84-ba069ce409f2,2017-01-26 14:24:00,2017-01-26 14:44:00
2,6e0b2d60-4027-4d9a-babd-0e7d40859fb1,2017-08-20 08:23:00,2017-08-20 08:31:00
3,6e0b2d60-4027-4d9a-babd-0e7d40859fb1,2017-08-20 08:23:00,2017-08-20 08:49:00
4,6879527e-c5a6-4d14-b2da-50b85212b0ab,2017-11-04 18:15:00,NaT


Number of rows:  2052


In [30]:
# How many of the timestamps are null for the column cart_time?
nnull = len(visits_cart_left[visits_cart_left.cart_time.isnull()])
print("Number of null timestamps: ", nnull)

Number of null timestamps:  1652


In [31]:
# What percent of users who visited Cool T-Shirts Inc. ended up NOT placing a t-shirt in their cart?
visit_not_cart = float(nnull) / len(visits_cart_left) * 100
print(visit_not_cart)

80.50682261208577


In [34]:
# What percentage of users put items in their cart, but did not proceed to checkout?
cart_checkout_left = pd.merge(cart, checkout, how='left')
nnull = len(cart_checkout_left[cart_checkout_left.checkout_time.isnull()])
cart_not_checkout = float(nnull) / len(cart_checkout_left) * 100
print(cart_not_checkout)

20.930232558139537


In [24]:
# merge all data
all_data = visits\
    .merge(cart, how="left")\
    .merge(checkout, how="left")\
    .merge(purchase, how="left")
all_data.head()

Unnamed: 0,user_id,visit_time,cart_time,checkout_time,purchase_time
0,943647ef-3682-4750-a2e1-918ba6f16188,2017-04-07 15:14:00,NaT,NaT,NaT
1,0c3a3dd0-fb64-4eac-bf84-ba069ce409f2,2017-01-26 14:24:00,2017-01-26 14:44:00,2017-01-26 14:54:00,2017-01-26 15:08:00
2,6e0b2d60-4027-4d9a-babd-0e7d40859fb1,2017-08-20 08:23:00,2017-08-20 08:31:00,NaT,NaT
3,6e0b2d60-4027-4d9a-babd-0e7d40859fb1,2017-08-20 08:23:00,2017-08-20 08:49:00,NaT,NaT
4,6879527e-c5a6-4d14-b2da-50b85212b0ab,2017-11-04 18:15:00,NaT,NaT,NaT


In [35]:
# What percentage of users proceeded to checkout, but did not purchase a t-shirt?
checkout_purchase_left = pd.merge(checkout, purchase, how='left')
nnull = len(checkout_purchase_left[checkout_purchase_left.purchase_time.isnull()])
checkout_not_purchase = float(nnull) / len(checkout_purchase_left) * 100
print(checkout_not_purchase)

16.88963210702341


Which step of the funnel is weakest (i.e., has the highest percentage of users not completing it)? How might Cool T-Shirts Inc. change their website to fix this problem?

In [36]:
print(visit_not_cart)
print(cart_not_checkout)
print(checkout_not_purchase)

80.50682261208577
20.930232558139537
16.88963210702341


They need to change the way from visit to cart.

In [37]:
all_data['time_to_purchase'] = \
    all_data.purchase_time - \
    all_data.visit_time
all_data.head()

Unnamed: 0,user_id,visit_time,cart_time,checkout_time,purchase_time,time_to_purchase
0,943647ef-3682-4750-a2e1-918ba6f16188,2017-04-07 15:14:00,NaT,NaT,NaT,NaT
1,0c3a3dd0-fb64-4eac-bf84-ba069ce409f2,2017-01-26 14:24:00,2017-01-26 14:44:00,2017-01-26 14:54:00,2017-01-26 15:08:00,00:44:00
2,6e0b2d60-4027-4d9a-babd-0e7d40859fb1,2017-08-20 08:23:00,2017-08-20 08:31:00,NaT,NaT,NaT
3,6e0b2d60-4027-4d9a-babd-0e7d40859fb1,2017-08-20 08:23:00,2017-08-20 08:49:00,NaT,NaT,NaT
4,6879527e-c5a6-4d14-b2da-50b85212b0ab,2017-11-04 18:15:00,NaT,NaT,NaT,NaT


In [38]:
# Calculate the average time to purchase using the following code:
print(all_data.time_to_purchase.mean())

0 days 00:44:02.672413
