# Page Visits Funnel

Cool T-Shirts Inc. has asked you to analyze data on visits to their website. 

Your job is to build a funnel, which is a description of how many people continue to the next step of a multi-step process.

In this case, our funnel is going to describe the following process:

A user visits CoolTShirts.com
A user adds a t-shirt to their cart
A user clicks “checkout”
A user actually purchases a t-shirt

In [1]:
import pandas as pd

In [6]:
visits = pd.read_csv(r'D:\GIT_Repositories\pandas\Page_Visits_Funnel_Project\visits.csv',
                     parse_dates=[1])
cart = pd.read_csv(r'D:\GIT_Repositories\pandas\Page_Visits_Funnel_Project\cart.csv',
                   parse_dates=[1])
                   
checkout = pd.read_csv(r'D:\GIT_Repositories\pandas\Page_Visits_Funnel_Project\checkout.csv',
                       parse_dates=[1])
purchase = pd.read_csv(r'D:\GIT_Repositories\pandas\Page_Visits_Funnel_Project\purchase.csv',
                       parse_dates=[1])

### Step 1

Inspect the DataFrames using `print` and `head`

In [11]:
print(visits.head(5))
print(cart.head(5))
print(checkout.head(5))
print(purchase.head(5))

                                user_id          visit_time
0  943647ef-3682-4750-a2e1-918ba6f16188 2017-04-07 15:14:00
1  0c3a3dd0-fb64-4eac-bf84-ba069ce409f2 2017-01-26 14:24:00
2  6e0b2d60-4027-4d9a-babd-0e7d40859fb1 2017-08-20 08:23:00
3  6879527e-c5a6-4d14-b2da-50b85212b0ab 2017-11-04 18:15:00
4  a84327ff-5daa-4ba1-b789-d5b4caf81e96 2017-02-27 11:25:00
                                user_id           cart_time
0  2be90e7c-9cca-44e0-bcc5-124b945ff168 2017-11-07 20:45:00
1  4397f73f-1da3-4ab3-91af-762792e25973 2017-05-27 01:35:00
2  a9db3d4b-0a0a-4398-a55a-ebb2c7adf663 2017-03-04 10:38:00
3  b594862a-36c5-47d5-b818-6e9512b939b3 2017-09-27 08:22:00
4  a68a16e2-94f0-4ce8-8ce3-784af0bbb974 2017-07-26 15:48:00
                                user_id       checkout_time
0  d33bdc47-4afa-45bc-b4e4-dbe948e34c0d 2017-06-25 09:29:00
1  4ac186f0-9954-4fea-8a27-c081e428e34e 2017-04-07 20:11:00
2  3c9c78a7-124a-4b77-8d2e-e1926e011e7d 2017-07-13 11:38:00
3  89fe330a-8966-4756-8f7c-3bdbcd47279a 

### Step 2

#### Combine visits and cart using a left merge.

Left merging visits and cart

In [13]:
visits_cart = pd.merge(visits, cart, how='left')

In [15]:
visits_cart.head()

Unnamed: 0,user_id,visit_time,cart_time
0,943647ef-3682-4750-a2e1-918ba6f16188,2017-04-07 15:14:00,NaT
1,0c3a3dd0-fb64-4eac-bf84-ba069ce409f2,2017-01-26 14:24:00,2017-01-26 14:44:00
2,6e0b2d60-4027-4d9a-babd-0e7d40859fb1,2017-08-20 08:23:00,2017-08-20 08:31:00
3,6879527e-c5a6-4d14-b2da-50b85212b0ab,2017-11-04 18:15:00,NaT
4,a84327ff-5daa-4ba1-b789-d5b4caf81e96,2017-02-27 11:25:00,NaT


## Step 3

How long is your merged DataFrame?

In [16]:
total_visits = len(visits_cart)
total_visits

2000

## Step 4

How many of the timestamps are null for the column cart_time?

What do these null rows mean?

In [19]:
null_cart_times = len(visits_cart[visits_cart['cart_time'].isnull()])
null_cart_times

1652

#### We see that cart_time has 1652 null values. This tells us that 1652 of the 2000 people who visited the site never made it to the cart.m

## Step 5


What percent of users who visited Cool T-Shirts Inc. ended up not placing a t-shirt in their cart?

Note: To calculate percentages, it will be helpful to turn either the numerator or the denominator into a float, by using float(), 
with the number to convert passed in as input. Otherwise, Python will use integer division, which truncates decimal points.


In [53]:
prcnt_visited_not_cart =  float(null_cart_times) / float(total_visits)
prcnt_visited_not_cart

0.826

#### 82% of users who visited Cool T-Shirts Inc not/never placed a t-shirt in their cart

## Step 6

In [None]:
Repeat the left merge for cart and checkout and count null values. 
What percentage of users put items in their cart, but did not proceed to checkout?

In [26]:
cart_checkout = pd.merge(cart, checkout, how='left')
cart_checkout.head()

Unnamed: 0,user_id,cart_time,checkout_time
0,2be90e7c-9cca-44e0-bcc5-124b945ff168,2017-11-07 20:45:00,2017-11-07 21:14:00
1,4397f73f-1da3-4ab3-91af-762792e25973,2017-05-27 01:35:00,NaT
2,a9db3d4b-0a0a-4398-a55a-ebb2c7adf663,2017-03-04 10:38:00,2017-03-04 11:04:00
3,b594862a-36c5-47d5-b818-6e9512b939b3,2017-09-27 08:22:00,2017-09-27 08:26:00
4,a68a16e2-94f0-4ce8-8ce3-784af0bbb974,2017-07-26 15:48:00,NaT


In [30]:
total_carts = len(cart_checkout)
total_carts

348

In [28]:
cart_not_checkout = cart_checkout[cart_checkout['checkout_time'].isnull()]
cart_not_checkout.head()

Unnamed: 0,user_id,cart_time,checkout_time
1,4397f73f-1da3-4ab3-91af-762792e25973,2017-05-27 01:35:00,NaT
4,a68a16e2-94f0-4ce8-8ce3-784af0bbb974,2017-07-26 15:48:00,NaT
10,fd80ce93-ae6e-4c0b-9ea4-561f84152026,2017-06-07 01:18:00,NaT
19,48a23075-694b-417d-8449-9df921ad95aa,2017-07-09 15:28:00,NaT
21,5d7d121a-817c-4b84-b4d6-5388092b9aec,2017-06-10 14:54:00,NaT


In [29]:
null_checkout_times = len(cart_not_checkout)
null_checkout_times

122

In [32]:
prcnt_visit_cart_not_checkout = float(null_checkout_times) / float(total_carts)
prcnt_visit_cart_not_checkout

0.3505747126436782

#### ~35% of users who had put T-Shirts in cart didnt check them out

## Step 7

In [None]:
Merge all four steps of the funnel, in order, using a series of left merges. Save the results to the variable all_data.

Examine the result using print and head.

In [33]:
all_data = visits_cart \
           .merge(cart_checkout, how = 'left') \
           .merge(purchase, how = 'left')

In [34]:
all_data.head()

Unnamed: 0,user_id,visit_time,cart_time,checkout_time,purchase_time
0,943647ef-3682-4750-a2e1-918ba6f16188,2017-04-07 15:14:00,NaT,NaT,NaT
1,0c3a3dd0-fb64-4eac-bf84-ba069ce409f2,2017-01-26 14:24:00,2017-01-26 14:44:00,2017-01-26 14:54:00,2017-01-26 15:08:00
2,6e0b2d60-4027-4d9a-babd-0e7d40859fb1,2017-08-20 08:23:00,2017-08-20 08:31:00,NaT,NaT
3,6879527e-c5a6-4d14-b2da-50b85212b0ab,2017-11-04 18:15:00,NaT,NaT,NaT
4,a84327ff-5daa-4ba1-b789-d5b4caf81e96,2017-02-27 11:25:00,NaT,NaT,NaT


## Step 8

In [None]:
What percentage of users proceeded to checkout, but did not purchase a t-shirt?

In [48]:
reached_checkout = all_data[~all_data['checkout_time'].isnull()]
len(reached_checkout)

334

In [49]:
checkout_not_purchase = all_data[(all_data['purchase_time'].isnull()) & (~all_data['checkout_time'].isnull())]
len(checkout_not_purchase)

82

In [54]:
prcnt_checkout_not_purchase_percent = float(len(checkout_not_purchase)) / float(len(reached_checkout))

In [56]:
print("% of users who got to checkout but did not purchase:",prcnt_checkout_not_purchase_percent)

% of users who got to checkout but did not purchase: 0.24550898203592814


#### ~25% percentage of users proceeded to checkout, but did not purchase a t-shirt

## Step 9

In [None]:
Which step of the funnel is weakest (i.e., has the highest percentage of users not completing it)?

How might Cool T-Shirts Inc. change their website to fix this problem?

check each part of the funnel, let's print all 3 of them again

In [57]:
print("{} percent of users who visited the page did not add a t-shirt to their cart".format(round(prcnt_visited_not_cart*100, 2)))
print("{} percent of users who added a t-shirt to their cart did not checkout".format(round(prcnt_visit_cart_not_checkout*100, 2)))
print("{} percent of users who made it to checkout  did not purchase a shirt".format(round( prcnt_checkout_not_purchase_percent*100, 2)))


82.6 percent of users who visited the page did not add a t-shirt to their cart
35.06 percent of users who added a t-shirt to their cart did not checkout
24.55 percent of users who made it to checkout  did not purchase a shirt


#### Observation

The weakest part of the funnel is clearly getting a person who visited the site to add a tshirt to their cart. 

Once they have added a t-shirt to their cart it is fairly likely they end up purchasing it. 

A suggestion could be to make the add-to-cart button more prominent on the front page.

## Average Time to Purchase

Using the giant merged DataFrame all_data that you created, let’s calculate the average time from initial visit to final purchase. 

Add a column that is the difference between purchase_time and visit_time.

In [58]:
all_data.head()

Unnamed: 0,user_id,visit_time,cart_time,checkout_time,purchase_time
0,943647ef-3682-4750-a2e1-918ba6f16188,2017-04-07 15:14:00,NaT,NaT,NaT
1,0c3a3dd0-fb64-4eac-bf84-ba069ce409f2,2017-01-26 14:24:00,2017-01-26 14:44:00,2017-01-26 14:54:00,2017-01-26 15:08:00
2,6e0b2d60-4027-4d9a-babd-0e7d40859fb1,2017-08-20 08:23:00,2017-08-20 08:31:00,NaT,NaT
3,6879527e-c5a6-4d14-b2da-50b85212b0ab,2017-11-04 18:15:00,NaT,NaT,NaT
4,a84327ff-5daa-4ba1-b789-d5b4caf81e96,2017-02-27 11:25:00,NaT,NaT,NaT


In [59]:
visit_purchase = all_data[ ~all_data['visit_time'].isnull() & \
                           ~all_data['cart_time'].isnull() & \
                           ~all_data['checkout_time'].isnull() & \
                           ~all_data['purchase_time'].isnull()
          ]

In [60]:
visit_purchase

Unnamed: 0,user_id,visit_time,cart_time,checkout_time,purchase_time
1,0c3a3dd0-fb64-4eac-bf84-ba069ce409f2,2017-01-26 14:24:00,2017-01-26 14:44:00,2017-01-26 14:54:00,2017-01-26 15:08:00
14,486480e2-98c3-4d51-8f4b-b1c07228ce84,2017-01-27 16:34:00,2017-01-27 16:44:00,2017-01-27 17:10:00,2017-01-27 17:12:00
48,3ccdaf69-2d30-40de-b083-51372881aedd,2017-01-08 20:21:00,2017-01-08 20:38:00,2017-01-08 20:52:00,2017-01-08 21:02:00
49,3ccdaf69-2d30-40de-b083-51372881aedd,2017-01-08 20:21:00,2017-01-08 20:38:00,2017-01-08 20:52:00,2017-01-08 21:21:00
65,ab0125fc-9493-4f59-ad70-24ad264a3a0c,2017-11-18 03:21:00,2017-11-18 03:33:00,2017-11-18 03:57:00,2017-11-18 04:21:00
...,...,...,...,...,...
2083,d2cb350b-2201-4290-b2e0-84a8bf0d6883,2017-08-08 16:05:00,2017-08-08 16:07:00,2017-08-08 16:34:00,2017-08-08 16:34:00
2093,f46c88d0-2441-40a8-97fe-6841ff6f050d,2017-09-06 08:42:00,2017-09-06 09:02:00,2017-09-06 09:22:00,2017-09-06 09:28:00
2097,f783c680-1d9a-437d-9f45-7827299b78fa,2017-06-25 08:07:00,2017-06-25 08:08:00,2017-06-25 08:28:00,2017-06-25 08:39:00
2098,f783c680-1d9a-437d-9f45-7827299b78fa,2017-06-25 08:07:00,2017-06-25 08:08:00,2017-06-25 08:28:00,2017-06-25 08:35:00


In [68]:
all_data['time_to_purchase'] = visit_purchase['purchase_time'] - visit_purchase['visit_time']

In [69]:
all_data.head()

Unnamed: 0,user_id,visit_time,cart_time,checkout_time,purchase_time,time_to_purchase
0,943647ef-3682-4750-a2e1-918ba6f16188,2017-04-07 15:14:00,NaT,NaT,NaT,NaT
1,0c3a3dd0-fb64-4eac-bf84-ba069ce409f2,2017-01-26 14:24:00,2017-01-26 14:44:00,2017-01-26 14:54:00,2017-01-26 15:08:00,0 days 00:44:00
2,6e0b2d60-4027-4d9a-babd-0e7d40859fb1,2017-08-20 08:23:00,2017-08-20 08:31:00,NaT,NaT,NaT
3,6879527e-c5a6-4d14-b2da-50b85212b0ab,2017-11-04 18:15:00,NaT,NaT,NaT,NaT
4,a84327ff-5daa-4ba1-b789-d5b4caf81e96,2017-02-27 11:25:00,NaT,NaT,NaT,NaT


Average time to purchase

In [70]:
print(all_data['time_to_purchase'].mean())

0 days 00:43:12.380952380
