# Page Funnel Visits project
Project for Codecademy's ***Data Scientist: Analytics*** Bootcamp

## Overview
Cool T-Shirts Inc. has asked you to analyze data on visits to their website. Your job is to build a funnel, which is a description of how many people continue to the next step of a multi-step process.

In this case, our funnel is going to describe the following process:
- A user visits CoolTShirts.com
- A user adds a t-shirt to their cart
- A user clicks “checkout”
- A user actually purchases a t-shirtt-shirt

In [3]:
import pandas as pd

Import all the files

In [5]:
visits = pd.read_csv('visits.csv',
                     parse_dates=[1])
cart = pd.read_csv('cart.csv',
                   parse_dates=[1])
                   
checkout = pd.read_csv('checkout.csv',
                       parse_dates=[1])
purchase = pd.read_csv('purchase.csv',
                       parse_dates=[1])

**Step 1:** Inspect the DataFrames

In [7]:
print(visits.head())
print(cart.head())
print(checkout.head())
print(purchase.head())

                                user_id          visit_time
0  943647ef-3682-4750-a2e1-918ba6f16188 2017-04-07 15:14:00
1  0c3a3dd0-fb64-4eac-bf84-ba069ce409f2 2017-01-26 14:24:00
2  6e0b2d60-4027-4d9a-babd-0e7d40859fb1 2017-08-20 08:23:00
3  6879527e-c5a6-4d14-b2da-50b85212b0ab 2017-11-04 18:15:00
4  a84327ff-5daa-4ba1-b789-d5b4caf81e96 2017-02-27 11:25:00
                                user_id           cart_time
0  2be90e7c-9cca-44e0-bcc5-124b945ff168 2017-11-07 20:45:00
1  4397f73f-1da3-4ab3-91af-762792e25973 2017-05-27 01:35:00
2  a9db3d4b-0a0a-4398-a55a-ebb2c7adf663 2017-03-04 10:38:00
3  b594862a-36c5-47d5-b818-6e9512b939b3 2017-09-27 08:22:00
4  a68a16e2-94f0-4ce8-8ce3-784af0bbb974 2017-07-26 15:48:00
                                user_id       checkout_time
0  d33bdc47-4afa-45bc-b4e4-dbe948e34c0d 2017-06-25 09:29:00
1  4ac186f0-9954-4fea-8a27-c081e428e34e 2017-04-07 20:11:00
2  3c9c78a7-124a-4b77-8d2e-e1926e011e7d 2017-07-13 11:38:00
3  89fe330a-8966-4756-8f7c-3bdbcd47279a 

**Step 2:** Combine `visits` and `cart`

In [9]:
visits_cart = pd.merge(visits, cart, how = 'left')
visits_cart.head()

Unnamed: 0,user_id,visit_time,cart_time
0,943647ef-3682-4750-a2e1-918ba6f16188,2017-04-07 15:14:00,NaT
1,0c3a3dd0-fb64-4eac-bf84-ba069ce409f2,2017-01-26 14:24:00,2017-01-26 14:44:00
2,6e0b2d60-4027-4d9a-babd-0e7d40859fb1,2017-08-20 08:23:00,2017-08-20 08:31:00
3,6879527e-c5a6-4d14-b2da-50b85212b0ab,2017-11-04 18:15:00,NaT
4,a84327ff-5daa-4ba1-b789-d5b4caf81e96,2017-02-27 11:25:00,NaT


**Step 3:** How long is your merged DataFrame?

In [11]:
total_visits = len(visits_cart)
print(total_visits)

2000


**Step 4:** How many of the timestamps are null for the column `cart_time`?

In [13]:
null_carts = len(visits_cart[visits_cart.cart_time.isnull()])

print(null_carts)

1652


- From the 2000 visitor records we have available, 1652 never added a T-Shirt to their cart.

**Step 5:** What percentage of users who visited Cool T-Shirts Inc. ended up *not* placing a t-shirt in their cart?

In [16]:
pc_visit_not_cart = (null_carts / len(visits_cart)) * 100
print('{}% of the people in our records were only visitors.'.format(pc_visit_not_cart))

82.6% of the people in our records were only visitors.


**Step 6:** What percentage of users put items in their cart, but did not proceed to checkout?

In [18]:
cart_checkout = pd.merge(cart, checkout, how = 'left')
cart_checkout.head()

Unnamed: 0,user_id,cart_time,checkout_time
0,2be90e7c-9cca-44e0-bcc5-124b945ff168,2017-11-07 20:45:00,2017-11-07 21:14:00
1,4397f73f-1da3-4ab3-91af-762792e25973,2017-05-27 01:35:00,NaT
2,a9db3d4b-0a0a-4398-a55a-ebb2c7adf663,2017-03-04 10:38:00,2017-03-04 11:04:00
3,b594862a-36c5-47d5-b818-6e9512b939b3,2017-09-27 08:22:00,2017-09-27 08:26:00
4,a68a16e2-94f0-4ce8-8ce3-784af0bbb974,2017-07-26 15:48:00,NaT


In [19]:
total_carts = len(cart_checkout)
null_checkout = len(cart_checkout[cart_checkout.checkout_time.isnull()])

pc_cart_no_checkout = round((null_checkout / total_carts) * 100, 2)

print('From all the users that added something to their cart, {}% did not continue the checkout process.'.format(pc_cart_no_checkout))

From all the users that added something to their cart, 35.06% did not continue the checkout process.


**Step 7:** Merge all four steps of the funnel, in order. Save the results to `all_data`.

In [21]:
all_data = visits_cart.merge(checkout, how = 'left').merge(purchase, how = 'left')
all_data.head()

Unnamed: 0,user_id,visit_time,cart_time,checkout_time,purchase_time
0,943647ef-3682-4750-a2e1-918ba6f16188,2017-04-07 15:14:00,NaT,NaT,NaT
1,0c3a3dd0-fb64-4eac-bf84-ba069ce409f2,2017-01-26 14:24:00,2017-01-26 14:44:00,2017-01-26 14:54:00,2017-01-26 15:08:00
2,6e0b2d60-4027-4d9a-babd-0e7d40859fb1,2017-08-20 08:23:00,2017-08-20 08:31:00,NaT,NaT
3,6879527e-c5a6-4d14-b2da-50b85212b0ab,2017-11-04 18:15:00,NaT,NaT,NaT
4,a84327ff-5daa-4ba1-b789-d5b4caf81e96,2017-02-27 11:25:00,NaT,NaT,NaT


**Step 8:** What percentage of users proceeded to checkout, but did not purchase a t-shirt?

In [23]:
checkout_purchase = pd.merge(checkout, purchase, how = 'left')
total_checkout = len(checkout_purchase)
null_purchase = len(checkout_purchase[checkout_purchase.purchase_time.isnull()])

pc_check_no_purchase = round((null_purchase / total_checkout) * 100, 2)
print('From the users who did click on *checkout*, there was a {}% that did not completed their purchase.'.format(pc_check_no_purchase))

From the users who did click on *checkout*, there was a 24.55% that did not completed their purchase.


**Step 9:** Which step of the funnel is the weakest (i.e., has the highest percentage of users not completing it)?

How might Cool T-Shirts Inc. change their website to fix this problem?

In [25]:
print('{}% of users who visited, did not add anything to their cart.'.format(pc_visit_not_cart))
print('{}% of users who added something to their cart, did not click on checkout.'.format(pc_cart_no_checkout))
print('{}% of users who clicked on checkout, did not completed their purchase.'.format(pc_check_no_purchase))

82.6% of users who visited, did not add anything to their cart.
35.06% of users who added something to their cart, did not click on checkout.
24.55% of users who clicked on checkout, did not completed their purchase.


*The weakest part of the funnel is clearly getting a person who visited the site to add a t-shirt to their cart. Once they've added a t-shirt to their cart it is fairly likely they end up purchasing it. A suggestion could be to make the add-to-cart button more prominent on the front page.*


**Step 10:** Let's calculate the average time from initial visit to purchase

In [27]:
all_data['time_to_purchase'] = all_data.purchase_time - all_data.visit_time

**Step 11:** Examine the results

In [29]:
all_data

Unnamed: 0,user_id,visit_time,cart_time,checkout_time,purchase_time,time_to_purchase
0,943647ef-3682-4750-a2e1-918ba6f16188,2017-04-07 15:14:00,NaT,NaT,NaT,NaT
1,0c3a3dd0-fb64-4eac-bf84-ba069ce409f2,2017-01-26 14:24:00,2017-01-26 14:44:00,2017-01-26 14:54:00,2017-01-26 15:08:00,0 days 00:44:00
2,6e0b2d60-4027-4d9a-babd-0e7d40859fb1,2017-08-20 08:23:00,2017-08-20 08:31:00,NaT,NaT,NaT
3,6879527e-c5a6-4d14-b2da-50b85212b0ab,2017-11-04 18:15:00,NaT,NaT,NaT,NaT
4,a84327ff-5daa-4ba1-b789-d5b4caf81e96,2017-02-27 11:25:00,NaT,NaT,NaT,NaT
...,...,...,...,...,...,...
2103,33913ac2-03da-45ae-8fc3-fea39df827c6,2017-03-25 03:29:00,NaT,NaT,NaT,NaT
2104,4f850132-b99d-4623-80e6-6e61d003577e,2017-01-08 09:57:00,NaT,NaT,NaT,NaT
2105,f0830b9b-1f5c-4e74-b63d-3f847cc6ce70,2017-09-07 12:56:00,NaT,NaT,NaT,NaT
2106,b01bffa7-63ba-4cd3-9d93-eb1477c23831,2017-07-20 04:37:00,NaT,NaT,NaT,NaT


**Step 12:** Calculate the average time to purchase

In [31]:
avg_purchase_time = all_data.time_to_purchase.mean()
print('The average time a user takes to go from initial visit to purchase is:', avg_purchase_time)

The average time a user takes to go from initial visit to purchase is: 0 days 00:43:12.380952380
