# Page Visits Funnel

## This a project for the data science career path in Codecademy.

#### For this particular project, a dataset and instructions were provided as guidance to know what information was needed for this project, but I did all the coding myself. 

Cool T-Shirts Inc. has asked you to analyze data on visits to their website. Your job is to build a funnel, which is a description of how many people continue to the next step of a multi-step process.

In this case, our funnel is going to describe the following process:

    - A user visits CoolTShirts.com
    - A user adds a t-shirt to their cart
    - A user clicks “checkout”
    - A user actually purchases a t-shirt


#### 1. Inspect the DataFrames using print and head:

`visits` lists all of the users who have visited the website.
`cart` lists all of the users who have added a t-shirt to their cart.
`checkout` lists all of the users who have started the checkout.
`purchase` lists all of the users who have purchased a t-shirt.




In [55]:
import pandas as pd
import numpy as np

visits = pd.read_csv('visits.csv',
                     parse_dates=[1])
cart = pd.read_csv('cart.csv',
                   parse_dates=[1])
checkout = pd.read_csv('checkout.csv',
                       parse_dates=[1])
purchase = pd.read_csv('purchase.csv',
                       parse_dates=[1])


print(visits.head())
print(cart.head())
print(checkout.head())
print(purchase.head())
print(len(visits))
print(len(cart))
print(len(checkout))
print(len(purchase))

                                user_id          visit_time
0  943647ef-3682-4750-a2e1-918ba6f16188 2017-04-07 15:14:00
1  0c3a3dd0-fb64-4eac-bf84-ba069ce409f2 2017-01-26 14:24:00
2  6e0b2d60-4027-4d9a-babd-0e7d40859fb1 2017-08-20 08:23:00
3  6879527e-c5a6-4d14-b2da-50b85212b0ab 2017-11-04 18:15:00
4  a84327ff-5daa-4ba1-b789-d5b4caf81e96 2017-02-27 11:25:00
                                user_id           cart_time
0  2be90e7c-9cca-44e0-bcc5-124b945ff168 2017-11-07 20:45:00
1  4397f73f-1da3-4ab3-91af-762792e25973 2017-05-27 01:35:00
2  a9db3d4b-0a0a-4398-a55a-ebb2c7adf663 2017-03-04 10:38:00
3  b594862a-36c5-47d5-b818-6e9512b939b3 2017-09-27 08:22:00
4  a68a16e2-94f0-4ce8-8ce3-784af0bbb974 2017-07-26 15:48:00
                                user_id       checkout_time
0  d33bdc47-4afa-45bc-b4e4-dbe948e34c0d 2017-06-25 09:29:00
1  4ac186f0-9954-4fea-8a27-c081e428e34e 2017-04-07 20:11:00
2  3c9c78a7-124a-4b77-8d2e-e1926e011e7d 2017-07-13 11:38:00
3  89fe330a-8966-4756-8f7c-3bdbcd47279a 

#### 2. Combine visits and cart using a left merge.

In [56]:
v_and_c = pd.merge(visits, cart, how = 'left')
print(v_and_c)

                                   user_id          visit_time  \
0     943647ef-3682-4750-a2e1-918ba6f16188 2017-04-07 15:14:00   
1     0c3a3dd0-fb64-4eac-bf84-ba069ce409f2 2017-01-26 14:24:00   
2     6e0b2d60-4027-4d9a-babd-0e7d40859fb1 2017-08-20 08:23:00   
3     6e0b2d60-4027-4d9a-babd-0e7d40859fb1 2017-08-20 08:23:00   
4     6879527e-c5a6-4d14-b2da-50b85212b0ab 2017-11-04 18:15:00   
...                                    ...                 ...   
2047  33913ac2-03da-45ae-8fc3-fea39df827c6 2017-03-25 03:29:00   
2048  4f850132-b99d-4623-80e6-6e61d003577e 2017-01-08 09:57:00   
2049  f0830b9b-1f5c-4e74-b63d-3f847cc6ce70 2017-09-07 12:56:00   
2050  b01bffa7-63ba-4cd3-9d93-eb1477c23831 2017-07-20 04:37:00   
2051  0336ca81-8d68-443f-9248-ac0b8ad147d5 2017-11-15 10:11:00   

               cart_time  
0                    NaT  
1    2017-01-26 14:44:00  
2    2017-08-20 08:31:00  
3    2017-08-20 08:49:00  
4                    NaT  
...                  ...  
2047              

#### 3. How long is your merged DataFrame?

In [57]:
n_visits = float(len(v_and_c.user_id))
print(n_visits)

2052.0


#### 4. How many of the timestamps are null for the column cart_time? What do these null rows mean?


In [58]:
visit_not_cart = float(len(v_and_c[v_and_c.cart_time.isnull()]))
print(visit_not_cart)


1652.0


In [59]:
# Means that 1652 of the visitors did not add an item to the shopping cart.

#### 5. What percent of users who visited Cool T-Shirts Inc. ended up not placing a t-shirt in their cart?


In [60]:
percentage_visit_not_cart = visit_not_cart / n_visits
print(percentage_visit_not_cart)

0.8050682261208577


In [61]:
# 80.51% of the visitors do not place a t-shirt in their cart.

#### 6. Repeat the left merge for cart and checkout and count null values. What percentage of users put items in their cart, but did not proceed to checkout?

In [62]:
c_and_ch = pd.merge(cart, checkout, how = 'left')
print(c_and_ch)

n_cart = float(len(c_and_ch.user_id))
cart_not_checkout = float(len(c_and_ch[c_and_ch.checkout_time.isnull()]))
percentage_cart_not_checkouts = cart_not_checkout / n_cart
print(percentage_cart_not_checkouts)

                                  user_id           cart_time  \
0    2be90e7c-9cca-44e0-bcc5-124b945ff168 2017-11-07 20:45:00   
1    2be90e7c-9cca-44e0-bcc5-124b945ff168 2017-11-07 20:45:00   
2    2be90e7c-9cca-44e0-bcc5-124b945ff168 2017-11-07 20:45:00   
3    4397f73f-1da3-4ab3-91af-762792e25973 2017-05-27 01:35:00   
4    a9db3d4b-0a0a-4398-a55a-ebb2c7adf663 2017-03-04 10:38:00   
..                                    ...                 ...   
597  0ea4cc68-dae4-4e35-b3e0-f0889932e1b5 2017-05-12 08:53:00   
598  20da6a89-e211-4ea9-99bb-e2e62f03d213 2017-10-12 17:34:00   
599  20da6a89-e211-4ea9-99bb-e2e62f03d213 2017-10-12 17:34:00   
600  20da6a89-e211-4ea9-99bb-e2e62f03d213 2017-10-12 17:34:00   
601  05b44764-bb83-4b08-b3ff-c6b31d4e31d3 2017-03-19 19:52:00   

          checkout_time  
0   2017-11-07 21:14:00  
1   2017-11-07 20:50:00  
2   2017-11-07 21:11:00  
3                   NaT  
4   2017-03-04 11:04:00  
..                  ...  
597 2017-05-12 09:20:00  
598 2017-10

In [63]:
# Almost 21% of the visitors put an item in their carts but do not proceed to checkout

#### 7. Merge all four steps of the funnel, in order, using a series of left merges. Save the results to the variable all_data. Examine the result using print and head.


In [64]:
all_data = visits.merge(cart, how = 'left').merge(checkout, how = 'left').merge(purchase, how = 'left')
print(all_data.head())

                                user_id          visit_time  \
0  943647ef-3682-4750-a2e1-918ba6f16188 2017-04-07 15:14:00   
1  0c3a3dd0-fb64-4eac-bf84-ba069ce409f2 2017-01-26 14:24:00   
2  6e0b2d60-4027-4d9a-babd-0e7d40859fb1 2017-08-20 08:23:00   
3  6e0b2d60-4027-4d9a-babd-0e7d40859fb1 2017-08-20 08:23:00   
4  6879527e-c5a6-4d14-b2da-50b85212b0ab 2017-11-04 18:15:00   

            cart_time       checkout_time       purchase_time  
0                 NaT                 NaT                 NaT  
1 2017-01-26 14:44:00 2017-01-26 14:54:00 2017-01-26 15:08:00  
2 2017-08-20 08:31:00                 NaT                 NaT  
3 2017-08-20 08:49:00                 NaT                 NaT  
4                 NaT                 NaT                 NaT  


####  8. What percentage of users proceeded to checkout, but did not purchase a t-shirt?

In [65]:
n_checkouts = float(len(all_data.checkout_time))
checkout_not_purchase = float(len(all_data[all_data.purchase_time.isnull()]))
percentage_checkout_not_purchases = checkout_not_purchase / n_checkouts
print(percentage_checkout_not_purchases)



0.7316885119506553


In [66]:
# 11% of the users proceeded to checkout, but did not purchase a t-shirt

#### 9. Which step of the funnel is weakest (i.e., has the highest percentage of users not completing it)? 


In [67]:
print(percentage_visit_not_cart)
print(percentage_cart_not_checkouts)
print(percentage_checkout_not_purchases)


0.8050682261208577
0.20930232558139536
0.7316885119506553


In [31]:
# The weakest step of the funnel is getting to visitors to add a t-shirt to the cart. Once the visitor decides 
# to make a purchase, the chances of completing a purchase are high. 



## Average Time to Purchase
####  10. Using the giant merged DataFrame all_data that you created, let’s calculate the average time from initial visit to final purchase. 

In [32]:
all_data['time_to_purchase'] = all_data.purchase_time - all_data.visit_time

#### 11. Examine the results.

In [33]:
print(all_data.time_to_purchase)


0           NaT
1      00:44:00
2      00:26:00
3      00:26:00
4           NaT
         ...   
3268   00:00:00
3269   00:00:00
3270   00:00:00
3271   00:00:00
3272   00:00:00
Name: time_to_purchase, Length: 3273, dtype: timedelta64[ns]


#### 12. Calculate the average time to purchase.

In [42]:
print(all_data.time_to_purchase.mean())






0 days 00:13:51.171605
3273


I see that some of the values are zero, which mean the purchase was not actually made, so they should not be taken into account when
calculating the mean. I'm substituting those for NaT:

0 days 00:13:51.171605


In [68]:
print(all_data)

                                   user_id          visit_time  \
0     943647ef-3682-4750-a2e1-918ba6f16188 2017-04-07 15:14:00   
1     0c3a3dd0-fb64-4eac-bf84-ba069ce409f2 2017-01-26 14:24:00   
2     6e0b2d60-4027-4d9a-babd-0e7d40859fb1 2017-08-20 08:23:00   
3     6e0b2d60-4027-4d9a-babd-0e7d40859fb1 2017-08-20 08:23:00   
4     6879527e-c5a6-4d14-b2da-50b85212b0ab 2017-11-04 18:15:00   
...                                    ...                 ...   
2589  33913ac2-03da-45ae-8fc3-fea39df827c6 2017-03-25 03:29:00   
2590  4f850132-b99d-4623-80e6-6e61d003577e 2017-01-08 09:57:00   
2591  f0830b9b-1f5c-4e74-b63d-3f847cc6ce70 2017-09-07 12:56:00   
2592  b01bffa7-63ba-4cd3-9d93-eb1477c23831 2017-07-20 04:37:00   
2593  0336ca81-8d68-443f-9248-ac0b8ad147d5 2017-11-15 10:11:00   

               cart_time       checkout_time       purchase_time  
0                    NaT                 NaT                 NaT  
1    2017-01-26 14:44:00 2017-01-26 14:54:00 2017-01-26 15:08:00  
2    2