 ## Cool T-Shirts Page Visits Funnel
 
 In this project, we will pretend to be the data analyst at Cool T-Shirts Inc. and analyze data on visits to their website. We will build a funnel, which is a description of how many people continue to the next step of a multi-step process.
 
In this case, our funnel is going to describe the following process:

1. A user visits CoolTShirts.com

2. A user adds a t-shirt to their cart

3. A user clicks “checkout"

4. A user actually purchases a t-shirt

In [1]:
# Let's import pandas first
import pandas as pd

In [2]:
# let's now inspect all the dataframes which we have.
visits = pd.read_csv( r"C:\Users\amanp\OneDrive\Desktop\visits.csv", parse_dates=[1])

cart = pd.read_csv( r"C:\Users\amanp\OneDrive\Desktop\cart.csv", parse_dates=[1])

checkout = pd.read_csv( r"C:\Users\amanp\OneDrive\Desktop\checkout.csv", parse_dates=[1])

purchase = pd.read_csv( r"C:\Users\amanp\OneDrive\Desktop\purchase.csv", parse_dates=[1])

In [3]:
visits.head()

Unnamed: 0,user_id,visit_time
0,943647ef-3682-4750-a2e1-918ba6f16188,2017-04-07 15:14:00
1,0c3a3dd0-fb64-4eac-bf84-ba069ce409f2,2017-01-26 14:24:00
2,6e0b2d60-4027-4d9a-babd-0e7d40859fb1,2017-08-20 08:23:00
3,6879527e-c5a6-4d14-b2da-50b85212b0ab,2017-11-04 18:15:00
4,a84327ff-5daa-4ba1-b789-d5b4caf81e96,2017-02-27 11:25:00


In [4]:
cart.head()

Unnamed: 0,user_id,cart_time
0,2be90e7c-9cca-44e0-bcc5-124b945ff168,2017-11-07 20:45:00
1,4397f73f-1da3-4ab3-91af-762792e25973,2017-05-27 01:35:00
2,a9db3d4b-0a0a-4398-a55a-ebb2c7adf663,2017-03-04 10:38:00
3,b594862a-36c5-47d5-b818-6e9512b939b3,2017-09-27 08:22:00
4,a68a16e2-94f0-4ce8-8ce3-784af0bbb974,2017-07-26 15:48:00


In [5]:
checkout.head()

Unnamed: 0,user_id,checkout_time
0,d33bdc47-4afa-45bc-b4e4-dbe948e34c0d,2017-06-25 09:29:00
1,4ac186f0-9954-4fea-8a27-c081e428e34e,2017-04-07 20:11:00
2,3c9c78a7-124a-4b77-8d2e-e1926e011e7d,2017-07-13 11:38:00
3,89fe330a-8966-4756-8f7c-3bdbcd47279a,2017-04-20 16:15:00
4,3ccdaf69-2d30-40de-b083-51372881aedd,2017-01-08 20:52:00


In [6]:
purchase.head()

Unnamed: 0,user_id,purchase_time
0,4b44ace4-2721-47a0-b24b-15fbfa2abf85,2017-05-11 04:25:00
1,02e684ae-a448-408f-a9ff-dcb4a5c99aac,2017-09-05 08:45:00
2,4b4bc391-749e-4b90-ab8f-4f6e3c84d6dc,2017-11-20 20:49:00
3,a5dbb25f-3c36-4103-9030-9f7c6241cd8d,2017-01-22 15:18:00
4,46a3186d-7f5a-4ab9-87af-84d05bfd4867,2017-06-11 11:32:00


#### The percent of users who visited Cool T-Shirts Inc. ended up not placing a t-shirt in their cart.

In [29]:
# let's Combine visits and cart using a left merge.

visit_cart= pd.merge(visits, cart, how='left')

visit_cart.head()

Unnamed: 0,user_id,visit_time,cart_time
0,943647ef-3682-4750-a2e1-918ba6f16188,2017-04-07 15:14:00,NaT
1,0c3a3dd0-fb64-4eac-bf84-ba069ce409f2,2017-01-26 14:24:00,2017-01-26 14:44:00
2,6e0b2d60-4027-4d9a-babd-0e7d40859fb1,2017-08-20 08:23:00,2017-08-20 08:31:00
3,6879527e-c5a6-4d14-b2da-50b85212b0ab,2017-11-04 18:15:00,NaT
4,a84327ff-5daa-4ba1-b789-d5b4caf81e96,2017-02-27 11:25:00,NaT


In [30]:
# let's check the length of our new dataframe

print(len(visit_cart))


2000


In [33]:
# let's count the total number of null timestamps
cart_null= visit_cart[visit_cart.cart_time.isnull()]
print(len(cart_null))

1652


In [34]:
# let's calculate the percent of users who visited Cool T-Shirts Inc. ended up not placing a t-shirt in their cart.
# to calculate percentages, it will be helpful to turn either the numerator or the denominator into a float, by using float(),
# with the number to convert passed in as input. Otherwise, Python will use integer division, which truncates decimal points.

print(float(1652*100/2000))

# If a row of the merged DataFrame has cart_time equal to null, then that user visited the website, but did not place a t-shirt
# in their cart.

82.6


So, out of the total number of users who visited the Cool T-Shirts website, 82.6 percent did not place a t-shirt in their cart.

#### The percentage of users who put items in their cart, but did not proceed to checkout.

In [7]:
# let's do  left merge for cart and checkout and count null values and calculate the percentage of users who put items in their cart,
# but did not proceed to checkout.

cart_checkout = pd.merge(cart, checkout, how='left')
cart_checkout.head()

Unnamed: 0,user_id,cart_time,checkout_time
0,2be90e7c-9cca-44e0-bcc5-124b945ff168,2017-11-07 20:45:00,2017-11-07 21:14:00
1,2be90e7c-9cca-44e0-bcc5-124b945ff168,2017-11-07 20:45:00,2017-11-07 20:50:00
2,2be90e7c-9cca-44e0-bcc5-124b945ff168,2017-11-07 20:45:00,2017-11-07 21:11:00
3,4397f73f-1da3-4ab3-91af-762792e25973,2017-05-27 01:35:00,NaT
4,a9db3d4b-0a0a-4398-a55a-ebb2c7adf663,2017-03-04 10:38:00,2017-03-04 11:04:00


In [39]:
checkout_null = cart_checkout[cart_checkout.checkout_time.isnull()]
print(len(checkout_null))
print(len(cart_checkout))

122
482


In [13]:
print(float(122*100/482))

25.311203319502074


So, of those users who put items in their carts, 25.3 percent did not proceed to checkout.

#### The percentage of users who proceeded to checkout, but did not purchase a t-shirt.

In [8]:
# Let's make a giant table all_data by merging all four steps of the funnel, in order, using a series of left merges and 
# print the first five rows.

all_data=visits.merge(cart, how='left').merge(checkout, how='left').merge(purchase, how='left')
all_data.head()

Unnamed: 0,user_id,visit_time,cart_time,checkout_time,purchase_time
0,943647ef-3682-4750-a2e1-918ba6f16188,2017-04-07 15:14:00,NaT,NaT,NaT
1,0c3a3dd0-fb64-4eac-bf84-ba069ce409f2,2017-01-26 14:24:00,2017-01-26 14:44:00,2017-01-26 14:54:00,2017-01-26 15:08:00
2,6e0b2d60-4027-4d9a-babd-0e7d40859fb1,2017-08-20 08:23:00,2017-08-20 08:31:00,NaT,NaT
3,6879527e-c5a6-4d14-b2da-50b85212b0ab,2017-11-04 18:15:00,NaT,NaT,NaT
4,a84327ff-5daa-4ba1-b789-d5b4caf81e96,2017-02-27 11:25:00,NaT,NaT,NaT


In [9]:
# Now let's calculate the percentage of users who proceeded to checkout, but did not purchase a t-shirt.
# to calculate that, we will first find the total null values in the checkout_time column as well as purchase_time column and 
# then subtract the numbers to get the number of users who proceeded to checkout but had null purchase_time.

purchase_null = all_data[all_data.purchase_time.isnull()]
checkout_null = all_data[all_data.checkout_time.isnull()]
checkout_but_not_purchase = len(purchase_null) - len(checkout_null)
print(checkout_but_not_purchase)

101


In [10]:
# let's calculate the percentage 
total = len(all_data) - len(checkout_null)
percentage_no_purchase = (float(101* 100)/total)
print(percentage_no_purchase)

16.889632107023413


So, the percentage of users who proceeded to checkout but did not purchase a t-shirt is 16.89 percent.

In [11]:
# Using the giant merged DataFrame all_data, let’s calculate the average time from initial visit to final purchase.
all_data['time_to_purchase'] = all_data.purchase_time - all_data.visit_time
all_data.time_to_purchase

0                  NaT
1      0 days 00:44:00
2                  NaT
3                  NaT
4                  NaT
             ...      
2367               NaT
2368               NaT
2369               NaT
2370               NaT
2371               NaT
Name: time_to_purchase, Length: 2372, dtype: timedelta64[ns]

In [22]:
print(all_data.time_to_purchase.mean())

0 days 00:43:53.360160965


So, the average time a user stay on the Cool t-shirts website from initial visit to final purchase is around 43 minutes.

#### Conclusion:
From the above calculations we have made the following funnel:
1. The percent of users who visited Cool T-Shirts Inc. ended up not placing a t-shirt in their cart are around 82.6%
2. The percentage of users who put items in their cart, but did not proceed to checkout are 25.3%
3. The percentage of users who proceeded to checkout but did not purchase a t-shirt is 16.89%
4. The average time a user stay on the Cool t-shirts website from initial visit to final purchase is around 43 minutes.
