# Cool T-Shirts

This is a fictitious **CodeCademy.com** project. 

Cool T-Shirts Inc. has asked you to analyze data on visits to their website. Your job is to build a funnel, which is a description of how many people continue to the next step of a multi-step process.

In this case, our funnel is going to describe the following process:

1. A user visits CoolTShirts.com
2. A user adds a t-shirt to their cart
3. A user clicks “checkout”
4. A user actually purchases a t-shirt

## 1. Importing all the files and exploring the data

In [2]:
import pandas as pd

In [3]:
visits = pd.read_csv('visits.csv', parse_dates=[1])
cart = pd.read_csv('cart.csv', parse_dates=[1])
checkout = pd.read_csv('checkout.csv', parse_dates=[1])
purchase = pd.read_csv('purchase.csv', parse_dates=[1])

We will have a look at the different dataframes we will be working with.

In [4]:
print(visits.head())
print(cart.head())
print(checkout.head())
print(purchase.head())

                                user_id          visit_time
0  943647ef-3682-4750-a2e1-918ba6f16188 2017-04-07 15:14:00
1  0c3a3dd0-fb64-4eac-bf84-ba069ce409f2 2017-01-26 14:24:00
2  6e0b2d60-4027-4d9a-babd-0e7d40859fb1 2017-08-20 08:23:00
3  6879527e-c5a6-4d14-b2da-50b85212b0ab 2017-11-04 18:15:00
4  a84327ff-5daa-4ba1-b789-d5b4caf81e96 2017-02-27 11:25:00
                                user_id           cart_time
0  2be90e7c-9cca-44e0-bcc5-124b945ff168 2017-11-07 20:45:00
1  4397f73f-1da3-4ab3-91af-762792e25973 2017-05-27 01:35:00
2  a9db3d4b-0a0a-4398-a55a-ebb2c7adf663 2017-03-04 10:38:00
3  b594862a-36c5-47d5-b818-6e9512b939b3 2017-09-27 08:22:00
4  a68a16e2-94f0-4ce8-8ce3-784af0bbb974 2017-07-26 15:48:00
                                user_id       checkout_time
0  d33bdc47-4afa-45bc-b4e4-dbe948e34c0d 2017-06-25 09:29:00
1  4ac186f0-9954-4fea-8a27-c081e428e34e 2017-04-07 20:11:00
2  3c9c78a7-124a-4b77-8d2e-e1926e011e7d 2017-07-13 11:38:00
3  89fe330a-8966-4756-8f7c-3bdbcd47279a 

## 2. From visits to cart

In [5]:
visits_to_cart = pd.merge(visits, cart, how = 'left')

visits_count = visits_to_cart.user_id.nunique()
print('Number of visits = ' + str(visits_count))

cart_time_count = visits_to_cart.cart_time.count()
print('Number of users adding to the cart = ' + str(cart_time_count))
print('Number of users only visiting = ' + str(visits_count - cart_time_count))

percentage_visit_not_cart = (visits_count - cart_time_count) / float(visits_count)
print('Percentage of users only visiting = ' + str(percentage_visit_not_cart))

Number of visits = 2000
Number of users adding to the cart = 348
Number of users only visiting = 1652
Percentage of users only visiting = 0.826


## 3. From cart to checkout

In [8]:
cart_to_checkout = pd.merge(cart, checkout, how = 'left')

checkout_time_null = len(cart_to_checkout[cart_to_checkout.checkout_time.isnull()])
percentage_cart_not_checkout = float(checkout_time_null) / cart_time_count

print('Checkout null values = ' + str(checkout_time_null))
print('Percentage of users that put items in their cart but did not proceed to checkout = ' + str(percentage_cart_not_checkout))

Checkout null values = 122
Percentage of users that put items in their cart but did not proceed to checkout = 0.3505747126436782


## 4. All data

In [9]:
all_data = visits.merge(cart, how = 'left').merge(checkout, how = 'left').merge(purchase, how = 'left')
print(all_data.head())

reached_checkout = all_data[~all_data.checkout_time.isnull()]
checkout_not_purchase = reached_checkout[reached_checkout.purchase_time.isnull()]
percentage_checkout_not_purchase = float(len(checkout_not_purchase)) / len(reached_checkout)

print('Lenght reached_checkout = ' + str(len(reached_checkout)))
print('checkout_not_purchase = ' + str(len(checkout_not_purchase)))
print('Percentage of user that reached checkout without purchasing = ' + str(percentage_checkout_not_purchase))

                                user_id          visit_time  \
0  943647ef-3682-4750-a2e1-918ba6f16188 2017-04-07 15:14:00   
1  0c3a3dd0-fb64-4eac-bf84-ba069ce409f2 2017-01-26 14:24:00   
2  6e0b2d60-4027-4d9a-babd-0e7d40859fb1 2017-08-20 08:23:00   
3  6879527e-c5a6-4d14-b2da-50b85212b0ab 2017-11-04 18:15:00   
4  a84327ff-5daa-4ba1-b789-d5b4caf81e96 2017-02-27 11:25:00   

            cart_time       checkout_time       purchase_time  
0                 NaT                 NaT                 NaT  
1 2017-01-26 14:44:00 2017-01-26 14:54:00 2017-01-26 15:08:00  
2 2017-08-20 08:31:00                 NaT                 NaT  
3                 NaT                 NaT                 NaT  
4                 NaT                 NaT                 NaT  
Lenght reached_checkout = 334
checkout_not_purchase = 82
Percentage of user that reached checkout without purchasing = 0.24550898203592814


## 5. Average time to purchase

In [13]:
all_data['time_from_visit_to_purchase'] = all_data.apply(lambda row: row['purchase_time'] - row['visit_time'], axis = 1)

avg_time_spent_purchasing = all_data.time_from_visit_to_purchase.mean()

print('The average time spent to place a purchase is ' + str(avg_time_spent_purchasing))


The average time spent to place a purchase is 0 days 00:43:12.380952380


## 6. Conclusions

In [17]:
print("{} percent of users who visited the page did not add a t-shirt to their cart".format(round(percentage_visit_not_cart *100, 2)))
print("{} percent of users who added a t-shirt to their cart did not checkout".format(round(percentage_cart_not_checkout*100, 2)))
print("{} percent of users who made it to checkout  did not purchase a shirt".format(round(percentage_checkout_not_purchase*100, 2)))

82.6 percent of users who visited the page did not add a t-shirt to their cart
35.06 percent of users who added a t-shirt to their cart did not checkout
24.55 percent of users who made it to checkout  did not purchase a shirt


*The weakest part of the funnel is clearly getting a person who visited the site to add a tshirt to their cart. Once they've added a t-shirt to their cart it is fairly likely they end up purchasing it. A suggestion could be to make the add-to-cart button more prominent on the front page.*