# Page Visits Funnel
Cool T-Shirts Inc. has asked you to analyze data on visits to their website. Your job is to build a funnel, which is a description of how many people continue to the next step of a multi-step process.

In this case, our funnel is going to describe the following process:

1. A user visits CoolTShirts.com
2. A user adds a t-shirt to their cart
3. A user clicks “checkout”
4. A user actually purchases a t-shirt

## Funnel for Cool T-Shirts Inc.

### Task 1
Inspect the DataFrames using `print` and `head`:

* `visits` lists all of the users who have visited the website
* `cart` lists all of the users who have added a t-shirt to their cart
* `checkout` lists all of the users who have started the checkout
* `purchase` lists all of the users who have purchased a t-shirt

In [13]:
import pandas as pd

visits = pd.read_csv('visits.csv',
                     parse_dates=[1])
cart = pd.read_csv('cart.csv',
                   parse_dates=[1])
checkout = pd.read_csv('checkout.csv',
                       parse_dates=[1])
purchase = pd.read_csv('purchase.csv',
                       parse_dates=[1])

print(f'{visits.head(1)}\n{cart.head(1)}\n{checkout.head(1)}\n{purchase.head(1)}')

                                user_id          visit_time
0  943647ef-3682-4750-a2e1-918ba6f16188 2017-04-07 15:14:00
                                user_id           cart_time
0  2be90e7c-9cca-44e0-bcc5-124b945ff168 2017-11-07 20:45:00
                                user_id       checkout_time
0  d33bdc47-4afa-45bc-b4e4-dbe948e34c0d 2017-06-25 09:29:00
                                user_id       purchase_time
0  4b44ace4-2721-47a0-b24b-15fbfa2abf85 2017-05-11 04:25:00


### Task 2
Combine `visits` and `cart` using a _left merge_.

In [17]:
visits_cart = pd.merge(visits,cart,how='left')          
print(visits_cart.head(1))

                                user_id          visit_time cart_time
0  943647ef-3682-4750-a2e1-918ba6f16188 2017-04-07 15:14:00       NaT


### Task 3
How long is your merged DataFrame?

In [20]:
print(len(visits_cart))
#The same length as the visits dataframe.

2000


### Task 4
How many of the timestamps are `null` for the column `cart_time`?

What do these null rows mean?

In [28]:
visits_cart_null = visits_cart.cart_time.isnull().sum()
print(visits_cart_null)

1652


### Task 5
What percent of users who visited Cool T-Shirts Inc. ended up not placing a t-shirt in their cart?

**Note:** To calculate percentages, it will be helpful to turn either the numerator or the denominator into a float, by using `float()`, with the number to convert passed in as input. Otherwise, Python will use integer division, which truncates decimal points.

In [32]:
print(f'\nThe percentage of users who did not place a t-shirt in their cart after visiting the site is {visits_cart_null*100/float(len(visits))}%')


The percentage of users who did not place a t-shirt in their cart after visiting the site is 82.6%


### Task 6
Repeat the left merge for `cart` and `checkout` and count `null` values. What percentage of users put items in their cart, but did not proceed to checkout?

In [36]:
cart_checkout = pd.merge(cart,checkout,how='left')
cart_checkout_null = len(cart_checkout.loc[cart_checkout.checkout_time.isnull()])
print(f'\nThe percentage of users who did not checkout their cart after placing items in it is {cart_checkout_null*100/float(len(cart))}%\n')


The percentage of users who did not checkout their cart after placing items in it is 35.05747126436781%



### Task 7
Merge all four steps of the funnel, in order, using a series of _left merges_. Save the results to the variable `all_data`.

Examine the result using `print` and `head`.

In [39]:
all_data = visits.merge(cart, how ='left').merge(checkout, how='left').merge(purchase, how='left')
print(all_data.head(1))

                                user_id          visit_time cart_time  \
0  943647ef-3682-4750-a2e1-918ba6f16188 2017-04-07 15:14:00       NaT   

  checkout_time purchase_time  
0           NaT           NaT  


### Task 8
What percentage of users proceeded to checkout, but did not purchase a t-shirt?

In [44]:
checkout_notnull = all_data.loc[all_data.checkout_time.notnull()]
checkout_nunique_id = checkout_notnull.user_id.nunique()
purhcase_notnull = all_data.loc[all_data.purchase_time.notnull()]
purchase_nunique_id = purhcase_notnull.user_id.nunique()
print(f'\nThe percentage of users who proceeded to checkout but did not purchase a t-shirt is {(checkout_nunique_id-purchase_nunique_id)*100/float(checkout_nunique_id)}%')


The percentage of users who proceeded to checkout but did not purchase a t-shirt is 36.283185840707965%


### Task 9
Which step of the funnel is weakest (i.e., has the highest percentage of users not completing it)?

How might Cool T-Shirts Inc. change their website to fix this problem?

_The visits step of the funnel is the weakest with 82.6% of users not placing an item in their cart. This can be improved by giving the website page a face lift, making sure it is easy for users to find the cart, making sure the layout and colours of the site are visually engaging and perhaps doing an analysis to see what items users most like to purchase and pushing to have those be the first items they see when logging on, ensuring a higher level of customer engagement._

## Average Time to Purchase

### Task 10
Using the giant merged DataFrame `all_data` that you created, let’s calculate the average time from initial visit to final purchase. Add a column that is the difference between purchase_time and visit_time.

In [50]:
all_data['visit_to_purchase_time'] = all_data.purchase_time - all_data.visit_time

### Task 11
Examine the results by printing the new column to the screen.

In [53]:
print(all_data)

                                   user_id          visit_time  \
0     943647ef-3682-4750-a2e1-918ba6f16188 2017-04-07 15:14:00   
1     0c3a3dd0-fb64-4eac-bf84-ba069ce409f2 2017-01-26 14:24:00   
2     6e0b2d60-4027-4d9a-babd-0e7d40859fb1 2017-08-20 08:23:00   
3     6879527e-c5a6-4d14-b2da-50b85212b0ab 2017-11-04 18:15:00   
4     a84327ff-5daa-4ba1-b789-d5b4caf81e96 2017-02-27 11:25:00   
...                                    ...                 ...   
2103  33913ac2-03da-45ae-8fc3-fea39df827c6 2017-03-25 03:29:00   
2104  4f850132-b99d-4623-80e6-6e61d003577e 2017-01-08 09:57:00   
2105  f0830b9b-1f5c-4e74-b63d-3f847cc6ce70 2017-09-07 12:56:00   
2106  b01bffa7-63ba-4cd3-9d93-eb1477c23831 2017-07-20 04:37:00   
2107  0336ca81-8d68-443f-9248-ac0b8ad147d5 2017-11-15 10:11:00   

               cart_time       checkout_time       purchase_time  \
0                    NaT                 NaT                 NaT   
1    2017-01-26 14:44:00 2017-01-26 14:54:00 2017-01-26 15:08:00   
2  

### Task 12
Calculate the average time to purchase by applying the `.mean()` function to your new column.

In [56]:
print(f'Average time to make a purchase is {all_data.visit_to_purchase_time.mean()}')

Average time to make a purchase is 0 days 00:43:12.380952380
