### Step 0
Import all the files

In [1]:
import pandas as pd

In [2]:
visits = pd.read_csv('visits.csv',
                     parse_dates=[1])
cart = pd.read_csv('cart.csv',
                   parse_dates=[1])
                   
checkout = pd.read_csv('checkout.csv',
                       parse_dates=[1])
purchase = pd.read_csv('purchase.csv',
                       parse_dates=[1])

### Step 1
Inspect the DataFrames using `print` and `head`:

- `visits` lists all of the users who have visited the website
- `cart` lists all of the users who have added a t-shirt to their cart
- `checkout` lists all of the users who have started the checkout
- `purchase` lists all of the users who have purchased a t-shirt

In [3]:
print(visits.head(5))
print(cart.head(5))
print(checkout.head(5))
print(purchase.head(5))

                                user_id          visit_time
0  943647ef-3682-4750-a2e1-918ba6f16188 2017-04-07 15:14:00
1  0c3a3dd0-fb64-4eac-bf84-ba069ce409f2 2017-01-26 14:24:00
2  6e0b2d60-4027-4d9a-babd-0e7d40859fb1 2017-08-20 08:23:00
3  6879527e-c5a6-4d14-b2da-50b85212b0ab 2017-11-04 18:15:00
4  a84327ff-5daa-4ba1-b789-d5b4caf81e96 2017-02-27 11:25:00
                                user_id           cart_time
0  2be90e7c-9cca-44e0-bcc5-124b945ff168 2017-11-07 20:45:00
1  4397f73f-1da3-4ab3-91af-762792e25973 2017-05-27 01:35:00
2  a9db3d4b-0a0a-4398-a55a-ebb2c7adf663 2017-03-04 10:38:00
3  b594862a-36c5-47d5-b818-6e9512b939b3 2017-09-27 08:22:00
4  a68a16e2-94f0-4ce8-8ce3-784af0bbb974 2017-07-26 15:48:00
                                user_id       checkout_time
0  d33bdc47-4afa-45bc-b4e4-dbe948e34c0d 2017-06-25 09:29:00
1  4ac186f0-9954-4fea-8a27-c081e428e34e 2017-04-07 20:11:00
2  3c9c78a7-124a-4b77-8d2e-e1926e011e7d 2017-07-13 11:38:00
3  89fe330a-8966-4756-8f7c-3bdbcd47279a 

### Step 2
Combine `visits` and `cart` using a *left merge*.

In [33]:
visits_cart = visits.merge(cart, how = 'left')
visits_cart.head()

Unnamed: 0,user_id,visit_time,cart_time
0,943647ef-3682-4750-a2e1-918ba6f16188,2017-04-07 15:14:00,NaT
1,0c3a3dd0-fb64-4eac-bf84-ba069ce409f2,2017-01-26 14:24:00,2017-01-26 14:44:00
2,6e0b2d60-4027-4d9a-babd-0e7d40859fb1,2017-08-20 08:23:00,2017-08-20 08:31:00
3,6879527e-c5a6-4d14-b2da-50b85212b0ab,2017-11-04 18:15:00,NaT
4,a84327ff-5daa-4ba1-b789-d5b4caf81e96,2017-02-27 11:25:00,NaT


### Step 3
How long is your merged DataFrame?

In [34]:
total_visits = len(visits_cart)
print("The merged DataFrame \'visits_cart\' contains {} records.".format(total_visits))
print("This DataFrame shows how visits (2000 rows) has been left merged with cart (348 rows), therefore some of the rows with \"visit_time\" do not have a corresponding \"cart_time\".")

The merged DataFrame 'visits_cart' contains 2000 records.
This DataFrame shows how visits (2000 rows) has been left merged with cart (348 rows), therefore some of the rows with "visit_time" do not have a corresponding "cart_time".


### Step 4
How many of the timestamps are `null` for the column `cart_time`?

What do these null rows mean?

In [35]:
visits_cart['visited_cart'] = ~visits_cart['cart_time'].isna()
cart_visit_count = visits_cart.groupby('visited_cart').user_id.count().reset_index()
print(cart_visit_count)
print("Out of 2000 total visits, the cart was only visited {} times.".format(cart_visit_count[cart_visit_count['visited_cart'] == True]['user_id'].tolist()))

   visited_cart  user_id
0         False     1652
1          True      348
Out of 2000 total visits, the cart was only visited [348] times.


### Step 5
What percent of users who visited Cool T-Shirts Inc. ended up not placing a t-shirt in their cart?

**Note:** To calculate percentages, it will be helpful to turn either the numerator or the denominator into a *float*, by using `float()`, with the number to convert passed in as input. Otherwise, Python will use integer division, which truncates decimal points.

In [7]:
cart_visit_count['percent'] = cart_visit_count.apply(lambda row: row['user_id'] / 2000 * 100, axis = 1)
print(cart_visit_count)
print("{}% of customers who visited Cool T-Shirts Inc. did not plcae a t-shirt in their cart.".format(cart_visit_count[cart_visit_count['visited_cart'] == False]['percent'].tolist()))

   visited_cart  user_id  percent
0         False     1652     82.6
1          True      348     17.4
[82.6]% of customers who visited Cool T-Shirts Inc. did not plcae a t-shirt in their cart.


### Step 6
Repeat the left merge for `cart` and `checkout` and count `null` values. What percentage of users put items in their cart, but did not proceed to checkout?

In [38]:
visits_cart_checkout = visits_cart.merge(checkout, how = 'left')
visits_cart_checkout['visited_cart'] = ~visits_cart_checkout['cart_time'].isna()
visits_cart_checkout['checked_out'] = ~visits_cart_checkout['checkout_time'].isna()
visits_cart_checkout.head()

Unnamed: 0,user_id,visit_time,cart_time,visited_cart,checkout_time,checked_out
0,943647ef-3682-4750-a2e1-918ba6f16188,2017-04-07 15:14:00,NaT,False,NaT,False
1,0c3a3dd0-fb64-4eac-bf84-ba069ce409f2,2017-01-26 14:24:00,2017-01-26 14:44:00,True,2017-01-26 14:54:00,True
2,6e0b2d60-4027-4d9a-babd-0e7d40859fb1,2017-08-20 08:23:00,2017-08-20 08:31:00,True,NaT,False
3,6879527e-c5a6-4d14-b2da-50b85212b0ab,2017-11-04 18:15:00,NaT,False,NaT,False
4,a84327ff-5daa-4ba1-b789-d5b4caf81e96,2017-02-27 11:25:00,NaT,False,NaT,False


In [39]:
cart_checkout_count = visits_cart_checkout[visits_cart_checkout['visited_cart'] == True].groupby(['visited_cart', 'checked_out']).user_id.count().reset_index()
cart_checkout_count['percent'] = cart_checkout_count.apply(lambda row: row['user_id'] / 348 * 100, axis = 1)
print(cart_checkout_count)
print("{}% of customers put a shirt in their cart but did not proceed to checkout.".format(cart_checkout_count[cart_checkout_count['checked_out'] == False]['percent'].tolist()))

   visited_cart  checked_out  user_id    percent
0          True        False      122  35.057471
1          True         True      226  64.942529
[35.05747126436782]% of customers put a shirt in their cart but did not proceed to checkout.


### Step 7
Merge all four steps of the funnel, in order, using a series of *left merges*. Save the results to the variable `all_data`.

Examine the result using `print` and `head`.

In [48]:
all_data = visits_cart_checkout.merge(purchase, how = 'left')
all_data['visited_cart'] = ~all_data['cart_time'].isna()
all_data['checked_out'] = ~all_data['checkout_time'].isna()
all_data['purchased'] = ~all_data['purchase_time'].isna()
all_data

Unnamed: 0,user_id,visit_time,cart_time,visited_cart,checkout_time,checked_out,purchase_time,purchased
0,943647ef-3682-4750-a2e1-918ba6f16188,2017-04-07 15:14:00,NaT,False,NaT,False,NaT,False
1,0c3a3dd0-fb64-4eac-bf84-ba069ce409f2,2017-01-26 14:24:00,2017-01-26 14:44:00,True,2017-01-26 14:54:00,True,2017-01-26 15:08:00,True
2,6e0b2d60-4027-4d9a-babd-0e7d40859fb1,2017-08-20 08:23:00,2017-08-20 08:31:00,True,NaT,False,NaT,False
3,6879527e-c5a6-4d14-b2da-50b85212b0ab,2017-11-04 18:15:00,NaT,False,NaT,False,NaT,False
4,a84327ff-5daa-4ba1-b789-d5b4caf81e96,2017-02-27 11:25:00,NaT,False,NaT,False,NaT,False
...,...,...,...,...,...,...,...,...
2103,33913ac2-03da-45ae-8fc3-fea39df827c6,2017-03-25 03:29:00,NaT,False,NaT,False,NaT,False
2104,4f850132-b99d-4623-80e6-6e61d003577e,2017-01-08 09:57:00,NaT,False,NaT,False,NaT,False
2105,f0830b9b-1f5c-4e74-b63d-3f847cc6ce70,2017-09-07 12:56:00,NaT,False,NaT,False,NaT,False
2106,b01bffa7-63ba-4cd3-9d93-eb1477c23831,2017-07-20 04:37:00,NaT,False,NaT,False,NaT,False


### Step 8
What percentage of users proceeded to checkout, but did not purchase a t-shirt?

In [50]:
checkout_purchase_count = all_data[all_data['checked_out'] == True].groupby(['checked_out', 'purchased']).user_id.nunique().reset_index()
checkout_purchase_count['percent'] = checkout_purchase_count.apply(lambda row: row['user_id'] / 226 * 100, axis = 1)
print(checkout_purchase_count)
print("{}% of customers put a shirt in their cart but did not proceed to checkout.".format(checkout_purchase_count[checkout_purchase_count['purchased'] == False]['percent'].tolist()))

   checked_out  purchased  user_id    percent
0         True      False       82  36.283186
1         True       True      144  63.716814
[36.283185840707965]% of customers put a shirt in their cart but did not proceed to checkout.


### Step 9
Which step of the funnel is weakest (i.e., has the highest percentage of users not completing it)?

How might Cool T-Shirts Inc. change their website to fix this problem?

In [51]:
print("The weakest part of the sales funnel is converting visits to cart additions. To optimize this step of the funnel, try making the \"Add to cart\" button more prominent, or reducing the number of options to select on each item.")

The weakest part of the sales funnel is converting visits to cart additions. To optimize this step of the funnel, try making the "Add to cart" button more prominent, or reducing the number of options to select on each item.


### Step 10
Using the giant merged DataFrame `all_data` that you created, let’s calculate the average time from initial visit to final purchase. Add a column that is the difference between purchase_time and visit_time.

In [53]:
all_data['time_diff'] = all_data.purchase_time - all_data.visit_time

### Step 11
Examine the results by printing the new column to the screen.

In [55]:
print(all_data.time_diff)

0                  NaT
1      0 days 00:44:00
2                  NaT
3                  NaT
4                  NaT
             ...      
2103               NaT
2104               NaT
2105               NaT
2106               NaT
2107               NaT
Name: time_diff, Length: 2108, dtype: timedelta64[ns]


### Step 12
Calculate the average time to purchase by applying the .mean() function to your new column.

In [56]:
print(all_data.time_diff.mean())

0 days 00:43:12.380952380
