## Superstore Challenge

The superstore dataset is a collection of ~10,000 anonymized transactions from an e-commerce store that occurred between the years 2014 and 2017.  

Your task here is to make a 'meta' dataset that aggregates information about what individual customers have done over their lifetime of interactions with the store.  

When you're done you'll have a list of each unique customer in the store (going by their name), with information like their lifetime customer value, number of orders, and their ordering behavior measured over different lags of time.  

This will be a helpful exercise to re-inforce some of the concepts discussed from class 6, like grouping and date offsets.

In [2]:
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt

In [10]:
df1 = pd.read_excel('/Users/devonbancroft/Desktop/Devon-GA-DAT-10-14/data/superstore.xls')

In [11]:
df1.head(1)

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country,City,...,Postal Code,Region,Product ID,Category,Sub-Category,Product Name,Sales,Quantity,Discount,Profit
0,1,CA-2016-152156,2016-11-08,2016-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,42420,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96,2,0.0,41.9136


**Column 1:** Create a column that lists every customers lifetime customer value.

In [12]:
df = df1.groupby('Customer Name').Sales.sum().to_frame().rename({'Sales':'LTV'}, axis=1)

**Column 2:** Create a column that lists the length of time each customer has been with the store.  This is defined as the number of days between when they made their first purchase and today.

In [17]:
earliest_time = df1.groupby('Customer Name')['Order Date'].min()
length_of_time = np.datetime64('now') - earliest_time

df = df.merge(length_of_time.to_frame().rename({'Order Date':'Experience'}, axis=1), on='Customer Name', how='left')

**Column 3:** Create a column that lists the total number of purchases for each customer.

**Column 4:** Create a column that assigns customers to a cohort.  

A customers cohort is determined by when they made their first purchase, and every year in your dataset has two cohorts:  the first half of the year and the second half.  

For example, if someone made their first purchase in March of 2017, their cohort would be 2017-1, or something similar.  Someone who purchased in September of 2017 would be 2017-2, and so on.

**Column 5:** Create a column that lists whether or not they're a repeat customer.  This means they've made more than one order.

**Column 6:** We want to find out what type of customer each person is.  To do this, we want to find which value of 'Segment' occurred most frequently for every single customer.  Ie, the modal value of the 'Segment' column for each customer.

**Column 7:** Create a column that is the date *3 months after they made their first purchase*.

**Column 8:** Make a column that represents one year after they made their first purchase.  Add this to your initial dataframe as well.

**Column 9:** Make a column that determines whether or not a customer made a second purchase within 90 days of their first purchase.

**Column 10:** Make a column that counts how many items each customer made with in their first year of purchase.

**Column 11:** Make a column that sums up the total value of sales for each customer one year after they made their purchase.

**Hint:** For some of these columns, you'll need to break them down into a few steps.  It's okay to make  helper columns on your original dataset that make it easier for you to calculate the final result.

### Questions:

Now that you've made these columns, try and answer the following questions.

**What percentage of customers make a second purchase within 3 months after their first one?  How does this differ by customer segment?**

**How Has the 3-Month Repurchasing Rate Been Changing Across Cohorts?**

**What impact does a second order within 3 months of 1st purchase have on lifetime customer value?  Does this effect hold true for each customer segment?**

**What's the average expected sales value for a customer one year after their first purchase? How has this changed across cohorts?**

**How much does lifetime customer value differ across the different customer segments?**