## Superstore Challenge

The superstore dataset is a collection of ~10,000 anonymized transactions from an e-commerce store that occurred between the years 2014 and 2017.  

Your task here is to make a 'meta' dataset that aggregates information about what individual customers have done over their lifetime of interactions with the store.  

When you're done you'll have a list of each unique customer in the store (going by their name), with information like their lifetime customer value, number of orders, and their ordering behavior measured over different lags of time.  

This will be a helpful exercise to re-inforce some of the concepts discussed from class 6, like grouping and date offsets.

In [284]:
import pandas as pd
import numpy as np

initial=pd.read_excel("/Users/bianca/Documents/GitHub/DAT-10-14/class material/Unit2/data/superstore.xls")

**Column 1:** Create a column that lists every customers lifetime customer value.

In [285]:
# create ltv series with customer id as the index label 
store = initial.groupby(['Customer Name','Customer ID'])['Sales'].sum().to_frame().rename({'Sales':'Lifetime Value'}, axis=1)


**Column 2:** Create a column that lists the length of time each customer has been with the store.  This is defined as the number of days between when they made their first purchase and today.

In [286]:
first_purchase = initial.groupby('Customer Name')['Order Date'].min().to_frame().rename({'Order Date':'First Order Date'},axis=1)

today = np.datetime64('now')

store = store.merge(first_purchase, on='Customer Name', how='left')

initial=initial.merge(first_purchase, on='Customer Name', how='left')

In [287]:
store['Lifetime'] = today - store['First Order Date'] 

**Column 3:** Create a column that lists the total number of purchases for each customer.

In [288]:
total_purchases = initial.groupby('Customer Name')['Order ID'].count().to_frame().rename({'Order ID': 'Total Purchases'}, axis=1)

store = store.merge(total_purchases, on='Customer Name', how='left')


**Column 4:** Create a column that assigns customers to a cohort.  

A customers cohort is determined by when they made their first purchase, and every year in your dataset has two cohorts:  the first half of the year and the second half.  

For example, if someone made their first purchase in March of 2017, their cohort would be 2017-1, or something similar.  Someone who purchased in September of 2017 would be 2017-2, and so on.

In [289]:
store['Purchase Cohort'] = store['First Order Date'].map(lambda x: 100*x.year + x.month)


**Column 5:** Create a column that lists whether or not they're a repeat customer.  This means they've made more than one order.

In [290]:
store['Repeat Customer'] = np.where(store['Total Purchases']>1, True, False)


**Column 6:** We want to find out what type of customer each person is.  To do this, we want to find which value of 'Segment' occurred most frequently for every single customer.  Ie, the modal value of the 'Segment' column for each customer.

In [291]:
segment_type = initial.groupby('Customer Name')['Segment'].agg(pd.Series.mode).to_frame()

store = store.merge(segment_type, on ='Customer Name')

**Column 7:** Create a column that is the date *3 months after they made their first purchase*.

In [292]:
three_mo_offset = pd.DateOffset(months=3)

#what I originally had
#store['Three Months Later'] = store['First Order Date']+three_mo_offset

initial['Three Months Later']= initial['Order Date'].min()+three_mo_offset

**Column 8:** Make a column that represents one year after they made their first purchase.  Add this to your initial dataframe as well.

In [293]:
one_year_offset = pd.DateOffset(months=12)

#store['One Year Later'] = store['First Order Date']+one_year_offset

initial['One Year Later']= initial['Order Date'].min()+one_year_offset

**Column 9:** Make a column that determines whether or not a customer made a second purchase within 90 days of their first purchase.

In [294]:
initial['Purchase Three Month'] = initial['Order Date'].between(initial['First Order Date'], initial['Three Months Later'],inclusive=False)

In [295]:
three_month_repeat = initial.groupby('Customer Name')['Purchase Three Month'].any()

store['Repeat Three Month'] = three_month_repeat

**Column 10:** Make a column that counts how many items each customer made with in their first year of purchase.

In [296]:
initial['First Year Orders']=initial['Order Date'].between(initial['First Order Date'], initial['One Year Later'])

In [297]:
first_year_orders = (initial.groupby(['Customer Name', 'Order ID'])['First Year Orders'].sum()>0).sum(level='Customer Name').to_frame().rename({'First Year Orders':'First Year Purchases'},axis=1)

store = store.merge(first_year_orders, on='Customer Name', how='left')

**Column 11:** Make a column that sums up the total value of sales for each customer one year after they made their purchase.

In [298]:
store['Repeat Three Month'].value_counts()

False    782
True      11
Name: Repeat Three Month, dtype: int64

**Hint:** For some of these columns, you'll need to break them down into a few steps.  It's okay to make  helper columns on your original dataset that make it easier for you to calculate the final result.

### Questions:

Now that you've made these columns, try and answer the following questions.

**What percentage of customers make a second purchase within 3 months after their first one?  How does this differ by customer segment?**

In [299]:
store['Repeat Three Month'].mean()

0.013871374527112233

In [273]:
store.groupby('Segment')['Repeat Three Month'].mean()

Segment
Consumer       0.019560
Corporate      0.008475
Home Office    0.006757
Name: Repeat Three Month, dtype: float64

**How Has the 3-Month Repurchasing Rate Been Changing Across Cohorts?**

**What impact does a second order within 3 months of 1st purchase have on lifetime customer value?  Does this effect hold true for each customer segment?**

**What's the average expected sales value for a customer one year after their first purchase? How has this changed across cohorts?**

**How much does lifetime customer value differ across the different customer segments?**