Let's do some model development.

We're trying to prod people to spend money. Assuming no long term effects like: retention rates, customer annoyance, long term habit building, customer satisfaction. So we're trying to prod them to spend money over the short term.

Business scenarios:

- Assuming no transaction history built into model.
    - New customer, no demo info, what to offer them.
        - Basically no info at all. Offer aggregate best, or in model solely with customer length.
    - New customer, demo info, what to offer them.
        - Use model based on age, gender, income, possibly in model with customer length.
    - Existing customer, no demo info, what to offer them.
        - Use model based on customer length. Possibly by year as bin.
    - Existing customer, demo info, what to offer them.
        - Use model based on age, gender, income, customer length.
        
So we're looking for a way to pick out which offer to give a customer.

In our data, customers are only exposed to a maximum of 6 offers, with a median of 4 unique offers.


In [27]:
# Imports
import numpy as np
import pandas as pd

# Sklearn
from sklearn.model_selection import train_test_split

# Sometimes use display instead of print
from IPython.display import display

# debugging
from IPython.core.debugger import set_trace

In [3]:
# Read the cleaned data
portfolio = pd.read_csv('./data/portfolio_clean.csv')
profile = pd.read_csv('./data/profile_clean.csv')
transcript = pd.read_csv('./data/transcript_clean.csv')

In [4]:
display(portfolio.head())
display(profile.head())
display(transcript.head())

Unnamed: 0,offer_id,web,email,mobile,social,offer_type,duration,difficulty,reward
0,1,0,1,1,1,bogo,7,10,10
1,2,1,1,1,1,bogo,5,10,10
2,3,1,1,1,0,informational,4,0,0
3,4,1,1,1,0,bogo,7,5,5
4,5,1,1,0,0,discount,10,20,5


Unnamed: 0,customer_id,gender,age,income,became_member_on
0,1,,,,2017-02-12
1,2,F,55.0,112000.0,2017-07-15
2,3,,,,2018-07-12
3,4,F,75.0,100000.0,2017-05-09
4,5,,,,2017-08-04


Unnamed: 0,customer_id,time,event,amount,reward,offer_id
0,4,0,offer_received,,,4.0
1,5,0,offer_received,,,5.0
2,6,0,offer_received,,,10.0
3,7,0,offer_received,,,7.0
4,8,0,offer_received,,,2.0


How to define success?

Base line behaviour

Split by customers? yes.

In [5]:
# Merge everything first.
df = transcript.merge(profile, how='left', on='customer_id').merge(portfolio, how='left', on='offer_id')

In [23]:
df = df.rename(columns={'reward_x':'reward_transaction', 'reward_y':'offer_reward'})

In [24]:
df.head()

Unnamed: 0,customer_id,time,event,amount,reward_transaction,offer_id,gender,age,income,became_member_on,web,email,mobile,social,offer_type,duration,difficulty,offer_reward
0,4,0,offer_received,,,4.0,F,75.0,100000.0,2017-05-09,1.0,1.0,1.0,0.0,bogo,7.0,5.0,5.0
1,5,0,offer_received,,,5.0,,,,2017-08-04,1.0,1.0,0.0,0.0,discount,10.0,20.0,5.0
2,6,0,offer_received,,,10.0,M,68.0,70000.0,2018-04-26,1.0,1.0,1.0,0.0,discount,7.0,10.0,2.0
3,7,0,offer_received,,,7.0,,,,2017-09-25,1.0,1.0,1.0,1.0,discount,10.0,10.0,2.0
4,8,0,offer_received,,,2.0,,,,2017-10-02,1.0,1.0,1.0,1.0,bogo,5.0,10.0,10.0


In [38]:
# A list of individual df's from grouping by customer id.
train_customers, test_customers = train_test_split([e[1] for e in df.groupby('customer_id')], test_size=0.3, random_state=7)

In [43]:
display(len(train_customers))
display(len(test_customers))

11900

5100

In [54]:
def split_transactions_and_offers(customer_list_of_df, transaction_key='transaction'):
    """
    Filters a agglomerated dataframe into transactions and offers.
    
    Input:
    customers_list_of_df - individual customer dfs in a list
    transaction_key      - str for transaction events
    
    Returns:
        List of tuples of transaction and offer event dfs by customer id.
    """
    output = []
    # Iterate through the list and split
    for customer in customer_list_of_df:
        # Mask to get transactions
        select = customer.event == transaction_key
        # Filter for transactions and 
        output.append((customer[select], customer[~select]))
    
    return output
    

In [55]:
train_event_split = split_transactions_and_offers(train_customers)

In [61]:
display(train_event_split[1][0])
print('\n'*4)
display(train_event_split[1][1])

Unnamed: 0,customer_id,time,event,amount,reward_transaction,offer_id,gender,age,income,became_member_on,web,email,mobile,social,offer_type,duration,difficulty,offer_reward
30713,12595,48,transaction,21.44,,,M,41.0,69000.0,2015-11-14,,,,,,,,
37876,12595,78,transaction,10.76,,,M,41.0,69000.0,2015-11-14,,,,,,,,
52027,12595,156,transaction,11.06,,,M,41.0,69000.0,2015-11-14,,,,,,,,
52927,12595,162,transaction,18.29,,,M,41.0,69000.0,2015-11-14,,,,,,,,
84814,12595,210,transaction,13.78,,,M,41.0,69000.0,2015-11-14,,,,,,,,
109659,12595,324,transaction,6.68,,,M,41.0,69000.0,2015-11-14,,,,,,,,
141015,12595,372,transaction,33.06,,,M,41.0,69000.0,2015-11-14,,,,,,,,
148467,12595,396,transaction,16.33,,,M,41.0,69000.0,2015-11-14,,,,,,,,
194570,12595,474,transaction,26.64,,,M,41.0,69000.0,2015-11-14,,,,,,,,
217389,12595,504,transaction,13.18,,,M,41.0,69000.0,2015-11-14,,,,,,,,









Unnamed: 0,customer_id,time,event,amount,reward_transaction,offer_id,gender,age,income,became_member_on,web,email,mobile,social,offer_type,duration,difficulty,offer_reward
9594,12595,0,offer_received,,,10.0,M,41.0,69000.0,2015-11-14,1.0,1.0,1.0,0.0,discount,7.0,10.0,2.0
17451,12595,6,offer_viewed,,,10.0,M,41.0,69000.0,2015-11-14,1.0,1.0,1.0,0.0,discount,7.0,10.0,2.0
30714,12595,48,offer_completed,,2.0,10.0,M,41.0,69000.0,2015-11-14,1.0,1.0,1.0,0.0,discount,7.0,10.0,2.0
62800,12595,168,offer_received,,,6.0,M,41.0,69000.0,2015-11-14,1.0,1.0,1.0,1.0,discount,7.0,7.0,3.0
74126,12595,180,offer_viewed,,,6.0,M,41.0,69000.0,2015-11-14,1.0,1.0,1.0,1.0,discount,7.0,7.0,3.0
84815,12595,210,offer_completed,,3.0,6.0,M,41.0,69000.0,2015-11-14,1.0,1.0,1.0,1.0,discount,7.0,7.0,3.0
116489,12595,336,offer_received,,,8.0,M,41.0,69000.0,2015-11-14,0.0,1.0,1.0,1.0,informational,3.0,0.0,0.0
129356,12595,342,offer_viewed,,,8.0,M,41.0,69000.0,2015-11-14,0.0,1.0,1.0,1.0,informational,3.0,0.0,0.0
160258,12595,408,offer_received,,,5.0,M,41.0,69000.0,2015-11-14,1.0,1.0,0.0,0.0,discount,10.0,20.0,5.0
173541,12595,420,offer_viewed,,,5.0,M,41.0,69000.0,2015-11-14,1.0,1.0,0.0,0.0,discount,10.0,20.0,5.0
