Ok, let's draft some user stories and implement Reinforcement Learning for the Reco Engine

## What are the ingredients?
1. Data about the email opened, item shopped and item responded. Note: It is the key ingredient for the learning.
2. Data of every user and the relevance of a group of items FGC's 

## What is already being done?
Collaborative filtering and XGboost regression to get the initial relevance for the group to the user.

## What is to be done?
According to the response of users, change the relevance value and attain a better Purchase rate.

**The metric to optimise for:** Purchases/Relevancy score


**Similar problem:** Multi armed bandit

## Concept:
A bandit wants to find which slot machine will get him best reward. So he keeps pulling levers and every time he wins, choosing that machine will become more probable!

### Algorithm:
```
Bandit has 10 hands (10 offers to be sent)
There are k actions (k is number of items)

Get the current array of Q values (relevances * 100)

let alpha = 0.1

For every user:
  Choose 10 actions such that,
    either action is a popular item from a Family group
    (with high Q value)
    or
    A overall popular item (for exploration)
  
  Send the 10 actions to users

  # Wait for some weeks
  Get the [opened, shopped, response] for products
  
  For each feedback for item:
    Get Old Q Value for the Family Group (relevence)
    
    if all [opened, shopped, response] are True:
      Reward = 5
    elif all [opened, shopped, response] are False:
      Reward = -2
    elif shopped:
      Reward = 3
    elif responded and didn't shop:
      Reward = 2
    elif opened email:
      Reward = 1
    else:
      Reward = 0

    # More Reward for pricier item
    if shopped:
      Reward = Reward * (0.01 * price)

    New Q Value = Old Q value + alpha * (Reward - Old Q Value)

    Update the old Q value for the [customer > Family Group]
```

### Some extra explanations:

`New Q Value = Old Q value + alpha * (Reward - Old Q Value)`
This equation is a special case for bandit algorithm, it makes sure that as the users change, it will adapt to them. That is, a Non-stationary problem. So if a person bought a toothpaste some months ago, its reward will fade away and more recent purchases will have more effect.

Alpha is the learning step size, greater value will make the Q value rise faster. 

In [1]:
# Import libraries I love
import pandas as pd
import numpy as np
from datetime import datetime

In [2]:
start = datetime.now()

In [3]:
MAIN_DATA_PATH = "Data/Data-RS.csv"
RS_DATA_PATH = "Data/Data-RS.csv"
TRX_DATA_PATH = "Data/SEP15-OCT01TRX.csv"

In [4]:
rs_data = pd.read_csv(RS_DATA_PATH)

print(rs_data.info())
rs_data.head(3)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 3 columns):
collector_key    1000000 non-null int64
fgc              1000000 non-null int64
rs               1000000 non-null float64
dtypes: float64(1), int64(2)
memory usage: 22.9 MB
None


Unnamed: 0,collector_key,fgc,rs
0,42,9604,0.508986
1,42,15457,0.399305
2,42,83714,0.361613


In [5]:
trx_data = pd.read_csv(TRX_DATA_PATH)
# Gives warning for test data, we should fix the format a little bit

# Data Cleaning

# Remove string records from the data,
trx_data.loc[trx_data["OPENED_email"] == "SHOPPED", "OPENED_email"] = 1
trx_data.loc[trx_data["SHOPPED"] == "RESPOND", "SHOPPED"] = 1
trx_data.loc[trx_data["SHOPPED"] == "RESPOND", "SHOPPED"] = 1
trx_data.loc[trx_data["RESPOND"] == "RESPOND", "RESPOND"] = 1

# Change the datatype to int for more performance
trx_data["OPENED_email"] = trx_data["OPENED_email"].astype(np.int0)
trx_data["SHOPPED"] = trx_data["SHOPPED"].astype(np.int0)
trx_data["RESPOND"] = trx_data["RESPOND"].astype(np.int0)

# Change columns for more ease
rs_data.columns = ["COLLECTOR_KEY", "FGC", "RS"]

print(trx_data.info())
trx_data.head(3)

  interactivity=interactivity, compiler=compiler, result=result)


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 108394 entries, 0 to 108393
Data columns (total 17 columns):
COLLECTOR_KEY          108394 non-null int64
DATE_ID                108394 non-null int64
STORE_ID               108394 non-null int64
TRANS_KEY              108394 non-null float64
CAT                    108394 non-null object
SUBCAT                 108394 non-null int64
UPC                    108394 non-null int64
AM_ELIGIBLE            108394 non-null int64
SALES                  108394 non-null float64
MARGIN                 108394 non-null float64
UNITS                  108394 non-null int64
UPC2                   108394 non-null int64
RXAL_DW_SUBCAT_DESC    95775 non-null object
FGC                    108394 non-null int64
OPENED_email           108394 non-null int64
SHOPPED                108394 non-null int64
RESPOND                108394 non-null int64
dtypes: float64(3), int64(12), object(2)
memory usage: 14.1+ MB
None


Unnamed: 0,COLLECTOR_KEY,DATE_ID,STORE_ID,TRANS_KEY,CAT,SUBCAT,UPC,AM_ELIGIBLE,SALES,MARGIN,UNITS,UPC2,RXAL_DW_SUBCAT_DESC,FGC,OPENED_email,SHOPPED,RESPOND
0,55,20170920,42,8.88897e+17,ANL,60100,77105810206,1,15.99,7.17,1,77105810206,IBUPROFEN,33020,1,0,0
1,59,20170923,6999,7.75793e+17,CAC,65250,6009326290,1,14.24,5.26,1,6009326290,COUGH/COLD REMEDIES LIQUID/PWD,33442,0,1,1
2,59,20170923,6999,7.75793e+17,BHB,50250,6260006290,1,3.99,0.1,1,6260006290,BABY BATH (NOT KIDS ADDITIVES),21736,1,1,1


In [6]:
merged_data = trx_data.merge(rs_data,on=["COLLECTOR_KEY", "FGC"], how="inner")

# Combine Sales Units
merged_data["SALES"] = merged_data["SALES"] * merged_data["UNITS"]

selected_data = merged_data[[
        "COLLECTOR_KEY", "FGC", "RS", "SALES", "OPENED_email", "SHOPPED", "RESPOND"
    ]
]

print(selected_data.info())
selected_data.head(3)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1808 entries, 0 to 1807
Data columns (total 7 columns):
COLLECTOR_KEY    1808 non-null int64
FGC              1808 non-null int64
RS               1808 non-null float64
SALES            1808 non-null float64
OPENED_email     1808 non-null int64
SHOPPED          1808 non-null int64
RESPOND          1808 non-null int64
dtypes: float64(2), int64(5)
memory usage: 113.0 KB
None


Unnamed: 0,COLLECTOR_KEY,FGC,RS,SALES,OPENED_email,SHOPPED,RESPOND
0,55,33020,0.255548,15.99,1,0,0
1,154,15418,0.914965,3.99,1,0,0
2,190,31796,0.450203,6.99,1,1,1


In [7]:
# Now lets group by the collector
user_feedback = selected_data.groupby(by="COLLECTOR_KEY")

### Update Algorithm:
```
let alpha = 0.1

For every user:
  For each feedback for item:
    Get Old Q Value for the Family Group (relevence)
    
    if all [opened, shopped, response] are True:
      Reward = 5
    elif all [opened, shopped, response] are False:
      Reward = -2
    elif shopped:
      Reward = 3
    elif responded and didn't shop:
      Reward = 2
    elif opened email:
      Reward = 1
    else:
      Reward = 0

    # More Reward for pricier item
    if shopped:
      Reward = Reward * (0.01 * price)

    New Q Value = Old Q value + alpha * (Reward - Old Q Value)

    Update the old Q value for the [customer > Family Group]
```

In [8]:
# And let the fun begin!
alpha = 0.01
price_factor = 0.2

rs_updates = []
for user, feedbacks in user_feedback:
    for index, feedback_row in feedbacks.iterrows():
        _, fgc, rs, sales, opened, shopped, responded = tuple(feedback_row)
        
        Q_old = rs
        
        Reward = 0
        if opened and shopped and responded and sales > 0:
            Reward = 5 * price_factor * sales
        elif not (opened or shopped or responded):
            Reward = -2
        elif shopped and sales > 0:
            Reward = 3 * price_factor * sales
        elif responded and not shopped:
            Reward = 2
        elif opened:
            Reward = 1
        
        Q_new = Q_old + alpha * (Reward - Q_old)
        
        feedback_row["RS"] = Q_new
        
        rs_updates.append(list(feedback_row))

rs_updates = pd.DataFrame(rs_updates, columns=selected_data.columns)

print(rs_updates.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1808 entries, 0 to 1807
Data columns (total 7 columns):
COLLECTOR_KEY    1808 non-null float64
FGC              1808 non-null float64
RS               1808 non-null float64
SALES            1808 non-null float64
OPENED_email     1808 non-null float64
SHOPPED          1808 non-null float64
RESPOND          1808 non-null float64
dtypes: float64(7)
memory usage: 99.0 KB
None


In [9]:
updated_data = rs_updates.drop(["OPENED_email", "SHOPPED", "RESPOND", "SALES"], axis=1)

updated_data.head(10)

Unnamed: 0,COLLECTOR_KEY,FGC,RS
0,55.0,33020.0,0.262992
1,154.0,15418.0,0.915815
2,190.0,31796.0,0.515601
3,190.0,31796.0,0.515601
4,190.0,32903.0,0.367509
5,223.0,81777.0,0.667781
6,289.0,15485.0,0.792886
7,447.0,9604.0,0.504591
8,474.0,71486.0,0.36623
9,520.0,31977.0,0.44725


In [10]:
main_data = pd.read_csv(MAIN_DATA_PATH)
main_data.columns = map(lambda x: x.upper(), main_data.columns)

cols = ['COLLECTOR_KEY', 'FGC']

new_data = pd.concat([main_data, updated_data], 0).drop_duplicates(['COLLECTOR_KEY','FGC'],keep='last').sort_values(by="COLLECTOR_KEY")

print(new_data.info())

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1000000 entries, 0 to 999989
Data columns (total 3 columns):
COLLECTOR_KEY    1000000 non-null float64
FGC              1000000 non-null float64
RS               1000000 non-null float64
dtypes: float64(3)
memory usage: 30.5 MB
None


In [11]:
def normalise(x):
    x["RS"] /= x["RS"].sum()
    return x

new_data = new_data.groupby(by="COLLECTOR_KEY").apply(normalise)

In [12]:
elapsed = datetime.now() - start
print("Execution completed in {0} seconds".format(elapsed.total_seconds()))

new_data.head(6)

Execution completed in 18.685549 seconds


Unnamed: 0,COLLECTOR_KEY,FGC,RS
0,42.0,9604.0,0.050056
27,42.0,85307.0,0.017013
28,42.0,112567.0,0.016027
29,42.0,15418.0,0.015038
30,42.0,32900.0,0.01477
31,42.0,27502.0,0.014613


In [13]:
new_data.to_csv("result.csv", index=False)
print("Done! Have a nice day Genius!")

Done! Have a nice day!
