# Starbucks Capstone Challenge

### Introduction

This data set contains simulated data that mimics customer behavior on the Starbucks rewards mobile app. Once every few days, Starbucks sends out an offer to users of the mobile app. An offer can be merely an advertisement for a drink or an actual offer such as a discount or BOGO (buy one get one free). Some users might not receive any offer during certain weeks. 

Not all users receive the same offer, and that is the challenge to solve with this data set.

Your task is to combine transaction, demographic and offer data to determine which demographic groups respond best to which offer type. This data set is a simplified version of the real Starbucks app because the underlying simulator only has one product whereas Starbucks actually sells dozens of products.

Every offer has a validity period before the offer expires. As an example, a BOGO offer might be valid for only 5 days. You'll see in the data set that informational offers have a validity period even though these ads are merely providing information about a product; for example, if an informational offer has 7 days of validity, you can assume the customer is feeling the influence of the offer for 7 days after receiving the advertisement.

You'll be given transactional data showing user purchases made on the app including the timestamp of purchase and the amount of money spent on a purchase. This transactional data also has a record for each offer that a user receives as well as a record for when a user actually views the offer. There are also records for when a user completes an offer. 

Keep in mind as well that someone using the app might make a purchase through the app without having received an offer or seen an offer.

### Example

To give an example, a user could receive a discount offer buy 10 dollars get 2 off on Monday. The offer is valid for 10 days from receipt. If the customer accumulates at least 10 dollars in purchases during the validity period, the customer completes the offer.

However, there are a few things to watch out for in this data set. Customers do not opt into the offers that they receive; in other words, a user can receive an offer, never actually view the offer, and still complete the offer. For example, a user might receive the "buy 10 dollars get 2 dollars off offer", but the user never opens the offer during the 10 day validity period. The customer spends 15 dollars during those ten days. There will be an offer completion record in the data set; however, the customer was not influenced by the offer because the customer never viewed the offer.

### Cleaning

This makes data cleaning especially important and tricky.

You'll also want to take into account that some demographic groups will make purchases even if they don't receive an offer. From a business perspective, if a customer is going to make a 10 dollar purchase without an offer anyway, you wouldn't want to send a buy 10 dollars get 2 dollars off offer. You'll want to try to assess what a certain demographic group will buy when not receiving any offers.

### Final Advice

Because this is a capstone project, you are free to analyze the data any way you see fit. For example, you could build a machine learning model that predicts how much someone will spend based on demographics and offer type. Or you could build a model that predicts whether or not someone will respond to an offer. Or, you don't need to build a machine learning model at all. You could develop a set of heuristics that determine what offer you should send to each customer (i.e., 75 percent of women customers who were 35 years old responded to offer A vs 40 percent from the same demographic to offer B, so send offer A).

# Data Sets

The data is contained in three files:

* portfolio.json - containing offer ids and meta data about each offer (duration, type, etc.)
* profile.json - demographic data for each customer
* transcript.json - records for transactions, offers received, offers viewed, and offers completed

Here is the schema and explanation of each variable in the files:

**portfolio.json**
* id (string) - offer id
* offer_type (string) - type of offer ie BOGO, discount, informational
* difficulty (int) - minimum required spend to complete an offer
* reward (int) - reward given for completing an offer
* duration (int) - time for offer to be open, in days
* channels (list of strings)

**profile.json**
* age (int) - age of the customer 
* became_member_on (int) - date when customer created an app account
* gender (str) - gender of the customer (note some entries contain 'O' for other rather than M or F)
* id (str) - customer id
* income (float) - customer's income

**transcript.json**
* event (str) - record description (ie transaction, offer received, offer viewed, etc.)
* person (str) - customer id
* time (int) - time in hours since start of test. The data begins at time t=0
* value - (dict of strings) - either an offer id or transaction amount depending on the record

**Note:** If you are using the workspace, you will need to go to the terminal and run the command `conda update pandas` before reading in the files. This is because the version of pandas in the workspace cannot read in the transcript.json file correctly, but the newest version of pandas can. You can access the termnal from the orange icon in the top left of this notebook.  

You can see how to access the terminal and how the install works using the two images below.  First you need to access the terminal:

<img src="pic1.png"/>

Then you will want to run the above command:

<img src="pic2.png"/>

Finally, when you enter back into the notebook (use the jupyter icon again), you should be able to run the below cell without any errors.

## The Goal of the Project

In this notebook, we are about to answer the questions of which demographic users group response best to which type of offers. With the force of data science, we have develop some tools like classification prediction model and recommendation system to increase the users purchase. We approach the solutions by the following ways:

1 Found the demographic users difference in 4 groups and understood how demographic features associated with customer behaviors:

  Group1: Users completed the offer without viewed.

  Group2: Users viewed the offer and completed it.

  Group3: Users viewed the offer but not completed it.

  Group4: Users did not view nor complete the offers.

2 Built a classification model to predict whether a user who receive the offer would complete it.

3 Built a user-user based collaborative filtering recommendation system for new and old users.

## Table of Contents

I. [Exploratory Data Analysis On User Groups](#Exploratory)<br>
II. [Classification to Predict Completion](#Classification)<br>
III. [User-User Based Collaborative Filtering Recommendation](#Recommendation)<br>

In [1]:
import pandas as pd
import numpy as np
import math
import json
% matplotlib inline

# read in the json files
portfolio = pd.read_json('data/portfolio.json', orient='records', lines=True)
profile = pd.read_json('data/profile.json', orient='records', lines=True)
transcript = pd.read_json('data/transcript.json', orient='records', lines=True)

In [2]:
portfolio.head()

Unnamed: 0,channels,difficulty,duration,id,offer_type,reward
0,"[email, mobile, social]",10,7,ae264e3637204a6fb9bb56bc8210ddfd,bogo,10
1,"[web, email, mobile, social]",10,5,4d5c57ea9a6940dd891ad53e9dbe8da0,bogo,10
2,"[web, email, mobile]",0,4,3f207df678b143eea3cee63160fa8bed,informational,0
3,"[web, email, mobile]",5,7,9b98b8c7a33c4b65b9aebfe6a799e6d9,bogo,5
4,"[web, email]",20,10,0b1e1539f2cc45b7b9fa7c272da2e1d7,discount,5


In [3]:
portfolio.shape

(10, 6)

In [4]:
print ('There are {} offers in the portfolio with the columns {}'.format(portfolio.id.nunique(),portfolio.columns))

There are 10 offers in the portfolio with the columns Index(['channels', 'difficulty', 'duration', 'id', 'offer_type', 'reward'], dtype='object')


In [5]:
profile.head()

Unnamed: 0,age,became_member_on,gender,id,income
0,118,20170212,,68be06ca386d4c31939f3a4f0e3dd783,
1,55,20170715,F,0610b486422d4921ae7d2bf64640c50b,112000.0
2,118,20180712,,38fe809add3b4fcf9315a9694bb96ff5,
3,75,20170509,F,78afa995795e4d85b5d9ceeca43f5fef,100000.0
4,118,20170804,,a03223e636434f42ac4c3df47e8bac43,


In [6]:
profile.shape

(17000, 5)

In [7]:
print ('There are {} users in the profile with the columns {}'.format(profile.id.nunique(),profile.columns))

There are 17000 users in the profile with the columns Index(['age', 'became_member_on', 'gender', 'id', 'income'], dtype='object')


In [8]:
transcript.head()

Unnamed: 0,event,person,time,value
0,offer received,78afa995795e4d85b5d9ceeca43f5fef,0,{'offer id': '9b98b8c7a33c4b65b9aebfe6a799e6d9'}
1,offer received,a03223e636434f42ac4c3df47e8bac43,0,{'offer id': '0b1e1539f2cc45b7b9fa7c272da2e1d7'}
2,offer received,e2127556f4f64592b11af22de27a7932,0,{'offer id': '2906b810c7d4411798c6938adc9daaa5'}
3,offer received,8ec6ce2a7e7949b1bf142def7d0e0586,0,{'offer id': 'fafdcd668e3743c1bb461111dcafc2a4'}
4,offer received,68617ca6246f4fbc85e91a2a49552598,0,{'offer id': '4d5c57ea9a6940dd891ad53e9dbe8da0'}


In [9]:
transcript.shape

(306534, 4)

In [10]:
print ('There are {} rows of transaction event'.format(transcript.loc[transcript.event=='transaction'].shape[0]))
print ('There are {} rows of offer received event'.format(transcript.loc[transcript.event=='offer received'].shape[0]))
print ('There are {} rows of offer viewed event'.format(transcript.loc[transcript.event=='offer viewed'].shape[0]))
print ('There are {} rows of offer completed event'.format(transcript.loc[transcript.event=='offer completed'].shape[0]))

There are 138953 rows of transaction event
There are 76277 rows of offer received event
There are 57725 rows of offer viewed event
There are 33579 rows of offer completed event


### <a class="anchor" id="Exploratory">Part I : Exploratory Data Analysis On User Groups</a>

In this section, we explored the difference of users demographic features in 4 exclusive groups based on users behaviors:

Group1: Users completed the offer without viewed.

Group2: Users viewed the offer and completed it.

Group3: Users viewed the offer but not completed it.

Group4: Users did not view nor complete the offers.


#### Data Preprocessing

In [11]:
transcript_trans = transcript.loc[transcript.event=='transaction'].copy()
transcript_offerrec = transcript.loc[transcript.event=='offer received'].copy()
transcript_offervie = transcript.loc[transcript.event=='offer viewed'].copy()
transcript_offercom =  transcript.loc[transcript.event=='offer completed'].copy()

In [12]:
transcript_trans.reset_index(inplace=True)
transcript_offerrec.reset_index(inplace=True)
transcript_offervie.reset_index(inplace=True)
transcript_offercom.reset_index(inplace=True)

In [13]:
transcript_trans['amount'] = transcript_trans.value.apply(lambda x:x['amount'])
transcript_offerrec['offer_id'] = transcript_offerrec.value.apply(lambda x:x['offer id'])
transcript_offervie['offer_id'] = transcript_offervie.value.apply(lambda x:x['offer id'])
transcript_offercom['offer_id'] = transcript_offercom.value.apply(lambda x:x['offer_id'])
transcript_offercom['reward'] = transcript_offercom.value.apply(lambda x:x['reward'])

In [14]:
transcript_trans.drop(['value','event','index'],axis=1,inplace=True)
transcript_offerrec.drop(['value','event','index'],axis=1,inplace=True)
transcript_offervie.drop(['value','event','index'],axis=1,inplace=True)
transcript_offercom.drop(['value','event','index'],axis=1,inplace=True)

In [15]:
transcript_trans.rename(columns={'person':'person_trans','time':'time_trans'},inplace=True)
transcript_offerrec.rename(columns={'person':'person_rec','time':'time_rec','offer_id':'offer_id_rec'},inplace=True)
transcript_offervie.rename(columns={'person':'person_vie','time':'time_vie','offer_id':'offer_id_vie'},inplace=True)
transcript_offercom.rename(columns={'person':'person_com','time':'time_com','offer_id':'offer_id_com','reward':'reward_com'},inplace=True)

In [16]:
transcript_trans.head()

Unnamed: 0,person_trans,time_trans,amount
0,02c083884c7d45b39cc68e1314fec56c,0,0.83
1,9fa9ae8f57894cc9a3b8a9bbe0fc1b2f,0,34.56
2,54890f68699049c2a04d415abc25e717,0,13.23
3,b2f1cd155b864803ad8334cdf13c4bd2,0,19.51
4,fe97aa22dd3e48c8b143116a8403dd52,0,18.97


In [17]:
transcript_offerrec.head()

Unnamed: 0,person_rec,time_rec,offer_id_rec
0,78afa995795e4d85b5d9ceeca43f5fef,0,9b98b8c7a33c4b65b9aebfe6a799e6d9
1,a03223e636434f42ac4c3df47e8bac43,0,0b1e1539f2cc45b7b9fa7c272da2e1d7
2,e2127556f4f64592b11af22de27a7932,0,2906b810c7d4411798c6938adc9daaa5
3,8ec6ce2a7e7949b1bf142def7d0e0586,0,fafdcd668e3743c1bb461111dcafc2a4
4,68617ca6246f4fbc85e91a2a49552598,0,4d5c57ea9a6940dd891ad53e9dbe8da0


In [18]:
transcript_offervie.head()

Unnamed: 0,person_vie,time_vie,offer_id_vie
0,389bc3fa690240e798340f5a15918d5c,0,f19421c1d4aa40978ebb69ca19b0e20d
1,d1ede868e29245ea91818a903fec04c6,0,5a8bc65990b245e5a138643cd4eb9837
2,102e9454054946fda62242d2e176fdce,0,4d5c57ea9a6940dd891ad53e9dbe8da0
3,02c083884c7d45b39cc68e1314fec56c,0,ae264e3637204a6fb9bb56bc8210ddfd
4,be8a5d1981a2458d90b255ddc7e0d174,0,5a8bc65990b245e5a138643cd4eb9837


In [19]:
transcript_offercom.head()

Unnamed: 0,person_com,time_com,offer_id_com,reward_com
0,9fa9ae8f57894cc9a3b8a9bbe0fc1b2f,0,2906b810c7d4411798c6938adc9daaa5,2
1,fe97aa22dd3e48c8b143116a8403dd52,0,fafdcd668e3743c1bb461111dcafc2a4,2
2,629fc02d56414d91bca360decdfa9288,0,9b98b8c7a33c4b65b9aebfe6a799e6d9,5
3,676506bad68e4161b9bbaffeb039626b,0,ae264e3637204a6fb9bb56bc8210ddfd,10
4,8f7dd3b2afe14c078eb4f6e6fe4ba97d,0,4d5c57ea9a6940dd891ad53e9dbe8da0,10


Looked into the test, we do some exploration of the dataset.

In [20]:
#In the test ,we sent offers in 6 times
transcript_offerrec.time_rec.value_counts()

408    12778
576    12765
336    12711
504    12704
168    12669
0      12650
Name: time_rec, dtype: int64

In [21]:
print ('The max time of offerview is {} hours, and the max time of offercomplete is {} hours'.format(transcript_offervie.time_vie.max(),transcript_offercom.time_com.max()))

The max time of offerview is 714 hours, and the max time of offercomplete is 714 hours


In [22]:
#The offer durations in the final receive time (576hours)
transcript_offerrec.loc[transcript_offerrec.time_rec==576].merge(portfolio,how='inner',left_on=['offer_id_rec'],right_on=['id']).duration.unique()

array([10,  3,  5,  7,  4], dtype=int64)

In [23]:
print ('The max duration of offer is {} days,which is {} hours'.format(portfolio.duration.max(),portfolio.duration.max()*24))
print ('The lastest offers send time is {}'.format(transcript_offerrec.time_rec.max()))
print ('We see probably some lastest offers did not reach it end time when the test is end')

The max duration of offer is 10 days,which is 240 hours
The lastest offers send time is 576
We see probably some lastest offers did not reach it end time when the test is end


In [24]:
#Merge the offerreceive dataset to offerview dataset
offer_rec_vie = transcript_offerrec.merge(transcript_offervie,how='left',left_on=['person_rec','offer_id_rec'],right_on=['person_vie','offer_id_vie'])
offer_rec_vie.shape

(95321, 6)

In [None]:
#transcript_offerrec.head()

In [None]:
#transcript_offerrec.loc[transcript_offerrec.person_rec=='a03223e636434f42ac4c3df47e8bac43']

In [None]:
#transcript_offerrec.loc[(transcript_offerrec.person_rec=='a03223e636434f42ac4c3df47e8bac43') & (transcript_offerrec.offer_id_rec=='0b1e1539f2cc45b7b9fa7c272da2e1d7')]

In [None]:
#transcript_offervie.loc[(transcript_offervie.person_vie=='a03223e636434f42ac4c3df47e8bac43') & (transcript_offervie.offer_id_vie=='0b1e1539f2cc45b7b9fa7c272da2e1d7')]

In [None]:
#offer_rec_vie.loc[(offer_rec_vie.person_rec=='a03223e636434f42ac4c3df47e8bac43') &(offer_rec_vie.offer_id_rec=='0b1e1539f2cc45b7b9fa7c272da2e1d7')]

In [25]:
#Invalid rows(receive after view) from merge are removed
offer_rec_vie_v = offer_rec_vie[~(offer_rec_vie.time_rec>offer_rec_vie.time_vie)].copy()
offer_rec_vie_v.shape

(84396, 6)

In [26]:
#Merge offer_receiveandview with offercomplete
offer_rec_vie_v_com = offer_rec_vie_v.merge(transcript_offercom,how='left',left_on=['person_rec','offer_id_rec'],right_on=['person_com','offer_id_com'])
offer_rec_vie_v_com.shape

(99057, 10)

In [27]:
#Invalid rows(receive after complete) from merge are removed
offer_rec_vie_v_com_v = offer_rec_vie_v_com[~(offer_rec_vie_v_com.time_rec>offer_rec_vie_v_com.time_com)].copy()
offer_rec_vie_v_com_v.shape

(93639, 10)

In [None]:
#offer_rec_vie_v_com.loc[(offer_rec_vie_v_com.person_rec=='a03223e636434f42ac4c3df47e8bac43') &(offer_rec_vie_v_com.offer_id_rec=='0b1e1539f2cc45b7b9fa7c272da2e1d7')]

In [28]:
#Create new column 'usercategory' for 4 user groups
offer=offer_rec_vie_v_com_v
offer['usercategory']=None
offer.loc[offer.time_com>=offer.time_vie,'usercategory']='viewandcomplete'
offer.loc[(offer.time_vie.isnull()==True) &(offer.time_com.isnull()==False),'usercategory']='completewithoutview'
offer.loc[offer.time_vie>offer.time_com,'usercategory']='completewithoutview'
offer.loc[(offer.time_vie.isnull()==False) & (offer.time_com.isnull()==True),'usercategory']='viewnotcomplete'
offer.loc[(offer.time_vie.isnull()==True) & (offer.time_com.isnull()==True),'usercategory']='noviewnocomplete'


In [29]:
offer.head()

Unnamed: 0,person_rec,time_rec,offer_id_rec,person_vie,time_vie,offer_id_vie,person_com,time_com,offer_id_com,reward_com,usercategory
0,78afa995795e4d85b5d9ceeca43f5fef,0,9b98b8c7a33c4b65b9aebfe6a799e6d9,78afa995795e4d85b5d9ceeca43f5fef,6.0,9b98b8c7a33c4b65b9aebfe6a799e6d9,78afa995795e4d85b5d9ceeca43f5fef,132.0,9b98b8c7a33c4b65b9aebfe6a799e6d9,5.0,viewandcomplete
1,a03223e636434f42ac4c3df47e8bac43,0,0b1e1539f2cc45b7b9fa7c272da2e1d7,a03223e636434f42ac4c3df47e8bac43,6.0,0b1e1539f2cc45b7b9fa7c272da2e1d7,,,,,viewnotcomplete
2,a03223e636434f42ac4c3df47e8bac43,0,0b1e1539f2cc45b7b9fa7c272da2e1d7,a03223e636434f42ac4c3df47e8bac43,624.0,0b1e1539f2cc45b7b9fa7c272da2e1d7,,,,,viewnotcomplete
3,e2127556f4f64592b11af22de27a7932,0,2906b810c7d4411798c6938adc9daaa5,e2127556f4f64592b11af22de27a7932,18.0,2906b810c7d4411798c6938adc9daaa5,,,,,viewnotcomplete
4,8ec6ce2a7e7949b1bf142def7d0e0586,0,fafdcd668e3743c1bb461111dcafc2a4,8ec6ce2a7e7949b1bf142def7d0e0586,12.0,fafdcd668e3743c1bb461111dcafc2a4,,,,,viewnotcomplete


In [30]:
offer.usercategory.value_counts()

viewandcomplete        34227
viewnotcomplete        32795
completewithoutview    15837
noviewnocomplete       10780
Name: usercategory, dtype: int64

In [31]:
# We create 'memberdays' which indicate how many days the users became a member.
import datetime
profile['became_member_on_corr']=profile.became_member_on.apply(lambda x:str(x))
profile['became_member_on_corr']=profile.became_member_on_corr.apply(lambda x :datetime.date(int(x[0:4]),int(x[4:6]),int(x[6:])))
today = datetime.date.today()
profile['memberdays']=profile.became_member_on_corr.apply(lambda x:(today-x).days)
profile.head()

Unnamed: 0,age,became_member_on,gender,id,income,became_member_on_corr,memberdays
0,118,20170212,,68be06ca386d4c31939f3a4f0e3dd783,,2017-02-12,764
1,55,20170715,F,0610b486422d4921ae7d2bf64640c50b,112000.0,2017-07-15,611
2,118,20180712,,38fe809add3b4fcf9315a9694bb96ff5,,2018-07-12,249
3,75,20170509,F,78afa995795e4d85b5d9ceeca43f5fef,100000.0,2017-05-09,678
4,118,20170804,,a03223e636434f42ac4c3df47e8bac43,,2017-08-04,591


In [32]:
profile.drop(['became_member_on','became_member_on_corr'],axis=1,inplace=True)
profile.reset_index(inplace=True)
#Checking null values in profile dataset
profile.isnull().sum()
#We drop rows where gender is null
profile.dropna(subset=['gender'],axis=0,inplace=True)
profile.isnull().sum()

index         0
age           0
gender        0
id            0
income        0
memberdays    0
dtype: int64

In [33]:
#Merge offer_receive/view/complete dataset with profile
offer_p=offer.merge(profile,how='inner',left_on=['person_rec'],right_on=['id'])

We explored the demographic features of users in the 4 exclusive groups.

Group1: Users completed the offer without viewed.

Group2: Users viewed the offer and completed it.

Group3: Users viewed the offer but not completed it.

Group4: Users did not view nor complete the offers.


In [34]:
offer_p.groupby(['usercategory'])[['age','income','memberdays']].median()

Unnamed: 0_level_0,age,income,memberdays
usercategory,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
completewithoutview,57,70000.0,755
noviewnocomplete,53,56000.0,532
viewandcomplete,56,68000.0,762
viewnotcomplete,54,59000.0,523


Observed the median of age,income and memberdays in the 4 groups, we saw:

The users who completed without viewed had the largest age (age median:57),the second rank is the users who viewed and completed offers(age median:56). Users who viewed the offers but not completed (age median:54) and Users who didnot view nor complete (median:53) were much younger. That means elder users have more willing to buy.

The users who completed without viewed had highest income(income median:70000).The second rank is the users who viewed and completed  the offer(income median:68000).Users who viewed the offers but not completed(income median:59000) and users who didnot view nor complete(income median:56000) had less income.That means users have higher income are less impacted by offers and also more willing to pay.

The users who completed without viewed (memberdays median:755) and users who viewed and completed offers(membership median:762) had the much longer period of membership than the other 2 groups. The other two groups of users who viewed but not completed and users who didnot view nor complete only had membership period of about 500days. That means we should send offers users who have long period of membership ,they are more loyal and have more willing to pay.

In [35]:
offer_p.groupby(['gender','usercategory'])[['age','income','memberdays']].median()

Unnamed: 0_level_0,Unnamed: 1_level_0,age,income,memberdays
gender,usercategory,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
F,completewithoutview,59.0,74000.0,753.0
F,noviewnocomplete,57.0,67000.0,566.0
F,viewandcomplete,58.0,72000.0,754.0
F,viewnotcomplete,57.0,66000.0,534.0
M,completewithoutview,55.0,65000.0,757.0
M,noviewnocomplete,51.0,52000.0,515.0
M,viewandcomplete,54.0,64000.0,771.0
M,viewnotcomplete,52.0,56000.0,517.5
O,completewithoutview,57.0,66500.0,761.0
O,noviewnocomplete,51.0,48000.0,539.0


We saw that female users had higher income than male users. Female users' age were higher than male users. 

In [36]:
def create_channels(portfolio=portfolio):
    portfolio['channels_web']=0
    portfolio['channels_email']=0
    portfolio['channels_mobile']=0
    portfolio['channels_social']=0
    channels=['web','email','mobile','social']
    for c in channels:
        for i in range(portfolio.shape[0]):
            if c=='web'and c in portfolio.channels[i]:
                portfolio.loc[i ,'channels_web']=1
            if c=='email'and c in portfolio.channels[i]:
                portfolio.loc[i,'channels_email']=1
            if c=='mobile'and c in portfolio.channels[i]:
                portfolio.loc[i,'channels_mobile']=1
            if c=='social'and c in portfolio.channels[i]:
                portfolio.loc[i,'channels_social']=1
    return portfolio
portfolio = create_channels()

In [37]:
portfolio.drop(['channels'],axis=1,inplace=True)
portfolio.head()

Unnamed: 0,difficulty,duration,id,offer_type,reward,channels_web,channels_email,channels_mobile,channels_social
0,10,7,ae264e3637204a6fb9bb56bc8210ddfd,bogo,10,0,1,1,1
1,10,5,4d5c57ea9a6940dd891ad53e9dbe8da0,bogo,10,1,1,1,1
2,0,4,3f207df678b143eea3cee63160fa8bed,informational,0,1,1,1,0
3,5,7,9b98b8c7a33c4b65b9aebfe6a799e6d9,bogo,5,1,1,1,0
4,20,10,0b1e1539f2cc45b7b9fa7c272da2e1d7,discount,5,1,1,0,0


In [38]:
#Merge offer receive/view/complte/profile dataset to portofolio
offer_p_f = offer_p.merge(portfolio,how='inner',left_on=['offer_id_rec'],right_on=['id'])
offer_p_f.head()

Unnamed: 0,person_rec,time_rec,offer_id_rec,person_vie,time_vie,offer_id_vie,person_com,time_com,offer_id_com,reward_com,...,memberdays,difficulty,duration,id_y,offer_type,reward,channels_web,channels_email,channels_mobile,channels_social
0,78afa995795e4d85b5d9ceeca43f5fef,0,9b98b8c7a33c4b65b9aebfe6a799e6d9,78afa995795e4d85b5d9ceeca43f5fef,6.0,9b98b8c7a33c4b65b9aebfe6a799e6d9,78afa995795e4d85b5d9ceeca43f5fef,132.0,9b98b8c7a33c4b65b9aebfe6a799e6d9,5.0,...,678,5,7,9b98b8c7a33c4b65b9aebfe6a799e6d9,bogo,5,1,1,1,0
1,e2127556f4f64592b11af22de27a7932,408,9b98b8c7a33c4b65b9aebfe6a799e6d9,e2127556f4f64592b11af22de27a7932,420.0,9b98b8c7a33c4b65b9aebfe6a799e6d9,e2127556f4f64592b11af22de27a7932,522.0,9b98b8c7a33c4b65b9aebfe6a799e6d9,5.0,...,326,5,7,9b98b8c7a33c4b65b9aebfe6a799e6d9,bogo,5,1,1,1,0
2,389bc3fa690240e798340f5a15918d5c,168,9b98b8c7a33c4b65b9aebfe6a799e6d9,389bc3fa690240e798340f5a15918d5c,192.0,9b98b8c7a33c4b65b9aebfe6a799e6d9,389bc3fa690240e798340f5a15918d5c,498.0,9b98b8c7a33c4b65b9aebfe6a799e6d9,5.0,...,402,5,7,9b98b8c7a33c4b65b9aebfe6a799e6d9,bogo,5,1,1,1,0
3,389bc3fa690240e798340f5a15918d5c,168,9b98b8c7a33c4b65b9aebfe6a799e6d9,389bc3fa690240e798340f5a15918d5c,438.0,9b98b8c7a33c4b65b9aebfe6a799e6d9,389bc3fa690240e798340f5a15918d5c,498.0,9b98b8c7a33c4b65b9aebfe6a799e6d9,5.0,...,402,5,7,9b98b8c7a33c4b65b9aebfe6a799e6d9,bogo,5,1,1,1,0
4,389bc3fa690240e798340f5a15918d5c,408,9b98b8c7a33c4b65b9aebfe6a799e6d9,389bc3fa690240e798340f5a15918d5c,438.0,9b98b8c7a33c4b65b9aebfe6a799e6d9,389bc3fa690240e798340f5a15918d5c,498.0,9b98b8c7a33c4b65b9aebfe6a799e6d9,5.0,...,402,5,7,9b98b8c7a33c4b65b9aebfe6a799e6d9,bogo,5,1,1,1,0


We explored the user demographic in each offer type.

In [39]:
offer_p_f.groupby(['usercategory','offer_type'])[['age','income','memberdays']].median()

Unnamed: 0_level_0,Unnamed: 1_level_0,age,income,memberdays
usercategory,offer_type,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
completewithoutview,bogo,57,71000.0,751
completewithoutview,discount,57,70000.0,759
noviewnocomplete,bogo,53,54000.0,518
noviewnocomplete,discount,51,53000.0,512
noviewnocomplete,informational,56,61000.0,568
viewandcomplete,bogo,57,70000.0,756
viewandcomplete,discount,56,67000.0,771
viewnotcomplete,bogo,52,55000.0,497
viewnotcomplete,discount,54,58000.0,443
viewnotcomplete,informational,55,64000.0,607


We saw users who completed bogo and discount offers had longer membership period(above 750days) than users who didnot completed offers(about 500days). Ages of users who completed bogo and discount offers(above 56) are higher than users did not(about 52). Users who completed bogo and discount offers have higher income (above 67000) than who did not(about 55000). We should notice that information offer donot have complete events, so next we will examine how information offers influence on transactions.

We examined the users who received informational offers whether influence on their purchase action.

In [40]:
#We extracted information offers event to study in offer_p_f_1 dataset
offer_p_f_1 = offer_p_f.loc[(offer_p_f.offer_type=='informational')].copy()
offer_p_f_1.shape

(14464, 26)

In [None]:
#offer_p_f.loc[offer_p_f.person_rec=='78afa995795e4d85b5d9ceeca43f5fef']

In [41]:
#Merged information event to transaction event on person receive
offer_p_f_1=offer_p_f_1.merge(transcript_trans,how='left',left_on=['person_rec'],right_on=['person_trans'])

In [None]:
#offer_p_f_1.loc[ (offer_p_f_1.time_trans.isnull()==True) | ((offer_p_f_1.time_trans>=offer_p_f_1.time_rec) & (offer_p_f_1.time_trans<=(offer_p_f_1.time_rec+offer_p_f_1.duration*24)))].shape

In [42]:
offer_p_f_1_s=offer_p_f_1.loc[(offer_p_f_1.time_trans.isnull()==True) | ((offer_p_f_1.time_trans>=offer_p_f_1.time_rec) & (offer_p_f_1.time_trans<=(offer_p_f_1.time_rec+offer_p_f_1.duration*24)))].copy()

In [43]:
offer_p_f_1_s.loc[offer_p_f_1_s.time_vie>offer_p_f_1_s.time_trans,'usercategory']='information_trans_not_view'
offer_p_f_1_s.loc[offer_p_f_1_s.time_vie<=offer_p_f_1_s.time_trans,'usercategory']='information_trans_and_view'
offer_p_f_1_s.loc[(offer_p_f_1_s.time_vie.isnull()==True) & (offer_p_f_1_s.time_trans.isnull()==False),'usercategory']='information_trans_not_view'
offer_p_f_1_s.loc[(offer_p_f_1_s.time_vie.isnull()==True) & (offer_p_f_1_s.time_trans.isnull()==True),'usercategory']='information_not_view_not_trans'
offer_p_f_1_s.loc[(offer_p_f_1_s.time_vie.isnull()==False) & (offer_p_f_1_s.time_trans.isnull()==True),'usercategory']='information_view_not_trans'

In [44]:
offer_p_f_1_s.usercategory.isnull().sum()

0

In [45]:
offer_p_f_1_s.groupby(['offer_type','usercategory'])[['age','income','memberdays']].median()

Unnamed: 0_level_0,Unnamed: 1_level_0,age,income,memberdays
offer_type,usercategory,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
informational,information_not_view_not_trans,59.0,65000.0,477.0
informational,information_trans_and_view,54.0,59000.0,813.0
informational,information_trans_not_view,54.0,58500.0,760.5
informational,information_view_not_trans,55.0,76000.0,441.0


For users who received information offers, we saw users who had transactions with or without viewed had longer membership days(above 760 days) and less income (income median 59000) than users who did not had transactions with or without viewed (membership about 450 days and income above 65000). 

### <a class="anchor" id="Classification">Part II: Classification to Predict Completion</a>

In this section we built a classifier to predict whether a user who receive the offer would view and complete it based on demographic user features and offer portfolio. For users who would complete without view are not the group we are interested in, because they are not influenced by offers and we need not to send offers to them.

In [46]:
#We found that informational offers donot have complete events, so we removed informational offer event to predict viewandcomplete
print ('Informational offers donot have completion event, Ture of False?')
print (offer_p_f.loc[offer_p_f.offer_type=='informational'].shape[0]==offer_p_f.loc[offer_p_f.offer_type=='informational'].person_com.isnull().sum())
#offer_p_f_2 contains bogo and discount offer, we built model based on offer_p_f_2
offer_p_f_2 = offer_p_f.loc[(offer_p_f.offer_type!='informational')].copy()
offer_p_f_2.shape

Informational offers donot have completion event, Ture of False?
True


(68057, 26)

In [47]:
offer_p_f_2.columns

Index(['person_rec', 'time_rec', 'offer_id_rec', 'person_vie', 'time_vie',
       'offer_id_vie', 'person_com', 'time_com', 'offer_id_com', 'reward_com',
       'usercategory', 'index', 'age', 'gender', 'id_x', 'income',
       'memberdays', 'difficulty', 'duration', 'id_y', 'offer_type', 'reward',
       'channels_web', 'channels_email', 'channels_mobile', 'channels_social'],
      dtype='object')

In [48]:
offer_p_f_2['viewandcomplete']=0
offer_p_f_2.loc[offer_p_f_2.usercategory=='viewandcomplete','viewandcomplete']=1

In [49]:
#Check for null values
X= offer_p_f_2[['age', 'gender','income','memberdays', 'difficulty', 'duration','offer_type', 'reward','channels_web', 'channels_email', 'channels_mobile', 'channels_social']].copy()
Y= offer_p_f_2[['viewandcomplete']].copy()
Y = np.array(Y)
Y = Y.reshape(len(Y))
X.isnull().sum()

age                0
gender             0
income             0
memberdays         0
difficulty         0
duration           0
offer_type         0
reward             0
channels_web       0
channels_email     0
channels_mobile    0
channels_social    0
dtype: int64

In [50]:
var_cat = ['offer_type','gender']
for c in var_cat:
    print (c)
    X = pd.concat([X.drop(c,axis=1),pd.get_dummies(X[c],prefix=c,prefix_sep='_',drop_first=True)],axis=1)
X.head()

offer_type
gender


Unnamed: 0,age,income,memberdays,difficulty,duration,reward,channels_web,channels_email,channels_mobile,channels_social,offer_type_discount,gender_M,gender_O
0,75,100000.0,678,5,7,5,1,1,1,0,0,0,0
1,68,70000.0,326,5,7,5,1,1,1,0,0,1,0
2,65,53000.0,402,5,7,5,1,1,1,0,0,1,0
3,65,53000.0,402,5,7,5,1,1,1,0,0,1,0
4,65,53000.0,402,5,7,5,1,1,1,0,0,1,0


In [51]:
#we can use accuracy to evaluate the performance because the dataset is a balanced dataset
print ('The nrows of dataset is {},the nrows of completed is {},the ratio of completed is {}'.format(X.shape[0],Y.sum(),Y.sum()/X.shape[0]))

The nrows of dataset is 68057,the nrows of completed is 32772,the ratio of completed is 0.48153753471354893


In [52]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score,f1_score,precision_score,recall_score

In [53]:
X_train,X_test,y_train,y_test = train_test_split(X,Y,test_size=0.3,random_state=24)

In [54]:
clf = DecisionTreeClassifier(random_state=24)
clf.fit(X_train,y_train)

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=24,
            splitter='best')

In [55]:
y_pred = clf.predict(X_test)
print ('the accuracy on testset is {} '.format(accuracy_score(y_test,y_pred)))
print ('the presicion score is {},the recall score is {}, the f1_score is{}'.format(precision_score(y_test,y_pred),recall_score(y_test,y_pred),f1_score(y_test,y_pred)))

the accuracy on testset is 0.6594181604466647 
the presicion score is 0.6457960644007156,the recall score is 0.6571601941747572, the f1_score is0.6514285714285714


We built the classification model that have accuracy =65.9% on the test set.The procision we predict the user who would view and complete the offer is 64.6% and the ratio of users who would view and completed could be identified by the model is 65.7%. 

In [56]:
#Tuning parameters with GridSearchCV
from sklearn.model_selection import GridSearchCV
clf_rf = RandomForestClassifier(random_state=24)
parameters = {'n_estimators':[20,50,80]}
clf_g = GridSearchCV(clf_rf,parameters,cv=5)
clf_g.fit(X_train,y_train)
print ('The Best extimator is {}'.format(clf_g.best_estimator_))
y_pred = clf_g.predict(X_test)

The Best extimator is RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=80, n_jobs=1,
            oob_score=False, random_state=24, verbose=0, warm_start=False)


In [57]:
print ('The accuracy of RandomForest clssifier on testset is {} '.format(accuracy_score(y_test,y_pred)))
print ('The presicion score is {},the recall score is {}, the f1_score is{}'.format(precision_score(y_test,y_pred),recall_score(y_test,y_pred),f1_score(y_test,y_pred)))

The accuracy of RandomForest clssifier on testset is 0.6898814771280243 
The presicion score is 0.673734610123119,the recall score is 0.6973098705501618, the f1_score is0.6853195507404831


We have accuracy of 68.99% of predict correctly on whether a user would complete the offer. For a predict user who would complete the offer, we have 67.37% confident he would truly complete the offer. And for users who would actually complete the offers, there are 69.73% of the users he would be identified by the model.

### <a class="anchor" id="Recommendation">Part III: User-User Based Collaborative Filtering Recommendation</a>

This section is to built a recommendation system with user-user based collaborative filtering algorithm.The idea is to find the most similar old users for the new user and recommend the old user's offer which is cumulated largest rewards to the new user. We removed the offers which did not influence on the users from the recommendation. The recommendation system also works for old users.

In [58]:
profile.head()

Unnamed: 0,index,age,gender,id,income,memberdays
1,1,55,F,0610b486422d4921ae7d2bf64640c50b,112000.0,611
3,3,75,F,78afa995795e4d85b5d9ceeca43f5fef,100000.0,678
5,5,68,M,e2127556f4f64592b11af22de27a7932,70000.0,326
8,8,65,M,389bc3fa690240e798340f5a15918d5c,53000.0,402
12,12,58,M,2eeac8d8feae4a8cad5a6af0499a211d,51000.0,492


In [59]:
#Transform columns in the profile to numeric, and set index to id
user_df = profile[['age','gender','id','income','memberdays']].copy()
user_df.reset_index(inplace=True)
user_df.drop('index',axis=1,inplace=True)
user_df = pd.concat([user_df.drop(['gender'],axis=1),pd.get_dummies(user_df['gender'],prefix_sep='_',prefix='gender',drop_first=False)],axis=1)
user_df = user_df.set_index(['id'])
user_df.head()

Unnamed: 0_level_0,age,income,memberdays,gender_F,gender_M,gender_O
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0610b486422d4921ae7d2bf64640c50b,55,112000.0,611,1,0,0
78afa995795e4d85b5d9ceeca43f5fef,75,100000.0,678,1,0,0
e2127556f4f64592b11af22de27a7932,68,70000.0,326,0,1,0
389bc3fa690240e798340f5a15918d5c,65,53000.0,402,0,1,0
2eeac8d8feae4a8cad5a6af0499a211d,58,51000.0,492,0,1,0


In [60]:
#Minmaxscaler transformation to the profile
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(user_df)
user_sc = scaler.transform(user_df)
user_sc

array([[0.44578313, 0.91111111, 0.20625343, 1.        , 0.        ,
        0.        ],
       [0.68674699, 0.77777778, 0.24300603, 1.        , 0.        ,
        0.        ],
       [0.60240964, 0.44444444, 0.04991772, 0.        , 1.        ,
        0.        ],
       ...,
       [0.37349398, 0.47777778, 0.29950631, 0.        , 1.        ,
        0.        ],
       [0.78313253, 0.22222222, 0.47778387, 1.        , 0.        ,
        0.        ],
       [0.53012048, 0.57777778, 0.2024136 , 1.        , 0.        ,
        0.        ]])

In [61]:
user_df_sc = pd.DataFrame(user_sc,index=user_df.index,columns=['age','income','memberdays','gender_F','gender_M','gender_O'])
user_df_sc.head()

Unnamed: 0_level_0,age,income,memberdays,gender_F,gender_M,gender_O
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0610b486422d4921ae7d2bf64640c50b,0.445783,0.911111,0.206253,1.0,0.0,0.0
78afa995795e4d85b5d9ceeca43f5fef,0.686747,0.777778,0.243006,1.0,0.0,0.0
e2127556f4f64592b11af22de27a7932,0.60241,0.444444,0.049918,0.0,1.0,0.0
389bc3fa690240e798340f5a15918d5c,0.566265,0.255556,0.091607,0.0,1.0,0.0
2eeac8d8feae4a8cad5a6af0499a211d,0.481928,0.233333,0.140976,0.0,1.0,0.0


In [62]:
def compute_euclidean_dist(user1, user2):
    '''
    INPUT
    user1 - int user_id
    user2 - int user_id
    OUTPUT
    the euclidean distance between user1 and user2
    '''
    
    dist = np.linalg.norm(user_df_sc.loc[user1] - user_df_sc.loc[user2])
    
    return dist #return the euclidean distance

In [63]:
def find_similar_users(user,n_user=5,user_df=user_df):
    '''
    INPUT
    user - for whom we find his similar users
    n_user - the number of similar users for the input user
    OUTPUT
    the most n_user similar users to the user we found for
    '''
    user_dist=[]
    topn_similar_user=[]
    user_df1=user_df.copy()
    for u in user_df.index:
        
        user_dist.append(compute_euclidean_dist(u,user))
    user_df1[user]=user_dist    
    topn_similar_user.extend(user_df1.sort_values(by=[user])[0:n_user].index)
    topn_similar_user.remove(user)
    #print (topn_similar_user)
    return topn_similar_user
    

In [64]:
#Merge offer completed dataset to portfolio
reward_df=transcript_offercom.merge(portfolio,how='inner',left_on='offer_id_com',right_on='id')
reward_df.drop(['id'],axis=1,inplace=True)

In [65]:
portfolio

Unnamed: 0,difficulty,duration,id,offer_type,reward,channels_web,channels_email,channels_mobile,channels_social
0,10,7,ae264e3637204a6fb9bb56bc8210ddfd,bogo,10,0,1,1,1
1,10,5,4d5c57ea9a6940dd891ad53e9dbe8da0,bogo,10,1,1,1,1
2,0,4,3f207df678b143eea3cee63160fa8bed,informational,0,1,1,1,0
3,5,7,9b98b8c7a33c4b65b9aebfe6a799e6d9,bogo,5,1,1,1,0
4,20,10,0b1e1539f2cc45b7b9fa7c272da2e1d7,discount,5,1,1,0,0
5,7,7,2298d6c36e964ae4a3e7e9706d1fb8c2,discount,3,1,1,1,1
6,10,10,fafdcd668e3743c1bb461111dcafc2a4,discount,2,1,1,1,1
7,0,3,5a8bc65990b245e5a138643cd4eb9837,informational,0,0,1,1,1
8,5,5,f19421c1d4aa40978ebb69ca19b0e20d,bogo,5,1,1,1,1
9,10,7,2906b810c7d4411798c6938adc9daaa5,discount,2,1,1,1,0


In [66]:
reward_df.head()
reward_df_g = reward_df.groupby(['person_com','offer_id_com']).reward_com.sum().unstack()
reward_df_g.tail()

offer_id_com,0b1e1539f2cc45b7b9fa7c272da2e1d7,2298d6c36e964ae4a3e7e9706d1fb8c2,2906b810c7d4411798c6938adc9daaa5,4d5c57ea9a6940dd891ad53e9dbe8da0,9b98b8c7a33c4b65b9aebfe6a799e6d9,ae264e3637204a6fb9bb56bc8210ddfd,f19421c1d4aa40978ebb69ca19b0e20d,fafdcd668e3743c1bb461111dcafc2a4
person_com,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
fff29fb549084123bd046dbc5ceb4faa,,,,20.0,,20.0,5.0,2.0
fff3ba4757bd42088c044ca26d73817a,,,2.0,,5.0,,,2.0
fff7576017104bcc8677a8d63322b5e1,,,,,5.0,,,4.0
fffad4f4828548d1b5583907f2e9906b,,,,,5.0,,10.0,
ffff82501cea40309d5fdd7edcca4a07,5.0,,6.0,,5.0,,,2.0


In [None]:
#reward_df_g.loc[(reward_df_g.index=='dbb778dbe0924f56a54aa714406bbcd9')]

In [None]:
#set(offer_p_f.loc[(offer_p_f.person_rec=='78afa995795e4d85b5d9ceeca43f5fef') & (offer_p_f.usercategory!='viewandcomplete')].offer_id_rec.unique())

In [67]:
#We recommend the offer with maximun reward completed by the similar users,the number of recommendation could be 1 to n
def recommendation(user,n_recom=5):
    '''
    INPUT
    user - the user we make recommendation
    n_recom - the maximun number we make recommendation for the user
    OUTPUT
    the offer recommendations for the input user
    '''
    similar_users=find_similar_users(user,n_user=5)
    
    offer_recom_set=set()
    offer_list = reward_df_g.columns
    for u in similar_users:
        
        reward_max = reward_df_g.loc[(reward_df_g.index==u)].max(axis=1).values[0]
        print('The Similar user {}, has rewardmax {}'.format(u,reward_max))
        user_offer = reward_df_g.loc[reward_df_g.index==u].copy()
        user_offer = user_offer.T
        user_offer.columns=['userid']
        print ('The offer completed with the maximum reward \n',user_offer.loc[user_offer.userid==reward_max].index.values)
        
        offer_recom_set.update(user_offer.loc[user_offer.userid==reward_max].index)
        if len(offer_recom_set)>n_recom:
            break
    #We removed the offers which did not influence on the users from the recommendation. 
    remove_recom = set(offer_p_f.loc[(offer_p_f.person_rec==user) & (offer_p_f.usercategory!='viewandcomplete')].offer_id_rec.unique())
    offer_recom_set=offer_recom_set-remove_recom
    return offer_recom_set


In [68]:
userid='78afa995795e4d85b5d9ceeca43f5fef'

In [69]:
recommendation(userid,5)

The Similar user eb5db7f1468847288b3caa2600d33eeb, has rewardmax 10.0
The offer completed with the maximum reward 
 ['9b98b8c7a33c4b65b9aebfe6a799e6d9']
The Similar user dbb778dbe0924f56a54aa714406bbcd9, has rewardmax 10.0
The offer completed with the maximum reward 
 ['ae264e3637204a6fb9bb56bc8210ddfd' 'f19421c1d4aa40978ebb69ca19b0e20d']
The Similar user d60929ada6ad44058b9be1359e7d2c25, has rewardmax 9.0
The offer completed with the maximum reward 
 ['2298d6c36e964ae4a3e7e9706d1fb8c2']
The Similar user b4e842928968481f988f869561cc0257, has rewardmax 9.0
The offer completed with the maximum reward 
 ['2298d6c36e964ae4a3e7e9706d1fb8c2']


{'2298d6c36e964ae4a3e7e9706d1fb8c2',
 '9b98b8c7a33c4b65b9aebfe6a799e6d9',
 'ae264e3637204a6fb9bb56bc8210ddfd'}

In [70]:
#print ('The offer recommendations for user  {} is \n'.format(userid))
print ('\n The offer recommendation for user {} is \n {}'.format(userid,portfolio.loc[portfolio.isin(recommendation(userid,5)).id==True]))


The Similar user eb5db7f1468847288b3caa2600d33eeb, has rewardmax 10.0
The offer completed with the maximum reward 
 ['9b98b8c7a33c4b65b9aebfe6a799e6d9']
The Similar user dbb778dbe0924f56a54aa714406bbcd9, has rewardmax 10.0
The offer completed with the maximum reward 
 ['ae264e3637204a6fb9bb56bc8210ddfd' 'f19421c1d4aa40978ebb69ca19b0e20d']
The Similar user d60929ada6ad44058b9be1359e7d2c25, has rewardmax 9.0
The offer completed with the maximum reward 
 ['2298d6c36e964ae4a3e7e9706d1fb8c2']
The Similar user b4e842928968481f988f869561cc0257, has rewardmax 9.0
The offer completed with the maximum reward 
 ['2298d6c36e964ae4a3e7e9706d1fb8c2']

 The offer recommendation for user 78afa995795e4d85b5d9ceeca43f5fef is 
    difficulty  duration                                id offer_type  reward  \
0          10         7  ae264e3637204a6fb9bb56bc8210ddfd       bogo      10   
3           5         7  9b98b8c7a33c4b65b9aebfe6a799e6d9       bogo       5   
5           7         7  2298d6c36e964ae4