## E-COMMERCE RECOMMENDER SYSTEM

## Part 4: Hybrid Model using LightFM

The purpose of this notebook is to solve the 'cold start' issue that our collaborative filtering and content based models have only partialy addressed. 
We will attempt to build a hybrid recommender system using both collaborative and content based approaches. We will leverage the python library LightFM as well as the resource linked below.
https://towardsdatascience.com/how-i-would-explain-building-lightfm-hybrid-recommenders-to-a-5-year-old-b6ee18571309

### Imports

In [14]:
import pandas as pd
import numpy as np
import scipy.sparse as sparse
from lightfm.data import Dataset
from sklearn.preprocessing import LabelEncoder
from lightfm.data import Dataset

In [33]:
from lightfm import LightFM
from lightfm.evaluation import precision_at_k, recall_at_k, auc_score
from lightfm.cross_validation import random_train_test_split

### Loading data

In [3]:
ratings = pd.read_csv('/Users/judith/Data_science_projects/Springboard_AssignmentsJY/capstone_three/data/processed/ratings.csv')

In [4]:
features = pd.read_csv('/Users/judith/Data_science_projects/Springboard_AssignmentsJY/capstone_three/data/processed/features.csv')

In [5]:
ratings.head()

Unnamed: 0,item_id,user_id,rating
0,7443,Alex,4
1,7443,carolyn.agan,3
2,7443,Robyn,4
3,7443,De,4
4,7443,tasha,4


In [6]:
# To ease computation, we will encode user_id from string to integer using LabelEncoder
user_encoder= LabelEncoder()
ratings['user_id'] = user_encoder.fit_transform(ratings['user_id'])

In [7]:
features.head()

Unnamed: 0,size,fit,user_attr,model_attr,category,year,split
0,2.0,Just right,Small,Small,Dresses,2012,0
1,2.0,Just right,Small,Small,Dresses,2012,0
2,2.0,Just right,Small,Small,Dresses,2012,0
3,2.0,Just right,Small,Small,Dresses,2012,0
4,2.0,Just right,Small,Small,Dresses,2012,0


In [8]:
# We are simply adding others information in other to progress in the rest of the notebook
# with a single dataframe instead of 2
features['item_id'] = ratings['item_id']
features['user_id'] = ratings['user_id']
features['rating'] = ratings['rating']

In [9]:
# Checking the final look of the dataframe
features.head()

Unnamed: 0,size,fit,user_attr,model_attr,category,year,split,item_id,user_id,rating
0,2.0,Just right,Small,Small,Dresses,2012,0,7443,309,4
1,2.0,Just right,Small,Small,Dresses,2012,0,7443,13009,3
2,2.0,Just right,Small,Small,Dresses,2012,0,7443,5534,4
3,2.0,Just right,Small,Small,Dresses,2012,0,7443,1716,4
4,2.0,Just right,Small,Small,Dresses,2012,0,7443,42071,4


### Pre-processing

#### Creating the interactions matrix

In [10]:
# Code below create a list containing user, item interactions
records = features[['user_id', 'item_id', 'rating']] .to_records(index = False)
interactions = list(records)

In [11]:
# checking the shape of a sample interaction value
print(interactions[2])

(5534, 7443, 4)


In [12]:
# Creating the unique list of users
users = list(features.user_id.unique())

In [13]:
# Creating the unique list of items
items = list(features.item_id.unique())

#### Creating items features

In [15]:
# Creating features labels
uf = []
col = ['fit']*len(features.fit.unique()) + ['model_attr']*len(features.model_attr.unique()) +['category']*len(features.category.unique())

In [16]:
# creating features list
unique_f1 = list(features.fit.unique()) + list(features.model_attr.unique()) + list(features.category.unique())

In [17]:
# putting all together
for x,y in zip(col, unique_f1):
    res= str(x) + ':' +str(y)
    uf.append(res)

In [18]:
uf

['fit:Just right',
 'fit:Slightly small',
 'fit:Very small',
 'fit:Slightly large',
 'fit:Very large',
 'model_attr:Small',
 'model_attr:Small&Large',
 'category:Dresses',
 'category:Outerwear',
 'category:Bottoms',
 'category:Tops']

In [19]:
# Creating a dataset
dataset = Dataset()
dataset.fit(
    users=users,
    items=items,
    item_features=uf)

In [20]:
# Creating interactions and weights from our interactions table
(interactions, weights) = dataset.build_interactions(interactions)

We will be using only 3 of the features available for the next steps: 
Fit, model_attr and category
These are the features that seems the most meaningful for our model

In [23]:
# function that takes the user features and converts them into 'features:value' format

def feature_colon_value(my_list):
    result = []
    ll =  ['fit:', 'model_attr:', 'category:']
    aa = my_list
    for x, y in zip(ll, aa):
        res = str(x) + '' + str(y)
        result.append(res)
    return result

In [24]:
# applying the feature-colon_value function to our features list
ad_subset = features[['fit', 'model_attr', 'category']]
ad_list = [list(x) for x in ad_subset.values]
feature_list = []
for item in ad_list:
    feature_list.append(feature_colon_value(item))

In [25]:
# creating tuples of item and features
item_tuple = list(zip(features.item_id.unique(), feature_list))

In [26]:
# Extracting item_features
item_features = dataset.build_item_features(item_tuple, normalize = False)

In [30]:
# Splitting interactions into train and test data for evaluation
train, test = random_train_test_split(interactions, test_percentage = 0.2, random_state = None)

### Modelling

In [32]:
model = LightFM(loss = 'warp')
model.fit(interactions = train,
          item_features = item_features,
         epochs = 10)

<lightfm.lightfm.LightFM at 0x1332572d0>

### Evaluation

In [35]:
train_auc = auc_score(model, train, item_features = item_features).mean()
test_auc = auc_score(model, test, item_features = item_features).mean()
train_auc, test_auc

(0.9794243, 0.8554151)

In [54]:
recall_10 = recall_at_k(model, test, train, k= 100, item_features = item_features).mean()

In [55]:
precision_10 = precision_at_k(model, test, train, k=100, item_features = item_features).mean()

In [56]:
recall_10, precision_10

(0.6129159726205952, 0.00910091)

### Conclusions

We were finally able to implement a model combining benefits of both collaborative filtering and content base
AUC score looks great, however, precision and recall are not as performing as our suprise model
Our final recommendation would be to implement both the surprize SVD model along with the lightFM model targeting different used depending if they are new or returning