# Welcome! Lets build a recommendation system together [WIP]👩‍🍳

![Food](https://images.unsplash.com/photo-1505714197102-6ae95091ed70?ixlib=rb-0.3.5&ixid=eyJhcHBfaWQiOjEyMDd9&s=e60943a98136dce6e62c7256bfeca5f8&auto=format&fit=crop&w=1350&q=80)

The goal of this notebook is to apply collaborative filtering on a [restaurant dataset](https://www.kaggle.com/uciml/restaurant-data-with-consumer-ratings) with customer ratings. Collaborative filtering allows us to create recommendation systems based on what activity a user has taken. Recommendation systems are around us on our faviourite services, like Netfix, Amazon, etc. 

In this case we will be using the Fast.ai library which will implement probalisitic matrix factorization. We will just use two factorized matrices as embedding matrices that can be modeled by addding an embedding layer in the neural net. 

*[This is still a work in progress]*




# Table of Contents

* Import dependancies and datasets
* EDA 
* Cleaning
* Create Model
* Analysis of results
* Create a filtering module from scratch

## Import dependancies and datasets

In [None]:

%reload_ext autoreload
%autoreload 2
%matplotlib inline

from fastai.learner import *
from fastai.column_data import *
from sklearn.decomposition import PCA
from plotnine import *
import seaborn as sns
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)

import os
print(os.listdir("../input"))



Lets make our paths:

In [None]:
path='../input/'
tmp_path='/kaggle/working/tmp/'
models_path='/kaggle/working/models/'

Lets load our datasets:

In [None]:
ratings = pd.read_csv(path+'rating_final.csv')
ratings

In [None]:
ratings.info()

In [None]:
places = pd.read_csv(path+'geoplaces2.csv')
places

In [None]:
len(ratings['rating'].isnull())

## EDA 


In [None]:
df = ratings['rating']
sns.countplot(df)
plt.title('Count of ratings given')

Ratings seem to go as high as 2, which seems odd.

In [None]:
import plotly.graph_objs as go
df = places['country'].value_counts()

iplot([go.Choropleth(
locationmode='country names',
locations=df.index.values,
text=df.index,
z=df.values
)])

From this plot it seems 100% of the resturants in the dataset come from Mexico.

In [None]:
sns.countplot(places['country'])
plt.title('Count of countries')

In the `places` dataframe we will need to do some cleaning, as there is "mexico country" and "?", listed as countries.

In [None]:
sns.set()
columns = ['rating', 'food_rating','service_rating']
sns.pairplot(ratings[columns],height=5,kind='scatter')
plt.show()

From my observations the three variables `rating`, `food_rating` and `service_rating` have a relationship that is weird. If the value of `rating` is 0 the user gave  `service_rating` that was 0 too. the relationship is the same with all the other variables too.

In [None]:
fig = (
   ratings.loc[:,['rating', 'food_rating','service_rating']]
).corr()

sns.heatmap(fig, annot=True)

However it appears that all the variables in the `ratings` dataframe have a strong positive relantionship with each other.

In [None]:
len(ratings['placeID'].unique())

We have about 130 different restaurants in the dataset.

In [None]:
len(ratings['userID'].unique())

While there appears to 138 different users who gave reviews

In [None]:


ratings['userID'].value_counts().head(10).plot.bar( title='Users with the most reviews ')


In [None]:
ratings['placeID'].value_counts().head(10).plot.bar(title='Places with most reviews')


The Top Restaurant has about 35 reviews

In [None]:
mean = ratings['placeID'].value_counts().mean()
mean

Each restaurant has an average of about 8 ratings.

In [None]:

sns.boxplot(
   x='placeID',
    y='rating',
    data=ratings.head(5)
    
    
 )

## Cleaning


In [None]:
ratings.isnull().any()

In [None]:
places.isnull().any()

In [None]:
places['country'] = places.country.apply(lambda x: x.replace('?','Mexico'))

In [None]:
places['country'] = places.country.apply(lambda x: x.replace('mexico country','Mexico'))

In [None]:
places['country'] = places.country.apply(lambda x: x.replace('mexico','Mexico'))

## Create Model

In [None]:
val_idxs = get_cv_idxs(len(ratings))
wd=2e-4
n_factors=50

In [None]:
cf = CollabFilterDataset.from_csv(path, 'rating_final.csv','userID','placeID','rating')
learn = cf.get_learner(n_factors, val_idxs, 64, opt_fn=optim.Adam,
                       tmp_name=tmp_path,models_name=models_path)


In [None]:
learn.fit(1e-2,2,wds=wd, cycle_len=1,cycle_mult=2)

In [None]:
math.sqrt(0.536)

In [None]:
learn.fit(1e-2,5,wds=wd, cycle_len=1,cycle_mult=2)

In [None]:
math.sqrt(0.516)

In an attempt to improve the model by training it more, I lead it to overfit. 

## Analysis of results

In [None]:
restaurant_names = places.set_index('placeID')['name'].to_dict()
g=ratings.groupby('placeID')['rating'].count()
topRestaurants = g.sort_values(ascending=False).index.values[:3000]
topRestIdx = np.array([cf.item2idx[o] for o in topRestaurants])

In [None]:
m=learn.model; m.cuda()

Lets take a look at the bias for the restaurants:

In [None]:
restaurant_bias = to_np(m.ib(V(topRestIdx)))

In [None]:
restaurant_bias

In [None]:
restaurant_ratings = [(b[0], restaurant_names[i] ) for i,b in zip(topRestaurants,restaurant_bias)]

Worst rated restaurants:

In [None]:
sorted(restaurant_ratings, key=lambda o: o[0])[:15]

Top rated resturants

In [None]:
sorted(restaurant_ratings, key=lambda o: o[0], reverse=True)[:15]

### Principla Component Analysis & Embeddings 

In [None]:
rest_emb = to_np(m.i(V(topRestIdx)))
rest_emb.shape

In [None]:
pca = PCA(n_components=3)
rest_pca = pca.fit(rest_emb.T).components_

In [None]:
rest_pca.shape

In [None]:
fac0 = rest_pca[0]
rest_comp = [(f,restaurant_names[i]) for f,i in zip(fac0, topRestaurants)]

first component:

In [None]:
sorted(rest_comp, key=itemgetter(0), reverse=True)[:10]

In [None]:
sorted(rest_comp, key=itemgetter(0))[:10]

In [None]:
fac1 = rest_pca[1]
rest_comp= [(f,restaurant_names[i]) for f,i in zip(fac1, topRestaurants)]

Second component:

In [None]:
sorted(rest_comp, key=itemgetter(0), reverse=True)[:10]

In [None]:
sorted(rest_comp, key=itemgetter(0))[:10]

Lets map out our components:

In [None]:
idxs = np.random.choice(len(topRestaurants), 50, replace=False)
X = fac0[idxs]
Y = fac1[idxs]
plt.figure(figsize=(15,15))
plt.scatter(X,Y)
for i, x, y in zip(topRestaurants[idxs], X, Y):
    plt.text(x,y,restaurant_names[i],color=np.random.rand(3)*0.7, fontsize=11)
plt.show()

# Create a filtering module from scratch

**Dot product example**

In [None]:
#Declear ttwo tensors
a = T([[1.,2],
      [3,4]])
b = T([[2.,2],
      [10,10]])

In [None]:
a,b

Lets perform element-wise multiplication:

In [None]:
a*b

In [None]:
#this will allow it to run on the GPU
a*b.cuda()

We need the tensor dot product, which is achived by elementwise multiplication and summed across the columns.


In [None]:
(a*b).sum(1)

Here we are going to build our own NN to process inputs and compute activations. The PyTorch module is derived from nn.Module which will contain a function called forward to compute the forward pass.

In [None]:
class DotProduct(nn.Module):
    def forward(self, u, m): return (u*m).sum(1)

In [None]:
model=DotProduct()

In [None]:
model(a,b)

We need to fix some of the data to make it sequential and contiguous IDs. 

In [None]:
unique_users = ratings.userID.unique()
user_to_idx = {o:i for i,o in enumerate(unique_users)}
ratings.userID = ratings.userID.apply(lambda x:user_to_idx[x])

In [None]:
unique_places = ratings.placeID.unique()
place_to_idx = {o:i for i,o in enumerate(unique_places)}
ratings.placeID = ratings.placeID.apply(lambda x:place_to_idx[x])

In [None]:
n_users=int(ratings.userID.nunique())
n_places=int(ratings.placeID.nunique())

# Creating the module

We will create a module that looks up the factors for the users and places from the embedding matrix and then take the dot product.

in EmbeddingDot we create embedding matrices for users and restuarants, then they are initialized. With the forward pass we take categorical and contiuous variables.

In [None]:
class EmbeddingDot(nn.Module):
    def __init__(self, n_users, n_places):
        super().__init__()
        self.u = nn.Embedding(n_users, n_factors)
        self.m = nn.Embedding(n_places, n_factors)
        self.u.weight.data.uniform_(0,0.05)
        self.m.weight.data.uniform_(0,0.05)
        
    def forward(self, cats, const):
        users,places = cats[:,0],cats[:,1]
        u,m = self.u(users),self.m(places)
        return (u*m).sum(1).view(-1,1)

We set up our crosstab where x is everything besides the rating , while y is the rating.



In [None]:
x = ratings.drop(['rating'],axis=1)
y = ratings['rating'].astype(np.float32)

In [None]:
ratings['rating'] = ratings['rating'].astype(float)


We can start settung up our model:

In [None]:
ratings['userID'] = ratings.userID.apply(lambda x: x.replace('U',''))

In [None]:
data = ColumnarModelData.from_data_frame(path,val_idxs, x, y, ['userID','placeID'], 64)

In [None]:
#initialize optimization function
wd=1e-5
model = EmbeddingDot(n_users, n_places).cuda()
opt = optim.SGD(model.parameters(), 1e-1,weight_decay=wd,momentum=0.9)

In [None]:
fit(model, data, 3, opt, F.mse_loss)

We will do learning rate annealing to try and reduce the loss.

In [None]:
set_lrs(opt, 0.01)

In [None]:
fit(model, data, 3, opt, F.mse_loss)

In [None]:
set_lrs(opt, 0.0001)

In [None]:
fit(model, data, 5, opt, F.mse_loss)

## Bias
We need bias for cases where a user gives low scores to restaurants. We will need to create a new model that takes the bias into account, however, it will differ in that that it uses a convience method to make embeddings and normalizes scores returns from the forward pass.

In [None]:
min_rating, max_rating =ratings.rating.min(), ratings.rating.max()
min_rating, max_rating

What is going on here?

1. We are getting the number of rows and factors from the rows and columns in the embedding matrix
2. The embedding matrices and bias vectors are initialized.
3. We apply a dot product, add our bias vectors and normilize the results

In [None]:
#1
def get_emb(ni,nf):
    e = nn.Embedding(ni,nf)
    e.weight.data.uniform_(-0.01,0.01)
    return e

class EmbeddingDotBias(nn.Module):
    def __init__(self,n_users, n_places):
        super().__init__()
        #2
        (self.u, self.m, self.ub, self.mb) = [get_emb(*o) for o in [
            (n_users, n_factors),(n_places, n_factors), (n_users,1),(n_places,1)
        ]]
        
    #3
    def forward(self, cats, conts):
        users,places = cats[:,0],cats[:,1]
        um = (self.u(users)*self.m(places)).sum(1)
        res = um + self.ub(users).squeeze() + self.mb(places).squeeze()
        res= torch.sigmoid(res) * (max_rating-min_rating) + min_rating
        return res.view(-1,1)

In [None]:
wd=2e-4
model = EmbeddingDotBias(cf.n_users, cf.n_items).cuda()
opt = optim.SGD(model.parameters(), 1e-1,weight_decay=wd,momentum=0.9)


In [None]:
fit(model, data, 3, opt, F.mse_loss)


Our validation loss is dramatically better!

### Mini Neural Net
We are going to feed the embedding values into a linear layer.

In [None]:
class EmbeddingNet(nn.Module):
    def __init__(self, n_users, n_places, nh=10, p1=0.05,p2=0.5):
        super().__init__()
        (self.u, self.m) = [get_emb(*o) for o in [
            (n_users,n_factors), (n_places,n_factors)
        ]]
        self.lin1 = nn.Linear(n_factors*2, nh)
        self.lin2 = nn.Linear(nh,1)
        self.drop1 = nn.Dropout(p1)
        self.drop2 = nn.Dropout(p2)
        
    def forward(self, cats, conts):
        users,places = cats[:,0],cats[:,1]
        x = self.drop1(torch.cat([self.u(users), self.m(places)],dim=1))
        x = self.drop2(F.relu(self.lin1(x)))
        return F.sigmoid(self.lin2(x)) * (max_rating-min_rating+1) + min_rating-0.5
    

In [None]:
wd=1e-5
model=EmbeddingNet(n_users,n_places).cuda()
opt=optim.Adam(model.parameters(), 1e-3,weight_decay=wd)

In [None]:
fit(model, data,3, opt, F.mse_loss)

#### Moving forward

* How do I serve such a model for inference? 
