In [1]:
import numpy as np
import pandas as pd
from sklearn.metrics import pairwise_distances
from sklearn.metrics import mean_squared_error
from sklearn.neighbors import NearestNeighbors
from contextlib import contextmanager
from math import sqrt

# Boceto
* ¿Preferimos user-based porque suponemos que nuestro modelo de negocios es cantidad de usuarios > cantidad de eventos?
    * Si queremos cambiar el enfoque: https://towardsdatascience.com/collaborative-filtering-based-recommendation-systems-exemplified-ecbffe1c20b1
* Lambda --> https://medium.com/@patrickmichelberger/how-to-deploy-a-serverless-machine-learning-microservice-with-aws-lambda-aws-api-gateway-and-d5b8cbead846
* Otro dataset: http://www2.informatik.uni-freiburg.de/~cziegler/BX/
* No tenemos rating sobre eventos. Nuestro modelo puede ser:
    * El valor para Usuario --> Evento deberíamos elegir entre:
        * 1 (participó/colaboró) / 0 (nada)
        * 2 (participó y colaboró) / 1 (participó/colaboró) / 0 (nada)
        * x (cada función que participa o recurso que colabora suma 1) <--> 0 (nada)

##  Collaborative Filtering based Recommendation Systems
Given a user-item ratings matrix M below, where 6 users (rows) have rated 6 items (columns).

In [2]:
#M is user-item ratings matrix where ratings are integers from 1-10
M = np.asarray([[3,7,4,9,9,7], 
                [7,0,5,3,8,8],
               [7,5,5,0,8,4],
               [5,6,8,5,9,8],
               [5,8,8,8,10,9],
               [7,7,0,4,7,8]])
M=pd.DataFrame(M)
M

Unnamed: 0,0,1,2,3,4,5
0,3,7,4,9,9,7
1,7,0,5,3,8,8
2,7,5,5,0,8,4
3,5,6,8,5,9,8
4,5,8,8,8,10,9
5,7,7,0,4,7,8


In [3]:
k=4
metric='cosine' #can be changed to 'correlation' for Pearson correlation similaries

### User-based Recommendation Systems
Firstly, we will have to predict the rating that user 3 (not index) will give to item 4. In user-based CF, we will find say k=3 users who are most similar to user 3. 

Commonly used similarity measures are cosine, Pearson, Euclidean etc. We will use **cosine similarity**:

![alt text](https://cdn-images-1.medium.com/max/1200/1*FjjcEChVVgb8fvUCVUL2mQ.png "Cosine similarity")

And **pearson correlation:**

![alt text](https://cdn-images-1.medium.com/max/1200/1*qCdw27XS0Q9shX4-0pJ96w.png "Pearson correlation")

In terms of models, we will use [NearestNeighbors](http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html#sklearn.neighbors.NearestNeighbors).

In [4]:
#get cosine similarities for ratings matrix M; pairwise_distances returns the distances between ratings and hence
#similarities are obtained by subtracting distances from 1
cosine_sim = 1-pairwise_distances(M, metric="cosine")

In [5]:
#Cosine similarity matrix
pd.DataFrame(cosine_sim)

Unnamed: 0,0,1,2,3,4,5
0,1.0,0.799268,0.779227,0.934622,0.97389,0.8846
1,0.799268,1.0,0.874744,0.90585,0.866146,0.827036
2,0.779227,0.874744,1.0,0.909513,0.865454,0.853275
3,0.934622,0.90585,0.909513,1.0,0.989344,0.865614
4,0.97389,0.866146,0.865454,0.989344,1.0,0.88164
5,0.8846,0.827036,0.853275,0.865614,0.88164,1.0


In [6]:
#This function finds k similar users given the user_id and ratings matrix M
#Note that the similarities are same as obtained via using pairwise_distances
def findksimilarusers(user_id, ratings, metric = metric, k=k):
    similarities=[]
    indices=[]
    model_knn = NearestNeighbors(metric = metric, algorithm = 'brute') 
    model_knn.fit(ratings)

    distances, indices = model_knn.kneighbors(ratings.iloc[user_id-1, :].values.reshape(1, -1), n_neighbors = k+1)
    similarities = 1-distances.flatten()
    print ('{0} most similar users for User {1}:\n'.format(k,user_id))
    for i in range(0, len(indices.flatten())):
        if indices.flatten()[i]+1 == user_id:
            continue;

        else:
            print ('{0}: User {1}, with similarity of {2}'.format(i, indices.flatten()[i]+1, similarities.flatten()[i]))
            
    return similarities,indices

In [7]:
similarities,indices = findksimilarusers(1,M, metric='cosine')

4 most similar users for User 1:

1: User 5, with similarity of 0.9738899354018394
2: User 4, with similarity of 0.934621684178377
3: User 6, with similarity of 0.8846004572297814
4: User 2, with similarity of 0.7992679780524186


In [8]:
similarities,indices = findksimilarusers(1,M, metric='correlation')

4 most similar users for User 1:

1: User 5, with similarity of 0.7619047619047619
2: User 6, with similarity of 0.2773500981126146
3: User 4, with similarity of 0.20817945092665135
4: User 2, with similarity of -0.13744632051327743


In [9]:
#This function predicts rating for specified user-item combination based on user-based approach
def predict_userbased(user_id, item_id, ratings, metric = metric, k=k):
    prediction=0
    similarities, indices=findksimilarusers(user_id, ratings,metric, k) #similar users based on cosine similarity
    mean_rating = ratings.loc[user_id-1,:].mean() #to adjust for zero based indexing
    sum_wt = np.sum(similarities)-1
    product=1
    wtd_sum = 0 
    
    for i in range(0, len(indices.flatten())):
        if indices.flatten()[i]+1 == user_id:
            continue;
        else: 
            ratings_diff = ratings.iloc[indices.flatten()[i],item_id-1]-np.mean(ratings.iloc[indices.flatten()[i],:])
            product = ratings_diff * (similarities[i])
            wtd_sum = wtd_sum + product
    
    prediction = int(round(mean_rating + (wtd_sum/sum_wt)))
    print ('\nPredicted rating for user {0} -> item {1}: {2}'.format(user_id,item_id,prediction))

    return prediction

In [10]:
predict_userbased(3,4,M)

4 most similar users for User 3:

1: User 4, with similarity of 0.9095126893401909
2: User 2, with similarity of 0.8747444148494656
3: User 5, with similarity of 0.8654538781497916
4: User 6, with similarity of 0.853274963343837

Predicted rating for user 3 -> item 4: 3


3

## Evaluating our model
There are many evaluation metrics for evaluating recommendation systems. However, the most popular and most commonly used is RMSE (root mean squared error). The function evaluateRS uses mean_squared_error function from sklearn to calculate RMSE between predicted ratings and actual ratings, and displays RMSE values for selected approach.

In [11]:
#This is a quick way to temporarily suppress stdout in particular code section
@contextmanager
def suppress_stdout():
    with open(os.devnull, "w") as devnull:
        old_stdout = sys.stdout
        sys.stdout = devnull
        try:  
            yield
        finally:
            sys.stdout = old_stdout

In [12]:
#This is final function to evaluate the performance of selected recommendation approach and the metric used here is RMSE
#suppress_stdout function is used to suppress the print outputs of all the functions inside this function. It will only print 
#RMSE values
def evaluateRS(ratings, approach):
    n_users = ratings.shape[0]
    n_items = ratings.shape[1]
    prediction = np.zeros((n_users, n_items))
    prediction= pd.DataFrame(prediction)           
    if (approach == 0): #  'User-based CF (cosine)'
        metric = 'cosine'
        for i in range(n_users):
            for j in range(n_items):
                prediction[i][j] = predict_userbased(i+1, j+1, ratings, metric)
    elif (approach == 1): # 'User-based CF (correlation)'
        metric = 'correlation'               
        for i in range(n_users):
            for j in range(n_items):
                prediction[i][j] = predict_userbased(i+1, j+1, ratings, metric)
    MSE = mean_squared_error(prediction, ratings)
    RMSE = round(sqrt(MSE),3)
    print ("RMSE using {0} approach is: {1}".format(approach,RMSE))

In [13]:
evaluateRS(M, 0)

4 most similar users for User 1:

1: User 5, with similarity of 0.9738899354018394
2: User 4, with similarity of 0.934621684178377
3: User 6, with similarity of 0.8846004572297814
4: User 2, with similarity of 0.7992679780524186

Predicted rating for user 1 -> item 1: 6
4 most similar users for User 1:

1: User 5, with similarity of 0.9738899354018394
2: User 4, with similarity of 0.934621684178377
3: User 6, with similarity of 0.8846004572297814
4: User 2, with similarity of 0.7992679780524186

Predicted rating for user 1 -> item 2: 6
4 most similar users for User 1:

1: User 5, with similarity of 0.9738899354018394
2: User 4, with similarity of 0.934621684178377
3: User 6, with similarity of 0.8846004572297814
4: User 2, with similarity of 0.7992679780524186

Predicted rating for user 1 -> item 3: 5
4 most similar users for User 1:

1: User 5, with similarity of 0.9738899354018394
2: User 4, with similarity of 0.934621684178377
3: User 6, with similarity of 0.8846004572297814
4: User

Predicted rating for user 6 -> item 4: 4
4 most similar users for User 6:

1: User 1, with similarity of 0.8846004572297814
2: User 5, with similarity of 0.8816402505314891
3: User 4, with similarity of 0.8656136210453812
4: User 3, with similarity of 0.853274963343837

Predicted rating for user 6 -> item 5: 8
4 most similar users for User 6:

1: User 1, with similarity of 0.8846004572297814
2: User 5, with similarity of 0.8816402505314891
3: User 4, with similarity of 0.8656136210453812
4: User 3, with similarity of 0.853274963343837

Predicted rating for user 6 -> item 6: 6
RMSE using 0 approach is: 2.804


In [14]:
evaluateRS(M, 1)

4 most similar users for User 1:

1: User 5, with similarity of 0.7619047619047619
2: User 6, with similarity of 0.2773500981126146
3: User 4, with similarity of 0.20817945092665135
4: User 2, with similarity of -0.13744632051327743

Predicted rating for user 1 -> item 1: 4
4 most similar users for User 1:

1: User 5, with similarity of 0.7619047619047619
2: User 6, with similarity of 0.2773500981126146
3: User 4, with similarity of 0.20817945092665135
4: User 2, with similarity of -0.13744632051327743

Predicted rating for user 1 -> item 2: 7
4 most similar users for User 1:

1: User 5, with similarity of 0.7619047619047619
2: User 6, with similarity of 0.2773500981126146
3: User 4, with similarity of 0.20817945092665135
4: User 2, with similarity of -0.13744632051327743

Predicted rating for user 1 -> item 3: 5
4 most similar users for User 1:

1: User 5, with similarity of 0.7619047619047619
2: User 6, with similarity of 0.2773500981126146
3: User 4, with similarity of 0.20817945092