# Exercice - Collaborative filtering

Implement user-based collaborative filtering to find recommendations for new user.

In [1]:
import scipy.stats
import numpy

The code below reads file "movies.csv" with ratings in the following form (userId, itemId, rating) and processes it. As a result in the variable ratings there is a matrix with users as rows and items as colunms. In cells there are ratings.

In [2]:
def readRatings(path):
    file=open(path, "r")
    lines = file.read().split("\n")
    return([[int(x) for x in line.split(",")] for line in lines if line != ""])

def processRatings(path):
    ratings = readRatings(path)
    maxUser = max([item[0] for item in ratings])
    maxItem = max([item[1] for item in ratings])
    ratMatrix = numpy.zeros((maxUser, maxItem))
    for rat in ratings:
        ratMatrix[rat[0]-1, rat[1]-1] = rat[2]
    return(ratMatrix)
ratings = processRatings("movies.csv")

**TODO** implement similarity for pair of vectors (user ratings). Use Pearson correlation (scipy.stats.pearsonr).

In [3]:
def similarity(item1, item2):
    return scipy.stats.pearsonr(item1, item2)[0]

In [4]:
similarity(ratings[0],ratings[1])

0.10632192973557714

**TODO** implement weighted average. RatingsCol parameter contains a column from ratings matrix (ratings of all users for one movie). Weights parameter is the array of similarities of users to current user (non-zero for k nearest neighbors, zeros for others).

In [5]:
def weightedMean(ratingsCol, weights):
    my_sum = 0
    for r, w in zip(ratingsCol, weights):
        my_sum += r*w
    return my_sum/sum(weights)

weightedMean([1,2,3], [5,2,4])

1.9090909090909092

**TODO** implement user-based collaborative filtering. Use the following steps:
    1. find similarities for all users fo given user (parameter userId). Remember not to take into consideration this user himself. The easiest form is to set value -1 for similarity of user to himself. 
    2. sort similarities descending
    3. find weights vector - similarity for k nearest users, 0 for others
    4. find predicted ratings for all items, which werent already rated by this user
        4.1. call weightedMean method for all columns with zeros for given user and computed in step 2 weights vector)
        4.2. sort predicted values descending
    5. return results in the form of sorted descending list of tuples (itemId, predicted rating)

In [6]:
from operator import itemgetter
k=10 #number of closest users used for recommendation

def findRecommendationsUserBased(userId, ratingsMatrix):
    
    # array of similarity [1]
    
    sim_array = [None] * len(ratingsMatrix)
    
    for other in range(len(ratingsMatrix)):
        if userId == other:
            sim_array[other] = -1
        else:
            sim_array[other] = similarity(ratingsMatrix[userId], ratingsMatrix[other])
    
    # sort sim_array, receive idexes [2]
    
    sort_index = numpy.argsort(sim_array) # this array will be in revered order
    
    # find weights vector (numbers only for k-nearest) [3]
    
    weight_vect = [0] * len(ratingsMatrix)
    
    num = sort_index[-k:]
    
    for i in range(len(num)):
        main_index = num[i]
        weight_vect[main_index] = sim_array[main_index]
    
    # find predicted ratings [4]
    
    arr_pred_ratings = []
    
    # for each item
    for i in range(len(ratingsMatrix[userId])):
        if ratingsMatrix[userId][i] == 0.0:
            arr_pred_ratings.append((i, weightedMean(ratingsMatrix[:, i], weight_vect)))
    
    arr_pred_ratings = sorted(arr_pred_ratings, key=lambda tup: tup[1], reverse=True)
    
    
    return arr_pred_ratings

#findRecommendationsUserBased(1, ratings)

The following code fragment prints 10 recommended movies for 10 first users. Notice that the user and movie IDs corespond the ones from input file, not the matrix indices. The matrix row/column index = user/movie ID - 1

In [7]:
usersCount = ratings.shape[0]
for user in range(5):
    recommendations = findRecommendationsUserBased(user, ratings)
    for i in range(10):
        print("User: " + str(user + 1) + ", Item: " + str(recommendations[i][0] + 1) + ", predicted rating: " + str(round(recommendations[i][1], 2)))
    print("")

User: 1, Item: 474, predicted rating: 4.2
User: 1, Item: 655, predicted rating: 3.8
User: 1, Item: 423, predicted rating: 3.7
User: 1, Item: 732, predicted rating: 3.69
User: 1, Item: 318, predicted rating: 3.52
User: 1, Item: 433, predicted rating: 3.38
User: 1, Item: 568, predicted rating: 3.31
User: 1, Item: 403, predicted rating: 3.08
User: 1, Item: 385, predicted rating: 2.99
User: 1, Item: 651, predicted rating: 2.99

User: 2, Item: 9, predicted rating: 2.88
User: 2, Item: 181, predicted rating: 2.69
User: 2, Item: 124, predicted rating: 2.68
User: 2, Item: 744, predicted rating: 2.55
User: 2, Item: 515, predicted rating: 2.38
User: 2, Item: 333, predicted rating: 2.23
User: 2, Item: 750, predicted rating: 2.13
User: 2, Item: 690, predicted rating: 2.05
User: 2, Item: 245, predicted rating: 1.97
User: 2, Item: 628, predicted rating: 1.79

User: 3, Item: 313, predicted rating: 4.41
User: 3, Item: 269, predicted rating: 3.1
User: 3, Item: 315, predicted rating: 2.93
User: 3, Item: 