# Exercice - Collaborative filtering

Implement user-based collaborative filtering to find recommendations for new user.

In [7]:
import scipy.stats
import numpy
import math

The code below reads file "movies.csv" with ratings in the following form (userId, itemId, rating) and processes it. As a result in the variable ratings there is a matrix with users as rows and items as colunms. In cells there are ratings.

In [2]:
def readRatings(path):
    file=open(path, "r")
    lines = file.read().split("\n")
    return([[int(x) for x in line.split(",")] for line in lines if line != ""])

def processRatings(path):
    ratings = readRatings(path)
    maxUser = max([item[0] for item in ratings])
    maxItem = max([item[1] for item in ratings])
    ratMatrix = numpy.zeros((maxUser, maxItem))
    for rat in ratings:
        ratMatrix[rat[0]-1, rat[1]-1] = rat[2]
    return(ratMatrix)
ratings = processRatings("movies.csv")

**TODO** implement similarity for pair of vectors (user ratings). Use Pearson correlation (scipy.stats.pearsonr).

In [8]:
def similarity(item1, item2):
    lPearson = scipy.stats.pearsonr(item1,item2)
    if(math.isnan(lPearson[0])):
        return 0
    return lPearson[0]

In [9]:
similarity(ratings[0],ratings[1])

0.10632192973557746

**TODO** implement weighted average. RatingsCol parameter contains a column from ratings matrix (ratings of all users for one movie). Weights parameter is the array of similarities of users to current user (non-zero for k nearest neighbors, zeros for others).

In [10]:
def weightedMean(ratingsCol, weights):
    lWeightedMean = 0
    lSumWeights = sum(weights)
    if(lSumWeights != 0):
        lWeightedMean = sum(r * w for r, w in zip(ratingsCol, weights)) / lSumWeights
        
    return lWeightedMean

weightedMean([1,2,3], [5,2,4])

1.9090909090909092

**TODO** implement user-based collaborative filtering. Use the following steps:
    1. find similarities for all users fo given user (parameter userId). Remember not to take into consideration this user himself. The easiest form is to set value -1 for similarity of user to himself. 
    2. sort similarities descending
    3. find weights vector - similarity for k nearest users, 0 for others
    4. find predicted ratings for all items, which werent already rated by this user
        4.1. call weightedMean method for all columns with zeros for given user and computed in step 2 weights vector)
        4.2. sort predicted values descending
    5. return results in the form of sorted descending list of tuples (itemId, predicted rating)

In [11]:
k=10 #number of closest users used for recommendation
def findRecommendationsUserBased(userId, raitingsMatrix):
    lSimilarity=[]
        
    # find similarities - setting -1 should make current user last - ignoring for k < allUsers    
    safeK = min(k, raitingsMatrix.shape[0] - 1)
    for i in range(raitingsMatrix.shape[0]):
        if i != userId:
            lSimilarity.append(similarity(raitingsMatrix[userId], raitingsMatrix[i]))
        else:
            lSimilarity.append( -1)
            
    # sort similarities descending
    lSortedIndexes = sorted(range(len(lSimilarity)), key=lambda k: lSimilarity[k], reverse=True)
    lSortedIndexes[safeK:] = [0] * (len(lSortedIndexes) - safeK)    
    
    lWeightsVector = [0] * (len(lSortedIndexes))
    for lIndex in lSortedIndexes:
        if lIndex > 0:
            lWeightsVector[lIndex] = lSimilarity[lIndex]
        else:
            break;
                             
    lPredictedRatings = []
    for i in range(raitingsMatrix.shape[1]):        
        lPredictedRatings.append((i, weightedMean([row[i] for row in raitingsMatrix], lWeightsVector)))
    lPredictedRatings.sort(key=lambda tup: tup[1], reverse=True)   
    
    return lPredictedRatings

The following code fragment prints 10 recommended movies for 10 first users. Notice that the user and movie IDs corespond the ones from input file, not the matrix indices. The matrix row/column index = user/movie ID - 1

In [12]:
usersCount = ratings.shape[0]
for user in range(5):
    recommendations = findRecommendationsUserBased(user, ratings)
    for i in range(10):
        print("User: " + str(user + 1) + ", Item: " + str(recommendations[i][0] + 1) + ", predicted rating: " + str(round(recommendations[i][1], 2)))
    print("")

User: 1, Item: 50, predicted rating: 4.9
User: 1, Item: 174, predicted rating: 4.9
User: 1, Item: 176, predicted rating: 4.6
User: 1, Item: 172, predicted rating: 4.51
User: 1, Item: 173, predicted rating: 4.5
User: 1, Item: 96, predicted rating: 4.49
User: 1, Item: 98, predicted rating: 4.31
User: 1, Item: 195, predicted rating: 4.29
User: 1, Item: 474, predicted rating: 4.2
User: 1, Item: 7, predicted rating: 4.2

User: 2, Item: 100, predicted rating: 3.86
User: 2, Item: 269, predicted rating: 3.79
User: 2, Item: 50, predicted rating: 3.67
User: 2, Item: 127, predicted rating: 3.63
User: 2, Item: 286, predicted rating: 3.56
User: 2, Item: 275, predicted rating: 3.54
User: 2, Item: 313, predicted rating: 3.15
User: 2, Item: 237, predicted rating: 3.1
User: 2, Item: 300, predicted rating: 3.07
User: 2, Item: 9, predicted rating: 2.88

User: 3, Item: 313, predicted rating: 4.41
User: 3, Item: 300, predicted rating: 4.11
User: 3, Item: 272, predicted rating: 3.81
User: 3, Item: 331, pred