## Collaborative filtering exercise

In this activity, we use the cosine similarity in order to find similar users that we can recommend products.

Note about software quality: in this notebook we will use a global variable "perons_ratings" which will hold all the data and allow us to communicate between parts of the program.
Normally it is a bad idea to have a 'global' variable lika that, because it makes things un-testable, creates leaks/interferances and basically bad/unsustainable software. But we will do it here anyway, because it dramatically simplifies the example :) There, you were warned and you will have to forgive me (and yourself).

In [5]:
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd
import numpy as np
from scipy.spatial.distance import cosine

Finish the function to find similar users. Note that the function for cosine similarity is given above.

In [13]:
def findSimilarUsers(person_number):
    # list for similar users
    similar_users = []
    
    # for all other users
    for other_person in range(0,len(perons_ratings)-1):
        if person_number!= other_person:
            # calculate similarity
            cosine_sim = cosine(perons_ratings[person_number],perons_ratings[other_person])

            # retain other user if similarity threshold is met
            if cosine_sim>minCos:
                similar_users.append(perons_ratings[other_person])
    print("#similar users: "+str(len(similar_users)))
    return similar_users

Finish the function to find new products.

In [14]:
def findNewProducts(similar_users,person_number):
    if len(similar_users)>0:
        # celli stands for the column number of the perons_ratings matrix, i.e., a movie
        for movie_number in range(len(perons_ratings[person_number])-1):
            # if there is no rating for our current user, calculate new score
            if perons_ratings[person_number,movie_number]==0:
                other_scores = 0
                
                # add scores of similar users
                for other in similar_users:
                    other_scores += other[movie_number]
                    
                                    
                # store average score 
                average_score = other_scores/len(similar_users)
#                 print(f"average_score: {average_score} \t {other_scores} / {len(similar_users)} ")

                
                # if the score is greater than a threshold, e.g. 1.3 (on scale from 0 to 5)
                # (it's so low, because most people did not rate most movies)
                if average_score>1.3:
                    print(f'Recommendation for user {person_number} is a movie {movie_number}, score {average_score}')
#     

First, we load the data.

In [15]:
# load data
ratings = pd.read_csv('ratings.csv')

# sample dataset
# be careful, large dataset!
ratings = ratings[:10000]

print(ratings.head())

# print some information
noMovies = len(ratings['movieId'].unique())
noUsers = len(ratings['userId'].unique())
print(str(noMovies)+" from "+str(noUsers)+' users')

   userId  movieId  rating   timestamp
0       1        2     3.5  1112486027
1       1       29     3.5  1112484676
2       1       32     3.5  1112484819
3       1       47     3.5  1112484727
4       1       50     3.5  1112484580
2889 from 91 users


Create an empty perons_ratings matrix

In [16]:
perons_ratings = np.zeros(shape=(noUsers,noMovies))
perons_ratings

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

Store movieIds as indices to use in perons_ratings matrix as the current indices don't match the sequential indices that a matrix uses.

In [17]:
movieIds = {}
midi = 0
for value in ratings['movieId'].unique():
    movieIds[value]=midi
    midi = midi + 1

Populate the perons_ratings matrix by looping all the rows in the ratings dataframe

In [18]:
for index, line in ratings.iterrows():
    uid = int(line['userId'])-1
    mid = movieIds[line['movieId']]
    rating = line['rating']
    # store the rating in the perons_ratings matrix at row user id - uid and column movie - mid
    perons_ratings[uid,mid]=rating
    
perons_ratings

array([[3.5, 3.5, 3.5, ..., 0. , 0. , 0. ],
       [0. , 0. , 0. , ..., 0. , 0. , 0. ],
       [0. , 0. , 4. , ..., 0. , 0. , 0. ],
       ...,
       [0. , 0. , 0. , ..., 0. , 0. , 0. ],
       [0. , 0. , 0. , ..., 0. , 0. , 0. ],
       [3.5, 0. , 4. , ..., 4. , 4. , 3. ]])

In [19]:
# minimum cosine similarity
minCos = 0.8

for row_number in range(0,len(perons_ratings)-1):
    print("\nFinding recommendations for user "+str(row_number))
    simmilarUsers = findSimilarUsers(row_number)
    findNewProducts(simmilarUsers,row_number)


Finding recommendations for user 0
#similar users: 79
Recommendation for user 0 is a movie 178, score 1.6708860759493671
Recommendation for user 0 is a movie 182, score 1.740506329113924
Recommendation for user 0 is a movie 369, score 1.9177215189873418
Recommendation for user 0 is a movie 408, score 1.3417721518987342
Recommendation for user 0 is a movie 652, score 1.3987341772151898

Finding recommendations for user 1
#similar users: 88
Recommendation for user 1 is a movie 4, score 1.3636363636363635
Recommendation for user 1 is a movie 11, score 2.085227272727273
Recommendation for user 1 is a movie 12, score 2.1079545454545454
Recommendation for user 1 is a movie 17, score 1.9147727272727273
Recommendation for user 1 is a movie 261, score 1.3238636363636365
Recommendation for user 1 is a movie 369, score 2.028409090909091
Recommendation for user 1 is a movie 406, score 1.3977272727272727
Recommendation for user 1 is a movie 408, score 1.5340909090909092
Recommendation for user 1 i


Finding recommendations for user 13
#similar users: 87
Recommendation for user 13 is a movie 4, score 1.3793103448275863
Recommendation for user 13 is a movie 11, score 2.0689655172413794
Recommendation for user 13 is a movie 12, score 2.028735632183908
Recommendation for user 13 is a movie 16, score 1.4252873563218391
Recommendation for user 13 is a movie 17, score 1.8620689655172413
Recommendation for user 13 is a movie 178, score 1.660919540229885
Recommendation for user 13 is a movie 205, score 1.4655172413793103
Recommendation for user 13 is a movie 406, score 1.3045977011494252
Recommendation for user 13 is a movie 408, score 1.4770114942528736
Recommendation for user 13 is a movie 652, score 1.4827586206896552

Finding recommendations for user 14
#similar users: 66
Recommendation for user 14 is a movie 9, score 1.6590909090909092
Recommendation for user 14 is a movie 30, score 1.5151515151515151
Recommendation for user 14 is a movie 31, score 1.3484848484848484
Recommendation f


Finding recommendations for user 29
#similar users: 89
Recommendation for user 29 is a movie 4, score 1.348314606741573
Recommendation for user 29 is a movie 9, score 1.5730337078651686
Recommendation for user 29 is a movie 11, score 2.061797752808989
Recommendation for user 29 is a movie 12, score 2.0842696629213484
Recommendation for user 29 is a movie 17, score 1.8932584269662922
Recommendation for user 29 is a movie 30, score 1.3651685393258426
Recommendation for user 29 is a movie 178, score 1.702247191011236
Recommendation for user 29 is a movie 182, score 1.8258426966292134
Recommendation for user 29 is a movie 205, score 1.4887640449438202
Recommendation for user 29 is a movie 261, score 1.3089887640449438
Recommendation for user 29 is a movie 369, score 2.00561797752809
Recommendation for user 29 is a movie 406, score 1.3820224719101124
Recommendation for user 29 is a movie 408, score 1.5168539325842696
Recommendation for user 29 is a movie 652, score 1.449438202247191

Findi


Finding recommendations for user 41
#similar users: 85
Recommendation for user 41 is a movie 12, score 2.070588235294118
Recommendation for user 41 is a movie 16, score 1.4941176470588236
Recommendation for user 41 is a movie 178, score 1.723529411764706
Recommendation for user 41 is a movie 182, score 1.8647058823529412
Recommendation for user 41 is a movie 186, score 1.311764705882353
Recommendation for user 41 is a movie 369, score 2.0470588235294116
Recommendation for user 41 is a movie 406, score 1.3941176470588235
Recommendation for user 41 is a movie 408, score 1.4764705882352942
Recommendation for user 41 is a movie 652, score 1.3647058823529412

Finding recommendations for user 42
#similar users: 84
Recommendation for user 42 is a movie 4, score 1.3273809523809523
Recommendation for user 42 is a movie 16, score 1.380952380952381
Recommendation for user 42 is a movie 17, score 1.75
Recommendation for user 42 is a movie 30, score 1.3452380952380953
Recommendation for user 42 is


Finding recommendations for user 56
#similar users: 89
Recommendation for user 56 is a movie 4, score 1.348314606741573
Recommendation for user 56 is a movie 9, score 1.5730337078651686
Recommendation for user 56 is a movie 11, score 2.061797752808989
Recommendation for user 56 is a movie 12, score 2.0842696629213484
Recommendation for user 56 is a movie 16, score 1.4831460674157304
Recommendation for user 56 is a movie 17, score 1.8932584269662922
Recommendation for user 56 is a movie 30, score 1.3651685393258426
Recommendation for user 56 is a movie 178, score 1.702247191011236
Recommendation for user 56 is a movie 205, score 1.4887640449438202
Recommendation for user 56 is a movie 261, score 1.3089887640449438
Recommendation for user 56 is a movie 369, score 2.00561797752809
Recommendation for user 56 is a movie 408, score 1.5168539325842696
Recommendation for user 56 is a movie 652, score 1.449438202247191

Finding recommendations for user 57
#similar users: 61

Finding recommenda


Finding recommendations for user 69
#similar users: 77
Recommendation for user 69 is a movie 9, score 1.4675324675324675
Recommendation for user 69 is a movie 369, score 1.844155844155844

Finding recommendations for user 70
#similar users: 89
Recommendation for user 70 is a movie 4, score 1.348314606741573
Recommendation for user 70 is a movie 9, score 1.5730337078651686
Recommendation for user 70 is a movie 11, score 2.061797752808989
Recommendation for user 70 is a movie 16, score 1.4831460674157304
Recommendation for user 70 is a movie 17, score 1.8932584269662922
Recommendation for user 70 is a movie 30, score 1.3651685393258426
Recommendation for user 70 is a movie 178, score 1.702247191011236
Recommendation for user 70 is a movie 182, score 1.8258426966292134
Recommendation for user 70 is a movie 205, score 1.4887640449438202
Recommendation for user 70 is a movie 261, score 1.3089887640449438
Recommendation for user 70 is a movie 406, score 1.3820224719101124
Recommendation for


Finding recommendations for user 83
#similar users: 69
Recommendation for user 83 is a movie 4, score 1.3043478260869565
Recommendation for user 83 is a movie 9, score 1.7173913043478262
Recommendation for user 83 is a movie 11, score 1.7826086956521738
Recommendation for user 83 is a movie 12, score 1.963768115942029
Recommendation for user 83 is a movie 17, score 1.7536231884057971
Recommendation for user 83 is a movie 30, score 1.565217391304348
Recommendation for user 83 is a movie 31, score 1.3985507246376812
Recommendation for user 83 is a movie 178, score 1.5434782608695652
Recommendation for user 83 is a movie 182, score 1.3623188405797102
Recommendation for user 83 is a movie 186, score 1.4130434782608696
Recommendation for user 83 is a movie 205, score 1.8478260869565217
Recommendation for user 83 is a movie 244, score 1.4492753623188406
Recommendation for user 83 is a movie 261, score 1.644927536231884
Recommendation for user 83 is a movie 302, score 1.3840579710144927
Reco