# Collaborative filtering

Our last recommendation engine will be collaborative filtering based. The assumption here is that if two people have similar ratings for a particular set of movies, then their choices in a set of new unknown movies would be similar too. We will first try to find similar users, based on  ratings on previous movies. In a second step we will recommend the films of similar users.

In a previous notebook we already explained an algorithm that can be used for this type of recommendation engine. In this notebook we will use a comparable algorithm. You don't have to know the implementation of the algorithm, but if you are interested, you can of course take a look at the code in the Python file.

## 1. Import the code library

Let us first import the code libraray with the functions who implement our collaborative filtering algorithms. The first two functions will be used to calculate the similarity score between two users (based on their reviews of previous seen movies). The similarity score gives us an idea of how similar two users are. The last two functions will finally implement our recommendation engine.

- `euclidean_score` uses the Euclidean distance between two data points to compute the score. The score ranges from 0 to 1. A low score indicates that the users are not similar.
- `pearson_score` uses the Pearson score, a measure of correlation between two objects. The score can range from -1 to +1. A score of +1 indicates that the objects are very similar where a score of -1 would indicate that the objects are very dissimilar.


- `find_similar_users` will - for a given user - find similar users (based on the pearson_score)
- `get_recommendations` will - for a given user - come up with recommendations (based on movies seen by similar users)

In [1]:
import sys
sys.path.append('resources/collaborative_filtering.py')

from resources.collaborative_filtering import euclidean_score, pearson_score, find_similar_users, get_recommendations

## 2. Computing similarity scores between users

Our movie recommendation system will be based on the data provided in the file *resources/ratings.json*. This file contains a set of people and their ratings for various movies.

```json
{ 
    "David Smith": 
    {
        "Vertigo": 4,
        "Scarface": 4.5,
        "Raging Bull": 3.0,
        "Goodfellas": 4.5,
        "The Apartment": 1.0
    },
    "Brenda Peterson": 
    {
        "Vertigo": 3.0,
        "Scarface": 1.5,
        "Raging Bull": 1.0,
        "Goodfellas": 2.0,
        "The Apartment": 5.0,
        "Roman Holiday": 4.5 
    },
    ...    
}
```

We will read the json file and calculate the similarity score between two users using the two different methods. Compare David Smith with Julie Hammel and see what happens. Maybe you can check in the file if the results make sence?

In [2]:
ratings_file = 'resources/ratings.json'

import json

with open(ratings_file) as json_file:
    data = json.load(json_file)
print(data)
    
user1 = "David Smith"
user2 = "Bill Duffy"

print("\nEuclidean score:")
print(euclidean_score(data, user1, user2))
    
print("\nPearson score:")
print(pearson_score(data, user1, user2))

{'David Smith': {'Vertigo': 4, 'Scarface': 4.5, 'Raging Bull': 3.0, 'Goodfellas': 4.5, 'The Apartment': 1.0}, 'Brenda Peterson': {'Vertigo': 3.0, 'Scarface': 1.5, 'Raging Bull': 1.0, 'Goodfellas': 2.0, 'The Apartment': 5.0, 'Roman Holiday': 4.5}, 'Bill Duffy': {'Vertigo': 4.5, 'Scarface': 5.0, 'Goodfellas': 4.5, 'The Apartment': 1.0}, 'Samuel Miller': {'Scarface': 3.5, 'Raging Bull': 5.0, 'The Apartment': 1.0, 'Goodfellas': 5.0, 'Roman Holiday': 1.0}, 'Julie Hammel': {'Scarface': 2.5, 'Roman Holiday': 4.5, 'Goodfellas': 3.0}, 'Clarissa Jackson': {'Vertigo': 5.0, 'Scarface': 4.5, 'Raging Bull': 4.0, 'Goodfellas': 2.5, 'The Apartment': 1.0, 'Roman Holiday': 1.5}, 'Adam Cohen': {'Vertigo': 3.5, 'Scarface': 3.0, 'The Apartment': 1.0, 'Goodfellas': 4.5, 'Roman Holiday': 3.0}, 'Chris Duncan': {'The Apartment': 1.5, 'Raging Bull': 4.5}, 'Bill Gates': {'Vertigo': 1.5, 'Scarface': 1.0, 'Goodfellas': 4.5, 'The Apartment': 5.0}}

Euclidean score:
0.585786437626905

Pearson score:
0.99099243041032

## 3. Finding similar users

Apparently the *Pearson score* gives the best result, so we will use this score to find similar users - given a certain user.

In [3]:
ratings_file = 'resources/ratings.json'

import json

with open(ratings_file) as json_file:
    data = json.load(json_file)

user = "Adam Cohen"

print('\nUsers similar to ' + user + ':\n')
similar_users = find_similar_users(data, user, 3) 
print('User\t\t\tPearson similarity score')
print('-'*48)
for item in similar_users:
    print(item[0], '\t\t\t', round(float(item[1]), 2))


Users similar to Adam Cohen:

User			Pearson similarity score
------------------------------------------------
David Smith 			 0.91
Bill Duffy 			 0.86
Samuel Miller 			 0.8


## 4. Building a movie recommendation system

Now that we have all the building blocks in place, it's time to build our movie recommendation system. When we want to find movie recommendations for a given user, we will need to find similar users in the dataset and then come up with
recommendations for this person.

In [4]:
ratings_file = 'resources/ratings.json'

import json

with open(ratings_file) as json_file:
    data = json.load(json_file)

user = "Chris Duncan"

print("\nMovie recommendations for " + user + ":")
movies = get_recommendations(data, user) 
for i, movie in enumerate(movies):
    print(str(i+1) + '. ' + movie)


Movie recommendations for Chris Duncan:
1. Vertigo
2. Scarface
3. Goodfellas
4. Roman Holiday


Now it's time to build you own recommendation system. Have a look at the task of module 2.