# Recommendatioon Systems

Netflix-watched and what others that liked similar movies also liked, Amazon-items in cart and viewed, items that go with what you added to cart etc.

### Types of recommendation systems
1. Content based recommendation system
2. Knowledge based recommendation system
3. Collaborative - 
- User based collaborative filtering 
- item based collaborative filtering
4. Hybrid: collaborative + content

## User-Based Nearest Neighbors
Recommended product have been liked by users who have similar interests as the user to whom the product is to be recommended
1. Pearson Correlation - how similar are 2 users i.e. it's a measure of how strong a relationship is between 2 variables
Possible similarity values between -1 and 1
2. Cosine base similarities -  produces better results in item-to-item filtering


## Item-Based Nearest Neighbors
Similar rating given by multiple users previously for the items
Uses the similarity between items (and not users) to make predictions/recommendations

### Simiarity measures
1.  Cosine base similarities -  produces better results in item-to-item filtering
2. Adjusted Cosine similarity - takes avg user ratings into account

If you have more information about item rating compared to user rating, you go with item-based nearest neighbors.

## 1. Import data

In [1]:
import pandas as pd
import numpy as np

In [2]:
users = pd.read_csv('ml-100k/u.user', sep='|', names=['user_id','age','gender','occupation','zip_code'])
ratings = pd.read_csv('ml-100k/u.data', sep='\t', names=['user_id','item_id','rating','timestamp'])
movies = pd.read_csv('ml-100k/u.item', sep='|', names=['movie_id','movie_title','release_date','video_release_date','IMDb URL','unknown','Action','Adventure','Animation','Children','Comedy','Crime','Documentary','Drama','Fantasy','Film-Noir','Horror','Musical','Mystery','Romance','Sci-Fi','Thriller','War','Western'], encoding='latin-1')
movies = movies.iloc[:,[0,1]]

In [3]:
users.head()

Unnamed: 0,user_id,age,gender,occupation,zip_code
0,1,24,M,technician,85711
1,2,53,F,other,94043
2,3,23,M,writer,32067
3,4,24,M,technician,43537
4,5,33,F,other,15213


In [4]:
ratings.head()

# this has both the user id and item id

Unnamed: 0,user_id,item_id,rating,timestamp
0,196,242,3,881250949
1,186,302,3,891717742
2,22,377,1,878887116
3,244,51,2,880606923
4,166,346,1,886397596


In [5]:
movies.head()

Unnamed: 0,movie_id,movie_title
0,1,Toy Story (1995)
1,2,GoldenEye (1995)
2,3,Four Rooms (1995)
3,4,Get Shorty (1995)
4,5,Copycat (1995)


## 2. From the ratings dataframe, build a matrix called association matrix (This will look similar to the table we saw in the powerpoint)

In [6]:
# drop the default index and use the item id as the index
learningMatrix = ratings.pivot_table(index=['item_id'], columns=['user_id'], values=['rating']).reset_index(drop=True)
learningMatrix
# the index is now the item ID. The column name is not user_id

Unnamed: 0_level_0,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating
user_id,1,2,3,4,5,6,7,8,9,10,...,934,935,936,937,938,939,940,941,942,943
0,5.0,4.0,,,4.0,4.0,,,,4.0,...,2.0,3.0,4.0,,4.0,,,5.0,,
1,3.0,,,,3.0,,,,,,...,4.0,,,,,,,,,5.0
2,4.0,,,,,,,,,,...,,,4.0,,,,,,,
3,3.0,,,,,,5.0,,,4.0,...,5.0,,,,,,2.0,,,
4,3.0,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1677,,,,,,,,,,,...,,,,,,,,,,
1678,,,,,,,,,,,...,,,,,,,,,,
1679,,,,,,,,,,,...,,,,,,,,,,
1680,,,,,,,,,,,...,,,,,,,,,,


### Note: when you are building an item based mdel, the item-id will be your index while when you are building the user based model, your index should be the user id

There are more movies (1682) than ratings (943) which is norrmal because not all users have watched and rated all the movies so the null values are normal.

## 3. Treat the null values in the association matrix

In [7]:
learningMatrix.fillna(0, inplace=True)
learningMatrix.head(20)

Unnamed: 0_level_0,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating
user_id,1,2,3,4,5,6,7,8,9,10,...,934,935,936,937,938,939,940,941,942,943
0,5.0,4.0,0.0,0.0,4.0,4.0,0.0,0.0,0.0,4.0,...,2.0,3.0,4.0,0.0,4.0,0.0,0.0,5.0,0.0,0.0
1,3.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,...,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0
2,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,3.0,0.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,4.0,...,5.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0
4,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,0.0,...,0.0,0.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,4.0,0.0,0.0,0.0,0.0,2.0,5.0,3.0,4.0,4.0,...,0.0,0.0,4.0,0.0,4.0,0.0,4.0,4.0,0.0,0.0
7,1.0,0.0,0.0,0.0,0.0,4.0,5.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0
8,5.0,0.0,0.0,0.0,0.0,4.0,5.0,0.0,0.0,4.0,...,0.0,1.0,4.0,5.0,3.0,5.0,3.0,0.0,0.0,3.0
9,3.0,2.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## 4. Find the similarity between the items and come up with movie similarity matrix

In [8]:
from sklearn.metrics import pairwise_distances

# pairwise_distance helps you determine the similarity. If there are close to each other, it means the similarity is high and if they are far from each other, the similarity is low

In [9]:
# to standardize the output, do 1 - pairwise distance to make sure eveything is between 0 to 1
movie_similarity = 1 - pairwise_distances(learningMatrix, metric='cosine')

# fill diag with 0 because if it's 1, it can be misleading as you may mistakenly read it as similarity
np.fill_diagonal(movie_similarity, 0) 


## 5. Determine the similarity matrix

In [10]:
# similarity matrix
# create rating matrix and convert it to dataframe
rating_matrix = pd.DataFrame(movie_similarity)
rating_matrix.head(10)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,1672,1673,1674,1675,1676,1677,1678,1679,1680,1681
0,0.0,0.402382,0.330245,0.454938,0.286714,0.116344,0.620979,0.481114,0.496288,0.273935,...,0.035387,0.0,0.0,0.0,0.035387,0.0,0.0,0.0,0.047183,0.047183
1,0.402382,0.0,0.273069,0.502571,0.318836,0.083563,0.383403,0.337002,0.255252,0.171082,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.078299,0.078299
2,0.330245,0.273069,0.0,0.324866,0.212957,0.106722,0.372921,0.200794,0.273669,0.158104,...,0.0,0.0,0.0,0.0,0.032292,0.0,0.0,0.0,0.0,0.096875
3,0.454938,0.502571,0.324866,0.0,0.334239,0.090308,0.489283,0.490236,0.419044,0.252561,...,0.0,0.0,0.094022,0.094022,0.037609,0.0,0.0,0.0,0.056413,0.075218
4,0.286714,0.318836,0.212957,0.334239,0.0,0.037299,0.334769,0.259161,0.272448,0.055453,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.094211
5,0.116344,0.083563,0.106722,0.090308,0.037299,0.0,0.139617,0.083876,0.151064,0.203097,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,0.620979,0.383403,0.372921,0.489283,0.334769,0.139617,0.0,0.423515,0.527462,0.318623,...,0.0,0.051498,0.0,0.0,0.051498,0.0,0.0,0.0,0.051498,0.051498
7,0.481114,0.337002,0.200794,0.490236,0.259161,0.083876,0.423515,0.0,0.424429,0.267764,...,0.0,0.082033,0.065627,0.065627,0.082033,0.0,0.0,0.0,0.082033,0.0
8,0.496288,0.255252,0.273669,0.419044,0.272448,0.151064,0.527462,0.424429,0.0,0.288514,...,0.0,0.0,0.05736,0.05736,0.0717,0.0,0.0,0.0,0.05736,0.0717
9,0.273935,0.171082,0.158104,0.252561,0.055453,0.203097,0.318623,0.267764,0.288514,0.0,...,0.0,0.0,0.080264,0.080264,0.0,0.0,0.0,0.0,0.0,0.0


In [14]:
try:
    user_inp=input('Enter the reference movie title based on which recommendations are to be made: ')
    inp=movies[movies['movie_title']==user_inp].index.tolist()
    
    inp=inp[0]
    
    
    movies['similarity'] = rating_matrix.iloc[inp]
    #print(movies.head(5))
    print("Recommended movies based on your choice of ",user_inp ,": \n", movies.sort_values( ["similarity"], ascending = False )[1:10])
    
except:
    print("The movie name you have entered does not exist in the list, however, below are the top movies recommended in general")
    print(movies.head(10))

Enter the reference movie title based on which recommendations are to be made: Get Shorty (1995)
Recommended movies based on your choice of  Get Shorty (1995) : 
      movie_id                        movie_title  similarity
203       204          Back to the Future (1985)    0.628946
173       174     Raiders of the Lost Ark (1981)    0.628720
201       202               Groundhog Day (1993)    0.620055
95         96  Terminator 2: Judgment Day (1991)    0.617312
194       195             Terminator, The (1984)    0.604652
171       172    Empire Strikes Back, The (1980)    0.602747
215       216     When Harry Met Sally... (1989)    0.601488
78         79               Fugitive, The (1993)    0.601319
384       385                   True Lies (1994)    0.599989


Project 3 tip: pass your testing data as the input and make prediction/recommendation