## The goal is to make a movie reccomendation system, based on user ratings.
### Let's say we input a movie "A" and a lot of users, who made a good review on "A", also made a good review on "B". Then we should reccomend the movie "B".

### The data used can be found here: https://grouplens.org/datasets/movielens/100k/
### Used help from this Lynda course : https://www.lynda.com/Python-tutorials/Introduction-Python-Recommendation-Systems-Machine-Learning/563080-2.html

In [3]:
import sys
import numpy as np
import sklearn
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.decomposition import TruncatedSVD

%matplotlib inline

In [9]:
user_rates = pd.read_csv('u.data', sep='\t', names = ['userID', 'itemID', 'rating', 'timestamp'])
print(user_rates.shape)
user_rates.head()

(100000, 4)


Unnamed: 0,userID,itemID,rating,timestamp
0,196,242,3,881250949
1,186,302,3,891717742
2,22,377,1,878887116
3,244,51,2,880606923
4,166,346,1,886397596


In [26]:
column_names = ['itemID','movieTitle','releaseDate','videoReleaseDate','IMDB_URL','Unknown','Action',
             'Adventure','Animation','Children','Comedy','Crime','Documentary','Drama','Fantasy','Film-Noir',
             'Horror','Musical','Mystery','Romance','Sci-Fi','Thriller','War','Western']
movies = pd.read_csv('u.item', sep='|', encoding='latin-1', names = column_names)
movie_names = movies[['itemID', 'movieTitle']]
print(movie_names.shape)
movie_names.head()

(1682, 2)


Unnamed: 0,itemID,movieTitle
0,1,Toy Story (1995)
1,2,GoldenEye (1995)
2,3,Four Rooms (1995)
3,4,Get Shorty (1995)
4,5,Copycat (1995)


### Merge the data, so we have itemID and movieTitle in the same Data Frame

In [28]:
movies_data = pd.merge(user_rates, movie_names, on='itemID')
print(movies.shape)
movies_data.head()

(1682, 24)


Unnamed: 0,userID,itemID,rating,timestamp,movieTitle
0,196,242,3,881250949,Kolya (1996)
1,63,242,3,875747190,Kolya (1996)
2,226,242,5,883888671,Kolya (1996)
3,154,242,3,879138235,Kolya (1996)
4,306,242,5,876503793,Kolya (1996)


### Print the most rated movies:

In [125]:
movies_data.groupby('movieTitle')['rating'].count().sort_values(ascending=False).head(10)

movieTitle
Star Wars (1977)                 583
Contact (1997)                   509
Fargo (1996)                     508
Return of the Jedi (1983)        507
Liar Liar (1997)                 485
English Patient, The (1996)      481
Scream (1996)                    478
Toy Story (1995)                 452
Air Force One (1997)             431
Independence Day (ID4) (1996)    429
Name: rating, dtype: int64

## Utility Matrix

In [39]:
rating_matrix = movies_data.pivot_table(values='rating', index='userID', columns='movieTitle')
rating_matrix.head()

movieTitle,'Til There Was You (1997),1-900 (1994),101 Dalmatians (1996),12 Angry Men (1957),187 (1997),2 Days in the Valley (1996),"20,000 Leagues Under the Sea (1954)",2001: A Space Odyssey (1968),3 Ninjas: High Noon At Mega Mountain (1998),"39 Steps, The (1935)",...,Yankee Zulu (1994),Year of the Horse (1997),You So Crazy (1994),Young Frankenstein (1974),Young Guns (1988),Young Guns II (1990),"Young Poisoner's Handbook, The (1995)",Zeus and Roxanne (1997),unknown,Á köldum klaka (Cold Fever) (1994)
userID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,,,2.0,5.0,,,3.0,4.0,,,...,,,,5.0,3.0,,,,4.0,
2,,,,,,,,,1.0,,...,,,,,,,,,,
3,,,,,2.0,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,
5,,,2.0,,,,,4.0,,,...,,,,4.0,,,,,4.0,


#### Fill NaN with 0

In [43]:
rating_matrix.fillna(0, inplace=True);
print(rating_matrix.shape)
rating_matrix.head()

(943, 1664)


movieTitle,'Til There Was You (1997),1-900 (1994),101 Dalmatians (1996),12 Angry Men (1957),187 (1997),2 Days in the Valley (1996),"20,000 Leagues Under the Sea (1954)",2001: A Space Odyssey (1968),3 Ninjas: High Noon At Mega Mountain (1998),"39 Steps, The (1935)",...,Yankee Zulu (1994),Year of the Horse (1997),You So Crazy (1994),Young Frankenstein (1974),Young Guns (1988),Young Guns II (1990),"Young Poisoner's Handbook, The (1995)",Zeus and Roxanne (1997),unknown,Á köldum klaka (Cold Fever) (1994)
userID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,0.0,2.0,5.0,0.0,0.0,3.0,4.0,0.0,0.0,...,0.0,0.0,0.0,5.0,3.0,0.0,0.0,0.0,4.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,2.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,...,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,4.0,0.0


### We are going to use SVD to decompose our matrix. To do that, we need the rows to be movies and columns- users. Then we will compress the columns (943 to 12)

In [131]:
X = rating_matrix.T
X.shape

(1664, 943)

In [132]:
svd = TruncatedSVD(n_components=12, random_state=3)
X_svd = svd.fit_transform(X)

In [133]:
X_svd.shape

(1664, 12)

### Correlation Matrix (Pearson's r)

In [134]:
corr = np.corrcoef(X_svd)
corr.shape

(1664, 1664)

### reccomend(movie) prints out the top 10 movies with the highest correlation to the input

In [136]:
def reccomend(movie):
    index = list(rating_matrix.columns).index(movie)
    #print(index)
    df = pd.DataFrame(corr[index], rating_matrix.columns)
    print(df.sort_values(by=0, ascending=False)[1:11])

In [137]:
reccomend('Star Wars (1977)')

                                                  0
movieTitle                                         
Return of the Jedi (1983)                  0.988323
Terminator 2: Judgment Day (1991)          0.942533
Toy Story (1995)                           0.933861
Terminator, The (1984)                     0.930761
Raiders of the Lost Ark (1981)             0.924706
Empire Strikes Back, The (1980)            0.924232
Fugitive, The (1993)                       0.916512
Die Hard (1988)                            0.906686
Aliens (1986)                              0.892319
Indiana Jones and the Last Crusade (1989)  0.886393
