# MOVIE RECOMMENDER SYSTEMS
## CONTENT-BASED AND COLLABORATIVE FILTERING-BASED METHODS

Recommendations systems are used for making product recommendations for a number of applications such as online shopping, suggesting interesting web sites, or helping people find music and movies. Recommendation systems search for people who share tastes and make automatic recommendations based on things that similar people like.
Recommendation systems are used not only for movies, but on multiple other products and services like Amazon (Books, Items), Pandora/Spotify (Music), Google (News, Search), YouTube (Videos) etc.

Content-Based Recommenders rely on the similarity of the items being recommended whereas Collaborative filtering produces recommendations based on the knowledge of users’ attitude to items.

## Data Sets Used

The data sets used are two parts of the MovieLens dataset, to build a model to recommend movies to users. This data has been collected by the GroupLens Research Project at the University of Minnesota. 


For Content-based method, we have used:

   1,000,000 ratings (1-5) from 6040 users on 3900 movies in the year 2000
   Demographic information of the users (age, gender, occupation, etc.)



For Collaborative Filtering-based method, we have used:

   100,000 ratings (1-5) from 943 users on 1682 movies in the year 1998
   Demographic information of the users (age, gender, occupation, etc.)


We are using a smaller data set for collaborative filtering, as this method required a lot of processing power that is not available on a desktop / laptop.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
from sklearn.model_selection import train_test_split

In [2]:
#Reading users file:
u_cols = ['user_id', 'age', 'sex', 'occupation', 'zip_code']
users = pd.read_csv('u.user', sep='|', names=u_cols,encoding='latin-1')
users.head()

Unnamed: 0,user_id,age,sex,occupation,zip_code
0,1,24,M,technician,85711
1,2,53,F,other,94043
2,3,23,M,writer,32067
3,4,24,M,technician,43537
4,5,33,F,other,15213


In [3]:
#Reading ratings file:
r_cols = ['user_id', 'movie_id', 'rating', 'unix_timestamp']
ratings = pd.read_csv('u.data', sep='\t', names=r_cols,encoding='latin-1')
ratings = ratings.iloc[:,:3]
ratings.head()

Unnamed: 0,user_id,movie_id,rating
0,196,242,3
1,186,302,3
2,22,377,1
3,244,51,2
4,166,346,1


In [4]:
#Reading items file:
i_cols = ['movie_id', 'movie_title' ,'release date','video release date', 'IMDb URL', 'unknown', 'Action', 'Adventure',
'Animation', 'Children\'s', 'Comedy', 'Crime', 'Documentary', 'Drama', 'Fantasy',
'Film-Noir', 'Horror', 'Musical', 'Mystery', 'Romance', 'Sci-Fi', 'Thriller', 'War', 'Western']
movies = pd.read_csv('u.item', sep='|', names=i_cols, encoding='latin-1')
movies_df = movies.iloc[:,:2]
movies_df.head()

Unnamed: 0,movie_id,movie_title
0,1,Toy Story (1995)
1,2,GoldenEye (1995)
2,3,Four Rooms (1995)
3,4,Get Shorty (1995)
4,5,Copycat (1995)


Checking the data sets for missing values

In [5]:
movies.isnull().sum()

movie_id                 0
movie_title              0
release date             1
video release date    1682
IMDb URL                 3
unknown                  0
Action                   0
Adventure                0
Animation                0
Children's               0
Comedy                   0
Crime                    0
Documentary              0
Drama                    0
Fantasy                  0
Film-Noir                0
Horror                   0
Musical                  0
Mystery                  0
Romance                  0
Sci-Fi                   0
Thriller                 0
War                      0
Western                  0
dtype: int64

In [6]:
movies.drop(['video release date'], axis=1, inplace=True)

In [7]:
movies.isnull().sum()

movie_id        0
movie_title     0
release date    1
IMDb URL        3
unknown         0
Action          0
Adventure       0
Animation       0
Children's      0
Comedy          0
Crime           0
Documentary     0
Drama           0
Fantasy         0
Film-Noir       0
Horror          0
Musical         0
Mystery         0
Romance         0
Sci-Fi          0
Thriller        0
War             0
Western         0
dtype: int64

In [8]:
ratings.isnull().sum()

user_id     0
movie_id    0
rating      0
dtype: int64

In [9]:
users.isnull().sum()

user_id       0
age           0
sex           0
occupation    0
zip_code      0
dtype: int64

In [10]:
#Convert from dat to csv for easier operations
#Declare a new csv file
movie = pd.read_csv('movies.dat', 
                    sep='::', 
                    engine='python', 
                    encoding='latin-1',
                    names=['movie_id', 'title', 'genres'])

movie.to_csv('movies.csv', 
              sep='\t', 
              header=True, 
              columns=['movie_id', 'title', 'genres'])

We will use the data set of 1 Million users for this pupose

In [11]:
movie = pd.read_csv('movies.csv', sep='\t', encoding='latin-1', usecols=['movie_id', 'title', 'genres'])
movie.head()

Unnamed: 0,movie_id,title,genres
0,1,Toy Story (1995),Animation|Children's|Comedy
1,2,Jumanji (1995),Adventure|Children's|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama
4,5,Father of the Bride Part II (1995),Comedy


In [12]:
movie['genres'] = movie['genres'].str.split('|')
movie['genres'] = movie['genres'].fillna("").astype('str')
movie.head()

Unnamed: 0,movie_id,title,genres
0,1,Toy Story (1995),"['Animation', ""Children's"", 'Comedy']"
1,2,Jumanji (1995),"['Adventure', ""Children's"", 'Fantasy']"
2,3,Grumpier Old Men (1995),"['Comedy', 'Romance']"
3,4,Waiting to Exhale (1995),"['Comedy', 'Drama']"
4,5,Father of the Bride Part II (1995),['Comedy']


## Content-Based Recommendation by Movie Genres

We will use the data set of 1 Million users for this pupose

We will use TfidfVectorizer to transform text in the genre column to feature vectors

In [13]:
from sklearn.feature_extraction.text import TfidfVectorizer
tf = TfidfVectorizer(analyzer='word',ngram_range=(1, 2),min_df=0, stop_words='english')
tfidf_matrix = tf.fit_transform(movie['genres'])
tfidf_matrix.shape

(3883, 127)

We will use the Cosine Similarity to calculate the similarity between two movies. Since we have used the TF-IDF Vectorizer, calculating the Dot Product will directly give us the Cosine Similarity Score. We will use the faster linear_kernel to compute the cosine similarities.

In [14]:
from sklearn.metrics.pairwise import linear_kernel
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)
cosine_sim[:4, :4]

array([[1.        , 0.14193614, 0.09010857, 0.1056164 ],
       [0.14193614, 1.        , 0.        , 0.        ],
       [0.09010857, 0.        , 1.        , 0.1719888 ],
       [0.1056164 , 0.        , 0.1719888 , 1.        ]])

In [15]:
# Building a 1-dimensional array with movie titles
titles = movie['title']
indices = pd.Series(movie.index, index=movie['title'])

In [16]:
indices.head()

title
Toy Story (1995)                      0
Jumanji (1995)                        1
Grumpier Old Men (1995)               2
Waiting to Exhale (1995)              3
Father of the Bride Part II (1995)    4
dtype: int64

We will now write a function that gets movie recommendations based on the cosine similarity score of movie genres

In [17]:
def genre_recommendations(title):
    idx = indices[title]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:21]
    movie_indices = [i[0] for i in sim_scores]
    return titles.iloc[movie_indices]

### Movie Predictions

In [18]:
#Getting the top 5 Recommendations by Genres
#Drama movies
genre_recommendations('Casino (1995)').head()

60     Eye for an Eye (1996)
78         Juror, The (1996)
90        Mary Reilly (1996)
98          City Hall (1996)
109       Taxi Driver (1976)
Name: title, dtype: object

In [19]:
#Comedy-Horror movies
genre_recommendations('Dracula: Dead and Loving It (1995)').head(5)

326                     Tales from the Hood (1995)
726     Cemetery Man (Dellamorte Dellamore) (1994)
789                        Frighteners, The (1996)
1221                              Braindead (1992)
1235                              Bad Taste (1987)
Name: title, dtype: object

In [20]:
#Children's movies
genre_recommendations('Balto (1995)').head(5)

241      Gumby: The Movie (1995)
310    Swan Princess, The (1994)
592             Pinocchio (1940)
612       Aristocats, The (1970)
700      Oliver & Company (1988)
Name: title, dtype: object

In [21]:
#Action movies
genre_recommendations('GoldenEye (1995)').head(5)

345    Clear and Present Danger (1994)
543          Surviving the Game (1994)
724                   Rock, The (1996)
788                    Daylight (1996)
825              Chain Reaction (1996)
Name: title, dtype: object

## Collaborative Filtering Recommendation

In [22]:
from sklearn.metrics import pairwise_distances
from scipy.spatial.distance import cosine, correlation

In [23]:
n_users = ratings.user_id.unique().shape[0]
n_items = ratings.movie_id.unique().shape[0]
print('Number of users = ' + str(n_users) + ' | Number of movies = ' + str(n_items))

Number of users = 943 | Number of movies = 1682


### Splitting Data into Train and Test

In [24]:
#Splitting the data into train and test
from sklearn.model_selection import train_test_split as cv
train_data, test_data = cv(ratings, test_size=0.2)

### Calculation of user-user and item-item similarity

In [25]:
#Create two user-item matrices, one for training and another for testing
train_data_matrix = np.zeros((n_users, n_items))
for line in train_data.itertuples():
    train_data_matrix[line[1]-1, line[2]-1] = line[3]

test_data_matrix = np.zeros((n_users, n_items))
for line in test_data.itertuples():
    test_data_matrix[line[1]-1, line[2]-1] = line[3]

In [26]:
from sklearn.metrics.pairwise import pairwise_distances
user_similarity = pairwise_distances(train_data_matrix, metric='cosine')
item_similarity = pairwise_distances(train_data_matrix.T, metric='cosine')

In [27]:
user_similarity

array([[0.        , 0.86304655, 0.94553874, ..., 0.8786238 , 0.85672235,
        0.66321845],
       [0.86304655, 0.        , 0.90987993, ..., 0.84036572, 0.84585512,
        0.90070085],
       [0.94553874, 0.90987993, 0.        , ..., 0.90603564, 0.8884318 ,
        0.96829197],
       ...,
       [0.8786238 , 0.84036572, 0.90603564, ..., 0.        , 0.88463855,
        0.95686042],
       [0.85672235, 0.84585512, 0.8884318 , ..., 0.88463855, 0.        ,
        0.87882432],
       [0.66321845, 0.90070085, 0.96829197, ..., 0.95686042, 0.87882432,
        0.        ]])

In [28]:
item_similarity

array([[0.        , 0.67621405, 0.75489237, ..., 1.        , 0.94793326,
        1.        ],
       [0.67621405, 0.        , 0.78833255, ..., 1.        , 0.9095466 ,
        1.        ],
       [0.75489237, 0.78833255, 0.        , ..., 1.        , 1.        ,
        1.        ],
       ...,
       [1.        , 1.        , 1.        , ..., 0.        , 1.        ,
        1.        ],
       [0.94793326, 0.9095466 , 1.        , ..., 1.        , 0.        ,
        1.        ],
       [1.        , 1.        , 1.        , ..., 1.        , 1.        ,
        0.        ]])

In [29]:
#Function to calculate predictions
def predict(ratings, similarity, type='user'):
    if type == 'user':
        mean_user_rating = ratings.mean(axis=1)
        #You use np.newaxis so that mean_user_rating has same format as ratings
        ratings_diff = (ratings - mean_user_rating[:, np.newaxis])
        pred = mean_user_rating[:, np.newaxis] + similarity.dot(ratings_diff) / np.array([np.abs(similarity).sum(axis=1)]).T
    elif type == 'item':
        pred = ratings.dot(similarity) / np.array([np.abs(similarity).sum(axis=1)])
    return pred

In [30]:
item_prediction = predict(train_data_matrix, item_similarity, type='item')
user_prediction = predict(train_data_matrix, user_similarity, type='user')

### Evaluation of the model

We will use the mean_square_error (MSE) function from sklearn to evaluate the model, where the RMSE is just the square root of MSE.

In [31]:
from sklearn.metrics import mean_squared_error
from math import sqrt
def rmse(prediction, ground_truth):
    prediction = prediction[ground_truth.nonzero()].flatten()
    ground_truth = ground_truth[ground_truth.nonzero()].flatten()
    return sqrt(mean_squared_error(prediction, ground_truth))

In [32]:
print('User-based CF RMSE: ' + str(rmse(user_prediction, test_data_matrix)))
print('Item-based CF RMSE: ' + str(rmse(item_prediction, test_data_matrix)))

User-based CF RMSE: 3.0911318859496077
Item-based CF RMSE: 3.440351699169376


In [33]:
print('User-based CF RMSE: ' + str(rmse(user_prediction, train_data_matrix)))
print('Item-based CF RMSE: ' + str(rmse(item_prediction, train_data_matrix)))

User-based CF RMSE: 3.0949994154659195
Item-based CF RMSE: 3.442813726446624


From the above, we can see that the user-based collaborative filtering gives a slightly better result.

## Functions for checking movie recommendations

In [34]:
# Creating a Pivot Tables for user-user and item-item similarity on the entire data set
user_movies_df = ratings.pivot(index='user_id', columns='movie_id', values = "rating" ).reset_index(drop=True)
movies_user_df = ratings.pivot( index='movie_id', columns='user_id', values = "rating" ).reset_index(drop=True)

In [35]:
# Appending '0' to ratings not given by users
user_movies_df.fillna(0, inplace = True)
movies_user_df.fillna(0, inplace = True)

In [36]:
print(user_movies_df.shape)
print(movies_user_df.shape)

(943, 1682)
(1682, 943)


In [37]:
#Sample of the Pivot table
user_movies_df.iloc[10:15, 20:25]

movie_id,21,22,23,24,25
10,0.0,4.0,0.0,3.0,3.0
11,0.0,0.0,0.0,0.0,0.0
12,3.0,4.0,5.0,1.0,1.0
13,0.0,3.0,5.0,0.0,2.0
14,0.0,0.0,0.0,0.0,3.0


In [38]:
#Sample of the Pivot table
movies_user_df.iloc[10:15, 20:25]

user_id,21,22,23,24,25
10,0.0,0.0,0.0,5.0,0.0
11,0.0,0.0,0.0,5.0,0.0
12,0.0,0.0,4.0,0.0,4.0
13,0.0,0.0,4.0,0.0,0.0
14,4.0,0.0,0.0,0.0,0.0


### Checking movies from user-user similarity

In [39]:
#Calculation of user-user similarity
user_sim = 1 - pairwise_distances(user_movies_df.as_matrix(), metric="cosine" )

  


In [40]:
user_sim_df = pd.DataFrame(user_sim)

In [41]:
user_sim_df.shape

(943, 943)

In [42]:
user_sim_df[0:5]

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,933,934,935,936,937,938,939,940,941,942
0,1.0,0.166931,0.04746,0.064358,0.378475,0.430239,0.440367,0.319072,0.078138,0.376544,...,0.369527,0.119482,0.274876,0.189705,0.197326,0.118095,0.314072,0.148617,0.179508,0.398175
1,0.166931,1.0,0.110591,0.178121,0.072979,0.245843,0.107328,0.103344,0.161048,0.159862,...,0.156986,0.307942,0.358789,0.424046,0.319889,0.228583,0.22679,0.161485,0.172268,0.105798
2,0.04746,0.110591,1.0,0.344151,0.021245,0.072415,0.066137,0.08306,0.06104,0.065151,...,0.031875,0.042753,0.163829,0.069038,0.124245,0.026271,0.16189,0.101243,0.133416,0.026556
3,0.064358,0.178121,0.344151,1.0,0.031804,0.068044,0.09123,0.18806,0.101284,0.060859,...,0.052107,0.036784,0.133115,0.193471,0.146058,0.030138,0.196858,0.152041,0.170086,0.058752
4,0.378475,0.072979,0.021245,0.031804,1.0,0.237286,0.3736,0.24893,0.056847,0.201427,...,0.338794,0.08058,0.094924,0.079779,0.148607,0.071459,0.239955,0.139595,0.152497,0.313941


In [43]:
# Finding users with highest similarities
user_sim_df.idxmax(axis=1)[0:5]

0    0
1    1
2    2
3    3
4    4
dtype: int64

In [44]:
# According to the results, most users are similar to themselves
# If we set the correlation with self to 0, we can avoid this issue

np.fill_diagonal(user_sim, 0)
user_sim_df = pd.DataFrame(user_sim)
user_sim_df[0:5]

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,933,934,935,936,937,938,939,940,941,942
0,0.0,0.166931,0.04746,0.064358,0.378475,0.430239,0.440367,0.319072,0.078138,0.376544,...,0.369527,0.119482,0.274876,0.189705,0.197326,0.118095,0.314072,0.148617,0.179508,0.398175
1,0.166931,0.0,0.110591,0.178121,0.072979,0.245843,0.107328,0.103344,0.161048,0.159862,...,0.156986,0.307942,0.358789,0.424046,0.319889,0.228583,0.22679,0.161485,0.172268,0.105798
2,0.04746,0.110591,0.0,0.344151,0.021245,0.072415,0.066137,0.08306,0.06104,0.065151,...,0.031875,0.042753,0.163829,0.069038,0.124245,0.026271,0.16189,0.101243,0.133416,0.026556
3,0.064358,0.178121,0.344151,0.0,0.031804,0.068044,0.09123,0.18806,0.101284,0.060859,...,0.052107,0.036784,0.133115,0.193471,0.146058,0.030138,0.196858,0.152041,0.170086,0.058752
4,0.378475,0.072979,0.021245,0.031804,0.0,0.237286,0.3736,0.24893,0.056847,0.201427,...,0.338794,0.08058,0.094924,0.079779,0.148607,0.071459,0.239955,0.139595,0.152497,0.313941


In [45]:
#We can now see the top 10 users with highest similarities
user_sim_df.idxmax(axis=1).sample(10, random_state = 10)

544    756
309    246
448    893
628    537
284    413
572    693
225    866
567    311
75     176
726    496
dtype: int64

In [46]:
#Function to get similar movies
def get_user_similar_movies( user1, user2 ):
  common_movies = ratings[ratings.user_id == user1].merge(ratings[ratings.user_id == user2], on = "movie_id", how = "inner" )
  return common_movies.merge( movies_df, on = 'movie_id' )

### Movie Predictions

In [47]:
get_user_similar_movies(572, 693)

Unnamed: 0,user_id_x,movie_id,rating_x,user_id_y,rating_y,movie_title
0,572,300,4,693,2,Air Force One (1997)
1,572,289,3,693,3,Evita (1996)
2,572,121,2,693,2,Independence Day (ID4) (1996)
3,572,9,5,693,3,Dead Man Walking (1995)
4,572,222,2,693,2,Star Trek: First Contact (1996)


In [48]:
get_user_similar_movies(75, 176)

Unnamed: 0,user_id_x,movie_id,rating_x,user_id_y,rating_y,movie_title
0,75,240,1,176,4,Beavis and Butt-head Do America (1996)
1,75,237,2,176,3,Jerry Maguire (1996)
2,75,475,5,176,5,Trainspotting (1996)
3,75,952,5,176,2,Blue in the Face (1995)
4,75,100,5,176,5,Fargo (1996)
5,75,151,5,176,4,Willy Wonka and the Chocolate Factory (1971)
6,75,273,5,176,4,Heat (1995)
7,75,294,3,176,2,Liar Liar (1997)
8,75,111,4,176,4,"Truth About Cats & Dogs, The (1996)"
9,75,405,4,176,2,Mission: Impossible (1996)


### Checking movies from item-item similarity

In [49]:
movie_sim = 1 - pairwise_distances(movies_user_df.as_matrix(), metric="correlation" )

  """Entry point for launching an IPython kernel.


In [50]:
movie_sim.shape

(1682, 1682)

In [51]:
movie_sim_df = pd.DataFrame( movie_sim )

In [52]:
movie_sim_df.head(10)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,1672,1673,1674,1675,1676,1677,1678,1679,1680,1681
0,1.0,0.234595,0.193362,0.226213,0.12884,0.015113,0.347354,0.25449,0.209502,0.104655,...,0.018215,-0.029676,-0.029676,-0.029676,0.018215,-0.029676,-0.029676,-0.029676,0.034179,0.034179
1,0.234595,1.0,0.190649,0.409044,0.240712,0.030062,0.220022,0.20602,0.077894,0.072906,...,-0.012451,-0.012451,-0.012451,-0.012451,-0.012451,-0.012451,-0.012451,-0.012451,0.071415,0.071415
2,0.193362,0.190649,1.0,0.227849,0.141368,0.065347,0.258855,0.078636,0.146181,0.079608,...,-0.009764,-0.009764,-0.009764,-0.009764,0.023964,-0.009764,-0.009764,-0.009764,-0.009764,0.091421
3,0.226213,0.409044,0.227849,1.0,0.237298,0.021878,0.295489,0.3528,0.229922,0.13822,...,-0.016619,-0.016619,0.088984,0.088984,0.025622,-0.016619,-0.016619,-0.016619,0.046743,0.067863
4,0.12884,0.240712,0.141368,0.237298,1.0,-0.008594,0.205289,0.145866,0.142541,-0.033746,...,-0.009889,-0.009889,-0.009889,-0.009889,-0.009889,-0.009889,-0.009889,-0.009889,-0.009889,0.088618
5,0.015113,0.030062,0.065347,0.021878,-0.008594,1.0,0.054415,0.01233,0.079619,0.166084,...,-0.005159,-0.005159,-0.005159,-0.005159,-0.005159,-0.005159,-0.005159,-0.005159,-0.005159,-0.005159
6,0.347354,0.220022,0.258855,0.295489,0.205289,0.054415,1.0,0.19067,0.286572,0.178505,...,-0.026036,0.03992,-0.026036,-0.026036,0.03992,-0.026036,-0.026036,-0.026036,0.03992,0.03992
7,0.25449,0.20602,0.078636,0.3528,0.145866,0.01233,0.19067,1.0,0.229331,0.152679,...,-0.01723,0.075617,0.057047,0.057047,0.075617,-0.01723,-0.01723,-0.01723,0.075617,-0.01723
8,0.209502,0.077894,0.146181,0.229922,0.142541,0.079619,0.286572,0.229331,1.0,0.158373,...,-0.021125,-0.021125,0.047273,0.047273,0.064372,-0.021125,-0.021125,-0.021125,0.047273,0.064372
9,0.104655,0.072906,0.079608,0.13822,-0.033746,0.166084,0.178505,0.152679,0.158373,1.0,...,-0.010138,-0.010138,0.073967,0.073967,-0.010138,-0.010138,-0.010138,-0.010138,-0.010138,-0.010138


In [53]:
#Appending movie similarity column of movie 0 (Toy Story) to the movies_df table to find similarity with other movies
movies_df['similarity'] = movie_sim_df.iloc[0]
movies_df.columns = ['movieid', 'title', 'similarity']
movies_df.head()

Unnamed: 0,movieid,title,similarity
0,1,Toy Story (1995),1.0
1,2,GoldenEye (1995),0.234595
2,3,Four Rooms (1995),0.193362
3,4,Get Shorty (1995),0.226213
4,5,Copycat (1995),0.12884


In [54]:
movies_df.sort_values(["similarity"], ascending = False)[1:5]

Unnamed: 0,movieid,title,similarity
49,50,Star Wars (1977),0.457677
120,121,Independence Day (ID4) (1996),0.454544
116,117,"Rock, The (1996)",0.431789
150,151,Willy Wonka and the Chocolate Factory (1971),0.423975


In [67]:
#Function to find similar movies
def get_similar_movies( movieid, topN = 5 ):
  movies_df['similarity'] = movie_sim_df.iloc[movieid -1]
  top_n = movies_df.sort_values( ["similarity"], ascending = False )[1:topN+1]
  print( "Similar Movies to",movies_df.title[movieid-1] )
  return top_n

### Movie Predictions

In [68]:
#Children's movies
get_similar_movies(1066)

Similar Movies to Balto (1995)


Unnamed: 0,movieid,title,similarity
1077,1078,Oliver & Company (1988),0.361455
541,542,Pocahontas (1995),0.360779
945,946,"Fox and the Hound, The (1981)",0.320334
101,102,"Aristocats, The (1970)",0.31116
1535,1536,Aiqing wansui (1994),0.308954


In [69]:
#Action movies
get_similar_movies(2)

Similar Movies to GoldenEye (1995)


Unnamed: 0,movieid,title,similarity
232,233,Under Siege (1992),0.611494
575,576,Cliffhanger (1993),0.555861
160,161,Top Gun (1986),0.553483
61,62,Stargate (1994),0.548701
384,385,True Lies (1994),0.547434


In [70]:
#Drama movies
get_similar_movies(693)

Similar Movies to Casino (1995)


Unnamed: 0,movieid,title,similarity
181,182,GoodFellas (1990),0.510615
217,218,Cape Fear (1991),0.436671
10,11,Seven (Se7en) (1995),0.433354
941,942,What's Love Got to Do with It (1993),0.409532
55,56,Pulp Fiction (1994),0.391973


In [71]:
#Romance movies
get_similar_movies(278)

Similar Movies to Bed of Roses (1996)


Unnamed: 0,movieid,title,similarity
279,280,Up Close and Personal (1996),0.394083
820,821,Mrs. Winterbourne (1996),0.383507
1052,1053,Now and Then (1995),0.38129
814,815,One Fine Day (1996),0.35041
933,934,"Preacher's Wife, The (1996)",0.341928


## Conclusion

Pros of Memory-based Collaborative Filtering:
- Implement is much easier than other methods.
- The prediction performance is good.
- When compared to other simpler algorithms, this method takes into account and corrects average rating.


Cons of Memory-based Collaborative Filtering:
- It usually recommends the more popular items.
- When new user or new item entered the system, it does not predict make accurate predictions.
