<a href="https://www.kaggle.com/code/rebeccapringle/simple-movie-recommender-system?scriptVersionId=131828151" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

<div style="padding:20px; 
            color:#FFFFFF;
            margin:10px;
            font-size:220%;
            text-align:center;
            display:fill;
            border-radius:20px;
            border-width: 5px;
            border-style: solid;
            border-color: #CD5C5C;
            background-color:#CD5C5C;
            overflow:hidden;
            font-weight:500">Simple Movie Recommender System</div>

This notebook shows a simple recommender system using KNN item-based collaborative filtering to find recommendations from movies the user has previously enjoyed.

The code does the following things:
1. automatically finds recommendations based on previously liked movies
2. removes movies already rated from final predictions to avoid repeatition

The code could be further improved by:
1. Creating genre specific recommendations using the genre tags
2. Instead of random selection of 5 star movies use the most recently watched 5 star films to inform recommendations

# 1. Import basic packages

In [1]:
import numpy as np
import pandas as pd

# 2. Get Datasets ready

Import the datasets only keeping id columns, movie titles and ratings

In [2]:
movies = pd.read_csv("../input/movie-lens-small-latest-dataset/movies.csv").drop('genres',1)
ratings = pd.read_csv("../input/movie-lens-small-latest-dataset/ratings.csv").drop('timestamp', 1)

  movies = pd.read_csv("../input/movie-lens-small-latest-dataset/movies.csv").drop('genres',1)
  ratings = pd.read_csv("../input/movie-lens-small-latest-dataset/ratings.csv").drop('timestamp', 1)


In [3]:
movies.head()

Unnamed: 0,movieId,title
0,1,Toy Story (1995)
1,2,Jumanji (1995)
2,3,Grumpier Old Men (1995)
3,4,Waiting to Exhale (1995)
4,5,Father of the Bride Part II (1995)


In [4]:
ratings.head()

Unnamed: 0,userId,movieId,rating
0,1,1,4.0
1,1,3,4.0
2,1,6,4.0
3,1,47,5.0
4,1,50,5.0


Get a rating matrix with rows showing the movies and columns showing the users ratings of those movies 

In [5]:
rating_matrix = ratings.pivot(index = 'movieId', columns = 'userId', values = 'rating').fillna(0)
display(rating_matrix.head())

userId,1,2,3,4,5,6,7,8,9,10,...,601,602,603,604,605,606,607,608,609,610
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,4.0,0.0,0.0,0.0,4.0,0.0,4.5,0.0,0.0,0.0,...,4.0,0.0,4.0,3.0,4.0,2.5,4.0,2.5,3.0,5.0
2,0.0,0.0,0.0,0.0,0.0,4.0,0.0,4.0,0.0,0.0,...,0.0,4.0,0.0,5.0,3.5,0.0,0.0,2.0,0.0,0.0
3,4.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0


Use scipy.sparse to remove the sparsity and compress the matrix. This will ease computation.

In [6]:
from scipy.sparse import csr_matrix

rating_matrix2 = csr_matrix(rating_matrix.values)
print(rating_matrix)

userId   1    2    3    4    5    6    7    8    9    10   ...  601  602  603  \
movieId                                                    ...                  
1        4.0  0.0  0.0  0.0  4.0  0.0  4.5  0.0  0.0  0.0  ...  4.0  0.0  4.0   
2        0.0  0.0  0.0  0.0  0.0  4.0  0.0  4.0  0.0  0.0  ...  0.0  4.0  0.0   
3        4.0  0.0  0.0  0.0  0.0  5.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   
4        0.0  0.0  0.0  0.0  0.0  3.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   
5        0.0  0.0  0.0  0.0  0.0  5.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   
...      ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...   
193581   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   
193583   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   
193585   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   
193587   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   
193609   0.0  0.0  0.0  0.0 

# 3. Define model

In [7]:
from sklearn.neighbors import NearestNeighbors

model = NearestNeighbors(metric='cosine', algorithm = 'brute', n_neighbors = 20)
model.fit(rating_matrix2)

# 4. Creating a function that finds the most similar movies (regarding their ratings) via entering the title of the movie and the number of desired recommendations

In [8]:
from fuzzywuzzy import process # a string matching function to find the movie id 

def KNN_item_recommender(movie_title, no_of_recommendations, model):
    # find id of movie
    id_of_movie = process.extractOne(movie_title, movies['title'])[2]
    
    #Get the n closest neighbours to that movie
    distances, indices = model.kneighbors(rating_matrix2[id_of_movie], n_neighbors = no_of_recommendations)
    
    # drop the first index being the movie itself
    indices =  np.delete(indices, 0)
    
    #print titles of those movies
    recommendations = []
    for i in indices:
        recommendations.append(movies['title'][i])
    return recommendations

In [9]:
KNN_item_recommender('Toy Story', 10, model)

["'night Mother (1986)",
 'Jurassic Park (1993)',
 'Independence Day (a.k.a. ID4) (1996)',
 'Star Wars: Episode IV - A New Hope (1977)',
 'Forrest Gump (1994)',
 'Lion King, The (1994)',
 "Once Upon a Time in the West (C'era una volta il West) (1968)",
 'Mission: Impossible (1996)',
 'Diva (1981)']

# 5. Create a function that randomly selects movies the user has rated 5 stars and gives recommendations based on those movies

In [10]:
import random

def user_recommendations(user_id, model):
    #find ratings for that user
    user_ratings = rating_matrix[user_id]
    
    #find the ids of the 5 star ratings given
    top_user_ratings = user_ratings[user_ratings > 4.9]
    
    # find all the movies the user has rated
    all_user_ratings = user_ratings[user_ratings > 0]
    
    # get list of all the movies the user has watched
    a = list(all_user_ratings.index)
    
    #randomly sample n movies the user has rated 5 stars
    b = list(top_user_ratings.index)
    b = random.sample(b, 10)
    
    #get a list of titles of randomly selected 5 star movies
    titles = []
    for i in b:
        title = str(movies.iat[i,1])
        titles.append(title)
        
    # get list of all movies that user has rated
    titles_all = []
    for i in a:
        title = str(movies.iat[i,1])
        titles_all.append(title)
        
    #get final list of recommendations
    recommendations = []
    for i in titles:
        
        c = list(KNN_item_recommender(i, 5, model))
        
        # remove items that the user has already rated
        d = [x for x in c if x not in titles_all]
        recommendations.append(d)
                               
    return recommendations


In [11]:
display(user_recommendations(1, model))

[['Syriana (2005)',
  'Night and Fog (Nuit et brouillard) (1955)',
  "Children's Hour, The (1961)"],
 ["Long Night's Journey Into Day (2000)",
  'Indiana Jones and the Temple of Doom (1984)',
  'Art of War, The (2000)',
  'Forbidden Games (Jeux interdits) (1952)'],
 ['Dutch (1991)',
  'Return of the Living Dead, The (1985)',
  'Pet Sematary (1989)',
  'Waiting for Guffman (1996)'],
 ['Training Day (2001)',
  'Hustler, The (1961)',
  'Lamerica (1994)',
  'My Man Godfrey (1936)'],
 ['Taps (1981)',
  'Last King of Scotland, The (2006)',
  'American Astronaut, The (2001)'],
 ['Navigator, The (1924)',
  'Moscow Does Not Believe in Tears (Moskva slezam ne verit) (1979)',
  'Strangers on a Train (1951)',
  'Sweet Dreams (1985)'],
 ['Denise Calls Up (1995)',
  'Guns of Navarone, The (1961)',
  'Puppet Master (1989)'],
 ['Lost in Translation (2003)',
  'Fantasticks, The (1995)',
  'Tuesdays with Morrie (1999)',
  'Tale of Two Cities, A (1935)'],
 ["Cheech and Chong's Up in Smoke (1978)",
  'Bra