## What Are Recommendation Systems ?
Recommender systems are the systems that are designed to recommend things to the user based on many different factors. These systems predict the most likely product that the users are most likely to purchase and are of interest to. Companies like Netflix, Amazon, etc. use recommender systems to help their users to identify the correct product or movies for them.

In this project I will develop a Movie Recommendation Engine on a popular MovieLens Dataset to recommend Ideal Movies to its Ideal viewers. Lets start this project with importing datasets.

### IMPORTING DATASET

In [1]:
import os
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.neighbors import NearestNeighbors

In [2]:
users = pd.read_csv("./data/users.csv", sep=";")

movies = pd.read_csv('./data/movies.csv',sep=';',encoding='latin-1').drop('Unnamed: 3',axis=1)

ratings = pd.read_csv("./data/ratings.csv", sep=";")

In [3]:
print("User contains Nan")
print(users.isna().sum())
print()

print("Movies contains Nan")
print(movies.isna().sum())
print()

print("Ratings contains Nan")
print(ratings.isna().sum())
print()

User contains Nan
userId        0
gender        0
age           0
occupation    0
zip-code      0
dtype: int64

Movies contains Nan
movieId    0
title      0
genres     0
dtype: int64

Ratings contains Nan
userId       0
movieId      0
rating       0
timestamp    0
dtype: int64



In [4]:
ratings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000209 entries, 0 to 1000208
Data columns (total 4 columns):
 #   Column     Non-Null Count    Dtype
---  ------     --------------    -----
 0   userId     1000209 non-null  int64
 1   movieId    1000209 non-null  int64
 2   rating     1000209 non-null  int64
 3   timestamp  1000209 non-null  int64
dtypes: int64(4)
memory usage: 30.5 MB


In [5]:
ratings.head(6)

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291
5,1,1197,3,978302268


In [6]:
movies.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Animation|Children's|Comedy
1,2,Jumanji (1995),Adventure|Children's|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama
4,5,Father of the Bride Part II (1995),Comedy


In [8]:
ratings

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291
...,...,...,...,...
1000204,6040,1091,1,956716541
1000205,6040,1094,5,956704887
1000206,6040,562,5,956704746
1000207,6040,1096,4,956715648


### IMPLEMENTING RECOMMENDER BASED ON USER RATING ALONE 

These are the users whose movie preferences are most similar to those of the target user.
What we are doing here is simple, we specify a target user, and get his/her ratings of previous movies (even if the user hasnt rated them before)
we find users that have simlar preferences and recommend those preferences to the target user

In [111]:
user_movie_rating_matrix = ratings.pivot(index='userId', columns='movieId', values='rating').fillna(0)

nn_model = NearestNeighbors(metric="cosine")
nn_model.fit(user_movie_rating_matrix.values)

In [112]:
user_movie_rating_matrix.head()

movieId,1,2,3,4,5,6,7,8,9,10,...,3943,3944,3945,3946,3947,3948,3949,3950,3951,3952
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [160]:
class Recommender():
    
    def __init__(self, target_user_id:int):
    
        self.target_user_id = target_user_id
        
    def recomemend_movie_ids_based_on_rating_alone(self):
        target_user_movie_ratings = user_movie_rating_matrix.loc[self.target_user_id].values.reshape(1, -1)
        
        #This returns the indices of the simlar users
        distances, neighbors = nn_model.kneighbors(target_user_movie_ratings, n_neighbors=6)

        # GEt the User ids that have the same movie prefreces as the target user
        nearest_user_ids = user_movie_rating_matrix.iloc[neighbors[0][1:]].index.tolist()
        
        # Generate movie recommendations based on nearest neighbors' preferences
        #This will return a series where each movie recommended will take the average ratings
        recommended_movies = user_movie_rating_matrix.loc[nearest_user_ids].mean(axis=0)
        
        #GEt all recommended movieids whose average rating is greater than or equal 3
        recommended_movies = recommended_movies[recommended_movies == 0].index.tolist()
        
        return recommended_movies
    

    
def get_users_most_rated_movies(user_id:int):
    
    user_most_rated = ratings.loc[ratings["userId"] == user_id]
    
    user_most_rated = user_most_rated.loc[user_most_rated["rating"] > 4]
    
    
    result = pd.merge(user_most_rated, movies, on='movieId', how='inner')
    
    return result.drop("timestamp", axis=1)


def get_recommened_movies_from_id(movie_ids):
    
    return movies[movies['movieId'].isin(movie_ids)]
        
        
user_id = 1

recommender = Recommender(target_user_id=user_id)


get_recommened_movies_from_id(recommender.recomemend_movie_ids_based_on_rating_alone()).iloc[0:15]

Unnamed: 0,movieId,title,genres
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama
4,5,Father of the Bride Part II (1995),Comedy
5,6,Heat (1995),Action|Crime|Thriller
7,8,Tom and Huck (1995),Adventure|Children's
8,9,Sudden Death (1995),Action
9,10,GoldenEye (1995),Action|Adventure|Thriller
11,12,Dracula,Dead and Loving It (1995)
12,13,Balto (1995),Animation|Children's
13,14,Nixon (1995),Drama


### IMPLEMENTING RECOMMENDER BASED ON MOVIE RATINGS AND GENRE

We recommend movies based on the ratings and or the genre of the target movie a user specifies, we get the target movie and it retuen similar movies base on the genre and ratings

In [10]:
movies.loc[movies['movieId'] == 4, 'genres']

3    Comedy|Drama
Name: genres, dtype: object

In [11]:

# Combine the ratings dataframe with the movies dataframe
ratings_with_genre = pd.merge(ratings, movies, on='movieId', how='left')

ratings_with_genre = ratings_with_genre.drop(["timestamp", "userId"], axis=1)




In [12]:
ratings_with_genre

Unnamed: 0,movieId,rating,title,genres
0,1193,5,One Flew Over the Cuckoo's Nest (1975),Drama
1,661,3,James and the Giant Peach (1996),Animation|Children's|Musical
2,914,3,My Fair Lady (1964),Musical|Romance
3,3408,4,Erin Brockovich (2000),Drama
4,2355,5,"Bug's Life, A (1998)",Animation|Children's|Comedy
...,...,...,...,...
1000204,1091,1,Weekend at Bernie's (1989),Comedy
1000205,1094,5,"Crying Game, The (1992)",Drama|Romance|War
1000206,562,5,Welcome to the Dollhouse (1995),Comedy|Drama
1000207,1096,4,Sophie's Choice (1982),Drama


In [14]:
# Here We are cleaning the Genre column, only allow rows with a valid genre


# List of valid genres (you can modify this list based on your actual data)
valid_genres = ['Action', 'Adventure', 'Animation', 'Children', 'Comedy', 'Crime', 'Documentary', 'Drama',
                'Fantasy', 'Film-Noir', 'Horror', 'Musical', 'Mystery', 'Romance', 'Sci-Fi', 'Thriller', 'War', 'Western']

# Create a boolean mask to identify rows with valid genres
valid_mask = ratings_with_genre['genres'].apply(lambda x: all(genre in valid_genres for genre in x.split('|')))

# Filter the DataFrame using the boolean mask
ratings_with_genre = ratings_with_genre[valid_mask]


In [16]:
ratings_with_genre["genres"].unique()

array(['Drama', 'Musical|Romance', 'Action|Adventure|Comedy|Romance',
       'Action|Adventure|Drama', 'Comedy|Drama', 'Musical', 'Comedy',
       'Comedy|Fantasy', 'Comedy|Sci-Fi', 'Drama|War', 'Romance',
       'Drama|Romance', 'Animation|Comedy|Thriller', 'Thriller',
       'Action|Crime|Romance', 'Action|Drama|War', 'Crime|Drama|Thriller',
       'Action|Crime|Drama', 'Crime|Drama',
       'Action|Adventure|Sci-Fi|Thriller', 'Action|Thriller',
       'Action|Drama', 'Comedy|Drama|Western', 'Action|Adventure|Crime',
       'Action|Crime|Mystery|Thriller', 'Comedy|Drama|Romance',
       'Comedy|Drama|War', 'Drama|Sci-Fi', 'Action|Drama|Thriller',
       'Action|Comedy|Western', 'Adventure|Comedy|Drama',
       'Drama|Thriller', 'Comedy|Romance',
       'Action|Drama|Romance|Thriller', 'Action|Crime|Thriller',
       'Action|Sci-Fi', 'Action|Romance|War',
       'Adventure|Drama|Romance|Sci-Fi', 'Action|Adventure|Sci-Fi',
       'Drama|Romance|War', 'Action|Drama|Romance',
       'Cri

### LETS MAKE A COPY OF THE RATINGS WITH GENRE DATAFRAME, AND DO TWO SPECIAL CASES

* One where the all the movies in the dataframe re only allowed on genre
* Another case is where we just consider the dataframe as it is, but this way might hinder performamce, i.e if the movie has
a genre of Comedy, Drama and Action our model will only recommend movies of that genre

##### FIRST CASE

In [99]:
first_case_ratings_with_genre = ratings_with_genre.copy

In [100]:
from sklearn.preprocessing import LabelEncoder
first_case_ratings_with_genre = ratings_with_genre.copy()

label_encoder = LabelEncoder()


genre_labels = first_case_ratings_with_genre['genres'].str.split('|', expand=True).stack().unique()

# Create a dictionary mapping genre labels to integer labels
genre_label_mapping = {genre: idx for idx, genre in enumerate(genre_labels)}

# Apply the mapping to the DataFrame
first_case_ratings_with_genre['encoded_genres'] = first_case_ratings_with_genre['genres'].apply(lambda x: [genre_label_mapping[genre] for genre in x.split('|')])



In [101]:
first_case_ratings_with_genre

Unnamed: 0,movieId,rating,title,genres,encoded_genres
0,1193,5,One Flew Over the Cuckoo's Nest (1975),Drama,[0]
2,914,3,My Fair Lady (1964),Musical|Romance,"[1, 2]"
3,3408,4,Erin Brockovich (2000),Drama,[0]
5,1197,3,"Princess Bride, The (1987)",Action|Adventure|Comedy|Romance,"[3, 4, 5, 2]"
6,1287,5,Ben-Hur (1959),Action|Adventure|Drama,"[3, 4, 0]"
...,...,...,...,...,...
1000203,1090,3,Platoon (1986),Drama|War,"[0, 8]"
1000204,1091,1,Weekend at Bernie's (1989),Comedy,[5]
1000205,1094,5,"Crying Game, The (1992)",Drama|Romance|War,"[0, 2, 8]"
1000206,562,5,Welcome to the Dollhouse (1995),Comedy|Drama,"[5, 0]"


In [103]:
# Get the unique genre labels
unique_genres = sorted(set(genre for sublist in first_case_ratings_with_genre['encoded_genres'] for genre in sublist))

# Create binary columns for each genre
for genre in unique_genres:
    first_case_ratings_with_genre[f'genre_{genre}'] = first_case_ratings_with_genre['encoded_genres'].apply(lambda x: int(genre in x))

In [105]:
# We drop the previous "encoded_genres"

first_case_ratings_with_genre.drop("encoded_genres", axis=1, inplace=True)

In [106]:
first_case_ratings_with_genre

Unnamed: 0,movieId,rating,title,genres,genre_0,genre_1,genre_2,genre_3,genre_4,genre_5,...,genre_7,genre_8,genre_9,genre_10,genre_11,genre_12,genre_13,genre_14,genre_15,genre_16
0,1193,5,One Flew Over the Cuckoo's Nest (1975),Drama,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,914,3,My Fair Lady (1964),Musical|Romance,0,1,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,3408,4,Erin Brockovich (2000),Drama,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,1197,3,"Princess Bride, The (1987)",Action|Adventure|Comedy|Romance,0,0,1,1,1,1,...,0,0,0,0,0,0,0,0,0,0
6,1287,5,Ben-Hur (1959),Action|Adventure|Drama,1,0,0,1,1,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1000203,1090,3,Platoon (1986),Drama|War,1,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
1000204,1091,1,Weekend at Bernie's (1989),Comedy,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
1000205,1094,5,"Crying Game, The (1992)",Drama|Romance|War,1,0,1,0,0,0,...,0,1,0,0,0,0,0,0,0,0
1000206,562,5,Welcome to the Dollhouse (1995),Comedy|Drama,1,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0


In [108]:
rating_wth_genre_X = first_case_ratings_with_genre.drop(["title", "genres"], axis=1)

rating_wth_genre_X.set_index("movieId", inplace=True)


rating_wth_genre_X

Unnamed: 0_level_0,rating,genre_0,genre_1,genre_2,genre_3,genre_4,genre_5,genre_6,genre_7,genre_8,genre_9,genre_10,genre_11,genre_12,genre_13,genre_14,genre_15,genre_16
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
1193,5,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
914,3,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3408,4,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1197,3,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0
1287,5,1,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1090,3,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
1091,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
1094,5,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0
562,5,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0


#### NOW WE CAN BUILD OUR RECOMMENDER