<font size='5'>In this project, we want to use a recommendation system.Recommendation systems are a collection of algorithms used to recommend items to users based on information taken from the user.These systems have become ubiquitous, and can be commonly seen in online stores, movies databases and job finders. In this notebook, we will explore Content-based recommendation systems and implement a simple version of one using Python and the Pandas library.</font>

#### Import the needed libraries

In [84]:
import pandas as pd

#### Read the data using Pandas dataframe

In [85]:
movies_df = pd.read_csv('movies.csv')
ratings_df = pd.read_csv('ratings.csv')
movies_df.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


#### Let's also remove the year from the **title** column by using pandas replace function and store in a new **year** column

In [86]:
movies_df['year'] = movies_df.title.str.extract('(\(\d\d\d\d\))' , expand=False)
movies_df['year'] = movies_df.title.str.extract('(\d\d\d\d)' , expand=False)
movies_df['title'] = movies_df.title.str.replace(r'\s*\(\d{4}\)\s*$' , '' , regex=True).str.strip()
movies_df.head()

Unnamed: 0,movieId,title,genres,year
0,1,Toy Story,Adventure|Animation|Children|Comedy|Fantasy,1995
1,2,Jumanji,Adventure|Children|Fantasy,1995
2,3,Grumpier Old Men,Comedy|Romance,1995
3,4,Waiting to Exhale,Comedy|Drama|Romance,1995
4,5,Father of the Bride Part II,Comedy,1995


#### split the values in the **Genres** column into a **list of Genres** to simplify for future use

In [87]:
movies_df['genres'] = movies_df.genres.str.split('|')
movies_df.head()

Unnamed: 0,movieId,title,genres,year
0,1,Toy Story,"[Adventure, Animation, Children, Comedy, Fantasy]",1995
1,2,Jumanji,"[Adventure, Children, Fantasy]",1995
2,3,Grumpier Old Men,"[Comedy, Romance]",1995
3,4,Waiting to Exhale,"[Comedy, Drama, Romance]",1995
4,5,Father of the Bride Part II,[Comedy],1995


#### For every row in the dataframe, iterate through the list of genres and place a **1** into the corresponding column and filling in the **NaN** values with **0** to show that a movie doesn't have that column's genre

In [88]:
moviesWithGenres_df = movies_df.copy()

for index,row in movies_df.iterrows():
    for genre in row['genres']:
        moviesWithGenres_df.at[index,genre] = 1

moviesWithGenres_df = moviesWithGenres_df.fillna(0)
moviesWithGenres_df.head()

Unnamed: 0,movieId,title,genres,year,Adventure,Animation,Children,Comedy,Fantasy,Romance,...,Horror,Mystery,Sci-Fi,War,Musical,Documentary,IMAX,Western,Film-Noir,(no genres listed)
0,1,Toy Story,"[Adventure, Animation, Children, Comedy, Fantasy]",1995,1.0,1.0,1.0,1.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2,Jumanji,"[Adventure, Children, Fantasy]",1995,1.0,0.0,1.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,3,Grumpier Old Men,"[Comedy, Romance]",1995,0.0,0.0,0.0,1.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,4,Waiting to Exhale,"[Comedy, Drama, Romance]",1995,0.0,0.0,0.0,1.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,5,Father of the Bride Part II,[Comedy],1995,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [89]:
ratings_df.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


In [90]:
ratings_df.drop(columns=['timestamp'],inplace=True)
ratings_df.head()

Unnamed: 0,userId,movieId,rating
0,1,1,4.0
1,1,3,4.0
2,1,6,4.0
3,1,47,5.0
4,1,50,5.0


<font size='6'>Content-Based recommendation system</font>



Let's begin by creating an input user to recommend movies to:

In [91]:
userInput = [
            {'title':'Maze Runner: Scorch Trials', 'rating':5},
            {'title':'Pixels', 'rating':4.5},
            {'title':'Jumanji', 'rating':5},
            {'title':"Train to Busan", 'rating':5},
            {'title': 'Wrong Turn' , 'rating':5}
         ]

input_movies = pd.DataFrame(userInput)
input_movies

Unnamed: 0,title,rating
0,Maze Runner: Scorch Trials,5.0
1,Pixels,4.5
2,Jumanji,5.0
3,Train to Busan,5.0
4,Wrong Turn,5.0


#### Add movieId to input user

In [92]:
inputId = movies_df[movies_df['title'].isin(input_movies['title'].to_list())]
input_movies = pd.merge(inputId , input_movies)
input_movies.drop(columns=['genres' , 'year'] , inplace=True)
input_movies

Unnamed: 0,movieId,title,rating
0,2,Jumanji,5.0
1,6379,Wrong Turn,5.0
2,117895,Maze Runner: Scorch Trials,5.0
3,135137,Pixels,4.5
4,162082,Train to Busan,5.0


#### We're going to start by learning the input's preferences, so let's get the subset of movies that the input has watched from the Dataframe containing genres defined with binary values

In [93]:
userMovies = moviesWithGenres_df[moviesWithGenres_df['movieId'].isin(input_movies['movieId'].to_list())]
userMovies

Unnamed: 0,movieId,title,genres,year,Adventure,Animation,Children,Comedy,Fantasy,Romance,...,Horror,Mystery,Sci-Fi,War,Musical,Documentary,IMAX,Western,Film-Noir,(no genres listed)
1,2,Jumanji,"[Adventure, Children, Fantasy]",1995,1.0,0.0,1.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4362,6379,Wrong Turn,"[Horror, Thriller]",2003,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8604,117895,Maze Runner: Scorch Trials,"[Action, Thriller]",2015,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8905,135137,Pixels,"[Action, Comedy, Sci-Fi]",2015,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9364,162082,Train to Busan,"[Action, Thriller]",2016,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Cleaning up

In [94]:
userMovies = userMovies.reset_index(drop=True)
userGenreTable = userMovies.copy()
userGenreTable.drop(columns=['movieId' , 'title' , 'genres' , 'year'] , inplace=True)
userGenreTable

Unnamed: 0,Adventure,Animation,Children,Comedy,Fantasy,Romance,Drama,Action,Crime,Thriller,Horror,Mystery,Sci-Fi,War,Musical,Documentary,IMAX,Western,Film-Noir,(no genres listed)
0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


<font size='6'>Creating user profile</font>

Now we're ready to start learning the input's preferences!

In [95]:
userProfile = userGenreTable.transpose().dot(input_movies['rating'])
userProfile

Adventure              5.0
Animation              0.0
Children               5.0
Comedy                 4.5
Fantasy                5.0
Romance                0.0
Drama                  0.0
Action                14.5
Crime                  0.0
Thriller              15.0
Horror                 5.0
Mystery                0.0
Sci-Fi                 4.5
War                    0.0
Musical                0.0
Documentary            0.0
IMAX                   0.0
Western                0.0
Film-Noir              0.0
(no genres listed)     0.0
dtype: float64

#### extract the genre table from the original dataframe

In [96]:
genreTable = moviesWithGenres_df.set_index(moviesWithGenres_df['movieId'])
genreTable.drop(columns=['movieId' , 'title' , 'genres' , 'year'] , inplace=True)
genreTable.head()

Unnamed: 0_level_0,Adventure,Animation,Children,Comedy,Fantasy,Romance,Drama,Action,Crime,Thriller,Horror,Mystery,Sci-Fi,War,Musical,Documentary,IMAX,Western,Film-Noir,(no genres listed)
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
1,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### we're going to take the weighted average of every movie based on the input profile and recommend the top twenty movies that most satisfy it

In [97]:
recommendationTable_df = ((genreTable*userProfile).sum(axis=1)/(userProfile.sum()))
recommendationTable_df.head()

movieId
1    0.333333
2    0.256410
3    0.076923
4    0.076923
5    0.076923
dtype: float64

In [98]:
recommendationTable_df = recommendationTable_df.sort_values(ascending=False)
recommendationTable_df.head()

movieId
2617      0.837607
72165     0.837607
164226    0.829060
170827    0.760684
2414      0.760684
dtype: float64

#### The recommendation table!

In [99]:
movies_df.loc[movies_df['movieId'].isin(recommendationTable_df.head(20).keys())]

Unnamed: 0,movieId,title,genres,year
1814,2414,Young Sherlock Holmes,"[Action, Adventure, Children, Fantasy, Mystery...",1985
1828,2429,Mighty Joe Young,"[Action, Adventure, Drama, Fantasy, Thriller]",1998
1972,2617,"Mummy, The","[Action, Adventure, Comedy, Fantasy, Horror, T...",1999
2869,3837,Phantasm II,"[Action, Fantasy, Horror, Sci-Fi, Thriller]",1988
5392,8985,Blade: Trinity,"[Action, Fantasy, Horror, Thriller]",2004
5612,27032,Who Am I? (Wo shi shei),"[Action, Adventure, Comedy, Sci-Fi, Thriller]",1998
5673,27683,Tremors 4: The Legend Begins,"[Action, Comedy, Horror, Sci-Fi, Thriller, Wes...",2004
5802,31804,Night Watch (Nochnoy dozor),"[Action, Fantasy, Horror, Mystery, Sci-Fi, Thr...",2004
5980,36509,"Cave, The","[Action, Adventure, Horror, Mystery, Sci-Fi, T...",2005
6076,41569,King Kong,"[Action, Adventure, Drama, Fantasy, Thriller]",2005
