# Building Recommender Systems

Return to the [castle](https://github.com/Nkluge-correa/teeny-tiny_castle).

**A recommender system, or a recommendation system (sometimes replacing 'system' with a synonym such as a platform or an engine), is a subclass of information filtering systems that provide suggestions for items that are most pertinent to a particular user.**

**Recommender systems are used in a variety of areas, with commonly recognized examples taking the form of playlist generators for video and music services, product recommenders for online stores, or content recommenders for social media platforms and open web content recommenders.**

![recommend](https://miro.medium.com/max/1200/1*E8c4PEwsogQQWJPErGda2A.gif)

**There are three types of recommender systems:**

-   **Demographic Filtering: They offer generalized recommendations to every user, based on movie popularity and/or genre. The basic idea behind this system is that movies that are more popular and critically acclaimed will have a higher probability of being liked by the average audience._**

-   **Content-Based Filtering: They suggest similar items based on a particular item. The general idea behind these recommender systems is that if a person liked a particular item, he or she will also like an item that is similar to it._**

-   **Collaborative Filtering: _This system matches persons with similar interests and provides recommendations based on this matching._**

**Let's start with *Demographic Filtering*.**

**For this notebook, we are using a couple of datasets located in the `data` foler. The first being [The Movie Database (TMDb)](https://www.kaggle.com/datasets/tmdb/themoviedb.org).**

In [2]:
import pandas as pd

movies = pd.read_csv('data/tmdb_5000_movies.csv')
movie_credits = pd.read_csv('data/tmdb_5000_credits.csv')

movie_credits.columns = ['id','tittle','cast','crew']

data= movies.merge(movie_credits, on='id').drop('tittle', axis = 1)

display(data.head(2))

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,cast,crew
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",...,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...",...,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."


**First of all, we need to:**

-   **Define a metric to score movies.**
-   **Calculate the score for every movie.**
-   **Sort the scores and recommend the best-rated movie.**

**We can use the average ratings of the movie as the score but using this won't be fair enough since a movie with an 8.9 average rating and only three votes cannot be considered better than the movie with 7.8 as an average rating but 40 votes.**

**To bipass this issue, we can the following formula (the `IMDB's weighted rating`):**

$$Weighted\;Rating\;(WR) = (\frac{v}{v+m} \times R) + (\frac{m}{v+m} \times C)$$

**where:**

-   **$v$ is the number of votes for the movie.**
-   **$m$ is the minimum votes required to be listed.**
-   **$R$ is the average rating of the movie.**
-   **$C$ is the mean vote across the whole report.**

In [4]:
C = data['vote_average'].mean()

# For a movie to feature in the charts, 
# it must have more votes than at least 
# 90% of the movies in the list

M = data['vote_count'].quantile(0.9)

recommender_list = data.copy().loc[data['vote_count'] >= M]

def IMDB_weighted_rating(x, M=M, C=C):
    v = x['vote_count']
    R = x['vote_average']

    # Calculation based on the IMDB formula
    return (v/(v+M) * R) + (M/(M+v) * C)

recommender_list['score'] = recommender_list.apply(IMDB_weighted_rating, axis=1)
recommender_list = recommender_list.sort_values('score', ascending=False)

display('Most Popular',recommender_list[['title', 'vote_count', 'vote_average', 'score']])

recommender_list_trending = recommender_list.sort_values('popularity', ascending=False)

display('Trending Now',recommender_list_trending[['title', 'vote_count', 'vote_average', 'popularity']])

'Most Popular'

Unnamed: 0,title,vote_count,vote_average,score
1881,The Shawshank Redemption,8205,8.5,8.059258
662,Fight Club,9413,8.3,7.939256
65,The Dark Knight,12002,8.2,7.920020
3232,Pulp Fiction,8428,8.3,7.904645
96,Inception,13752,8.1,7.863239
...,...,...,...,...
41,Green Lantern,2487,5.1,5.521697
337,A Good Day to Die Hard,3493,5.2,5.507643
193,After Earth,2532,5.0,5.459420
91,Independence Day: Resurgence,2491,4.9,5.406234


'Trending Now'

Unnamed: 0,title,vote_count,vote_average,popularity
546,Minions,4571,6.4,875.581305
95,Interstellar,10867,8.1,724.247784
788,Deadpool,10995,7.4,514.569956
94,Guardians of the Galaxy,9742,7.9,481.098624
127,Mad Max: Fury Road,9427,7.2,434.278564
...,...,...,...,...
101,X-Men: First Class,5181,7.1,3.195174
203,X2,3506,6.8,2.871739
508,The Lost World: Jurassic Park,2487,6.2,2.502487
2511,Home Alone,2414,7.1,2.186927


**Now we have a recommendation system that indicates the best movies "_overall_", and using the "_popularity_" feature, we can list the most recommended movies "_for a given time frame_".**

**However, demographic recommender systems _are not sensitive to the interests and tastes of a particular user_. For this wee need _Content-Based Filtering_.**

**In this recommender system, the content of the movie (_overview, cast, crew, keyword, tagline_, etc.) is used to find its similarity with other movies. Then, the movies that are most likely to be similar are recommended.**

**We can achieve this by assessing the similarity of, for example, the synopsis of different movies. And thus recommend movies with a similar synopsis to those that the user decides to watch.**

In [5]:
for i in range(3):
    print(f'''{data['original_title'][i]}\n{data['overview'][i]}\n''')

Avatar
In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization.

Pirates of the Caribbean: At World's End
Captain Barbossa, long believed to be dead, has come back to life and is headed to the edge of the Earth with Will Turner and Elizabeth Swann. But nothing is quite as it seems.

Spectre
A cryptic message from Bond’s past sends him on a trail to uncover a sinister organization. While M battles political forces to keep the secret service alive, Bond peels back the layers of deceit to reveal the terrible truth behind SPECTRE.



**For this, we can compute the *"Term Frequency - Inverse Document Frequency"* (TF-IDF) vectors for each synopsis.**

- **Term frequency (`TF`)  it is the relative frequency of a word in a document $(\frac{term\;instances}{total\;instances})$.**
- **Inverse Document Frequency (`IDF`) is the relative count of documents containing the term $log(\frac{number\;of\;documents}{documents\;with\;term})$.**
- **The overall importance of each word to the documents in which they appear is equal to  $TF \times IDF$.**

**This will give you a matrix where each column represents a word in the vocabulary and each row represents a movie in the dataset.**

**Fortunately, `scikit-learn` can do all this heavy lifting for you.**

In [6]:
from sklearn.feature_extraction.text import TfidfVectorizer

tfidf = TfidfVectorizer(stop_words='english') 

#Replace NaN with an empty string
data['overview'] = data['overview'].fillna('') 

tfidf_matrix = tfidf.fit_transform(data['overview'])

#Output the shape of tfidf_matrix
print(f'Number of movies: {tfidf_matrix.shape[0]}.')
print(f'Size of the vocabulary used to describe them: {tfidf_matrix.shape[1]} words.')

Number of movies: 4803.
Size of the vocabulary used to describe them: 20978 words.


**With the TF-ITF matrix in hand, we can now compute a similarity score. There are several candidates for this, such as:**

- **[Euclidean distance](https://en.wikipedia.org/wiki/Euclidean_distance)**.
- **[Manhattan Distance](https://en.wikipedia.org/wiki/Taxicab_geometry).**
- **[Cosine Similarity](https://en.wikipedia.org/wiki/Cosine_similarity).**
- **[Jaccard Similarity](https://en.wikipedia.org/wiki/Jaccard_index).**
- **[Pearson correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient).**
- **_and many others..._**

**In this notebook, we will be using the cosine similarity score.**


$$\text{similarity} = \cos(\theta) = \frac{\textbf{A} \cdot \textbf{B}}{\lVert \textbf{A} \rVert \lVert \textbf{B} \rVert} = \frac{\sum\limits_{i=1}^{n} A_i B_i}{\sqrt{\sum\limits_{i=1}^{n} A_i^2} \sqrt{\sum\limits_{i=1}^{n} B_i^2}}$$

**where:** 

- **$\textbf{A}$ and $\textbf{B}$ are two vectors.** 
- **$\theta$ is the angle between them.** 

**This formula calculates the cosine of the angle between the two vectors, which measures how similar they are in direction. The similarity score ranges from $-1$ to $1$, with values closer to $1$ indicating higher similarity. This formula is commonly used in natural language processing and information retrieval to measure the similarity between two documents or two sets of features.**


In [7]:
from sklearn.metrics.pairwise import cosine_similarity

# Compute the cosine similarity matrix
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

cosine_sim.shape

(4803, 4803)

**What we now have is a similarity matrix, with a $movies \times movies$ format, that tells us the similarity of the synopsis of all the movies with all the others.**

**Now we can define a function that takes in a movie title as an input and outputs a list of the most similar movies.**

In [10]:
indices = pd.Series(data.index, index=data['title']).drop_duplicates()

def get_recommendations(title, cosine_sim=cosine_sim):
    idx = indices[title]

    sim_scores = list(enumerate(cosine_sim[idx]))

    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # exclude 0 since it's the movie itself..
    sim_scores = sim_scores[1:6] 

    movie_indices = [x[0] for x in sim_scores]

    print(f'Recommendations for "{title}"\n{"_" * 100}\n')
    for movie in movie_indices:
        a = data['title'].iloc[movie]
        b = data['overview'].iloc[movie]
        print(f'Title: {a}.\nSynopsis: {b}\n')
    
    

get_recommendations('Avatar')

Recommendations for "Avatar"
____________________________________________________________________________________________________

Title: Apollo 18.
Synopsis: Officially, Apollo 17 was the last manned mission to the moon. But a year later in 1973, three American astronauts were sent on a secret mission to the moon funded by the US Department of Defense. What you are about to see is the actual footage which the astronauts captured on that mission. While NASA denies it's authenticity, others say it's the real reason we've never gone back to the moon.

Title: The American.
Synopsis: Dispatched to a small Italian town to await further orders, assassin Jack embarks on a double life that may be more relaxing than is good for him.

Title: The Matrix.
Synopsis: Set in the 22nd century, The Matrix tells the story of a computer hacker who joins a group of underground insurgents fighting the vast and powerful computers who now rule the earth.

Title: The Inhabited Island.
Synopsis: On the thresho

**Let's use the same technique, only now using different metadata (`['cast', 'crew', 'keywords', 'genres']`). Let us now create a `DataFrame` with only these features.**

In [11]:
from ast import literal_eval
import numpy as np

features = ['cast', 'crew', 'keywords', 'genres']
for feature in features:
    data[feature] = data[feature].apply(literal_eval)

def get_director(x):
    for i in x:
        if i['job'] == 'Director':
            return i['name']
    return np.nan

def get_list(x):
    if isinstance(x, list):
        names = [i['name'] for i in x]
        if len(names) > 3:
            names = names[:3]
        return names
    return []

data['director'] = data['crew'].apply(get_director)

features = ['cast', 'keywords', 'genres']
for feature in features:
    data[feature] = data[feature].apply(get_list)

display(data[['title', 'cast', 'director', 'keywords', 'genres']].head())

Unnamed: 0,title,cast,director,keywords,genres
0,Avatar,"[Sam Worthington, Zoe Saldana, Sigourney Weaver]",James Cameron,"[culture clash, future, space war]","[Action, Adventure, Fantasy]"
1,Pirates of the Caribbean: At World's End,"[Johnny Depp, Orlando Bloom, Keira Knightley]",Gore Verbinski,"[ocean, drug abuse, exotic island]","[Adventure, Fantasy, Action]"
2,Spectre,"[Daniel Craig, Christoph Waltz, Léa Seydoux]",Sam Mendes,"[spy, based on novel, secret agent]","[Action, Adventure, Crime]"
3,The Dark Knight Rises,"[Christian Bale, Michael Caine, Gary Oldman]",Christopher Nolan,"[dc comics, crime fighter, terrorist]","[Action, Crime, Drama]"
4,John Carter,"[Taylor Kitsch, Lynn Collins, Samantha Morton]",Andrew Stanton,"[based on novel, mars, medallion]","[Action, Adventure, Science Fiction]"


**Almost every time we work with text data, some cleaning is necessary. Below we are eliminating certain characters and lower casing all strings.**

In [12]:
def clean_data(x):
    if isinstance(x, list):
        return [str.lower(i.replace(" ", "")) for i in x]
    else:
        if isinstance(x, str):
            return str.lower(x.replace(" ", ""))
        else:
            return ''
features = ['cast', 'keywords', 'director', 'genres']

for feature in features:
    data[feature] = data[feature].apply(clean_data)

display(data[['title', 'cast', 'director', 'keywords', 'genres']].head())

Unnamed: 0,title,cast,director,keywords,genres
0,Avatar,"[samworthington, zoesaldana, sigourneyweaver]",jamescameron,"[cultureclash, future, spacewar]","[action, adventure, fantasy]"
1,Pirates of the Caribbean: At World's End,"[johnnydepp, orlandobloom, keiraknightley]",goreverbinski,"[ocean, drugabuse, exoticisland]","[adventure, fantasy, action]"
2,Spectre,"[danielcraig, christophwaltz, léaseydoux]",sammendes,"[spy, basedonnovel, secretagent]","[action, adventure, crime]"
3,The Dark Knight Rises,"[christianbale, michaelcaine, garyoldman]",christophernolan,"[dccomics, crimefighter, terrorist]","[action, crime, drama]"
4,John Carter,"[taylorkitsch, lynncollins, samanthamorton]",andrewstanton,"[basedonnovel, mars, medallion]","[action, adventure, sciencefiction]"


**Now, we will group all these features into one big "_soup feature_", i.e., a giant string containing all information about these separated features.**

In [13]:
def create_soup_feature(x):
    return ' '.join(x['keywords']) + ' ' + ' '.join(x['cast']) + ' ' + x['director'] + ' ' + ' '.join(x['genres'])
data['soup'] = data.apply(create_soup_feature, axis=1)

display(data['soup'][0])

'cultureclash future spacewar samworthington zoesaldana sigourneyweaver jamescameron action adventure fantasy'

**Now we can use the same function we wrote before, changing the similarity matrix, and get recommendations that take into account the director, genre, actors, etc.**

In [14]:
from sklearn.feature_extraction.text import CountVectorizer

count = CountVectorizer(stop_words='english')
count_matrix = count.fit_transform(data['soup'])
cosine_sim2 = cosine_similarity(count_matrix, count_matrix)

get_recommendations('Avatar', cosine_sim2)

Recommendations for "Avatar"
____________________________________________________________________________________________________

Title: Clash of the Titans.
Synopsis: Born of a god but raised as a man, Perseus is helpless to save his family from Hades, vengeful god of the underworld. With nothing to lose, Perseus volunteers to lead a dangerous mission to defeat Hades before he can seize power from Zeus and unleash hell on earth. Battling unholy demons and fearsome beasts, Perseus and his warriors will only survive if Perseus accepts his power as a god, defies fate and creates his own destiny.

Title: The Mummy: Tomb of the Dragon Emperor.
Synopsis: Archaeologist Rick O'Connell travels to China, pitting him against an emperor from the 2,000-year-old Han dynasty who's returned from the dead to pursue a quest for world domination. This time, O'Connell enlists the help of his wife and son to quash the so-called 'Dragon Emperor' and his abuse of supernatural power.

Title: The Monkey King

**The above system is only capable of suggesting movies that are close to a certain movie. That is, it is not capable of capturing tastes and providing recommendations across genres. Also, the system doesn't capture the personal tastes and biases of a user. However, with *Collaborative Filtering* we can address this flaw.**

**In this section, we demonstrate [Collaborative filtering](https://en.wikipedia.org/wiki/Collaborative_filtering) using the [Movielens dataset](https://www.kaggle.com/c/movielens-100k).**

In [18]:
user_data = pd.read_csv('data/movielens_user_ratings.csv')

display(user_data)

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931
...,...,...,...,...
100831,610,166534,4.0,1493848402
100832,610,168248,5.0,1493850091
100833,610,168250,5.0,1494273047
100834,610,168252,5.0,1493846352


**The steps to build a recommender system with this approach will be:**

- **Map user ID to a "_user vector_".**
- **Map movie ID to a "_movie vector_".**
- **Compute the dot product between the user vector and movie vector (use embedding to predict rating).**
- **Train the embeddings via gradient descent using all known user-movie pairs.**

> **Note: In machine learning, `embeddings` refer to a type of representation for data that maps high-dimensional data points to low-dimensional vectors. These vectors capture the key features or attributes of the data points in a compressed form. For example, in `NLP`, words or syllables can be represented as embeddings that capture their semantic meaning.**

In [19]:
user_ids = user_data["userId"].unique().tolist()

user2user_encoded = {x: i for i, x in enumerate(user_ids)}
userencoded2user = {i: x for i, x in enumerate(user_ids)}


movie_ids = user_data["movieId"].unique().tolist()

movie2movie_encoded = {x: i for i, x in enumerate(movie_ids)}
movie_encoded2movie = {i: x for i, x in enumerate(movie_ids)}

user_data["user"] = user_data["userId"].map(user2user_encoded)
user_data["movie"] = user_data["movieId"].map(movie2movie_encoded)
user_data["rating"] = user_data["rating"].values.astype(np.float32)

min_rating = min(user_data["rating"])
max_rating = max(user_data["rating"])

display(user_data)
print(f"Number of users: {len(user2user_encoded)}.")
print(f"Number of Movies: {len(movie_encoded2movie)}.")


Unnamed: 0,userId,movieId,rating,timestamp,user,movie
0,1,1,4.0,964982703,0,0
1,1,3,4.0,964981247,0,1
2,1,6,4.0,964982224,0,2
3,1,47,5.0,964983815,0,3
4,1,50,5.0,964982931,0,4
...,...,...,...,...,...,...
100831,610,166534,4.0,1493848402,609,3120
100832,610,168248,5.0,1493850091,609,2035
100833,610,168250,5.0,1494273047,609,3121
100834,610,168252,5.0,1493846352,609,1392


Number of users: 610.
Number of Movies: 9724.


**Now we shuffle our dataset, separate features and labels, and divide the training set into `train_set` and `validatiosn_set`.**

In [20]:
user_data = user_data.sample(frac=1, random_state=42
                             )
x = user_data[["user", "movie"]].values

# Normalize the targets between 0 and 1. 
y = user_data["rating"].apply(lambda x: (x - min_rating) / (max_rating - min_rating)).values

train_indices = int(0.9 * user_data.shape[0])

x_train, x_val, y_train, y_val = (
    x[:train_indices],
    x[train_indices:],
    y[:train_indices],
    y[train_indices:],
)

print('Tarining set: ', x_train.shape)
print('Validation set: ', x_val.shape)

Tarining set:  (90752, 2)
Validation set:  (10084, 2)


**We now embed both users and movies into 50-dimensional vectors. The model computes a match score between user and movie embeddings via a dot product and adds a per-movie and per-user bias.**

**Bellow, we are implementing a Keras model using `classes`,  just like in our PyTorch tutorial. The final output of the model will be a score of how likely the user will "like" other movies, based on their "_movie history_"**

In [23]:
import tensorflow as tf
from tensorflow import keras
from keras import layers


class RecommenderNet(keras.Model):
    def __init__(self, num_users, num_movies, embedding_size, **kwargs):
        super(RecommenderNet, self).__init__(**kwargs)
        self.num_users = num_users
        self.num_movies = num_movies
        self.embedding_size = embedding_size
        self.user_embedding = layers.Embedding(
            num_users,
            embedding_size,
            embeddings_initializer="he_normal",
            embeddings_regularizer=keras.regularizers.l2(1e-6),
        )
        self.user_bias = layers.Embedding(num_users, 1)
        self.movie_embedding = layers.Embedding(
            num_movies,
            embedding_size,
            embeddings_initializer="he_normal",
            embeddings_regularizer=keras.regularizers.l2(1e-6),
        )
        self.movie_bias = layers.Embedding(num_movies, 1)

    def call(self, inputs):
        user_vector = self.user_embedding(inputs[:, 0])
        user_bias = self.user_bias(inputs[:, 0])
        movie_vector = self.movie_embedding(inputs[:, 1])
        movie_bias = self.movie_bias(inputs[:, 1])
        dot_user_movie = tf.tensordot(user_vector, movie_vector, 2)
        x = dot_user_movie + user_bias + movie_bias
        return tf.nn.sigmoid(x)


model = RecommenderNet(len(user2user_encoded), len(movie_encoded2movie), 50)

model.compile(
    loss=tf.keras.losses.BinaryCrossentropy(),
    optimizer=keras.optimizers.Adam(learning_rate=0.001))

print("Version: ", tf.__version__)
print("Eager mode: ", tf.executing_eagerly())
print("GPU is", "available" if tf.config.list_physical_devices('GPU') else "NOT AVAILABLE")


history = model.fit(
    x=x_train,
    y=y_train,
    batch_size=64,
    epochs=10,
    verbose=1,
    validation_data=(x_val, y_val),
)

# model.save_weights('models/recommender.h5')

Version:  2.10.1
Eager mode:  True
GPU is available
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


**Now that we have a trained recommendation system, let's see a list of movies to be recommended.**

In [26]:
movies_df = pd.read_csv('data/movielens_movies.csv')

display(movies_df)

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy
...,...,...,...
9737,193581,Black Butler: Book of the Atlantic (2017),Action|Animation|Comedy|Fantasy
9738,193583,No Game No Life: Zero (2017),Animation|Comedy|Fantasy
9739,193585,Flint (2017),Drama
9740,193587,Bungo Stray Dogs: Dead Apple (2018),Action|Animation


**And let's use user `42`, and see the top recommendations according to our model for him/her/they.**

In [28]:
userID = 42

movies_watched_by_user_42 = user_data[user_data.userId == userID]

movies_watched = movies_df[movies_df["movieId"].isin(movies_watched_by_user_42.movieId.values)]["movieId"]

movies_not_watched = movies_df[~movies_df["movieId"].isin(movies_watched_by_user_42.movieId.values)]["movieId"]
movies_not_watched = list(set(movies_not_watched).intersection(set(movie2movie_encoded.keys())))
movies_not_watched = [[movie2movie_encoded.get(x)] for x in movies_not_watched]

user_encoder = user2user_encoded.get(42)

user_movie_array = np.hstack(([[user_encoder]] * len(movies_not_watched), movies_not_watched))

ratings = model.predict(user_movie_array).flatten()

top_ratings_indices = ratings.argsort()[-10:][::-1]

recommended_movie_ids = [movie_encoded2movie.get(movies_not_watched[x][0]) for x in top_ratings_indices]

top_movies_user = (movies_watched_by_user_42.sort_values(by="rating", ascending=False).head(5).movieId.values)

movie_df_rows = movies_df[movies_df["movieId"].isin(top_movies_user)]

recommended_movies = movies_df[movies_df["movieId"].isin(recommended_movie_ids)]

print(f'''
Recommendations for user: {userID}
{"_" * 100}
''')
for row in movie_df_rows.itertuples():
    print(f"{row.title} | {row.genres}.")

print("\nTop 10 movie recommendations")
print(f'{"_" * 100}\n')
for row in recommended_movies.itertuples():
    print(f"{row.title} | {row.genres}.")
print(f'\n{"_" * 100}')


Recommendations for user: 42
____________________________________________________________________________________________________

American President, The (1995) | Comedy|Drama|Romance.
Right Stuff, The (1983) | Drama.
Gattaca (1997) | Drama|Sci-Fi|Thriller.
Airplane! (1980) | Comedy.
City Slickers (1991) | Comedy|Western.

Top 10 movie recommendations
____________________________________________________________________________________________________

Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1964) | Comedy|War.
Rear Window (1954) | Mystery|Thriller.
Sunset Blvd. (a.k.a. Sunset Boulevard) (1950) | Drama|Film-Noir|Romance.
Lawrence of Arabia (1962) | Adventure|Drama|War.
Apocalypse Now (1979) | Action|Drama|War.
Shining, The (1980) | Horror.
Cool Hand Luke (1967) | Drama.
For a Few Dollars More (Per qualche dollaro in più) (1965) | Action|Drama|Thriller|Western.
Road Warrior, The (Mad Max 2) (1981) | Action|Adventure|Sci-Fi|Thriller.
Departed, The (2006) | 

**We explored three of the main methodologies for creating recommender systems:**

- **Demographic Filtering.**
-  **Content-Based Filtering.**  
- **Collaborative Filtering.**  

**Good recommenders usually use all of these techniques combined, but this is a good start if you want to master recommendation systems through ML. 🙃**

-----

Return to the [castle](https://github.com/Nkluge-correa/teeny-tiny_castle).