# Content-Based Recommender


Introducing a personalized content-based recommender!

 In this implementation, user preferences are captured through embeddings, which are dense vectors representing each user's unique characteristics. By analyzing movie features like genre, rating count, and average rating, the system generates personalized movie recommendations. This approach ensures tailored suggestions based on both movie content and individual user tastes, enhancing the overall recommendation experience.

### Dowload Dataset

In [2]:
!wget http://files.grouplens.org/datasets/movielens/ml-latest-small.zip

--2024-05-03 21:14:33--  http://files.grouplens.org/datasets/movielens/ml-latest-small.zip
Resolving files.grouplens.org (files.grouplens.org)... 128.101.65.152
Connecting to files.grouplens.org (files.grouplens.org)|128.101.65.152|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 978202 (955K) [application/zip]
Saving to: ‘ml-latest-small.zip’


2024-05-03 21:14:34 (2.64 MB/s) - ‘ml-latest-small.zip’ saved [978202/978202]



In [3]:
!unzip -q ml-latest-small.zip

In [4]:
import pandas as pd
import numpy as np
import matplotlib as plt
from sklearn.model_selection import train_test_split
import keras
import tensorflow as tf
from keras.models import Model
from keras.layers import Embedding, Input, Dot, Flatten, Dropout, Dense, Concatenate

In [5]:
label_path = '/content/ml-latest-small/ratings.csv'
dataset_path = '/content/ml-latest-small/movies.csv'

## Load Ratings

In [6]:
df_label = pd.read_csv(label_path)
df_label.drop_duplicates()
df_label = df_label.drop(columns=['timestamp'])
df_label

Unnamed: 0,userId,movieId,rating
0,1,1,4.0
1,1,3,4.0
2,1,6,4.0
3,1,47,5.0
4,1,50,5.0
...,...,...,...
100831,610,166534,4.0
100832,610,168248,5.0
100833,610,168250,5.0
100834,610,168252,5.0


## Load Movie Dataset

In [7]:

df_data = pd.read_csv(dataset_path)
df_data.drop_duplicates()
df_data

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy
...,...,...,...
9737,193581,Black Butler: Book of the Atlantic (2017),Action|Animation|Comedy|Fantasy
9738,193583,No Game No Life: Zero (2017),Animation|Comedy|Fantasy
9739,193585,Flint (2017),Drama
9740,193587,Bungo Stray Dogs: Dead Apple (2018),Action|Animation


## Extracting Features

1. Extracting genres per movie.

2. Extracting count of ratings per movie.

3. Extracting average rating per movie.

In [8]:
#Extracting genres per movie.
genre = []
for g in df_data['genres']:
    lst = g.split('|')
    for l in lst:
      if l not in genre:
        genre.append(l)

In [9]:
df_dataset = df_data.merge(df_label, on='movieId')

df_dataset

Unnamed: 0,movieId,title,genres,userId,rating
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,1,4.0
1,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,5,4.0
2,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,7,4.5
3,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,15,2.5
4,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,17,4.5
...,...,...,...,...,...
100831,193581,Black Butler: Book of the Atlantic (2017),Action|Animation|Comedy|Fantasy,184,4.0
100832,193583,No Game No Life: Zero (2017),Animation|Comedy|Fantasy,184,3.5
100833,193585,Flint (2017),Drama,184,3.5
100834,193587,Bungo Stray Dogs: Dead Apple (2018),Action|Animation,184,3.5


In [10]:
for i in range(len(genre)):
    df_dataset[genre[i]] = 0

df_dataset

Unnamed: 0,movieId,title,genres,userId,rating,Adventure,Animation,Children,Comedy,Fantasy,...,Horror,Mystery,Sci-Fi,War,Musical,Documentary,IMAX,Western,Film-Noir,(no genres listed)
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,1,4.0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,5,4.0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,7,4.5,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,15,2.5,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,17,4.5,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
100831,193581,Black Butler: Book of the Atlantic (2017),Action|Animation|Comedy|Fantasy,184,4.0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
100832,193583,No Game No Life: Zero (2017),Animation|Comedy|Fantasy,184,3.5,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
100833,193585,Flint (2017),Drama,184,3.5,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
100834,193587,Bungo Stray Dogs: Dead Apple (2018),Action|Animation,184,3.5,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [11]:
for i in range(len(df_dataset)):

    lst = df_dataset['genres'][i].split('|')
    for l in lst:
      df_dataset.loc[i, l] = 1

df_dataset

Unnamed: 0,movieId,title,genres,userId,rating,Adventure,Animation,Children,Comedy,Fantasy,...,Horror,Mystery,Sci-Fi,War,Musical,Documentary,IMAX,Western,Film-Noir,(no genres listed)
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,1,4.0,1,1,1,1,1,...,0,0,0,0,0,0,0,0,0,0
1,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,5,4.0,1,1,1,1,1,...,0,0,0,0,0,0,0,0,0,0
2,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,7,4.5,1,1,1,1,1,...,0,0,0,0,0,0,0,0,0,0
3,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,15,2.5,1,1,1,1,1,...,0,0,0,0,0,0,0,0,0,0
4,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,17,4.5,1,1,1,1,1,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
100831,193581,Black Butler: Book of the Atlantic (2017),Action|Animation|Comedy|Fantasy,184,4.0,0,1,0,1,1,...,0,0,0,0,0,0,0,0,0,0
100832,193583,No Game No Life: Zero (2017),Animation|Comedy|Fantasy,184,3.5,0,1,0,1,1,...,0,0,0,0,0,0,0,0,0,0
100833,193585,Flint (2017),Drama,184,3.5,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
100834,193587,Bungo Stray Dogs: Dead Apple (2018),Action|Animation,184,3.5,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [12]:
# Extracting count of ratings per movie.
movie_counts = df_dataset['movieId'].value_counts()
sorted_movie_counts = movie_counts.sort_index()
sorted_movie_counts


movieId
1         215
2         110
3          52
4           7
5          49
         ... 
193581      1
193583      1
193585      1
193587      1
193609      1
Name: count, Length: 9724, dtype: int64

In [13]:
sorted_movie_counts_df = sorted_movie_counts.reset_index()
sorted_movie_counts_df.columns = ['movieId', 'count']

df_dataset = df_dataset.merge(sorted_movie_counts_df, on='movieId')

df_dataset

Unnamed: 0,movieId,title,genres,userId,rating,Adventure,Animation,Children,Comedy,Fantasy,...,Mystery,Sci-Fi,War,Musical,Documentary,IMAX,Western,Film-Noir,(no genres listed),count
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,1,4.0,1,1,1,1,1,...,0,0,0,0,0,0,0,0,0,215
1,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,5,4.0,1,1,1,1,1,...,0,0,0,0,0,0,0,0,0,215
2,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,7,4.5,1,1,1,1,1,...,0,0,0,0,0,0,0,0,0,215
3,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,15,2.5,1,1,1,1,1,...,0,0,0,0,0,0,0,0,0,215
4,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,17,4.5,1,1,1,1,1,...,0,0,0,0,0,0,0,0,0,215
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
100831,193581,Black Butler: Book of the Atlantic (2017),Action|Animation|Comedy|Fantasy,184,4.0,0,1,0,1,1,...,0,0,0,0,0,0,0,0,0,1
100832,193583,No Game No Life: Zero (2017),Animation|Comedy|Fantasy,184,3.5,0,1,0,1,1,...,0,0,0,0,0,0,0,0,0,1
100833,193585,Flint (2017),Drama,184,3.5,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
100834,193587,Bungo Stray Dogs: Dead Apple (2018),Action|Animation,184,3.5,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,1


In [14]:
# Extracting average rating per movie.
avg_rating_per_movie = df_dataset.groupby('movieId')['rating'].mean()
avg_rating_per_movie_df = avg_rating_per_movie.reset_index()
avg_rating_per_movie_df.columns = ['movieId', 'avg_rating']

avg_rating_per_movie_df

Unnamed: 0,movieId,avg_rating
0,1,3.920930
1,2,3.431818
2,3,3.259615
3,4,2.357143
4,5,3.071429
...,...,...
9719,193581,4.000000
9720,193583,3.500000
9721,193585,3.500000
9722,193587,3.500000


In [15]:
df_dataset = df_dataset.merge(avg_rating_per_movie_df, on='movieId')
df_dataset = df_dataset.drop(columns='genres')
df_dataset

Unnamed: 0,movieId,title,userId,rating,Adventure,Animation,Children,Comedy,Fantasy,Romance,...,Sci-Fi,War,Musical,Documentary,IMAX,Western,Film-Noir,(no genres listed),count,avg_rating
0,1,Toy Story (1995),1,4.0,1,1,1,1,1,0,...,0,0,0,0,0,0,0,0,215,3.92093
1,1,Toy Story (1995),5,4.0,1,1,1,1,1,0,...,0,0,0,0,0,0,0,0,215,3.92093
2,1,Toy Story (1995),7,4.5,1,1,1,1,1,0,...,0,0,0,0,0,0,0,0,215,3.92093
3,1,Toy Story (1995),15,2.5,1,1,1,1,1,0,...,0,0,0,0,0,0,0,0,215,3.92093
4,1,Toy Story (1995),17,4.5,1,1,1,1,1,0,...,0,0,0,0,0,0,0,0,215,3.92093
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
100831,193581,Black Butler: Book of the Atlantic (2017),184,4.0,0,1,0,1,1,0,...,0,0,0,0,0,0,0,0,1,4.00000
100832,193583,No Game No Life: Zero (2017),184,3.5,0,1,0,1,1,0,...,0,0,0,0,0,0,0,0,1,3.50000
100833,193585,Flint (2017),184,3.5,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,3.50000
100834,193587,Bungo Stray Dogs: Dead Apple (2018),184,3.5,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,1,3.50000


## Create Dataset

In [16]:
movies_feature = df_dataset.iloc[:, 4:].to_numpy()
ratings = df_dataset.iloc[:, 3].to_numpy()
users = df_dataset.iloc[:, 2].to_numpy()

print("Shape of movies_feature:", movies_feature.shape)
print("Shape of ratings:", ratings.shape)
print("Shape of users:", users.shape)

Shape of movies_feature: (100836, 22)
Shape of ratings: (100836,)
Shape of users: (100836,)


In [17]:
from sklearn.model_selection import train_test_split

# Split the data into training and testing sets
movies_feature_train, movies_feature_test, ratings_train, ratings_test, users_train, users_test = train_test_split(
    movies_feature, ratings, users, test_size=0.2, random_state=42)

movies_feature_val, movies_feature_test, ratings_val, ratings_test, users_val, users_test = train_test_split(
    movies_feature_test, ratings_test, users_test, test_size=0.5, random_state=42)

# Display the shapes of the training and testing sets
print("Shape of movies_feature_train:", movies_feature_train.shape)
print("Shape of movies_feature_test:", movies_feature_test.shape)
print("Shape of movies_feature_val:", movies_feature_val.shape)

print("Shape of ratings_train:", ratings_train.shape)
print("Shape of ratings_test:", ratings_test.shape)
print("Shape of ratings_val:", ratings_val.shape)

print("Shape of users_train:", users_train.shape)
print("Shape of users_test:", users_test.shape)
print("Shape of users_val:", users_val.shape)

Shape of movies_feature_train: (80668, 22)
Shape of movies_feature_test: (10084, 22)
Shape of movies_feature_val: (10084, 22)
Shape of ratings_train: (80668,)
Shape of ratings_test: (10084,)
Shape of ratings_val: (10084,)
Shape of users_train: (80668,)
Shape of users_test: (10084,)
Shape of users_val: (10084,)


## Normalize Features

In [18]:
def normalize(x, mu=None, sigma=None):
    if mu == None and sigma == None:
        mu = x.mean()
        sigma = x.std()
    return (x - mu) / sigma, mu, sigma


In [19]:
movies_feature_train, mu, sigma = normalize(movies_feature_train)
movies_feature_test, _ , _ = normalize(movies_feature_test, mu, sigma)
movies_feature_val, _ , _ = normalize(movies_feature_val, mu, sigma)

## Create Pipline

In [20]:
AUTOTUNE = tf.data.experimental.AUTOTUNE
BUFFER = 1000
BATCH_SIZE = 1024

training = tf.data.Dataset.from_tensor_slices(((movies_feature_train, users_train), ratings_train))
training = training.shuffle(BUFFER)
training = training.batch(BATCH_SIZE, num_parallel_calls=AUTOTUNE, drop_remainder=True)
training = training.prefetch(AUTOTUNE)

test = tf.data.Dataset.from_tensor_slices(((movies_feature_test, users_test), ratings_test))
test = test.batch(BATCH_SIZE, num_parallel_calls=AUTOTUNE, drop_remainder=True)
test = test.prefetch(AUTOTUNE)

validation = tf.data.Dataset.from_tensor_slices(((movies_feature_val, users_val), ratings_val))
validation = validation.batch(BATCH_SIZE, num_parallel_calls=AUTOTUNE, drop_remainder=True)
validation = validation.prefetch(AUTOTUNE)

In [21]:
for (feature, user), rate in training.take(1):
  print(feature, user, rate)

tf.Tensor(
[[-0.10856499 -0.16424701 -0.16424701 ... -0.16424701  5.51531918
   0.03609908]
 [-0.16424701 -0.10856499 -0.16424701 ... -0.16424701  0.94939342
   0.05291287]
 [-0.16424701 -0.16424701 -0.16424701 ... -0.16424701  2.84258215
   0.00743922]
 ...
 [-0.16424701 -0.16424701 -0.16424701 ... -0.16424701 -0.10856499
   0.05848107]
 [-0.16424701 -0.16424701 -0.16424701 ... -0.16424701  0.55961926
  -0.00790903]
 [-0.16424701 -0.16424701 -0.16424701 ... -0.16424701  1.9516698
   0.007195  ]], shape=(1024, 22), dtype=float64) tf.Tensor([593 483 590 ... 238 368 429], shape=(1024,), dtype=int64) tf.Tensor([2.  4.5 3.  ... 4.  2.  4. ], shape=(1024,), dtype=float64)


## Model Architecture

In [44]:
class MyModel(Model):
    def __init__(self, features, n_user):
        super(MyModel, self).__init__()

        self.user_embedding = Embedding(n_user + 1, features, name="User-Embedding")
        self.user_flatten = Flatten()
        self.dotter = Dot(axes=1)

    def build(self, user_input_shape, movie_input_shape):
        user_inputs = Input(shape=user_input_shape)
        movie_inputs = Input(shape=movie_input_shape)
        self.call([movie_inputs, user_inputs], training=False)
        self.built = True

    def call(self, inputs, training):
        movies, users = inputs
        user_x = self.user_embedding(users)
        user_x = self.user_flatten(user_x)
        x = self.dotter([movies, user_x])
        return x

In [74]:
features = movies_feature_train.shape[-1]
n_user = users_train.shape[0]
user_input_shape = (1,)
movie_input_shape = (movies_feature_train.shape[1],)
model = MyModel(features, n_user)

model.build(user_input_shape=user_input_shape, movie_input_shape=movie_input_shape)

model.summary()

Model: "my_model_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 User-Embedding (Embedding)  (None, 1, 22)             1774718   
                                                                 
 flatten_4 (Flatten)         (None, 22)                0         
                                                                 
 dot_4 (Dot)                 (None, 1)                 0         
                                                                 
Total params: 1774718 (6.77 MB)
Trainable params: 1774718 (6.77 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [76]:
model.compile(loss=tf.losses.MeanSquaredError(),
              optimizer=tf.optimizers.Adam())

In [77]:
from tensorflow.keras.callbacks import LearningRateScheduler
def get_scheduler(initial_learning_rate, min_learning_rate=1e-5, weight=0.95):
    def scheduler(epoch, lr):
        return max(initial_learning_rate * weight ** (epoch // 10), min_learning_rate)
    return scheduler

# Define your scheduler function
scheduler = get_scheduler(initial_learning_rate=0.01)

# Create a LearningRateScheduler callback using the scheduler function
lr_scheduler_callback = LearningRateScheduler(scheduler)

In [78]:

model.fit(training,
          validation_data=validation,
          epochs=150,
          callbacks=[lr_scheduler_callback])

Epoch 1/150
Epoch 2/150
Epoch 3/150
Epoch 4/150
Epoch 5/150
Epoch 6/150
Epoch 7/150
Epoch 8/150
Epoch 9/150
Epoch 10/150
Epoch 11/150
Epoch 12/150
Epoch 13/150
Epoch 14/150
Epoch 15/150
Epoch 16/150
Epoch 17/150
Epoch 18/150
Epoch 19/150
Epoch 20/150
Epoch 21/150
Epoch 22/150
Epoch 23/150
Epoch 24/150
Epoch 25/150
Epoch 26/150
Epoch 27/150
Epoch 28/150
Epoch 29/150
Epoch 30/150
Epoch 31/150
Epoch 32/150
Epoch 33/150
Epoch 34/150
Epoch 35/150
Epoch 36/150
Epoch 37/150
Epoch 38/150
Epoch 39/150
Epoch 40/150
Epoch 41/150
Epoch 42/150
Epoch 43/150
Epoch 44/150
Epoch 45/150
Epoch 46/150
Epoch 47/150
Epoch 48/150
Epoch 49/150
Epoch 50/150
Epoch 51/150
Epoch 52/150
Epoch 53/150
Epoch 54/150
Epoch 55/150
Epoch 56/150
Epoch 57/150
Epoch 58/150
Epoch 59/150
Epoch 60/150
Epoch 61/150
Epoch 62/150
Epoch 63/150
Epoch 64/150
Epoch 65/150
Epoch 66/150
Epoch 67/150
Epoch 68/150
Epoch 69/150
Epoch 70/150
Epoch 71/150
Epoch 72/150
Epoch 73/150
Epoch 74/150
Epoch 75/150
Epoch 76/150
Epoch 77/150
Epoch 78

<keras.src.callbacks.History at 0x78b2c039be50>

## Evaluate Model

In [79]:
model.evaluate(test)



0.6127814054489136

In [80]:
# Initialize empty lists to store features, users, and ratings
features_list = []
users_list = []
ratings_list = []

# Iterate through the test dataset and extract features, users, and ratings
for (features, users), ratings in test.as_numpy_iterator():
    features_list.append(features)
    users_list.append(users)
    ratings_list.append(ratings)

# Convert lists to numpy arrays
features_np = np.array(features_list)
users_np = np.array(users_list)
ratings_np = np.array(ratings_list)

features_np = features_np.reshape(features_np.shape[0] * features_np.shape[1], -1)
users_np = users_np.reshape(users_np.shape[0] * users_np.shape[1], -1)
ratings_np = ratings_np.reshape(ratings_np.shape[0] * ratings_np.shape[1], -1)

preds = model.predict((features_np, users_np))

for i in range(20):

    print(preds[i], ratings_np[i])

[4.2136927] [4.5]
[3.3561928] [3.]
[4.038139] [5.]
[3.0721807] [3.]
[4.5235214] [5.]
[2.9502478] [2.]
[3.2118654] [2.5]
[3.1666276] [1.]
[4.0667787] [5.]
[1.5112844] [1.]
[3.1083703] [4.]
[2.7864099] [2.]
[3.8413253] [3.5]
[3.8591833] [5.]
[2.999925] [3.]
[3.970624] [3.5]
[3.4186351] [3.]
[2.7953155] [2.5]
[3.193067] [3.5]
[4.086236] [4.]


### Suggest 10 movies to user 249

In [81]:
df_unique = df_dataset.drop(columns='userId')
df_unique = df_unique.drop(columns='rating')
df_unique = df_unique.drop_duplicates()
df_unique

Unnamed: 0,movieId,title,Adventure,Animation,Children,Comedy,Fantasy,Romance,Drama,Action,...,Sci-Fi,War,Musical,Documentary,IMAX,Western,Film-Noir,(no genres listed),count,avg_rating
0,1,Toy Story (1995),1,1,1,1,1,0,0,0,...,0,0,0,0,0,0,0,0,215,3.920930
215,2,Jumanji (1995),1,0,1,0,1,0,0,0,...,0,0,0,0,0,0,0,0,110,3.431818
325,3,Grumpier Old Men (1995),0,0,0,1,0,1,0,0,...,0,0,0,0,0,0,0,0,52,3.259615
377,4,Waiting to Exhale (1995),0,0,0,1,0,1,1,0,...,0,0,0,0,0,0,0,0,7,2.357143
384,5,Father of the Bride Part II (1995),0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,49,3.071429
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
100831,193581,Black Butler: Book of the Atlantic (2017),0,1,0,1,1,0,0,1,...,0,0,0,0,0,0,0,0,1,4.000000
100832,193583,No Game No Life: Zero (2017),0,1,0,1,1,0,0,0,...,0,0,0,0,0,0,0,0,1,3.500000
100833,193585,Flint (2017),0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,1,3.500000
100834,193587,Bungo Stray Dogs: Dead Apple (2018),0,1,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,1,3.500000


In [82]:
movies_feature_unique = df_unique.iloc[:, 2:].to_numpy()
movies_id =  df_unique.iloc[:, 0].to_numpy()
movies_feature_unique, _ , _ = normalize(movies_feature_unique, mu, sigma)
movies_feature_unique.shape

(9724, 22)

In [83]:
user_id = np.ones((movies_feature_unique.shape[0], )) * 249
prediction = model.predict((movies_feature_unique, user_id), verbose=False)

In [84]:
prediction

array([[4.0704036],
       [3.7416682],
       [3.5739763],
       ...,
       [3.911741 ],
       [3.913497 ],
       [4.239333 ]], dtype=float32)

In [85]:
predictions_with_movie_id = list(zip(movies_id, prediction))
sorted_predictions_with_movie_id = sorted(predictions_with_movie_id, key=lambda x: x[1], reverse=True)

print(sorted_predictions_with_movie_id[:10])

[(4180, array([5.250108], dtype=float32)), (26401, array([5.250108], dtype=float32)), (2196, array([5.2454596], dtype=float32)), (5244, array([5.196497], dtype=float32)), (115727, array([5.196497], dtype=float32)), (26169, array([5.1962423], dtype=float32)), (80124, array([5.1962423], dtype=float32)), (82744, array([5.1962423], dtype=float32)), (78836, array([5.18624], dtype=float32)), (148, array([5.185632], dtype=float32))]


In [86]:
# Assuming df_unique contains columns 'movieId' and 'title'
def get_movie_name(movie_id):
    movie_name = df_unique[df_unique['movieId'] == movie_id]['title'].values
    return movie_name[0] if len(movie_name) > 0 else None


movie_id = 1
movie_name = get_movie_name(movie_id)
print("Movie Name:", movie_name)

Movie Name: Toy Story (1995)


In [87]:
for i in range(10):
    name = get_movie_name(sorted_predictions_with_movie_id[i][0])
    rate = sorted_predictions_with_movie_id[i][1]
    print(f'Recemended movie for you is {name}: rating {rate}')

Recemended movie for you is Reform School Girls (1986): rating [5.250108]
Recemended movie for you is Last Hurrah for Chivalry (Hao xia) (1979): rating [5.250108]
Recemended movie for you is Knock Off (1998): rating [5.2454596]
Recemended movie for you is Shogun Assassin (1980): rating [5.196497]
Recemended movie for you is Crippled Avengers (Can que) (Return of the 5 Deadly Venoms) (1981): rating [5.196497]
Recemended movie for you is Branded to Kill (Koroshi no rakuin) (1967): rating [5.1962423]
Recemended movie for you is Sisters (Syostry) (2001): rating [5.1962423]
Recemended movie for you is Faster (2010): rating [5.1962423]
Recemended movie for you is Enter the Void (2009): rating [5.18624]
Recemended movie for you is Awfully Big Adventure, An (1995): rating [5.185632]
