# **LightFM Model**

LightFM can use the normal user-item interactions for making predictions for known users. In the case of new users, it can make predictions if it knows some additional information about these new users. This additional information could be features like gender, age, ethnicity, etc and must be fed to the algorithm during training.

### Reviewers Demographic Dataset

This dataset contains review data for movies, containing information of the characteristics of the movie reviewer such as gender, age, occupation and area.

- **reviewer_id:** Unique ID of the reviewer
- **reviewer_gender:** Gender of the reviewer
  - 1: "F"
  - 2: "M"
- **reviewer_age:** Age of the reviewer in categorical age ranges
  - 1: "Under 18"  
  - 2: "18-24"
  - 3: "25-34"
  - 4: "35-44"
  - 5: "45-49"
  - 6: "50-55"
  - 7: "56+"
- **reviewer_occupation:** Occupation of the reviewer encoded by an integer
  - 0:  "other" or not specified
  - 1:  "academic/educator"
  - 2:  "artist"
  - 3:  "clerical/admin"
  - 4:  "college/grad student"
  - 5:  "customer service"
  - 6:  "doctor/health care"
  - 7:  "executive/managerial"
  - 8:  "farmer"
  - 9:  "homemaker"
  - 10:  "K-12 student"
  - 11:  "lawyer"
  - 12:  "programmer"
  - 13:  "retired"
  - 14:  "sales/marketing"
  - 15:  "scientist"
  - 16:  "self-employed"
  - 17:  "technician/engineer"
  - 18:  "tradesman/craftsman"
  - 19:  "unemployed"
  - 20:  "writer"
- **reviewer_area:** Location of the reviewer grouped by the first digit of the reviewer's zipcode.
- **reviewer_rating:** Movie rating by reviewer from a range of 1-5
- **movie_id:** Unique ID of movie reviewed
- **movie_title:** Title of movie reviewed
- **movie_genre:** Genre of movie reviewed
	* Action
	* Adventure
	* Animation
	* Children's
	* Comedy
	* Crime
	* Documentary
	* Drama
	* Fantasy
	* Film-Noir
	* Horror
	* Musical
	* Mystery
	* Romance
	* Sci-Fi
	* Thriller
	* War
	* Western
- **movie_year_of_release:** Year of release of movie reviewed

In [None]:
from google.colab import drive
drive.mount('/content/drive/')

Mounted at /content/drive/


## **1. Installation and Imports**

In [None]:
import sys
!{sys.executable} -m pip install lightfm
!pip install git+https://github.com/microsoft/recommenders.git
!pip install sklearn

Collecting git+https://github.com/microsoft/recommenders.git
  Cloning https://github.com/microsoft/recommenders.git to /tmp/pip-req-build-t5wz83py
  Running command git clone --filter=blob:none --quiet https://github.com/microsoft/recommenders.git /tmp/pip-req-build-t5wz83py
  Resolved https://github.com/microsoft/recommenders.git to commit 397e80afbea49e634655739b8c8f06e913350b84
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting sklearn
  Downloading sklearn-0.0.post12.tar.gz (2.6 kB)
  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mpython setup.py egg_info[0m did not run successfully.
  [31m│[0m exit code: [1;36m1[0m
  [31m╰─>[0m See above for output.
  
  [1;35mnote[0m: This error originates from a subprocess, and is likely not a problem with pip.
  Preparing metadata (setup.py) ... [?25l[?25herror
[1;31merror[

In [None]:
import os
import pandas as pd
import gdown
import numpy as np
import warnings
from lightfm import LightFM
from lightfm.data import Dataset
from lightfm.evaluation import precision_at_k, recall_at_k, auc_score
from recommenders.models.lightfm.lightfm_utils import track_model_metrics, similar_items
from sklearn.preprocessing import OneHotEncoder
from sklearn.metrics import f1_score
from sklearn.model_selection import train_test_split, ParameterGrid

In [None]:
pd.set_option('display.max_rows', 100)
pd.set_option('display.max_columns', 500)

warnings.simplefilter(action='ignore', category=FutureWarning)

## **2. Feature Engineering**



*   Load Dataset
*   Create new columns for encoded genre
*   Create new columns for each unique age, gender and occupation



In [99]:
df = pd.read_csv('/content/drive/MyDrive/BT4222 Project/processed_datasets/reviewer_demographic.csv', encoding='latin-1') # Or 'cp1252', 'ISO-8859-1'
print(df.shape)
df.head()

(1000209, 10)


Unnamed: 0,reviewer_id,reviewer_gender,reviewer_age,reviewer_occupation,reviewer_area,reviewer_rating,movie_id,movie_title,movie_genre,movie_year_of_release
0,1,F,1,10,4,5,1193,one flew over the cuckoo's nest (1975),['Drama'],1975
1,1,F,1,10,4,3,661,james and the giant peach (1996),"['Animation', ""Children's"", 'Musical']",1996
2,1,F,1,10,4,3,914,my fair lady (1964),"['Musical', 'Romance']",1964
3,1,F,1,10,4,4,3408,erin brockovich (2000),['Drama'],2000
4,1,F,1,10,4,5,2355,"bug's life, a (1998)","['Animation', ""Children's"", 'Comedy']",1998


In [100]:
# Preprocess movie genres
df['movie_genre'] = df['movie_genre'].apply(eval)
genres = df['movie_genre'].explode().unique()

# OneHotEncoder setup
genre_encoder = OneHotEncoder(sparse_output=False)
genre_encoder.fit(np.array(genres).reshape(-1, 1))

# Function to encode genres
def encode_genres(genre_list):
    genre_array = np.array(genre_list).reshape(-1, 1)
    return genre_encoder.transform(genre_array).sum(axis=0)

# Apply encoding
df['encoded_genres'] = df['movie_genre'].apply(encode_genres)

# Verify encoded genres
genre_feature_names = genre_encoder.get_feature_names_out()

In [101]:
# Instantiate the OneHotEncoder
encoder = OneHotEncoder(sparse_output=False)

# Encode gender
encoded_gender = encoder.fit_transform(df[['reviewer_gender']])
gender_feature_names = encoder.get_feature_names_out(['reviewer_gender'])
df[gender_feature_names] = encoded_gender

# Encode occupation
encoded_occupation = encoder.fit_transform(df[['reviewer_occupation']])
occupation_feature_names = encoder.get_feature_names_out(['reviewer_occupation'])
df[occupation_feature_names] = encoded_occupation

# Encode age
encoded_age = encoder.fit_transform(df[['reviewer_age']])
age_feature_names = encoder.get_feature_names_out(['reviewer_age'])
df[age_feature_names] = encoded_age

df

Unnamed: 0,reviewer_id,reviewer_gender,reviewer_age,reviewer_occupation,reviewer_area,reviewer_rating,movie_id,movie_title,movie_genre,movie_year_of_release,encoded_genres,reviewer_gender_F,reviewer_gender_M,reviewer_occupation_0,reviewer_occupation_1,reviewer_occupation_2,reviewer_occupation_3,reviewer_occupation_4,reviewer_occupation_5,reviewer_occupation_6,reviewer_occupation_7,reviewer_occupation_8,reviewer_occupation_9,reviewer_occupation_10,reviewer_occupation_11,reviewer_occupation_12,reviewer_occupation_13,reviewer_occupation_14,reviewer_occupation_15,reviewer_occupation_16,reviewer_occupation_17,reviewer_occupation_18,reviewer_occupation_19,reviewer_occupation_20,reviewer_age_1,reviewer_age_2,reviewer_age_3,reviewer_age_4,reviewer_age_5,reviewer_age_6,reviewer_age_7
0,1,F,1,10,4,5,1193,one flew over the cuckoo's nest (1975),[Drama],1975,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, ...",1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
1,1,F,1,10,4,3,661,james and the giant peach (1996),"[Animation, Children's, Musical]",1996,"[0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
2,1,F,1,10,4,3,914,my fair lady (1964),"[Musical, Romance]",1964,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
3,1,F,1,10,4,4,3408,erin brockovich (2000),[Drama],2000,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, ...",1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
4,1,F,1,10,4,5,2355,"bug's life, a (1998)","[Animation, Children's, Comedy]",1998,"[0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...",1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1000204,6040,M,3,6,1,1,1091,weekend at bernie's (1989),[Comedy],1989,"[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, ...",0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
1000205,6040,M,3,6,1,5,1094,"crying game, the (1992)","[Drama, Romance, War]",1992,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, ...",0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
1000206,6040,M,3,6,1,5,562,welcome to the dollhouse (1995),"[Comedy, Drama]",1995,"[0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, ...",0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
1000207,6040,M,3,6,1,4,1096,sophie's choice (1982),[Drama],1982,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, ...",0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0


## **4. Create dataset for LightFM model input**

`Dataset()` is used to store and manage user-item interactions and additional features for building recommendation models.

- Training: LightFM uses the user-item matrix to understand which users have interacted with which items and uses these interactions to learn latent representations of users and items.

- Item Features: The item feature matrix can be passed to the `fit()` function when training the model. The model will use this side information to better understand item similarities and user preferences, helping make more accurate recommendations.

- Hybrid Recommendation: By combining the user-item matrix with the item feature matrix, LightFM can create hybrid recommendation systems that leverage both collaborative filtering (based on interactions) and content-based filtering (based on item features).


**This method is used to build a sparse interaction matrix where each entry corresponds to the interaction (rating) between a user and an item (movie). The interactions are weighted by the ratings provided by users.**

In [104]:
# Step 2: Fit the dataset with users, items, user features, and item features
user_feature_names = list(df.columns[df.columns.str.startswith('reviewer_gender_')]) + \
                     list(df.columns[df.columns.str.startswith('reviewer_occupation_')]) + \
                     list(df.columns[df.columns.str.startswith('reviewer_age_')])

# Create a LightFM dataset
dataset = Dataset()
dataset.fit(
    users=df['reviewer_id'].unique(),
    items=df['movie_id'].unique(),
    user_features=user_feature_names,
    item_features=genre_feature_names
)

# Get the number of users and items
num_users, num_items = dataset.interactions_shape()

# Print the shape
print(f"Dataset shape: ({num_users}, {num_items})")

Dataset shape: (6040, 3706)


In [105]:
# Step 3: Build interactions and features
(interactions, weights) = dataset.build_interactions(
    (row['reviewer_id'], row['movie_id'], row['reviewer_rating'])
    for _, row in df.iterrows()
)

# Build user features
user_features = dataset.build_user_features(
    (row['reviewer_id'], dict(zip(user_feature_names, row[user_feature_names])))
    for _, row in df.iterrows()
)

# Build item features
item_features = dataset.build_item_features(
    (row['movie_id'], dict(zip(genre_feature_names, row['encoded_genres'])))
    for _, row in df.iterrows()
)

# Print final results
print(f"Interactions shape: {interactions.shape}")
print(f"User features shape: {user_features.shape}")
print(f"Item features shape: {item_features.shape}")

Interactions shape: (6040, 3706)
User features shape: (6040, 6070)
Item features shape: (3706, 3724)


## **5. Split into train and test interactions**

For a LightFM model, train and test sets are expected to have the same dimensions. Convolutional train test split will not work. Hence we decide to do a chronological train test split. Train:test ratio = 70:30.

**Chronological Train-Test Split:**

When you split the dataset chronologically, you're essentially choosing interactions that happened before a specific date (for training) and those that happened after that date (for testing).

  - For example, if you're building a recommendation system for movies, you could use reviews up to a certain point in time for training and reviews that came after that for testing.

**Validation in LightFM:**

While LightFM doesn't have an explicit validation set, we will evaluate the model on the test set after training.

In [117]:
def chronological_split(df, test_percentage=0.3):
  split_year = df['movie_year_of_release'].quantile(1 - test_percentage)
  train_df = df[df['movie_year_of_release'] < split_year]
  test_df = df[df['movie_year_of_release'] >= split_year]
  return train_df, test_df

train_df, test_df = chronological_split(df)

train_interactions, _ = dataset.build_interactions(
    (row['reviewer_id'], row['movie_id'], row['reviewer_rating'])
    for _, row in train_df.iterrows()
)

test_interactions, _ = dataset.build_interactions(
    (row['reviewer_id'], row['movie_id'], row['reviewer_rating'])
    for _, row in test_df.iterrows()
)

In [118]:
# Print the shape of the training set
train_shape = train_interactions.shape
print(f"Training set shape: {train_shape} (Users, Items)")

# Print the shape of the test set
test_shape = test_interactions.shape
print(f"Test set shape: {test_shape} (Users, Items)")


Training set shape: (6040, 3706) (Users, Items)
Test set shape: (6040, 3706) (Users, Items)


## **6. Fit the LightFM model**

### **6.1 Loss function for training: Weighted Approximate-Rank Pairwise (WARP) Loss**

- The Weighted Approximate-Rank Pairwise (WRAP) loss function is well-suited for ranking-based tasks such as recommendation.

- Leveraging WARP helps the model learn to rank relevant items higher than irrelevant ones, focusing on pairwise ranking of items to provide personalised recommendations.

`no_components:` The number of latent factors. More components can capture more complex patterns but may lead to overfitting.

`learning_rate:` Affects the convergence speed of the model. Too high can make it unstable, while too low can make training slow.

`item_alph/user_alpha (regularization term):` Helps to prevent overfitting by adding a penalty to item features.

The `random_state` sets the initial state of the random number generator. If you don’t set random_state, the model will use a different random seed each time you run it, potentially leading to slightly different outcomes due to randomness in processes like weight initialization or data shuffling.

In [107]:
# Initiate LightFM model with Weighted Approximate-Rank Pairwise Loss
model = LightFM(loss='warp', no_components=20, learning_rate=0.001, item_alpha=1e-8, user_alpha=1e-8, random_state=42)
model.fit(train_interactions, epochs=10, item_features=item_features, user_features=user_features, verbose=True)

Epoch: 100%|██████████| 10/10 [00:58<00:00,  5.82s/it]


<lightfm.lightfm.LightFM at 0x7889cdb93940>

### **6.2 Model Evaluation**

In [108]:
precision = precision_at_k(model, test_interactions, train_interactions, item_features=item_features, user_features=user_features, k=5).mean()
recall = recall_at_k(model, test_interactions, train_interactions, item_features=item_features, user_features=user_features, k=5).mean()
auc = auc_score(model, test_interactions, train_interactions, item_features=item_features, user_features=user_features).mean()
f_score = (2*precision*recall) / (recall+precision)

precision_3sf = "{:.3f}".format(precision)
recall_3sf = "{:.3f}".format(recall)
auc_score_3sf = "{:.3f}".format(auc)
f_score_3sf = "{:.3f}".format(f_score)

# Create a table to report model performance on training and testing sets
performance_table = pd.DataFrame({
    "Metric": ["Average Precision@K", "Recall@K", "AUC Score", "F-score@K"],
    "Overall": [precision_3sf, recall_3sf, auc_score_3sf, f_score_3sf]
})

print(performance_table)

                Metric Overall
0  Average Precision@K   0.026
1             Recall@K   0.002
2            AUC Score   0.569
3            F-score@K   0.004


## **7. Hyperparameter Tuning**

### **7.1 Tune 1**
- Increase Epochs
- Reduce Learning Rate
- Reduce item_alpha and user_alpha

In [119]:
# Initiate LightFM model with Weighted Approximate-Rank Pairwise Loss
model2 = LightFM(loss='warp', no_components=20, learning_rate=0.0001, item_alpha=1e-6, user_alpha=1e-6, random_state=42)
model2.fit(train_interactions, epochs=15, item_features=item_features, user_features=user_features, verbose=True)

Epoch: 100%|██████████| 15/15 [01:49<00:00,  7.27s/it]


<lightfm.lightfm.LightFM at 0x788a74404bb0>

In [120]:
precision = precision_at_k(model2, test_interactions, train_interactions, item_features=item_features, user_features=user_features, k=5).mean()
recall = recall_at_k(model2, test_interactions, train_interactions, item_features=item_features, user_features=user_features, k=5).mean()
auc = auc_score(model2, test_interactions, train_interactions, item_features=item_features, user_features=user_features).mean()
f_score = (2*precision*recall) / (recall+precision)

precision_3sf = "{:.3f}".format(precision)
recall_3sf = "{:.3f}".format(recall)
auc_score_3sf = "{:.3f}".format(auc)
f_score_3sf = "{:.3f}".format(f_score)

# Create a table to report model performance on training and testing sets
performance_table = pd.DataFrame({
    "Metric": ["Average Precision@K", "Recall@K", "AUC Score", "F-score@K"],
    "Overall": [precision_3sf, recall_3sf, auc_score_3sf, f_score_3sf]
})

print(performance_table)

                Metric Overall
0  Average Precision@K   0.072
1             Recall@K   0.007
2            AUC Score   0.580
3            F-score@K   0.013


### **7.2 Tune 2**
- Increase Epochs

In [121]:
# Initiate LightFM model with Weighted Approximate-Rank Pairwise Loss
model3 = LightFM(loss='warp', no_components=20, learning_rate=0.0001, item_alpha=1e-6, user_alpha=1e-6, random_state=42)
model3.fit(train_interactions, epochs=20, item_features=item_features, user_features=user_features, verbose=True)

Epoch: 100%|██████████| 20/20 [02:25<00:00,  7.30s/it]


<lightfm.lightfm.LightFM at 0x788aabc43670>

In [122]:
precision = precision_at_k(model3, test_interactions, train_interactions, item_features=item_features, user_features=user_features, k=5).mean()
recall = recall_at_k(model3, test_interactions, train_interactions, item_features=item_features, user_features=user_features, k=5).mean()
auc = auc_score(model3, test_interactions, train_interactions, item_features=item_features, user_features=user_features).mean()
f_score = (2*precision*recall) / (recall+precision)

precision_3sf = "{:.3f}".format(precision)
recall_3sf = "{:.3f}".format(recall)
auc_score_3sf = "{:.3f}".format(auc)
f_score_3sf = "{:.3f}".format(f_score)

# Create a table to report model performance on training and testing sets
performance_table = pd.DataFrame({
    "Metric": ["Average Precision@K", "Recall@K", "AUC Score", "F-score@K"],
    "Overall": [precision_3sf, recall_3sf, auc_score_3sf, f_score_3sf]
})

print(performance_table)

                Metric Overall
0  Average Precision@K   0.110
1             Recall@K   0.011
2            AUC Score   0.581
3            F-score@K   0.021


**Explanation of Metrics:**

- Precision@K measures how many of the top K items recommended to a user were relevant (i.e., items the user actually interacted with in the test set).
A value of 0.11 means that, on average, around 11% of the top K recommendations were relevant across all users. This value is relatively low, indicating that only a small fraction of the top K items recommended were correct matches.

- Recall@K measures how many of the relevant items for each user (from the test set) were included in the top K recommendations.
A value of 0.011 means that only 1.1% of the items a user interacted with in the test set were found in the top K recommendations. This suggests that the model has a limited ability to capture all relevant items for users.

- AUC (Area Under the Curve) measures the probability that a randomly chosen positive item (an item the user interacted with) is ranked higher than a randomly chosen negative item (an item the user did not interact with).
A value of 0.581 indicates that the model is only slightly better than random chance (0.5). An ideal AUC score is closer to 1.0, so a score of 0.581 implies that the ranking quality needs significant improvement.

- F-score is the harmonic mean of Precision and Recall, balancing the two metrics.
A value of 0.021 means the overall balance between precision and recall is very low, indicating the model struggles to provide a meaningful trade-off between correctly predicted items and capturing all relevant items.

**Potential Reasons for Low Performance:**
1. Limited Feature Richness: Using only age, gender, occupation as user features may oversimplify user preferences, failing to account for more nuanced interests or viewing history. Additionally, the item features, which are limited to genre, help categorize movies but do not capture the subtleties of movie content, such as themes, sub-genres, or director styles. This lack of detailed features on both the user and item side can lead to less personalized recommendations, which in turn impacts metrics like precision, recall, and F-score.

2. Chronological Data Split: Splitting dataset chronologically means that some users or items in the test set might not have appeared in the training set. This can cause the model to struggle when recommending for unseen users or items, impacting metrics like precision and recall.

3. Cold Start Challenges: Although LightFM helps mitigate the cold start problem by incorporating user and item features, the sparse or limited diversity in these features can restrict its effectiveness.

**Potential for Improved Performance:**
1. Adding more specific features (like director and cast) could potentially improve the model's performance, but this would require more computational resources than are currently available.

2. Experimenting with different training and test split strategies, such as using a stratified or randomized split, might help the model generalize better and improve its performance on unseen users and items.


## **8. Making predictions with model**

### **8.1 Top 5 Recommended Movies for Sample User**

For a user (say, User 1), the model computes scores for each movie they haven't interacted with yet based on their preferences (latent factors) and the movie features.

For example, the model might predict that User 1 would give Movie 10 a high score based on the fact that it’s an action movie (Action feature) and User 1 has shown a preference for action movies in the past.

In [128]:
# Movies watched by user 1
unique_movie_ids_for_reviewer = df[df['reviewer_id'] == 1]['movie_id'].unique()
print(unique_movie_ids_for_reviewer)

[1193  661  914 3408 2355 1197 1287 2804  594  919  595  938 2398 2918
 1035 2791 2687 2018 3105 2797 2321  720 1270  527 2340   48 1097 1721
 1545  745 2294 3186 1566  588 1907  783 1836 1022 2762  150    1 1961
 1962 2692  260 1028 1029 1207 2028  531 3114  608 1246]


In [135]:
# Make recommendations for a specific user
def get_recommendations(user_id, model, dataset, df, n=5):
    n_users, n_items = dataset.interactions_shape()
    scores = model.predict(user_id, np.arange(n_items), user_features=user_features, item_features=item_features)
    top_items = np.argsort(-scores)[:n]

    recommendations = df[df['movie_id'].isin(dataset.mapping()[2].keys())].iloc[top_items]
    return recommendations[['movie_id', 'movie_title', 'movie_genre']]

In [136]:
# Example: Get recommendations for user with ID 1
user_id = 1
recommendations = get_recommendations(user_id, model3, dataset, df)
print(f"\nTop 10 movie recommendations for user {user_id}:")
print(recommendations)


Top 10 movie recommendations for user 1:
      movie_id           movie_title                  movie_genre
994        926  all about eve (1950)                      [Drama]
1241      3105     awakenings (1990)                      [Drama]
860       2640       superman (1978)  [Action, Adventure, Sci-Fi]
237       1036       die hard (1988)           [Action, Thriller]
155        457  fugitive, the (1993)           [Action, Thriller]


*Note: Recommendations can't be check as there is no ground truth as to what the users will watch.*
