# Personalized Movie Recommendations
This notebook shows how to build a personalized movie recommendation model with ThirdAI's Universal Deep Transformer (UDT) model, our all-purpose classifier for tabular datasets. In this demo, we will train and evaluate the model on the Movielens 1M dataset, but you can easily replace this with your own dataset.

You can immediately run a version of this notebook in your browser on Google Colab at the following link:

https://githubtocolab.com/ThirdAILabs/Demos/blob/main/universal_deep_transformer/personlization_and_recommendation/PersonalizedMovieRecommendations.ipynb

This notebook uses an activation key that will only work with this demo. If you want to try us out on your own dataset, you can obtain a free trial license at the following link: https://www.thirdai.com/try-bolt/

In [None]:
!pip3 install thirdai --upgrade

import thirdai
thirdai.licensing.activate("Y9MT-TV7T-4JTP-L4XH-PWYC-4KEF-VX93-3HV7")

# Dataset Download
We will use the demos module in the thirdai package to download the Movielens 1M dataset. You can replace this step and the next step with a download method and a UDT initialization step that is specific to your dataset.

In [None]:
from thirdai.demos import download_movielens

train_filename, test_filename, inference_batch, index_batch = download_movielens()

# UDT Initialization
We can now create a UDT model by passing in the types of each column in the dataset and the target column we want to be able to predict.

For this demo, we additionally want to use "temporal context" to make predictions. Adding temporal context requires a single bolt.types.date() column to use to track the timestamp of training data. We pass in a dictionary called temporal_tracking_relationships that tells UDT we want to track movies over time for each user. This allows UDT to make better predictions for the target column by creating temporal features that take into account the historical relationship between users and movies.

In [None]:
from thirdai import bolt

model = bolt.UniversalDeepTransformer(
    data_types={
        "userId": bolt.types.categorical(),
        "movieTitle": bolt.types.categorical(n_classes=3706),
        "timestamp": bolt.types.date(),
    },
    temporal_tracking_relationships={"userId": ["movieTitle"]},
    target="movieTitle",
)

# Training
We can now train our UDT model with just one line! Feel free to customize the number of epochs and the learning rate; we have chosen values that give good convergence.

In [None]:
model.train(train_filename, epochs=3, learning_rate=0.001, metrics=["recall@10"]);

# Evaluation
Evaluating the performance of the UDT model is also just one line! We measure the model's ability to predict the movie that a user chooses to watch out of 3706 options.

In [None]:
model.evaluate(test_filename, metrics=["recall@1", "recall@10", "recall@100"]);

validate | epoch 3 | train_steps 1320 | val_recall@10=0.129263 val_recall@100=0.425501 val_recall@1=0.0220054  | val_batches 49 | time 17.180s



# Saving and Loading
Saving and loading a trained UDT model to disk is also extremely straight forward.

In [None]:
save_location = "personalized_movie_recommendation.model"

# Saving
model.save(save_location)

In [None]:
# Loading
model = bolt.UniversalDeepTransformer.load(save_location)

# Testing Predictions
The evaluation method is great for testing, but it requires labels, which don't exist in a production setting. We also have a predict method that can take in an in-memory batch of rows or a single row (without the target column), allowing easy integration into production pipelines.

In the following cell, let's say we have a new user with an ID "6040".

In [None]:
## resetting temporal trackers erases all users watch history. 
## The model weights are still intact, but all subsequent predictions are made as if it is the user's first visit.
## We are resetting temporal trackers here to introduce a new user "6040" for the demo purposes. Use this API carefully in production use cases.

model.reset_temporal_trackers()

sample_input = {
    "userId": "6040",
    "timestamp": "2000-04-25",
}

In [None]:
predictions, _ = model.predict(sample_input, top_k=5)
print([model.class_name(p) for p in predictions])

['Star Wars: Episode VI - Return of the Jedi (1983)', 'Rain Man (1988)', 'Fatal Attraction (1987)', "One Flew Over the Cuckoo's Nest (1975)", 'Back to the Future (1985)']


You can see that for a new user "6040", the model recommends the most popular movies in the dataset, such as "Star Wars: Episode VI - Return of the Jedi (1983)" and "Back to the Future (1985)".
 
Now, let's say user "6040" went ahead and watched "Godfather The (1972)" instead of the recommended one, we can always incrementally update the model by using the model.index() API as shown below.

In [None]:
new_observation = {
    'userId': '6040',
    'movieTitle': 'Godfather The (1972)',
    'timestamp': '2000-04-25'
}

model.index(new_observation)

And now, let's make a recommendation again.

In [None]:
sample_input = {
    "userId":"6040",
    "timestamp": "2000-04-25"
}

predictions, _ = model.predict(sample_input, top_k=5)
print([model.class_name(p) for p in predictions])

['Raiders of the Lost Ark (1981)', 'Godfather: Part II The (1974)', 'Godfather: Part III The (1990)', 'Casablanca (1942)', 'Star Wars: Episode IV - A New Hope (1977)']


Voila! You can see that the model now predicts "Godfather: Part II The (1974)" and "Godfather: Part III The (1990)", which were not in the previous recommendations.

Until we update UDT's temporal trackers with new observations of a user, we keep recommending the same movies to the same person. If we "index" new observations as we get them, then UDT will take advantage of this new information to make better predictions. When you run the following cell, notice how the prediction changes in response to new data.

In [None]:
# Returns the same prediction
print("Before indexing new observation")
predictions, _ = model.predict(sample_input, top_k=5)
print([model.class_name(p) for p in predictions])

# Index a new observation
new_observation = {
    'userId': '6040',
    'movieTitle': 'Godfather: Part II The (1974)',
    'timestamp': '2000-04-25'
}
model.index(new_observation)

# Returns a different prediction
print("After indexing new observation")
predictions, _ = model.predict(sample_input, top_k=5)
print([model.class_name(p) for p in predictions])


Before indexing new observation
['Raiders of the Lost Ark (1981)', 'Godfather: Part II The (1974)', 'Godfather: Part III The (1990)', 'Casablanca (1942)', 'Star Wars: Episode IV - A New Hope (1977)']
After indexing new observation
['Godfather: Part III The (1990)', 'Star Wars: Episode IV - A New Hope (1977)', 'Raiders of the Lost Ark (1981)', "One Flew Over the Cuckoo's Nest (1975)", 'Star Wars: Episode V - The Empire Strikes Back (1980)']
