# MovieLens recommendations in Keras

This notebook demonstrates a Keras model for recommending movies. It's an example of one of the things I like about deep learning -- you can use the same model, trained on the same data, for multiple different tasks. In this case, those tasks are personalized and non-personalized recommendations.

It's derived originally from [an example by Maciej Kula](https://github.com/maciejkula/triplet_recommendations_keras). In this fork I did a few additional things that aren't in the original:

* Incorporated genre metadata into the model
* Slightly modified the loss function to match the [BPR paper](https://arxiv.org/abs/1205.2618)
* Added a non-linear transformation into the model
* Wrote log data and metadata out in TensorBoard format to enable visualization of embeddings
* Provided a function to actually query the recommendations for a given user
* Provided a function to query the model by movie, and return similar movies

It's also had a few tweaks to let it run without warnings in Keras 2.

## Network architecture

From a high-level perspective, this network is a **pairwise personalized ranking model**. Given a user and two movies from the training data, it learns to give a higher score to the one which the user preferred. (However, as we'll see below, you can also use it for non-personalized, similarity-based ranking.)

This architecture is sometimes called a **triplet network**, although that terminology [originally meant](https://arxiv.org/abs/1412.6622) a model that takes three inputs of the _same_ type (e.g. three movies).

Each user, movie, and genre tag is represented by an embedding -- a vector of floats, i.e. some coordinates in a high dimensional space -- which is initialized randomly and updated as the model trains. So, over time:

* Users with similar tastes will be pushed closer to each other in user embedding space
* Movies which appeal to similar people will be pushed closer to each other in movie embedding space
* Tags which appear in similar contexts will be pushed closer together in tag embedding space

Some movies have multiple genres, e.g. _Alien_ is tagged 'Action', 'Horror', 'Sci-Fi', 'Thriller'. In these cases, the tag embeddings are simply averaged together. The final representation for a movie is its own embedding concatenated with the average of its tag embeddings.

The network's weights are shared between the positive and negative examples -- i.e. the movie and tag embedding tables. This means we learn a single embedding for each movie and each tag, not different embeddings for positive and negative context. This architecture is sometimes called a [Siamese network](https://www.quora.com/What-are-Siamese-neural-networks-what-applications-are-they-good-for-and-why).

To make predictions, we use a slightly modified version of the network, with _just one_ movie input (and a user input). This outputs a personalized ranking score for that movie, that can be used to sort that user's recommendations.

We also return a version of the network with _two_ movie inputs and _no_ user input, for ranking movies by similarity.

In [1]:
from __future__ import print_function

import numpy as np

from keras.models import Model
from keras.layers import Embedding, Flatten, Input, Lambda
from keras.layers.merge import concatenate, dot
from keras.layers.advanced_activations import ELU
from keras.callbacks import TensorBoard
from keras.optimizers import Adam

import data
import metrics
from net_helpers import *


def build_model(num_users, num_items, num_tags, max_tags,
                item_latent_dim, tag_latent_dim):
    
    """
    Build a model for training, plus submodels for personal ranking
    and movie-movie similarity.
    
    You need to supply the dimensionality of the movie and genre tag
    embedding spaces; the user embedding is sized automatically to be
    equal to the sum of these, e.g. 75d movies and 25d tags will give
    you a 100d user embedding.
    
    Args:
        num_users (int): the number of users in the dataset
        num_items (int): the number of movies in the dataset
        num_tags (int): the number of distinct genre tags in the dataset
        max_tags (int): the maximum number of tags associated with a single movie
        item_latent_dim (int): the size of the movie embedding to learn
        tag_latent_dim (int): the size of the tag embedding to learn
        
    Returns:
        The training model, the prediction model, the similarity model
    """

    # These inputs take movie IDs for the positive and negative items
    positive_item_input = Input((1, ), name='positive_item_input')
    negative_item_input = Input((1, ), name='negative_item_input')
    
    # These take zero-padded tag ID vectors, for the positive and negative items
    positive_tags = Input((max_tags, ), name='positive_tags')
    negative_tags = Input((max_tags, ), name='negative_tags')

    # Shared embedding layer for positive and negative items
    item_embedding_layer = Embedding(
        num_items, item_latent_dim, name='item_embedding', input_length=1)
    
    # Shared embedding layer for positive and negative items' tags
    tag_embedding_layer = Embedding(
        num_tags, tag_latent_dim, name='tag_embedding', input_length=max_tags)

    # Get the embeddings corresponding to the positive and negative movie IDs;
    # Flatten just turns a matrix with one row into a vector
    
    positive_item_embedding = Flatten()(item_embedding_layer(positive_item_input))
    
    negative_item_embedding = Flatten()(item_embedding_layer(negative_item_input))
    
    # Get the embeddings for the positive and negative tags, and average them
    # together via mask_aware_mean -- see comments in net_helpers.py
    
    positive_tags_embedding = Lambda(mask_aware_mean, mask_aware_mean_output_shape,
                                     name='pos_mean')(tag_embedding_layer(positive_tags))
    
    negative_tags_embedding = Lambda(mask_aware_mean, mask_aware_mean_output_shape,
                                     name='neg_mean')(tag_embedding_layer(negative_tags))
    
    # Concatenate the movie embeddings and mean tag embeddings for the two
    # inputs, then add an Exponential Linear Unit to give the network a
    # little more flexibility
    
    positive_vec = ELU()(concatenate([positive_item_embedding, positive_tags_embedding]))
    
    negative_vec = ELU()(concatenate([negative_item_embedding, negative_tags_embedding]))
    
    # Input for the user ID
    user_input = Input((1, ), name='user_input')

    # User embedding has to have dimensionality equal to item plus tag embeddings,
    # as they need to align element-wise when calculating the scores
    user_latent_dim = item_latent_dim + tag_latent_dim
    
    # Retrieve the user embedding and add an ELU as above
    user_embedding = ELU()(Flatten()(Embedding(
        num_users, user_latent_dim, name='user_embedding', input_length=1)(
            user_input)))
    
    # Now, the final representation for each movie (positive_vec/negative_vec)
    # is an ELU-transformed concatenation of the movie embedding and the mean
    # tag embedding.
    
    # The final representation for the user is just her ELU-transformed embedding.

    # Bayesian Personalized Ranking loss (see net_helpers.py)
    loss = BprLoss(name='bpr_loss')([positive_vec, negative_vec, user_embedding])

    # Construct and compile the main model for training, with positive and
    # negative movies plus user
    
    model = Model(
        inputs=[positive_item_input, positive_tags, negative_item_input, negative_tags, user_input],
        outputs=loss)
    
    model.compile(loss=identity_loss, optimizer=Adam())
    
    # Now define a separate model for prediction, with only one movie plus user,
    # which just calculates a score by taking the dot product of the user
    # representation and the movie representation
    
    # This doesn't have any additional weights of its own, it's just a subgraph
    # of the main model but with a different output layer
    
    user_dot_item = dot(
        [positive_vec, user_embedding], axes=-1, name='user_dot_item')
    
    pred_model = Model(
        inputs=[positive_item_input, positive_tags, user_input],
        outputs=user_dot_item)
    
    # Likewise, we can define a separate model for movie-movie similarity,
    # using both movie inputs and no user input
    
    # Note that they're still called 'positive' and 'negative' because we're
    # reusing previously-defined components, although when we use this model
    # for similarity, the input will just be two movies with no implied order
    
    item_item_cos_sim = dot(
        [positive_vec, negative_vec], axes=-1,
        normalize=True, # i.e. cosine similarity instead of inner product
        name='item_item_cos_sim')
    
    sim_model = Model(
        inputs=[positive_item_input, positive_tags, negative_item_input, negative_tags],
        outputs=item_item_cos_sim)

    return model, pred_model, sim_model

Using TensorFlow backend.


## Load and transform data
We're going to load the Movielens 100k dataset and create triplets of (user, known positive item, randomly sampled negative item).

The success metric is AUC: in this case, the probability that a randomly chosen known positive item from the test set is ranked higher for a given user than a ranomly chosen negative item.

In [2]:
item_latent_dim = 125
tag_latent_dim = 25

# Read data
train, test = data.get_movielens_data()
num_users, num_items = train.shape

item_features = data.get_movielens_item_metadata(use_item_ids=False)

max_tags = item_features.shape[1]
num_tags = item_features.max() + 1

# Prepare the test triplets
test_uid, test_pid, test_nid = data.get_triplets(test)

# Generate the simplified metadata files for TensorBoard
log_dir = '/tmp/tfboard/triplet_keras/'
items_metadata, tags_metadata = data.extract_tensorboard_metadata(log_dir)

## Build and inspect the models

Now we can build both the main training model, and the simpler prediction and similarity models, and verify their structure.

In [3]:
model, pred_model, sim_model = build_model(
    num_users, num_items, num_tags, max_tags,
    item_latent_dim, tag_latent_dim)

# Print the model structure
print('Model for training:')
print(model.summary())
print()
print('Model for inference:')
print(pred_model.summary())
print()
print('Model for movie similarity:')
print(sim_model.summary())
print()

# Sanity check, should be around 0.5
print('AUC before training %s' % metrics.full_auc(pred_model, test, item_features))

Model for training:
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
positive_item_input (InputLayer) (None, 1)             0                                            
____________________________________________________________________________________________________
positive_tags (InputLayer)       (None, 6)             0                                            
____________________________________________________________________________________________________
negative_item_input (InputLayer) (None, 1)             0                                            
____________________________________________________________________________________________________
negative_tags (InputLayer)       (None, 6)             0                                            
_______________________________________________________________________

## TODO model graphs

## Train the model

Run for a few epochs, checking the AUC after every epoch.

We'll get Keras to write out some training stats and the model weights to `log_dir` so we can explore them in [TensorBoard](https://www.tensorflow.org/get_started/summaries_and_tensorboard) afterwards.

In [4]:
tensorboard = TensorBoard(
    log_dir=log_dir,
    embeddings_freq=1,
    embeddings_layer_names=['item_embedding', 'tag_embedding', 'user_embedding'],
    embeddings_metadata={'item_embedding': items_metadata, 'tag_embedding': tags_metadata})

num_epochs = 10
checkpoint_every = 1

for epoch in range(num_epochs):

    print('Epoch %s' % epoch)

    # Sample triplets from the training data
    uid, pid, nid = data.get_triplets(train)
    ptags = item_features[pid]
    ntags = item_features[nid]

    X = {
        'user_input': uid,
        'positive_item_input': pid,
        'negative_item_input': nid,
        'positive_tags': ptags,
        'negative_tags': ntags
    }
    
    checkpoint = ((epoch + 1) % checkpoint_every == 0)
    
    if checkpoint:
        callbacks=[tensorboard]
    else:
        callbacks=[]

    model.fit(X,
              np.ones(len(uid)),
              batch_size=64,
              epochs=1,
              verbose=1,
              shuffle=True,
              callbacks=callbacks)

    if checkpoint:
        print('AUC %s' % metrics.full_auc(pred_model, test, item_features))

Epoch 0
Epoch 1/1
AUC 0.830446625728
Epoch 1
Epoch 1/1
AUC 0.864003825028
Epoch 2
Epoch 1/1
AUC 0.883190332454
Epoch 3
Epoch 1/1
AUC 0.894669984616
Epoch 4
Epoch 1/1
AUC 0.902753512825
Epoch 5
Epoch 1/1
AUC 0.907492437565
Epoch 6
Epoch 1/1
AUC 0.9096222936
Epoch 7
Epoch 1/1
AUC 0.91189617685
Epoch 8
Epoch 1/1
AUC 0.913232889884
Epoch 9
Epoch 1/1
AUC 0.914111003423


## Results

After 10 epochs, you should see an AUC of at least 0.91 above.

Now go to a terminal and run:

`tensorboard --logdir=/tmp/tfboard/triplet_keras/`

and then open this link:

http://localhost:6006/ (seems to work in Chrome only... thanks Google)

That'll let you visualize the learning curve during training, and explore the embeddings graphically.

Note that the movie embeddings in TensorBoard are somewhat misleading, as they _don't_ take into account the genre tags for each movie. We'll return to this shortly.

## Querying the model by user

For demo purposes, let's write a function that takes a user ID, and shows you what items they've liked already, and what the model's best-bet recommendations are.

We do this by preparing a batch of user/movie/genre inputs, one per each of the _n_ movies that the user hasn't seen -- really we're making _n_ separate predictions -- then we find the top-scoring ones and return these. We're just querying one user at a time, so the user ID is repeated _n_ times in the batch.

This uses the secondary model `pred_model` which only needs inputs for _one_ movie and a user.

In [5]:
import heapq

all_item_ids = np.arange(num_items, dtype=np.uint16)
names = np.array(data.get_movie_names())

def query_for_user(user_id, limit=5):
    
    # First, find out what they've already liked, so we can exclude these
    prev_liked = train.getrow(user_id).nonzero()[1]
    print('User %d has previously liked:\n' % user_id)
    print('\n'.join(np.sort(names[prev_liked])))
    
    # Build the query batch
    candidate_ids = np.setdiff1d(all_item_ids, prev_liked, assume_unique=True)
    candidate_tags = item_features[candidate_ids]
    num_candidates = len(candidate_ids)
    user_id_array = np.repeat(user_id, num_candidates)
    
    # Query the model
    predictions = pred_model.predict({
        'positive_item_input': candidate_ids,
        'positive_tags': candidate_tags,
        'user_input': user_id_array},
        batch_size=num_candidates)
    
    # Get the indices (in the candidates array) of the highest-valued outputs
    top_hits = heapq.nlargest(
        limit, xrange(num_candidates), predictions.take)
    
    # Now display them
    print('\nTop recommendations:\n')
    for hit_idx in top_hits:
        score = predictions[hit_idx]
        id = candidate_ids[hit_idx]
        print('%0.2f\t%s' % (score, names[id]))

In [6]:
query_for_user(200)

User 200 has previously liked:

101 Dalmatians (1996)
20,000 Leagues Under the Sea (1954)
2001: A Space Odyssey (1968)
Aladdin (1992)
Alice in Wonderland (1951)
Alien (1979)
Alien 3 (1992)
Aliens (1986)
Amadeus (1984)
Andre (1994)
Apollo 13 (1995)
Assassins (1995)
Babe (1995)
Back to the Future (1985)
Batman Forever (1995)
Batman Returns (1992)
Birdcage, The (1996)
Birds, The (1963)
Blade Runner (1982)
Boot, Das (1981)
Braveheart (1995)
Cape Fear (1991)
Carrie (1976)
Casablanca (1942)
Casper (1995)
Cat People (1982)
Cliffhanger (1993)
Clockwork Orange, A (1971)
Conan the Barbarian (1981)
Contact (1997)
Cool Hand Luke (1967)
Crash (1996)
Crow, The (1994)
Day the Earth Stood Still, The (1951)
Dead Man Walking (1995)
Dead Poets Society (1989)
Demolition Man (1993)
Desperado (1995)
Die Hard 2 (1990)
Dragonheart (1996)
Dumbo (1941)
E.T. the Extra-Terrestrial (1982)
Empire Strikes Back, The (1980)
English Patient, The (1996)
Eraser (1996)
Escape from L.A. (1996)
Executive Decision (1996)
Fan

Notice that these scores are unbounded, because they're calculated from the [dot product](https://en.wikipedia.org/wiki/Dot_product) of the user and movie representations, which isn't a normalized quantity. This is useful as it can give us some comparative measure of quality between different result sets. If a user's top hits score very low compared to most other users' top hits, that can be a sign that we don't have enough data about that user or those movies to be confident about the recommendations.

## Querying the model by movie

We can also use the model to rank candidate movies by similarity to a query movie, using the other secondary model, `sim_model`. This requires _two_ movies -- fed into the model's 'positive' and 'negative' inputs, although this isn't actually important here -- and _no user_. As above, we just retrieve all the similarity scores for a query movie in a single batch, then find the top hits.

In [7]:
def query_for_item(item_id, limit=5):
    
    print('Showing similar items to %s:\n' % names[item_id])
    
    # Replicate the query ID and its tags across the whole batch
    query_ids = np.repeat(item_id, num_items)
    query_tags = np.tile(item_features[item_id], [num_items, 1])

    # 'positive' is the query item, this is arbitrary tho
    predictions = sim_model.predict({
        'positive_item_input': query_ids,
        'positive_tags': query_tags,
        'negative_item_input': all_item_ids,
        'negative_tags': item_features},
        batch_size=num_items)

    # Get the indices of the highest-valued outputs
    top_hit_ids = heapq.nlargest(
        limit, xrange(num_items), predictions.take)
    
    # Now display them
    for id in top_hit_ids:
        print('%0.2f\t%s' % (predictions[id], names[id]))

In [8]:
query_for_item(183)

Showing similar items to Alien (1979):

1.00	Alien (1979)
0.89	Aliens (1986)
0.87	Terminator, The (1984)
0.83	Terminator 2: Judgment Day (1991)
0.81	Jurassic Park (1993)


Notice that these scores range between 0 and 1, this is because `sim_model` uses the (l2-normalized) [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity) instead of the (unnormalized) dot product. This is useful for similarity scoring as it ensures that the most similar item to any query item is itself, with a score of 1.0! In production, you'd want to filter this out, but when testing, it's a useful sanity check. If something else appears in the top slot, you know there's a bug.

Likewise, you'd also want to filter out the dummy placeholder movie 0, which is at some random location in the embedding space. (This is also true for the personalized recommendations above.)