Copyright 2021 The TensorFlow Authors.

In [1]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Recommending movies: retrieval using a sequential model

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://www.tensorflow.org/recommenders/examples/sequential_retrieval"><img src="https://www.tensorflow.org/images/tf_logo_32px.png" />View on TensorFlow.org</a>
  </td>
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/recommenders/blob/main/docs/examples/sequential_retrieval.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/recommenders/blob/main/docs/examples/sequential_retrieval.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
  <td>
    <a href="https://storage.googleapis.com/tensorflow_docs/recommenders/docs/examples/sequential_retrieval.ipynb"><img src="https://www.tensorflow.org/images/download_logo_32px.png" />Download notebook</a>
  </td>
</table>

In this tutorial, we are going to build a sequential retrieval model. Sequential recommendation is a popular model that looks at a sequence of  items that users have interacted with previously and then predicts the next item. Here the order of the items within each sequence matters, so we are going to use a recurrent neural network to model the sequential relationship. For more details, please refer to this [GRU4Rec paper](https://arxiv.org/abs/1511.06939).



## Imports

First let's get our dependencies and imports out of the way.

In [1]:
%pip install -q tensorflow-recommenders
%pip install -q --upgrade tensorflow-datasets
%pip install wget

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [2]:
%pip install -q scann

Note: you may need to restart the kernel to use updated packages.


ERROR: Could not find a version that satisfies the requirement scann (from versions: none)
ERROR: No matching distribution found for scann


In [3]:
import os
import pprint
import tempfile

from typing import Dict, Text

import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_recommenders as tfrs

  from .autonotebook import tqdm as notebook_tqdm


## Preparing the dataset

Next, we need to prepare our dataset. We are going to leverage the [data generation utility](https://github.com/tensorflow/examples/blob/master/lite/examples/recommendation/ml/data/example_generation_movielens.py) in this [TensorFlow Lite On-device Recommendation reference app](https://www.tensorflow.org/lite/examples/recommendation/overview).

MovieLens 1M data contains ratings.dat (*columns: UserID, MovieID, Rating, Timestamp*), and movies.dat (*columns: MovieID, Title, Genres*). The example generation script download the 1M dataset, takes both files, only keep ratings higher than 2, form user movie interaction timelines, sample activities as labels and 10 previous user activities as the context for prediction.

In [4]:
import wget
f = wget.download("https://raw.githubusercontent.com/tensorflow/examples/master/lite/examples/recommendation/ml/data/example_generation_movielens.py")

# %wget -nc https://raw.githubusercontent.com/tensorflow/examples/master/lite/examples/recommendation/ml/data/example_generation_movielens.py
!python -m example_generation_movielens  --data_dir="data/raw"  --output_dir="data/examples"  --min_timeline_length=3  --max_context_length=10  --max_context_movie_genre_length=10  --min_rating=2  --train_data_fraction=0.9  --build_vocabs=False

Downloading data from https://files.grouplens.org/datasets/movielens/ml-1m.zip

  16384/5917549 [..............................] - ETA: 0s
  40960/5917549 [..............................] - ETA: 32s
  90112/5917549 [..............................] - ETA: 44s
 172032/5917549 [..............................] - ETA: 30s
 253952/5917549 [>.............................] - ETA: 30s
 352256/5917549 [>.............................] - ETA: 25s
 401408/5917549 [=>............................] - ETA: 25s
 466944/5917549 [=>............................] - ETA: 24s
 516096/5917549 [=>............................] - ETA: 24s
 565248/5917549 [=>............................] - ETA: 23s
 630784/5917549 [==>...........................] - ETA: 22s
 679936/5917549 [==>...........................] - ETA: 21s
 696320/5917549 [==>...........................] - ETA: 22s
 745472/5917549 [==>...........................] - ETA: 22s
 811008/5917549 [===>..........................] - ETA: 21s
 860160/5917549 [===>

2022-06-03 09:16:30.700154: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-06-03 09:16:30.700318: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-06-03 09:16:33.277138: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'nvcuda.dll'; dlerror: nvcuda.dll not found
2022-06-03 09:16:33.277264: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-06-03 09:16:33.279815: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: LAPTOP-GRGNU5MA
2022-06-03 09:16:33.279962: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: LAPTOP-GRGNU5MA
I0603 09:16:33.274965 23248 example_generation_movielens.py:460] Downloading and extracting data.
I0603 09

Here is a sample of the generated dataset.

```
0 : {
  features: {
    feature: {
      key  : "context_movie_id"
      value: { int64_list: { value: [ 1124, 2240, 3251, ..., 1268 ] } }
    }
    feature: {
      key  : "context_movie_rating"
      value: { float_list: {value: [ 3.0, 3.0, 4.0, ..., 3.0 ] } }
    }
    feature: {
      key  : "context_movie_year"
      value: { int64_list: { value: [ 1981, 1980, 1985, ..., 1990 ] } }
    }
    feature: {
      key  : "context_movie_genre"
      value: { bytes_list: { value: [ "Drama", "Drama", "Mystery", ..., "UNK" ] } }
    }
    feature: {
      key  : "label_movie_id"
      value: { int64_list: { value: [ 3252 ] }  }
    }
  }
}
```
You can see that it includes a sequence of context movie IDs, and a label movie ID (next movie), plus context features such as movie year, rating and genre. 

In our case we will only be using the sequence of context movie IDs and the label movie ID. You can refer to the [Leveraging context features tutorial](https://www.tensorflow.org/recommenders/examples/context_features) to learn more about adding additional context features.

In [5]:
train_filename = "data/examples/train_movielens_1m.tfrecord"
train = tf.data.TFRecordDataset(train_filename)

test_filename = "data/examples/test_movielens_1m.tfrecord"
test = tf.data.TFRecordDataset(test_filename)

feature_description = {
    'context_movie_id': tf.io.FixedLenFeature([10], tf.int64, default_value=np.repeat(0, 10)),
    'context_movie_rating': tf.io.FixedLenFeature([10], tf.float32, default_value=np.repeat(0, 10)),
    'context_movie_year': tf.io.FixedLenFeature([10], tf.int64, default_value=np.repeat(1980, 10)),
    'context_movie_genre': tf.io.FixedLenFeature([10], tf.string, default_value=np.repeat("Drama", 10)),
    'label_movie_id': tf.io.FixedLenFeature([1], tf.int64, default_value=0),
}

def _parse_function(example_proto):
  return tf.io.parse_single_example(example_proto, feature_description)

train_ds = train.map(_parse_function).map(lambda x: {
    "context_movie_id": tf.strings.as_string(x["context_movie_id"]),
    "label_movie_id": tf.strings.as_string(x["label_movie_id"])
})

test_ds = test.map(_parse_function).map(lambda x: {
    "context_movie_id": tf.strings.as_string(x["context_movie_id"]),
    "label_movie_id": tf.strings.as_string(x["label_movie_id"])
})

for x in train_ds.take(1).as_numpy_iterator():
  pprint.pprint(x)

{'context_movie_id': array([b'570', b'2395', b'34', b'1449', b'232', b'2321', b'21', b'223',
       b'1885', b'2424'], dtype=object),
 'label_movie_id': array([b'1845'], dtype=object)}


Now our train/test datasets include only a sequence of historical movie IDs and a label of next movie ID. Note that we use `[10]` as the shape of the features during tf.Example parsing because we specify 10 as the length of context features in the example generateion step.

We need one more thing before we can start building the model - the vocabulary for our movie IDs.

In [6]:
movies = tfds.load("movielens/1m-movies", split='train')
movies = movies.map(lambda x: x["movie_id"])
movie_ids = movies.batch(1_000)
unique_movie_ids = np.unique(np.concatenate(list(movie_ids)))

0.1.0
Using ~\tensorflow_datasets\movielens\1m-movies\0.1.1 instead.


[1mDownloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to ~\tensorflow_datasets\movielens\1m-movies\0.1.1...[0m


Dl Completed...: 0 url [00:00, ? url/s]
Dl Completed...:   0%|          | 0/1 [00:00<?, ? url/s]
Dl Completed...: 100%|██████████| 1/1 [00:00<00:00, 124.98 url/s]
Dl Completed...: 100%|██████████| 1/1 [00:00<00:00, 124.98 url/s]
Dl Completed...: 100%|██████████| 1/1 [00:00<00:00, 62.49 url/s] 
Extraction completed...: 0 file [00:00, ? file/s]
Dl Size...: 100%|██████████| 5917549/5917549 [00:00<00:00, 369807489.14 MiB/s]
Dl Completed...: 100%|██████████| 1/1 [00:00<00:00, 39.39 url/s]
                                                                        

[1mDataset movielens downloaded and prepared to ~\tensorflow_datasets\movielens\1m-movies\0.1.1. Subsequent calls will reuse this data.[0m


## Implementing a sequential model

In our [basic retrieval tutorial](https://www.tensorflow.org/recommenders/examples/basic_retrieval), we use one query tower for the user, and the candidate tow for the candidate movie. However, the two-tower architecture is generalizble and not limited to <user,item> pair. You can also use it to do item-to-item recommendation as we note in the [basic retrieval tutorial](https://www.tensorflow.org/recommenders/examples/basic_retrieval#item-to-item_recommendation).

Here we are still going to use the two-tower architecture. Specificially, we use the query tower with a [Gated Recurrent Unit (GRU) layer](https://www.tensorflow.org/api_docs/python/tf/keras/layers/GRU) to encode the sequence of historical movies, and keep the same candidate tower for the candidate movie. 

In [7]:
embedding_dimension = 32

query_model = tf.keras.Sequential([
    tf.keras.layers.StringLookup(
      vocabulary=unique_movie_ids, mask_token=None),
    tf.keras.layers.Embedding(len(unique_movie_ids) + 1, embedding_dimension), 
    tf.keras.layers.GRU(embedding_dimension),
])

candidate_model = tf.keras.Sequential([
  tf.keras.layers.StringLookup(
      vocabulary=unique_movie_ids, mask_token=None),
  tf.keras.layers.Embedding(len(unique_movie_ids) + 1, embedding_dimension)
])

The metrics, task and full model are defined similar to the basic retrieval model. 

In [8]:
metrics = tfrs.metrics.FactorizedTopK(
  candidates=movies.batch(128).map(candidate_model)
)

task = tfrs.tasks.Retrieval(
  metrics=metrics
)

class Model(tfrs.Model):

    def __init__(self, query_model, candidate_model):
        super().__init__()
        self.query_model = query_model
        self.candidate_model = candidate_model

        self._task = task

    def compute_loss(self, features, training=False):
        watch_history = features["context_movie_id"]
        watch_next_label = features["label_movie_id"]

        query_embedding = self.query_model(watch_history)       
        candidate_embedding = self.candidate_model(watch_next_label)
        
        return self._task(query_embedding, candidate_embedding, compute_metrics=not training)

## Fitting and evaluating

We can now compile, train and evaluate our sequential retrieval model.

In [9]:
model = Model(query_model, candidate_model)
model.compile(optimizer=tf.keras.optimizers.Adagrad(learning_rate=0.1))

In [10]:
cached_train = train_ds.shuffle(10_000).batch(12800).cache()
cached_test = test_ds.batch(2560).cache()

In [11]:
for i in cached_test.take(1):
    print(i)

{'context_movie_id': <tf.Tensor: shape=(2560, 10), dtype=string, numpy=
array([[b'736', b'1676', b'2617', ..., b'3638', b'2002', b'3623'],
       [b'3510', b'1965', b'266', ..., b'3', b'3409', b'3450'],
       [b'2393', b'2949', b'780', ..., b'1129', b'10', b'3633'],
       ...,
       [b'1394', b'898', b'1294', ..., b'260', b'2174', b'2872'],
       [b'2912', b'2959', b'2976', ..., b'3543', b'3578', b'3550'],
       [b'3461', b'50', b'1247', ..., b'1240', b'2366', b'2820']],
      dtype=object)>, 'label_movie_id': <tf.Tensor: shape=(2560, 1), dtype=string, numpy=
array([[b'2094'],
       [b'2792'],
       [b'1527'],
       ...,
       [b'1097'],
       [b'3576'],
       [b'26']], dtype=object)>}


In [12]:
model.fit(cached_train, epochs=3)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x1e773a4b430>

In [13]:
model.evaluate(cached_test, return_dict=True)



{'factorized_top_k/top_1_categorical_accuracy': 0.014008678495883942,
 'factorized_top_k/top_5_categorical_accuracy': 0.0769517794251442,
 'factorized_top_k/top_10_categorical_accuracy': 0.13377541303634644,
 'factorized_top_k/top_50_categorical_accuracy': 0.3664751350879669,
 'factorized_top_k/top_100_categorical_accuracy': 0.5024787187576294,
 'loss': 9356.974609375,
 'regularization_loss': 0,
 'total_loss': 9356.974609375}

This concludes the sequential retrieval tutorial.

In [14]:
# Create a model that takes in raw query features, and
index = tfrs.layers.factorized_top_k.BruteForce(model.query_model)
# recommends movies out of the entire movies dataset.
index.index_from_dataset(
  tf.data.Dataset.zip((movies.batch(100), movies.batch(100).map(model.candidate_model)))
)

# Get recommendations.
_, titles = index(tf.constant(np.array([["1","2","","","","","","","",""]])))
print(f"Recommendations for user 42: {titles[0, :3]}")

Recommendations for user 42: [b'1538' b'2623' b'1322']


In [15]:
# Get recommendations.
_, titles = index(tf.constant(np.array([["1","","","","","","","","",""]])))
print(f"Recommendations for user 42: {titles[0, :3]}")

Recommendations for user 42: [b'1210' b'1538' b'1961']


In [16]:
# Get recommendations.
_, titles = index(tf.constant(np.array([["1","2","100","","","","","","",""]])))
print(f"Recommendations for user 42: {titles[0, :3]}")

Recommendations for user 42: [b'1538' b'2623' b'1322']


In [17]:
["3","2","9","","","","","","",""]

['3', '2', '9', '', '', '', '', '', '', '']

In [22]:
_, titles = index(np.array("42"))

ValueError: Exception encountered when calling layer "sequential" (type Sequential).

Input 0 of layer "gru" is incompatible with the layer: expected ndim=3, found ndim=1. Full shape received: (32,)

Call arguments received:
  • inputs=tf.Tensor(shape=(), dtype=string)
  • training=None
  • mask=None

In [20]:
# Convert the model.
converter = tf.lite.TFLiteConverter.from_keras_model(index)
tflite_model = converter.convert()

# Save the model.
with open('model.tflite', 'wb') as f:
  f.write(tflite_model)



INFO:tensorflow:Assets written to: C:\Users\MUHAMM~1\AppData\Local\Temp\tmprn14xe5w\assets


INFO:tensorflow:Assets written to: C:\Users\MUHAMM~1\AppData\Local\Temp\tmprn14xe5w\assets


ConverterError: c:\Users\Muhammad Nur Ilmi\AppData\Local\Programs\Python\Python310\lib\site-packages\keras\engine\base_layer.py:1096:0: error: 'tf.TensorListReserve' op requires element_shape to be static during TF Lite transformation pass
<unknown>:0: note: loc(fused["StatefulPartitionedCall:", "StatefulPartitionedCall_1"]): called from
c:\Users\Muhammad Nur Ilmi\AppData\Local\Programs\Python\Python310\lib\site-packages\keras\engine\base_layer.py:1096:0: error: failed to legalize operation 'tf.TensorListReserve' that was explicitly marked illegal
<unknown>:0: note: loc(fused["StatefulPartitionedCall:", "StatefulPartitionedCall_1"]): called from
<unknown>:0: error: Lowering tensor list ops is failed. Please consider using Select TF ops and disabling `_experimental_lower_tensor_list_ops` flag in the TFLite converter object. For example, converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS]\n converter._experimental_lower_tensor_list_ops = False


In [23]:
converter = tf.lite.TFLiteConverter.from_saved_model(model)
converter.target_spec.supported_ops = [
  tf.lite.OpsSet.TFLITE_BUILTINS, # enable TensorFlow Lite ops.
  tf.lite.OpsSet.SELECT_TF_OPS # enable TensorFlow ops.
]
tflite_model = converter.convert()

TypeError: Expected binary or unicode string, got <__main__.Model object at 0x000001E7739A5AB0>