In [None]:
# Copyright 2022 NVIDIA Corporation. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

<img src="http://developer.download.nvidia.com/compute/machine-learning/frameworks/nvidia_logo.png" style="width: 90px; float: right;">

# Getting Started with Merlin dataloader and TensorFlow

This notebook is created using the latest stable [merlin-tensorflow](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow) container.

## Overview

[Merlin dataloader](https://github.com/NVIDIA-Merlin/dataloader) is a library for constructing highly optimized dataloaders to accelerate training pipelines in TensorFlow (Keras) and PyTorch. In this example, we will provide a simple pipeline to train a MatrixFactorization Model in TensorFlow with Merlin dataloader based on the MovieLens dataset.

The core features of Merlin dataloader:

- Accelerate pipelines by up to 10x compared to other dataloaders
- Handles larger than memory dataset by streaming data from disk
- Support for common data formats: CSV, Parquet, Avro
- Distributed training support

### Learning objectives

- Using Merlin dataloader to train a TensorFlow Keras Model

# Downloading and preparing the dataset

We will base our example on the  [MovieLens25M](https://grouplens.org/datasets/movielens/25m/) dataset.

In [7]:
from merlin.core.utils import download_file
from merlin.core.dispatch import get_lib

from merlin.io import Dataset
from merlin.loader.tensorflow import Loader

import tensorflow as tf

2022-11-18 05:39:17.859371: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-11-18 05:39:17.859783: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-11-18 05:39:17.859948: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero


In [8]:
DATA_PATH = '/workspace'

In [9]:
download_file("http://files.grouplens.org/datasets/movielens/ml-25m.zip", DATA_PATH + "/ml-25m.zip")

downloading ml-25m.zip: 262MB [01:19, 3.29MB/s]                                                                                                                                                                                                                                                                                                                                           
unzipping files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:03<00:00,  2.21files/s]


# Training a TensorFlow Keras Model with Merlin dataloader

In [10]:
ratings = get_lib().read_csv(DATA_PATH + '/ml-25m/ratings.csv')
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,296,5.0,1147880044
1,1,306,3.5,1147868817
2,1,307,5.0,1147868828
3,1,665,5.0,1147878820
4,1,899,3.5,1147868510


The `ratings.csv` file stores ratings a user has given a movie. Let's load the data directly from disk into a `Merlin Dataset` and train a simple `MatrixFactorization` model that we will construct in `Tensorflow`.

In [12]:
dataset = Dataset(DATA_PATH + '/ml-25m/ratings.csv')

Let us now instantiate the `dataloader`.

In [13]:
loader = Loader(dataset, batch_size=65536)

As is, the `loader` will output a batch that will consist of a tuple with dictionary with tensors and `None`.

In [14]:
batch = loader.peek()
batch

2022-11-18 05:46:39.147325: I tensorflow/core/platform/cpu_feature_guard.cc:194] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-18 05:46:39.148655: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-11-18 05:46:39.148918: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-11-18 05:46:39.149054: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning

({'userId': <tf.Tensor: shape=(65536, 1), dtype=int64, numpy=
  array([[ 10196],
         [ 13689],
         [ 27496],
         ...,
         [ 57498],
         [ 44895],
         [105338]])>,
  'movieId': <tf.Tensor: shape=(65536, 1), dtype=int64, numpy=
  array([[ 3055],
         [ 2722],
         [ 4803],
         ...,
         [74458],
         [ 2796],
         [ 2406]])>,
  'timestamp': <tf.Tensor: shape=(65536, 1), dtype=int64, numpy=
  array([[1035948134],
         [ 992556783],
         [1013740337],
         ...,
         [1338838680],
         [ 947102082],
         [1236404231]])>,
  'rating': <tf.Tensor: shape=(65536, 1), dtype=float64, numpy=
  array([[3. ],
         [1. ],
         [3. ],
         ...,
         [4.5],
         [4. ],
         [3. ]])>},
 None)

TensorFlow Kera's .fit function expects to receive the data as a tuple (x, y), with x being the input features and y the label. We need to provide this information to the dataloader. We can add a custom function to convert the data into the tuple with `process_batch`.

In [15]:
label_column = 'rating'


def process_batch(data, _):
    x = {col: data[col] for col in data.keys() if col != label_column}
    y = data[label_column]
    return (x, y)


loader._map_fns = [process_batch]

We now have the data in the shape that `Tensorflow` expects.

In [16]:
batch = next(loader)
loader.stop()
batch

({'userId': <tf.Tensor: shape=(65536, 1), dtype=int64, numpy=
  array([[128502],
         [ 29762],
         [123712],
         ...,
         [160242],
         [122260],
         [125514]])>,
  'movieId': <tf.Tensor: shape=(65536, 1), dtype=int64, numpy=
  array([[ 93363],
         [  1385],
         [  1215],
         ...,
         [  1221],
         [  6281],
         [140110]])>,
  'timestamp': <tf.Tensor: shape=(65536, 1), dtype=int64, numpy=
  array([[1470169763],
         [1290341594],
         [1009922461],
         ...,
         [ 855153028],
         [1064351710],
         [1476478555]])>},
 <tf.Tensor: shape=(65536, 1), dtype=float64, numpy=
 array([[4. ],
        [2. ],
        [5. ],
        ...,
        [5. ],
        [4. ],
        [0.5]])>)

Let us now construct a simple MatrixFactorization model and train for a single epoch.

In [17]:
class MatrixFactorization(tf.keras.Model):
    def __init__(self, n_factors):
        super().__init__()
        self.user_embeddings = tf.keras.layers.Embedding(ratings['userId'].max() + 1, n_factors)
        self.movie_embeddings = tf.keras.layers.Embedding(ratings['movieId'].max() + 1, n_factors)

    def call(self, batch, training=False):
        user_embs = self.user_embeddings(batch['userId'])
        movie_embs = self.movie_embeddings(batch['movieId'])

        tensor = (tf.squeeze(user_embs) * tf.squeeze(movie_embs))
        return tf.reduce_sum(tensor, 1)

In [18]:
model = MatrixFactorization(64)
model.compile(optimizer=tf.keras.optimizers.Adam(1e-2), loss=tf.keras.losses.MeanSquaredError())

Let us now train for a single epoch.

In [19]:
model.fit(loader, epochs=1)

2022-11-18 05:46:43.718243: W tensorflow/core/common_runtime/forward_type_inference.cc:231] Type inference failed. This indicates an invalid graph that escaped type checking. Error message: INVALID_ARGUMENT: expected compatible input types, but input 1:
type_id: TFT_OPTIONAL
args {
  type_id: TFT_PRODUCT
  args {
    type_id: TFT_TENSOR
    args {
      type_id: TFT_BOOL
    }
  }
}
 is neither a subtype nor a supertype of the combined inputs preceding it:
type_id: TFT_OPTIONAL
args {
  type_id: TFT_PRODUCT
  args {
    type_id: TFT_TENSOR
    args {
      type_id: TFT_LEGACY_VARIANT
    }
  }
}

	while inferring type of node 'mean_squared_error/cond/output/_11'




<keras.callbacks.History at 0x7f0ecc9d0e80>

In [20]:
model.evaluate(loader)



0.6691824197769165

## Conclusion

We demonstrated how to train a TensorFlow Keras model with Merlin dataloader. Merlin dataloader can accelerate existing TensorFlow pipelines with minimal code changes. 

# Next Steps

Merlin dataloader is part of NVIDIA Merlin, a open source framework for recommender systems. In this example, we looked only on a specific use-case to accelerate existing training pipelines. We provide more libraries to make recommender system pipelines easier:

* [NVTabular](https://github.com/NVIDIA-Merlin/NVTabular) is a library to accelerate and scale feature engineering
* [Merlin Models](https://github.com/NVIDIA-Merlin/models) is a library with high-quality implementations of popular recommender systems architectures

The libraries are designed to work closely together. We recommend to check out our examples:

* [Getting Started with NVTabular: Process Tabular Data On GPU](https://github.com/NVIDIA-Merlin/NVTabular/blob/main/examples/01-Getting-started.ipynb)
* [Getting Started with Merlin Models: Develop a Model for MovieLens](https://github.com/NVIDIA-Merlin/models/blob/main/examples/01-Getting-started.ipynb)

In the example, [From ETL to Training RecSys models - NVTabular and Merlin Models integrated example](https://github.com/NVIDIA-Merlin/models/blob/main/examples/02-Merlin-Models-and-NVTabular-integration.ipynb), we explain how the close collaboration works.