This tutorial demonstrates how you can build and train retrieval models with Merlin framework and Tensorflow. It builds, trains and evaluates two retrieval models: Matrix Factorization and Two-Tower architecture.
You can run this tutorial on Google Colab or locally with Docker.
If you don't the necessary enviroment, you can run this notebook on Google Colab, which provides a T4 GPU for free!
- Linux
- Docker
- GPU card (T4 or superior. 16 GB GPU memory recommended).
- Run the following Docker command pulls and run the Merlin TensorFlow Container (23.06 release).
Please set the host path to the folder where you have pulled theevalRS-KDD-2023
repo from GitHub and also the path where the dataset will be downloaded and preprocessed.
EVALRS_KDD_REPO=/PATH/TO/evalRS-KDD-2023
DATASET_WKSP_PATH=/PATH/TO/DATASET/WORKSPACE
docker run --runtime=nvidia --rm -it --ipc=host --cap-add SYS_NICE -v $EVALRS_KDD_REPO:/evalRS-KDD-2023 -v $DATASET_WKSP_PATH:/data -p 8888:8888 nvcr.io/nvidia/merlin/merlin-tensorflow:23.06 /bin/bash
- Inside the Docker container, pull the RecList library and install it. RecList will be used for evaluation.
mkdir -p /workspace/ && cd /workspace/
git clone https://github.com/Reclist/reclist/
cd reclist
# Removing "pyarrow==12.0.1" from the requirements, as it causes conflict with cudf which uses "pyarrow-10.0.1"
grep -vwE "pyarrow" requirements.txt > requirements_tmp.txt & mv requirements_tmp.txt requirements.txt
pip install -e .
- Also inside the Docker container, update the Models library to the evalRS_2023, which has some fixes necessary for pre-trained embeddings support.
cd /models
git config remote.origin.fetch "+refs/heads/*:refs/remotes/origin/*" && git fetch origin evalrs_2023 && git checkout evalrs_2023
pip install . --no-deps
- Start Jupyter notebook inside the container
cd /evalRS-KDD-2023/notebooks/merlin_tf_tutorial
jupyter notebook --no-browser --ip 0.0.0.0 --no-browser --allow-root
-
Load the Jupyter notebook UI in a web browser. Look for the URL provided in the console, that contains the token. Something like
http://127.0.0.1:8888/?token=5ac3f9ca...
-
Run the Merlin tutorial notebook
-
This notebook will save parquet files with the top-100 recommendations for test users, for both MF and Two-Tower models, in
$DATASET_WKSP_PATH/evalrs_2023_dataset_preproc/model_predictions
For your convenience, we have already saved the model predictions parquet files and make them available for you.
-
Download the model prediction parquet files, which we have generated by running the above notebook in a V100 GPU.
-
Run this Jupyter notebook to evaluate with RecList on the predictions generated by some trained Merlin retrieval models.