Skip to content

Latest commit

 

History

History

merlin_tf_tutorial

Tutorial for the EvalRS 2023 - RecSys evaluation hackaton

Retrieval models with Merlin and Tensorflow on LastFM/EvalRS dataset

This tutorial demonstrates how you can build and train retrieval models with Merlin framework and Tensorflow. It builds, trains and evaluates two retrieval models: Matrix Factorization and Two-Tower architecture.

You can run this tutorial on Google Colab or locally with Docker.

Running on Google Colab

If you don't the necessary enviroment, you can run this notebook on Google Colab, which provides a T4 GPU for free! Open In Colab

Running locally with Docker

Requirements

  • Linux
  • Docker
  • GPU card (T4 or superior. 16 GB GPU memory recommended).
  1. Run the following Docker command pulls and run the Merlin TensorFlow Container (23.06 release).
    Please set the host path to the folder where you have pulled the evalRS-KDD-2023 repo from GitHub and also the path where the dataset will be downloaded and preprocessed.
EVALRS_KDD_REPO=/PATH/TO/evalRS-KDD-2023
DATASET_WKSP_PATH=/PATH/TO/DATASET/WORKSPACE
docker run --runtime=nvidia --rm -it --ipc=host --cap-add SYS_NICE -v $EVALRS_KDD_REPO:/evalRS-KDD-2023 -v $DATASET_WKSP_PATH:/data -p 8888:8888 nvcr.io/nvidia/merlin/merlin-tensorflow:23.06 /bin/bash
  1. Inside the Docker container, pull the RecList library and install it. RecList will be used for evaluation.
mkdir -p /workspace/ && cd /workspace/
git clone https://github.com/Reclist/reclist/
cd reclist
# Removing "pyarrow==12.0.1" from the requirements, as it causes conflict with cudf which uses "pyarrow-10.0.1"
grep -vwE "pyarrow" requirements.txt > requirements_tmp.txt & mv requirements_tmp.txt requirements.txt
pip install -e . 
  1. Also inside the Docker container, update the Models library to the evalRS_2023, which has some fixes necessary for pre-trained embeddings support.
cd /models
git config remote.origin.fetch "+refs/heads/*:refs/remotes/origin/*" && git fetch origin evalrs_2023 && git checkout evalrs_2023
pip install . --no-deps
  1. Start Jupyter notebook inside the container
cd /evalRS-KDD-2023/notebooks/merlin_tf_tutorial
jupyter notebook --no-browser --ip 0.0.0.0 --no-browser --allow-root
  1. Load the Jupyter notebook UI in a web browser. Look for the URL provided in the console, that contains the token. Something like http://127.0.0.1:8888/?token=5ac3f9ca...

  2. Run the Merlin tutorial notebook

  3. This notebook will save parquet files with the top-100 recommendations for test users, for both MF and Two-Tower models, in $DATASET_WKSP_PATH/evalrs_2023_dataset_preproc/model_predictions

Evaluation based on saved recommendations

For your convenience, we have already saved the model predictions parquet files and make them available for you.

  1. Download the model prediction parquet files, which we have generated by running the above notebook in a V100 GPU.

  2. Run this Jupyter notebook to evaluate with RecList on the predictions generated by some trained Merlin retrieval models.