This repository provides examples and best practices for building recommendation systems, provided as Jupyter notebooks. The examples detail our learnings on five key tasks:
- Prepare Data: Preparing and loading data for each recommender algorithm
- Model: Building models using various recommender algorithms such as Alternating Least Squares (ALS), Singular Value Decomposition (SVD), etc.
- Evaluate: Evaluating algorithms with offline metrics
- Model Select and Optimize: Tuning and optimizing hyperparameters for recommender models
- Operationalize: Operationalizing models in a production environment on Azure
Several utilities are provided in reco_utils to support common tasks such as loading datasets in the format expected by different algorithms, evaluating model outputs, and splitting train/test data. Implementations of several state-of-the-art algorithms are provided for self-study and customization in your own applications.
To setup on your local machine:
- Install Anaconda with Python >= 3.6. Miniconda is a quick way to get started.
- Clone the repository
git clone https://github.com/Microsoft/Recommenders
- Run the generate conda file script and create a conda environment:
cd Recommenders ./scripts/generate_conda_file.sh conda env create -n reco -f conda_bare.yaml
- Activate the conda environment and register it with Jupyter:
conda activate reco python -m ipykernel install --user --name reco --display-name "Python (reco)"
- Start the Jupyter notebook server
cd notebooks jupyter notebook
- Run the SAR Python CPU Movielens notebook under the 00_quick_start folder. Make sure to change the kernel to "Python (reco)".
We provide several notebooks to show how recommendation algorithms can be designed, evaluated and operationalized.
The Data Preparation Notebook shows how to prepare and split data properly for recommendation systems.
The Modeling Notebooks provide a deep dive into implementations of different recommender algorithms.
The Evaluation Notebooks show how to evaluate recommender algorithms for different ranking and rating metrics.
The Model selecting and optimizing Notebooks collect how to fine tune hyperparameters for recommender algorithms.
The Operationalizion Notebook demonstrates how to deploy models in production systems.
The Quick-Start and Modeling notebooks showcase how to utilize the following algorithms to build a recommender system:
The table below lists recommender algorithms available in the repository at the moment.
|Surprise/Singular Value Decomposition (SVD)||Python||Collaborative Filtering||General purpose algorithm for smaller datasets|
|Alternating Least Squares (ALS)||Spark||Collaborative Filtering||General purpose algorithm for larger datasets, optimized with Spark|
|Smart Adaptive Recommendations (SAR)||Python / Spark||Collaborative Filtering||Generalized algorithm utilizing item similarities and can easily adapt to new users|
|Vowpal Wabbit Family (VW)||Python / Online||Collaborative, Content-based Filtering||Fast online learning algorithms, great for scenarios where user features / context are constantly changing, like real-time bidding|
|eXtreme Deep Factorization Machine (xDeepFM)||Python / GPU||Hybrid||Deep learning model combining implicit and explicit features|
|Deep Knowledge-Aware Network (DKN)||Python / GPU||Content-based Filtering||Deep learning model incorporating a knowledge graph and article embeddings to provide powerful news or article recommendations|
|Deep Learning Recommenders|
|Neural Collaborative Filtering (NCF)||Python / GPU||Collaborative Filtering||General algorithm built using a multi-layer perceptron|
|Restricted Boltzmann Machines (RBM)||Python / GPU||Collaborative Filtering||Generative neural network algorithm built to learn the underlying probability distribution for user/item affinity|
|FastAI Embedding Dot Bias (FAST)||Python / GPU||Collaborative Filtering||General purpose algorithm embedding dot biases for users and items|
In addition, we also provide a comparison notebook to illustrate how different algorithms could be evaluated and compared. In this notebook, data (MovieLens 1M) is randomly split into train/test sets at a 75/25 ratio. A recommendation model is trained using each of the collaborative filtering algorithms below. We utilize empirical parameter values reported in literature here. For ranking metrics we use k = 10 (top 10 results). We run the comparison on a Standard NC6s_v2 Azure DSVM (6 vCPUs, 112 GB memory and 1 K80 GPU). Spark ALS is run in local standalone mode.
This project welcomes contributions and suggestions. Before contributing, please see our contribution guidelines.