Skip to content
Recommender Systems
Branch: master
Clone or download
miguelgfierro Merge pull request #623 from Microsoft/staging
Staging to master due to milestone 14 finished Hyperparameter tuning on AzureML
Latest commit 32994bf Mar 8, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github Update text Feb 19, 2019
notebooks Merge branch 'master' into staging Mar 8, 2019
reco_utils Merge pull request #617 from Microsoft/gramhagen/sar_move_remove_seen Mar 8, 2019
scripts small typo Mar 1, 2019
tests Merge pull request #617 from Microsoft/gramhagen/sar_move_remove_seen Mar 8, 2019
.gitignore gitignore to include generated env yaml Mar 7, 2019
AUTHORS.md edit authors page Mar 5, 2019
CONTRIBUTING.md updating contributing from review comments Jan 4, 2019
LICENSE
README.md Merge branch 'staging' into miguelgfierro-patch-1 Mar 8, 2019
SETUP.md Update SETUP.md Mar 1, 2019

README.md

Recommenders

This repository provides examples and best practices for building recommendation systems, provided as Jupyter notebooks. The examples detail our learnings on five key tasks:

  • Prepare Data: Preparing and loading data for each recommender algorithm
  • Model: Building models using various classical and deep learning recommender algorithms such as Alternating Least Squares (ALS) or eXtreme Deep Factorization Machines (xDeepFM).
  • Evaluate: Evaluating algorithms with offline metrics
  • Model Select and Optimize: Tuning and optimizing hyperparameters for recommender models
  • Operationalize: Operationalizing models in a production environment on Azure

Several utilities are provided in reco_utils to support common tasks such as loading datasets in the format expected by different algorithms, evaluating model outputs, and splitting training/test data. Implementations of several state-of-the-art algorithms are provided for self-study and customization in your own applications.

Getting Started

Please see the setup guide for more details on setting up your machine locally, on Spark, or on Azure Databricks.

To setup on your local machine:

  1. Install Anaconda with Python >= 3.6. Miniconda is a quick way to get started.
  2. Clone the repository
    git clone https://github.com/Microsoft/Recommenders
    
  3. Run the generate conda file script to create a conda environment: (This is for a basic python environment, see SETUP.md for PySpark and GPU environment setup)
    cd Recommenders
    python scripts/generate_conda_file.py
    conda env create -f reco_base.yaml  
    
  4. Activate the conda environment and register it with Jupyter:
    conda activate reco_base
    python -m ipykernel install --user --name reco_base --display-name "Python (reco)"
    
  5. Start the Jupyter notebook server
    cd notebooks
    jupyter notebook
    
  6. Run the SAR Python CPU Movielens notebook under the 00_quick_start folder. Make sure to change the kernel to "Python (reco)".

NOTE - The Alternating Least Squares (ALS) notebooks require a PySpark environment to run. Please follow the steps in the setup guide to run these notebooks in a PySpark environment.

Algorithms

The table below lists recommender algorithms available in the repository at the moment.

Algorithm Environment Type Description
Smart Adaptive Recommendations (SAR)* Python CPU Collaborative Filtering Similarity-based algorithm for implicit feedback dataset
Surprise/Singular Value Decomposition (SVD) Python CPU Collaborative Filtering Matrix factorization algorithm for predicting explicit rating feedback in datasets that are not very large
Vowpal Wabbit Family (VW)* Python CPU (train online) Collaborative, Content-Based Filtering Fast online learning algorithms, great for scenarios where user features / context are constantly changing
Extreme Deep Factorization Machine (xDeepFM)* Python CPU / Python GPU Hybrid Deep learning based algorithm for implicit and explicit feedback with user/item features
Deep Knowledge-Aware Network (DKN)* Python CPU / Python GPU Content-Based Filtering Deep learning algorithm incorporating a knowledge graph and article embeddings to provide powerful news or article recommendations
Neural Collaborative Filtering (NCF) Python CPU / Python GPU Collaborative Filtering Deep learning algorithm with enhanced performance for implicit feedback
Restricted Boltzmann Machines (RBM) Python CPU / Python GPU Collaborative Filtering Neural network based algorithm for learning the underlying probability distribution for explicit or implicit feedback
FastAI Embedding Dot Bias (FAST) Python CPU / Python GPU Collaborative Filtering General purpose algorithm with embeddings and biases for users and items
Wide and Deep Python CPU / Python GPU Hybrid Deep learning algorithm that can memorize feature interactions and generalize user features
Alternating Least Squares (ALS) PySpark Collaborative Filtering Matrix factorization algorithm for explicit or implicit feedback in large datasets, optimized by Spark MLLib for scalability and distributed computing capability

NOTE - * indicates algorithms invented/contributed to by Microsoft.

Preliminary Comparison

We provide a comparison notebook to illustrate how different algorithms could be evaluated and compared. In this notebook, data (MovieLens 1M) is randomly split into training/test sets at a 75/25 ratio. A recommendation model is trained using each of the collaborative filtering algorithms below. We utilize empirical parameter values reported in literature here. For ranking metrics we use k = 10 (top 10 recommended items). We run the comparison on a Standard NC6s_v2 Azure DSVM (6 vCPUs, 112 GB memory and 1 P100 GPU). Spark ALS is run in local standalone mode.

Algo MAP nDCG@k Precision@k Recall@k RMSE MAE R2 Explained Variance
ALS 0.002020 0.024313 0.030677 0.009649 0.860502 0.680608 0.406014 0.411603
SVD 0.010915 0.102398 0.092996 0.025362 0.888991 0.696781 0.364178 0.364178
FastAI 0.023022 0.168714 0.154761 0.050153 0.887224 0.705609 0.371552 0.374281

Contributing

This project welcomes contributions and suggestions. Before contributing, please see our contribution guidelines.

Build Status

Build Type Branch Status Branch Status
Linux CPU master Status staging Status
Linux GPU master Status staging Status
Linux Spark master Status staging Status

NOTE - these tests are the nightly builds, which compute the smoke and integration tests. Master is our main branch and staging is our development branch. We use pytest for testing python utilities in reco_utils and papermill for the notebooks. For more information about the testing pipelines, please see the test documentation.

You can’t perform that action at this time.