This python package (webpage) implements our ICML'16 paper:
Arthur Mensch, Julien Mairal, Bertrand Thirion, Gaël Varoquaux. Dictionary Learning for Massive Matrix Factorization. International Conference on Machine Learning, Jun 2016, New York, United States. 2016
It allows to perform sparse / dense matrix factorization on fully-observed/missing data very efficiently, by leveraging random sampling with online learning.
Reference paper is available on HAL / arxiv. This package allows to reproduce the experiments and figures from the papers.
More importantly, it provides https://github.com/scikit-learn/scikit-learn compatible estimators that fully implements the proposed algorithms.
Installation from source is simple In a command prompt:
git clone https://github.com/arthurmensch/modl.git
cd modl
pip install -r requirements.txt
pip install .
cd $HOME
py.test --pyargs modl
Two simple examples runs out-of-the box. Those are a good basis for understanding the API of modl
estimators.
- ADHD (rfMRI) sparse decomposition, relying on nilearn
python examples/adhd_decompose.py
- Movielens (User/Movie ratings) prediction
python examples/recsys_predict.py
For Movielens example, you will need to download the dataset, from spira repository.
make download-movielens10m
Recommender systems experiments can be reproduced running the following command in the root repository.
python examples/experimental/recsys/recsys_compare.py
You will need to download datasets beforehand:
make download-movielens1m
make download-movielens10m
make download-netflix
You will need to retrieve the S500 release of the HCP dataset in some way beforehand. You may use the public S3 bucket, order filled hard-drives, or download it directly.
Edit $HCPLOCATION
in the Makefile
and run
make hcp
to create symlinks and download a useful mask.
The HCP experiment can be reproduced as such:
# unmask data
python examples/experiment/fmri/hcp_prepare.py
# compare methods
python examples/experiment/fmri/hcp_compare.py
# analyse convergence
python examples/experiment/fmri/hcp_analysis.py
# plot results
python examples/experiment/fmri/hcp_plot.py
By default, results will be available in $HOME/output/modl
Please feel free to report any issue and propose improvements on github.
Related projects :
- spira is a python library to perform collaborative filtering based on coordinate descent. It serves as the baseline for recsys experiments - we hard included it for simplicity.
- scikit-learn is a python library for machine learning. It serves as the basis of this project.
- nilearn is a neuro-imaging library that we wrap in our fMRI related estimators.
Licensed under simplified BSD.
Arthur Mensch, 2015 - present