A comparison of LightFM's pure collaborative filtering and hybrid model performance with several tuned baseline algorithms on the MovieLens dataset.
See our Medium article here.
- Amanda Shu
- Sarat Sreepathy
The data used is the Movielens 100k dataset.
See here for a detailed description of the data. We use the files:
ua.base: training dataua.test: testing datau.item: item features data
The config folder contains several json files:
data-local-params.json: contains data parameters (when running locally) that are passed into theget_datafunction inetl.pydata-test-params.json: contains data parameters for testing data when running thetesttargetreport-params.json: contains paramaters for building thereport.html- all other files in the format
<algorithm name>-params.jsoncontain the parameters passed into their respective functions
The src folder contains subfolders data, analysis, models, and utils.
In the src/data folder:
etl.py: contains the functionget_datathat reads in the MovieLens100k data and outputs training/validation/testing user-item interaction matrices and item features data
In the src/analysis folder:
analysis.py: contains the functionrun_analysisto run the results of lightfm and baseline algorithms- All other files starting with
analysis_contain a function that runs their respective algorithm
The src/models folder contains baseline algorithms and evaluation code, taken from this repository by Dacrema, an author of "Are we Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches".
In the src/utils folder:
report.py: contains functionreportto save metric figures inresults/and outputsreport.htmlinreport/. This file contains code modified from hereclean.py: contains functionremove_resultsthat implements the standard targetclean
The notebook folder contains a Jupyter Notebook file that is run when building the report.
The test/testdata folder contains the testing data that is utilized for the standard target test. This data is only used to check correctness of the pipeline.
Run this command to run all the algorithms and get the results (this assumes there is a folder called data, which contains required data files):
python run.py data-local all-algosBesides all-algos, you can specify which algorithms to run if you only want to get the results of certain ones. Possible targets include toppop, itemknncf, userknncf, p3alpha, rp3beta, lightfm, and lightfm-hybrid. The code below runs the p3alpha baseline algorithm:
python run.py data-local p3alphaStandard targets are also implemented. all will run all the algorithms. clean will delete the folders that are outputted after running. test will run all the algorithms, using the testing data.
Running any baseline algorithm will create a folder result_experiments, which holds several files related to the tuning of the baseline algorithms (outputted by Dacrema's baseline implementations).
Running any number of algorithms will create a folder results which contains:
Metrics.csv: contains the metrics at each cutoff for each of the algorithmsMetrics.tex: same as Metrics.csv but the table is in latex formatprecision.png/recall.png: line plots comparing algorithms by their metrics over each cutoff
Additionally, a folder report will contain report.html, which is the html version of report.ipynb that lies in the notebook folder.
- Maurizio Ferrari Dacrema, Paolo Cremonesi, and Dietmar Jannach. 2019. Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches. In Thirteenth ACM Conference on Recommender Systems (RecSys ’19), September 16–20, 2019, Copenhagen, Denmark.ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3298689.334705
- F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4, Article 19 (December 2015), 19 pages. DOI=http://dx.doi.org/10.1145/2827872
- Maciej Kula. 2015. Metadata Embeddings for User and Item Cold-start Recommendations. In Proceedings of the 2nd Workshop on New Trends on Content-Based Recommender Systems co-located with 9th ACM Conference on Recommender Systems (RecSys 2015), Vienna, Austria, September 16-20, 2015. (pp. 14–21). CEUR-WS.org.