Skip to content

Repository which includes a reproducibility experiment and two full e2e experiments considering the VBPR architecture described in the original paper by Prof. Julian McAuley of 2016

License

Notifications You must be signed in to change notification settings

Silleellie/VBPR-Reproducibility

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VBPR Reproducibility: comparison and end-to-end experiments with ClayRS can see

pylint

Repository which includes everything related to the paper Reproducibility Analysis of Recommender Systems relying on Visual Features: traps, pitfalls, and countermeasures

The following are the experiments that could be reproduced using this repository:

  • Experiment 1: comparing VBPR results
    • Comparing the implementation of the VBPR algorithm between the modified version of ClayRS and Cornac
  • Experiment 2: Testing ClayRS Can See functionalities to include images as side information
    • Performing an end-to-end experiment using the modified version of ClayRS with the pre-trained caffe reference model on different pre-processing configurations
  • Experiment 3: Testing state-of-the-art models for extracting features from images
    • Performing an end-to-end experiment using the modified version of ClayRS with the pre-trained vgg19 and resnet50 models

Check the 'Experiment pipeline' section for an overview of the operations carried out by the three different experiments

All the experiments provided in this repository are compliant with the proposed checklist:

Stage Check Value
Dataset Collection ✅ Link to a downloadable version of the dataset collection Tradesy raw feedback,
Image features binary file,
Tradesy Images from DVBPR dataset
✅ Any pre-filtering process performed on data $\forall$ experiment, duplicate interactions are removed and users with less than five interactions are not considered, script.
For Experiment 2 and Experiment 3, images from the Tradesy Images DVBPR dataset were removed in order to re-create the VBPR dataset (since original dataset is not accessible), script
✅ Relevant dataset statistics $\forall$ experiment, lines 18-27 of terminal output
✅ Preprocessing operations performed on side information Experiment 1: no preprocessing performed, visual features provided by original authors were used,
Experiment 2: lines 23-24, 42-47 of yaml report, lines 71-73, 83-86 of script,
Experiment 3: lines 21-34, 50-63 of yaml report, lines 64-67, 74-77 of script
✅ Pre-trained models adopted to represent side information bvlc_reference_caffenet,
resnet50,
vgg19
Data Splitting ✅ Protocol used for data partitioning and random seed to reproduce random splits Holdout $\forall$ user with test set size of one instance with random seed set at 42, script
⬜ Link to a downloadable version of the training/test/validation sets Train and test sets are not provided, but can be easily reproduced by running the main data pipeline , by setting the random state to 42
Recommendation ✅ Name and version of the framework containing the recommendation algorithm Clayrs can See (modified version of Clayrs v0.4),
Cornac v1.14.2
✅ Source code of the recommendation algorithm and setting of parameters Source code of the recommendation algorithm:
Clayrs can See VBPR,
Cornac VBPR

Parameters settings:
ClayRS can See: lines 61-70 of script,
Cornac: lines 102-121 of script
⬜ Method to select the best hyperparameters No hyperparameter tuning was carried out
✅ Any random seed necessary to reproduce random processes All random processes were set to random seed 42
Candidate Item Filtering ✅ Set of target items to generate a ranking All items of the system were taken into account
✅ Strategy (TestRatings, TestItems, TrainingItems, AllItems, One-Plus-Random) AllItems
Evaluation ✅ Name and version of the framework used to compute metrics Cornac framework for evaluating cornac models,
Custom AUC implementation to evaluate ClayRS model, lines of script: 64-118
✅ List of metrics adopted and cutoff for recommendation lists The only metric used was AUC, and all ranked items were taken into account to compute it
⬜ Normalization strategy adopted No normalization strategy was applied for the metric chosen (AUC)
✅ Averaging strategy adopted (e.g. micro or macro-average) System results were generated by performing macro-average over the user results, line 115 of script
✅ List of results in a standard format (per fold and overall) Experiment 1 AUC results path: reports/exp1,
Experiment 2 AUC results path: reports/exp2,
Experiment 3 AUC results path: reports/exp3
Statistical testing ✅ Data on which the test is performed Experiment 1: AUC results between ClayRS and Cornac for each epoch located at reports/exp1,
Experiment 2: AUC results between caffe and caffe_center_crop trained recommender for each epoch located at reports/exp2,
Experiment 3: AUC results between vgg19 and resnet50 trained recommender for each epoch located at reports/exp3
✅ Type of test and p-value ttest statistical test was used:
Experiment 1 p-value results path: reports/ttest_results/exp1,
Experiment 2 p-value results path: reports/ttest_results/exp2,
Experiment 3 p-value results path: reports/ttest_results/exp3
⬜ Corrections for multiple comparisons No correction was applied

How to Use

Simply execute pip install requirements.txt in a freshly created virtual environment.

The source code has been tested and results have been produced with python 3.9 and CUDA V11.6. Please note that CUDA must be installed to run the experiments.

To perform the exp1 experiment, which is the comparison of the VBPR implementation between ClayRS and Cornac, run via command line:

python pipeline.py -epo 5 10 20 50 -exp exp1

In this way, raw data will first be downloaded and processed, and then the actual experiment will be run using the default parameters.

  • By default, the experiment is run for $5$, $10$, $20$ and $50$ epochs. Default parameters can be easily changed by passing them as command line arguments

To perform the exp2 experiment, which is the end-to-end experiment in which ClayRS can see is tested to include images as side information (using bvlc_reference_caffenet with two different pre-processing configurations), run via command line:

python pipeline.py -epo 10 20 -exp exp2
  • The experiment was performed by setting 10 and 20 epochs using the epo parameter, however any number of epochs can be specified

To perform the exp3 experiment, which is the end-to-end experiment in which ClayRS can see is tested using state-of-the-art models (vgg19 and resnet50) for extracting features from images, run via command line:

python pipeline.py -epo 10 20 -exp exp3

You can inspect all the parameters that can be set by simply running python pipeline.py –h. The following is what you would obtain:

$ python pipeline.py –h

usage: pipeline.py [-h] [-epo 5 [5 ...]] [-bs 128] [-gd 20] [-td 20] [-lr 0.005] [-seed 42] [-nt_ca 4] [-exp exp1]

Main script to reproduce the VBPR experiment

optional arguments:
  -h, --help            show this help message and exit
  -epo 5 [5 ...], --epochs 5 [5 ...]
                        Number of epochs for which the VBPR network will be trained
  -bs 128, --batch_size 128
                        Batch size that will be used for the torch dataloaders during training
  -gd 20, --gamma_dim 20
                        Dimension of the gamma parameter of the VBPR network
  -td 20, --theta_dim 20
                        Dimension of the theta parameter of the VBPR network
  -lr 0.005, --learning_rate 0.005
                        Learning rate for the VBPR network
  -seed 42, --random_seed 42
                        random seed
  -nt_ca 4, --num_threads_ca 4
                        Number of threads that will be used in ClayRS during Content Analyzer serialization phase
  -exp exp1, --experiment exp1
                        exp1 to perform the comparison experiment with Cornac,
                        exp2 to perform end to end experiment using caffe via ClayRS can see,
                        exp3 to perform end to end experiment using vgg19 and resnet50 via Clayrs can see

Experiment pipeline

The following is a description of the operations carried out by the pipeline depending on the experiment type (exp1, exp2, exp3) set by changing the -exp parameter

-exp exp1

Data:

  • Download raw tradesy feedback from here
  • Download binary file containing features of images from here
  • Filter raw interactions following original VBPR paper instructions and remove duplicate interactions
  • Extract into a npy matrix features from the binary file for items which appear in the filtered interactions
  • Build item map (following the order in which each item appears in the binary file)
  • Build train and test set with leave-one-out using -seed parameter as random state
  • Build user map (following the order in which each user appears in the filtered interactions)

Experiment and evaluation:

  • Fit VBPR algorithm via ClayRS can see and Cornac using command line arguments when invoking pipeline.py (-epo, -bs, -gd, etc.)
  • Compute AUC of each user and the average AUC for both ClayRS and Cornac
  • Perform ttest statistical test between ClayRS user results and Cornac user results

-exp exp2

Data:

  • Download raw tradesy feedback from here
  • Download npy file containing tradesy images from here
  • Download caffe model and all of its necessary files:
    • bvlc_reference_caffenet model from here
    • deploy.prototxt for bvlc_reference_caffenet from here
    • ilsvrc_2012_mean.npy file containing mean pixel value from here
  • Filter raw interactions following original VBPR paper instructions and remove duplicate interactions
  • Download binary file containing features of images from here
  • Extract into a npy matrix features from the binary file for items which appear in the filtered interactions
  • Build item map (following the order in which each item appears in the binary file)
  • Extract from the npy file into a folder the images of the items which appear in the filtered interactions
  • Build a .csv file associating each item to the path of its image in said directory
  • Build train and test set with leave-one-out using -seed parameter as random state
  • Build user map (following the order in which each user appears in the filtered interactions)

Experiment and evaluation:

  • From the images dataset, create processed contents using the Content Analyzer. Each serialized content (corresponding to an item) will have two different representations:
    • caffe: same model as the one used in the VBPR paper (and pre-processing operations suggested for the model by the Caffe framework from here)
    • caffe_center_crop: same configuration, but only center crop to 227x227 dimensions is applied as pre-processing operation
  • Fit a different VBPR algorithm for the two representations via ClayRS can see using command line arguments when invoking pipeline.py (-epo, -bs, -gd, etc.)
  • Compute AUC of each user and the average AUC for ClayRS for each VBPR algorithm instance
  • Perform ttest statistical test between the two configurations

-exp exp3

Data:

  • Download raw tradesy feedback from here
  • Download npy file containing tradesy images from here
  • Filter raw interactions following original VBPR paper instructions and remove duplicate interactions
  • Download binary file containing features of images from here
  • Extract into a npy matrix features from the binary file for items which appear in the filtered interactions
  • Build item map (following the order in which each item appears in the binary file)
  • Extract from the npy file into a folder the images of the items which appear in the filtered interactions
  • Build a .csv file associating each item to the path of its image in said directory
  • Build train and test set with leave-one-out using -seed parameter as random state
  • Build user map (following the order in which each user appears in the filtered interactions)

Experiment and evaluation:

  • From the images dataset, create processed contents using the Content Analyzer. Each serialized content (corresponding to an item) will have two different representations:
    • resnet50: features are extracted from the pool5 layer of the ResNet50 architecture
    • vgg19: features are extracted from the last convolution layer before the fully-connected ones of the vgg19 architecture and global max-pooling is applied to them
  • Fit a different VBPR algorithm for the two representations via ClayRS can see using command line arguments when invoking pipeline.py (-epo, -bs, -gd, etc.)
  • Compute AUC of each user and the average AUC for ClayRS for each VBPR algorithm instance
  • Perform ttest statistical test between the two configurations

Project Organization

├── 📁 data                          <- Directory containing all data generated/used by both experiments
│   ├── 📁 interim                       <- Intermediate data that has been transformed
│   ├── 📁 processed                     <- The final, canonical data sets used for training
│   └── 📁 raw                           <- The original, immutable data dump
│
├── 📁 models                        <- Trained and serialized models at different epochs for the three experiments
│   ├── 📁 exp1                          <- Models which are output of the experiment 1
│   │   ├── 📁 vbpr_clayrs                   <- ClayRS models which are output of the experiment 1
│   │   └── 📁 vbpr_cornac                   <- Cornac models which are output of the experiment 1
│   │
│   ├── 📁 exp2                          <- Models which are output of the experiment 2
│   └── 📁 exp3                          <- Models which are output of the experiment 3
│
├── 📁 reports                       <- Generated metrics and reports by the three different experiments
│   ├── 📁 exp1                          <- System-wise and per-user AUC results output of the experiment 1
│   │   ├── 📁 vbpr_clayrs                   <- ClayRS AUC results which are output of the experiment 1
│   │   └── 📁 vbpr_cornac                   <- Cornac AUC results which are output of the experiment 1
│   │
│   ├── 📁 exp2                          <- System-wise and per-user AUC results output of the experiment 2
│   ├── 📁 exp3                          <- System-wise and per-user AUC results output of the experiment 3
│   ├── 📁 ttest_results                 <- Results of the ttest statistic for each epoch for all three experiments
│   │   ├── 📁 exp1                          <- ttest results output of the experiment 1
│   │   ├── 📁 exp2                          <- ttest results output of the experiment 2
│   │   └── 📁 exp3                          <- ttest results output of the experiment 3
│   │
│   ├── 📁 yaml_clayrs                   <- Reports generated by the Report class in ClayRS to document all techniques and parameters used in the experiments
│   │   ├── 📁 exp1_rs_report                <- Reports generated for each Recommender System configuration in the experiment 1
│   │   ├── 📁 exp2_rs_report                <- Reports generated for each Recommender System configuration in the experiment 2
│   │   ├── 📁 exp3_rs_report                <- Reports generated for each Recommender System configuration in the experiment 3
│   │   ├── 📄 exp1_ca_report.yml            <- Report generated for the Content Analyzer module in the experiment 1
│   │   ├── 📄 exp2_ca_report.yml            <- Report generated for the Content Analyzer module in the experiment 2
│   │   └── 📄 exp3_ca_report.yml            <- Report generated for the Content Analyzer module in the experiment 3
│   │
│   ├── 📄 exp1_terminal_output.txt      <- Output of the terminal which generated committed results for experiment 1
│   ├── 📄 exp2_terminal_output.txt      <- Output of the terminal which generated committed results for experiment 2
│   └── 📄 exp3_terminal_output.txt      <- Output of the terminal which generated committed results for experiment 3
│
├── 📁 src                           <- Source code of the project
│   ├── 📁 data                          <- Scripts to download and generate data
│   │   ├── 📄 create_interaction_csv.py
│   │   ├── 📄 create_tradesy_images_dataset.py
│   │   ├── 📄 dl_raw_sources.py
│   │   ├── 📄 extract_features_from_source.py
│   │   └── 📄 train_test_split.py
│   │
│   ├── 📁 evaluation                <- Scripts to evaluate models and compute ttest
│   │   ├── 📄 compute_auc.py
│   │   └── 📄 ttest.py
│   │
│   ├── 📁 model                     <- Scripts to train models
│   │   ├── 📄 exp1_clayrs_experiment.py
│   │   ├── 📄 exp1_cornac_experiment.py
│   │   ├── 📄 exp2_caffe.py
│   │   ├── 📄 exp3_vgg19_resnet.py
│   │   ├── 📄 clayrs_experiment.py
│   │   └── 📄 cornac_experiment.py
│   │
│   ├── 📄 __init__.py                   <- Makes src a Python module
│   └── 📄 utils.py                      <- Contains utils function for the project
│
├── 📄 LICENSE                       <- MIT License
├── 📄 pipeline.py                   <- Script that can be used to reproduce or customize the experiment pipeline
├── 📄 README.md                     <- The top-level README for developers using this project
└── 📄 requirements.txt              <- The requirements file for reproducing the analysis environment (src package)

Project based on the cookiecutter data science project template. #cookiecutterdatascience

About

Repository which includes a reproducibility experiment and two full e2e experiments considering the VBPR architecture described in the original paper by Prof. Julian McAuley of 2016

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages