This repository accompanies the publication: Katharina Hoedt, Arthur Flexer, and Gerhard Widmer, 'Defending a Music Recommender Against Hubness-Based Adversarial Attacks', published at the SMC-22 Scientific Paper Track. To view our supplementary material, containing listening examples, look at our Github page.
This research was supported by the Austrian Science Fund (FWF, P 31988). GW’s work is supported by the European Research Council (ERC) under the EU’s Horizon 2020 research and innovation programme, grant agreement No 101019375 (Whither Music?).
If you encounter any issues or bugs within this repo, do not hesitate to contact us!
All required libraries are listed in the file environment.yml
.
If you have conda available, you can create an
environment for this project and activate it as follows:
conda env create -f environment.yml
conda activate mp
The system we perform our experiments on is the Fm4 Soundpark [3], where various songs can be found and downloaded. We cannot provide the exact data we used in this work here, nevertheless the code should provide a good starting base for experiments. Note that we assume every song to have a duration of at least 2 minutes in our experiments. For more information, please contact us directly.
[3] Martin Gasser and Arthur Flexer (2009). Fm4 soundpark: Audio-based music recommendation in everyday use. In Proc. of the 6th Sound and Music Computing Conference, SMC, pages 23–25.
To improve the runtime efficiency, we use cached audio files and pre-compute the Gaussians and KL-divergences. Before all of your subsequent experiments, therefore run e.g.,
python -m mp.utils.preprocess --data_path <your-data-path> --cache_file <your-path/cache_file.hdf5> --save_path <your-save-path>
and replace the data_path
, cache_file
and save_path
accordingly.
The first line in Table 1 of the paper shows the result of attacking a yet undefended music recommender. This can be done by calling e.g.,
python -m mp.attack.soundpark_attack --cache_file <your-path/cache_file.h5py> --save_path <your-save-path>
In case you want to change e.g., the attack parameter, check out the configuration file (attack/config.vars
)
and the available arguments in attack/soundpark_attack.py
. Make sure to set the cache_file
and the
save_path
correctly (as defined during data preprocessing).
During the attack various values are stored in log files, which can be used to compute everything shown in Table 1 (i.e., SNR, k-occurrence,...) by calling e.g.,
python -m mp.utils.log_processing --log_files <log_file1>
Note that multiple log files can be given.
To find how many hubs / anti-hubs are in your recommendation graph before and after applying Mutual Proximity, you can call
python -m mp.utils.hub_analysis --save_path <your-save-path> --hub_size 5
To compute the retrieval accuracy as stated in the paper, call
python -m mp.utils.knn_classification --save_path <your-save-path> --label_file <path-to/labels.csv>
Note that you have to have a file containing annotations for the data (here: genres). The format
of the file is assumed to be (per line): {index},
followed by zero, one or multiple annotations.
In Table 1, we compare three different settings of an attack on the defended system. The first one
still tries to minimise the KL divergence, while the other two consider Mutual Proximity in the attack.
The last version of the attack disregards the norm of the perturbation in order to find
more successful adversaries (this can be done by setting alpha=inf
in our code). To perform these
three attacks, you can run
python -m mp.mp_attack.soundpark_mp_attack --cache_file <your-path/cache_file.h5py> --save_path <your-save-path> --adv-vars mp/mp_attack/config.vars
python -m mp.mp_attack.soundpark_mp_attack --cache_file <your-path/cache_file.h5py> --save_path <your-save-path> --adv-vars mp/mp_attack/mp_config.vars
python -m mp.mp_attack.soundpark_mp_attack --cache_file <your-path/cache_file.h5py> --save_path <your-save-path> --adv-vars mp/mp_attack/mp_nonorm_config.vars
Again, you can summarise the log files by running
python -m mp.utils.log_processing --log_files <log_file1>
with the corresponding log-files as arguments.
To perform the MP defence after an attack (on an undefended system) was already made, you can use the following commands:
python -m mp.post_hoc.mp_defence --adv_path <adversarial-folder> --save_path <your-save-path>
python -m mp.post_hoc.mp_log_processing --log_files <log-file>
This makes the defence and summarises the results of the according log files. Make sure to set the paths and log-files correctly again!
To summarise, the hyper-parameters we used for Table 1 are as follows:
Adaptation | Parameters |
---|---|
original | ε = 0.1, η = 0.001, α = 25.0 |
C&WKLmod | ε = 1.0, η = 0.001, α = 25.0 |
C&WMPmod | ε = 1.0, η = 0.0005, α = 100.0 |
C&WMPmod (w/o norm) | ε = 1.0, η = 0.0005, α = -- |
Here ε is the clipping factor, η the learning-rate determining the step size during optimisation, and α the factor that balances speed of convergence vs. perceptibility.