Skip to content


Merge pull request #3 from CPJKU/update-readme
Browse files Browse the repository at this point in the history
Refactoring of
  • Loading branch information
karapostK committed Nov 10, 2020
2 parents 947193e + 5e5e714 commit 76e9465
Show file tree
Hide file tree
Showing 3 changed files with 102 additions and 92 deletions.
194 changes: 102 additions & 92 deletions
@@ -1,28 +1,60 @@
# LEMONS: Listenable Explanations for Music recOmmeNder Systems
**LEMONS** addresses the issue of explaining of why a track has been recommended to a user by providing listenable explanations based on the track itself.

We consider the following users/personas, distinguised by music preference:
- **Marko**: Favourite genre is reggae. He also prefers more niche tracks and loud music.
- **Matteo**: Favourite genres are trance, blues, and progressive.
- **Johnny**: Listens to a bit of everything.
- **Elizabeth**: Her top 3 genres she likes are rock, alternative metal, and heavy metal.
- **Nina**: Favourite genres are rock, emo, and post-hardcore.
- **Paige**: Mostly listens to popular music.
- **Sandra**: Favourite genres are hip-hop and rap, especially dirty south rap.
## Overview

## Install the conda environment
**LEMONS** consists of the following 2 parts:

1. Music Recommender System. The RS takes in input the audio tracks and outputs the relevance for the user.
2. Listenable Explanations. Explanations are computed post-hoc using [audioLIME](, an extension of [LIME]( for audio data.

The functionality is demonstrated using a [streamlit]( app. A screenshot of the **LEMONS** app can be seen below.


You can check out the [video of our demo]( (~9 minutes).

In the following you can find the details about the recommender system, how to setup and conduct the same experiments and how to run the `streamlit` app to play around with the explanations.

## Audio-based Recommender System - Model and Training details
### Input
For training on the Million Song Dataset, we use snippets from 7digital. Snippet durations range from 30s to 60s.
Audios are downsampled to 16kHz and transformed in decibel mel-spectograms. We use 256 mel bins with a hop size of 512. Only for training, we train on 1s randomly selected part of the snippet, leading to the input shape of 256x63.

### Model
The structure of the audio-based recommender system is depicted below.
| Layers |
| BatchNorm2d |
| Conv2d(1,64), BatchNorm2d, ReLU, MaxPool2d |
| Conv2d(64,128), BatchNorm2d, ReLU, MaxPool2d |
| Conv2d(128,128), BatchNorm2d, ReLU, MaxPool2d |
| Conv2d(128,128), BatchNorm2d, ReLU, MaxPool2d |
| Conv2d(128,64), BatchNorm2d, ReLU, MaxPool2d |
| Cat(AdaptiveAvgPool2d + AdaptiveMaxPool2d) |
| Dropout(0.5) |
| Linear(128,1) |

Convolutions have a kernel of 3x3 while MaxPooling halves in both dimensions each time.
In the last layers we concatenate global average pooling and global max pooling, apply dropout, and feed it to a linear layer.

### Training
We use a batch size of 20 and train for 1000 epochs with a learning rate of 1e-3, weight decay of 1e-4, and Adam optimizer.
We train a total of 7 models, one for each user.

### Validation and Testing
For evaluation, we use as input the whole track.

## Setup

### Create an environment with all dependencies

```shell script
conda env create -f ajures.yml
conda activate ajures
Furthermore, audioLIME needs to be installed manually (while ajures is active):
```shell script
cd explanations
pip install -e git://
conda env create -f lemons.yml
conda activate lemons

### Setup
### Install `lemons`
In the root directory, run the following:
```shell script
python3 develop
Expand All @@ -32,10 +64,16 @@ or, if it doesn't work
pip install -e .

### Config

Some paths need to be set, e.g. to the location of your data.
Copy `` to `` and set your paths there. `` is in `.gitignore`
such that each user has their own config without overwriting the others.

## Training
Before training, it could be necessary to tune the following parameters.

In local_conf in, change the following:
We use [`sacred`]( to log all experiments. In `local_conf` in `recsys/`, change the following:
- mongodb_db_name
local_conf = {
Expand All @@ -45,41 +83,35 @@ local_conf = {

In experiment_config() in, you can change the following parameters (commented):
In `experiment_config()` in `recsys/`, you can change the following parameters:
def experiment_config():
# --Logging Parameters-- #

uid = generate_uid()
model_save_path = '../experiments/{}/'.format(uid)
use_tensorboard = 0 # if also tensorboard (together with sacred) should be used
log_step = 100 # how many batches have to pass before logging the batch loss (NB. this is not for avg_loss)

# --Training Parameters-- #

training_seed = 1930289 # seed used for training (independent of the data seed)
model_type = 'base' # which model to train (check to see the ones available)
input_length = get_model_input_length(model_type)
model_load_path = '' # if load pre-trained model
freeze = 1 # if freeze the weights of a pre-trained model
batch_size = 20 # batch size
n_epochs = 1000 # epochs for training
lr = 1e-3 # learning rate
wd = 1e-4 # weight decay
num_workers = 10 # number of workers
device = 'cuda:0' # which device to use

# --Data Parameters-- #

data_path = '' # path to the npys
meta_path = '../data/' # path to the meta data
user_name = 'marko' # users (check utils, get_user_id)
user_id = get_user_id(user_name)
# --Logging Parameters-- #

use_tensorboard = 0 # if also tensorboard (together with sacred) should be used
log_step = 100 # how many batches have to pass before logging the batch loss (NB. this is not for avg_loss)

# --Training Parameters-- #

training_seed = 1930289 # seed used for training (independent of the data seed)
model_load_path = '' # if load pre-trained model
freeze = 1 # if freeze the weights of a pre-trained model
batch_size = 20 # batch size
n_epochs = 1000 # epochs for training
lr = 1e-3 # learning rate
wd = 1e-4 # weight decay
num_workers = 10 # number of workers
device = 'cuda:0' if torch.cuda.is_available() else 'cpu' # which device to use

# --Data Parameters-- #

data_path = '' # path to the npys
meta_path = '../data/' # path to the meta data
user_name = 'marko' # users (check utils, get_user_id)
Then training can be run with:
```shell script
cd training
python3 with seed=1057386
cd recsys
The best model will be saved by default in the directory /experiments/<date>.

Expand All @@ -89,68 +121,46 @@ In local_conf in, change the following:
- mongodb_db_name (similar to above):
In experiment_config() in, you can change the following parameters (commented):
def experiment_config():
# --Logging Parameters-- #
# --Logging Parameters-- #

use_tensorboard = 1 # if also tensorboard (together with sacred) should be used
use_tensorboard = 1 # if also tensorboard (together with sacred) should be used

# --Evaluation Parameters-- #
# --Evaluation Parameters-- #

model_type = 'base' # which model to train (check to see the ones available)
input_length = get_model_input_length(model_type)
model_load_path = 'best_model.pth' # path to the trained model
results_path = os.path.dirname(model_load_path) + "/results.pkl" # TODO: not used for now
batch_size = 20 # batch size
num_workers = 10 # number of workers
device = 'cuda:0' # which device to use
model_load_path = 'best_model.pth' # path to the trained model
batch_size = 20 # batch size
num_workers = 10 # number of workers
device = 'cuda:0' if torch.cuda.is_available() else 'cpu' # which device to use

# --Data Parameters-- #
# --Data Parameters-- #

data_path = '' # path to the npys
meta_path = '../data/' # path to the meta data
user_name = 'marko' # users (check utils, get_user_id)
user_id = get_user_id(user_name)
data_path = '' # path to the npys
meta_path = '../data/' # path to the meta data
user_name = 'marko' # users (check utils, get_user_id)
Then the evaluation can be run with:
```shell script
cd training/
python3 with seed=1057386
The results will be saved in the same directory of "model_load_path".

### Config

Some paths need to be set, e.g. to the location where you want to store the spleeter model which
is used for separating the sources (it will be downloaded when used the first time).
Copy `` to `` and set your paths there. `` is in `.gitignore`
such that each user has their own config without overwriting the others.

### Demo
## Demo

You can look at a demonstration using the `streamlit` app.
It has to be run from the `ajures` directory.
It has to be run from the `lemons` root directory.

streamlit run explanations/
streamlit run explanations/

## Audio-based Recommender System - Model and Training details
### Input
For training on the Million Song Dataset, we use snippets from 7digital. Snippet durations range from 30s to 60s.
Audios are downsampled to 16kHz and transformed in decibel mel-spectograms. We use 256 mel bins with a hop size of 512. Only for training, we train on 1s randomly selected part of the snippet, leading to the input shape of 256x63.
## Experiments & Results

### Model
At the beginning of our model, we carry out batch normalization.
Afterwards, the inputs go through 5 layers of convolutions. Each convolution is followed by another batch normalization, ReLU, and Max Pooling.
The number of channels for the convolutions are: 1 -> 64 -> 128 -> 128 -> 64. Each maxpooling halves the width and height.
In the last layers, we perform global average pooling and global max pooling. The two output are then combined, passed throught dropout and a fully connected layer which outputs the logit relevance for the track.
We split the tracks into train, validation,and test set in an 80-10-10 fashion and select the model that achieves the bestresults in terms of AUC and MAP on the validation set. The results on the testset averaged across the users are 0.734±0.130 MAP and 0.758±0.113 AUC.

### Training
We use a batch size of 20 and train for 1000 epochs with a learning rate of 1e-3, weight decay of 1e-4, adn Adam optimizer.
We train a total of 7 models, one for each user.
### Stability of explanations
We select the number of samples in the neighborhood N_s to get stable explanations by following the procedure described in [[Mishra 2020] Reliable Local Explanations for Machine Listening]( In this experiment, the computation of the explanations is repeated 5 times, and each time the top k=3 interpretable components are recorded. With increasing number of samples N_s the number of unique components U_n should approach k (in our case: 3). We found that a number of N_s=2^11=2048 suffices to compute stable explanations in a reasonable amount of time.

### Validation and Testing
For evaluation, we use as input the whole track.

## audioLIME
We select the number of samples N_s to get stable explanations by following the procedure described in~\cite{Mishra2020Reliable}. Preliminary experiments on a subset of test examples (50 per user) showed that N_s=2^{11} suffices.
Each violin represents the results for one user model for a subset of the test set (50 examples). Each data point in a violin shows how many unique components U_n (shown on the x-axis) were selected when repeating computation of the explanation for a test sample 5 times. The y-axis shows the number of neighborhood examples N_s that was used for training the explainer in each case. The figure shows that increasing N_s decreases U_n, on average. This means that for example for Sandra (purple), using `N_s=2048` and repeatedly computing an explanation consisting of 3 components for the same track will result in the same 3 components being picked (for a majority of the test songs).
Binary file added imgs/landing_page.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added imgs/stability.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 76e9465

Please sign in to comment.