Learning to Listen, Read, and Follow:
Score Following as a Reinforcement Learning Game
This repository contains the corresponding code for our paper:
Dorfer M., Henkel F., and Widmer G.
"Learning to Listen, Read, and Follow: Score Following as a Reinforcement Learning Game".
In Proceedings of the 19th International Society for Music Information Retrieval Conference, 2018
Score following is the process of tracking a musical performance (audio) with respect to a known symbolic representation (a score). The figure below shows a sketch of this task.
In our paper we formulate score following as a multimodal Markov Decision Process (MDP), the mathematical foundation for sequential decision making. Given this formal definition, we can address the score following task with state-of-the-art reinforcement learning (RL) algorithms. In particular, the goal is to design multimodal RL agents that simultaneously learn to listen to music, read the scores from images of sheet music, and follow the audio along in the sheet, in an end-to-end fashion. All this behavior should be learned entirely from scratch, based on a weak and potentially delayed reward signal that indicates to the agent how close it is to the correct position in the score.
We would also like to emphasise a view aspects of the score following game that make it, besides a nice MIR application, also an interesting machine learning problem:
- It is a reinforcement learning problem with training, validation and test set. This allows us to investigate the generalization abilities of reinforcement learning algorithms and network architectures. The ultimate goal are score following agents generalizing to complete unseen musical pieces (scores) as well as audio conditions.
- It is a multimodal representation learning problem. This could open doors to explore concepts such as DCCA, that already worked well for other multimodal applications.
- It involves interaction with humans. Ideally we end up with agents that are general enough to follow human performers.
Before we can start working with the score following game, we first need to set up a few things:
Setup and Requirements
For a list of required python packages see the requirements.txt or just install them all at once using pip.
pip install -r requirements.txt
If your are an anaconda user, we also provide an anaconda environment file which can be installed as follows:
conda env create -f environment.yml
Next, clone the project from github.
git clone firstname.lastname@example.org:CPJKU/score_following_game.git
To install the score_following_game package in develop mode run
python setup.py develop --user
in the root folder of the package.
This is what we recommend, especially if you want to try out new ideas.
Software Synthesizer - FluidSynth
Make sure that you have fluidsynth available on your system. We will need it to synthesise the audios from MIDI. Synthesising the audios and computing spectrograms will take a while but it is done only once when training an agent for the first time.
Check if the Score Following Game works
Once you have installed all packages and downloaded the data everything should be ready to train the models. To check if the game works properly on your system you can run the following script and play the game on your own.
python environment_test.py --data_set test_sample --agent_type human --game_config game_configs/nottingham.yaml
With the key "w" you can increase the agents pixel progression speed, with "s" you can decrease it. Remember, the goal is to collect as much reward as possible.
Important: If you are a MAC user you have to add your terminal application (e.g. iTerm) to the list of accessibility input devices (system control -> security & privacy -> accessibility).
If you don't want to play on your own, you can also run an optimal agent.
python environment_test.py --data_set test_sample --agent_type optimal --game_config game_configs/nottingham_continuous.yaml
Important: This wont work on a server without a x-window system so ideally you check it on a desktop machine. This holds also if you want to visually inspect the performance of your trained agents. Then you will also need to have a cuda-capable GPU available on your desktop system.
Data Preparation for Training an Agent
Our experiments and the score following game are based on the Nottingham database and the MSMD dataset. For the score following game we wont need the entire content delivered with these data sets so we provide a preprocessed version of both ready for download.
The easiest way to get started is to simply run the following script which will automatically download and prepare the data set for you.
python prepare_game_data.py --destination <PATH-TO-DATA-ROOT>
If automatically downloading and preparing the data fails for any reason just download it manually from here and extract it to your desired data path .
Note: For simplicity we only use one soundfont for synthesizing the audios in this repository.
Training an Agent
To train a model on a specific data set, learning algorithm and network architecture you can start with our suggested commands below.
Note: Our sf_experiment.py training script has a very verbose command line and we rely on the default parametrization for the following exemplar calls. If you want to try different agent configurations please run
python sf_experiment.py --help
to learn more.
Train A2C Agent on Nottingham (monophonic):
python sf_experiment.py --net ScoreFollowingNet --train_set <PATH-TO-DATA-ROOT>/score_following_game_data/nottingham/nottingham_train --eval_set <PATH-TO-DATA-ROOT>/score_following_game_data/nottingham/nottingham_valid --game_config game_configs/nottingham.yaml --log_root <LOG-ROOT>/runs_ISMIR18 --param_root <LOG-ROOT>/params_ISMIR18
Train A2C Agent on MSMD (polyphonic):
python sf_experiment.py --net ScoreFollowingNetMSMDLCHSDeepDo --train_set <PATH-TO-DATA-ROOT>/score_following_game_data/msmd_all/msmd_all_train --eval_set <PATH-TO-DATA-ROOT>/score_following_game_data/msmd_all/msmd_all_valid --game_config game_configs/mutopia_lchs1.yaml --log_root <LOG-ROOT>/runs_ISMIR18 --param_root <LOG-ROOT>/params_ISMIR18
We use a tensorboard port for pytorch to watch our training progress. If you have all packages installed you can start it with:
tensorboard --logdir <LOG-ROOT>/runs_ISMIR18
Once tensorboard is running you should be able to view it in your browser at http://localhost:6006.
Evaluating an Agent
To investigate the performance of your trained agents you have the following two options:
An Audio-Visual Quality Check
To get an intuition on how well your agents work you can visualize its performance on a particular piece. Just run the following command and you will get a rendering similar to the one provided in the youtube video above.
python test_score_following.py --params <LOG-ROOT>/params_ISMIR18/<run_id>/best_model.pt --data_set <PATH-TO-DATA-ROOT/score_following_game_data/nottingham/nottingham_test
Without specifying a specific piece using the parameter --piece a random piece is selected from the provided split folder.
Computing the Numbers
To compute the performance measures over the entire training, validation or test set you can run the following command.
python eval_score_following.py --trials 1 --params <LOG-ROOT>/params_ISMIR18/<run_id>/best_model.pt --data_set <PATH-TO-DATA-ROOT/score_following_game_data/nottingham/nottingham_test
This should produce something like this, reporting the ratio of correctly tracked onsets for each test piece:
reels_simple_chords_53 tracking ratio: 0.57 reels_simple_chords_203 tracking ratio: 1.00 + jigs_simple_chords_29 tracking ratio: 1.00 + ...
Recall that the agent's are following a stochastic policy. So we recommend to increase the number of evaluation trials (for example to 10 as in our paper) to get more robust estimates.
Trying out a Pre-trained Agent
If you would like to try out a pre-trained agent here is a recipe how can do it: We assume here that the data is already set up as explained above.
- download the pre-trained parameters from here.
- run the command below.
- optional: change the number of evaluation trails in the command below to 1 for a quick check. Run the full 10 evaluations to see how stable the model is.
python eval_score_following.py --trials 10 --params <PATH-TO-PRETRAINEDMODEL>.pt --data_set <PATH-TO-DATA-ROOT>/score_following_game_data/msmd_all/msmd_all_test --game_config game_configs/mutopia_lchs1.yaml --net ScoreFollowingNetMSMDLCHSDeepDoLight
What you should get as an output is the following:
& 0.75 & 0.74 & 19.32 & 23.45 \\ & 0.73 & 0.73 & 18.89 & 23.17 \\ & 0.74 & 0.75 & 19.27 & 23.62 \\ & 0.77 & 0.77 & 18.86 & 22.67 \\ & 0.77 & 0.76 & 19.27 & 23.46 \\ & 0.75 & 0.76 & 19.88 & 24.79 \\ & 0.77 & 0.74 & 19.06 & 22.64 \\ & 0.76 & 0.77 & 18.73 & 22.63 \\ & 0.76 & 0.73 & 19.56 & 24.25 \\ & 0.75 & 0.75 & 19.21 & 23.43 \\ -------------------------------- & 0.76 & 0.75 & 19.21 & 23.41 \\
The last row is the average over all 10 evaluation trials.
If you want to see how the pre-trained model performs on a single piece run the following command (this will show a video and requires a graphical output):
python test_score_following.py --params <PATH-TO-PRETRAINEDMODEL>.pt --data_set <PATH-TO-DATA-ROOT>/score_following_game_data/msmd_all/msmd_all_test --game_config game_configs/mutopia_lchs1.yaml --net ScoreFollowingNetMSMDLCHSDeepDoLight --piece BachJS__BWV117a__BWV-117a
Note: This is one of the pieces where the model fails from time to time. We selected this example to show the stochastic behaviour of a Policy Gradient Agent.