Skip to content

ONSETS&VELOCITIES real-time piano detection - Python demo [EUSIPCO2023]

License

Notifications You must be signed in to change notification settings

andres-fr/iamusica_demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

iamusica_demo

demo screenshot

The present repository hosts a highly configurable and extendable desktop graphical application to perform real-time detection of piano note onsets and velocities, developed in the context of the IAMúsica project. Specifically, it informs about:

  • Installation of required software dependencies
  • Configuring and running the demo
  • Main components
  • Usage examples

For the detections, a pre-trained deep learning model is used (provided here). See the companion repository as well as our paper for details about training and evaluating the model, and consider citing our work if this demo is useful to you:

@inproceedings{onsvel,
      title={{Onsets and Velocities}: Affordable Real-Time Piano Transcription Using Convolutional Neural Networks},
      author={Andres Fernandez},
      year={2023},
      booktitle={{EUSIPCO} Proceedings},
}

⚠️ Note that the detector has been trained on modern piano recordings, and its performance may suffer for different instruments and in contaminated/noisy environments. To facilitate detection quality, the demo includes a series of tunable parameters (more details below).

IAMúsica logo IEB logo

IAMúsica was supported by research grant 389062, INV-23/2021 from the Institut d'Estudis Baleàrics, and is composed by:

This is Free/Libre and Open Source Software, see the LICENSE for more details! Particularly, we welcome improvement suggestions, bug reports, and any other kind of feedback+contributions through the issues.


Software dependencies

We use PyTorch for the backend and PySide2 (a Python wrapper for Qt 5.12+) for the frontend. The following instructions should allow to create a working environment from scratch, with all required dependencies (tested on Ubuntu 20.04 with conda 4.13.0):

# create and activate conda venv
conda create -n iamusica_demo python==3.9
conda activate iamusica_demo

# conda dependencies
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 -c pytorch
conda install pyside2==5.13.2 -c conda-forge
conda install pyaudio==0.2.11 -c anaconda
conda install gstreamer==1.18.5 gst-plugins-base==1.18.5 gst-plugins-good==1.18.5 gst-plugins-good-pulseaudio==1.18.4 -c conda-forge -c dtk-forge
conda install omegaconf==2.1.2 -c conda-forge
conda install pandas==1.4.2 -c anaconda
conda install h5py==3.6.0 -c anaconda
conda install matplotlib==3.4.3 -c conda-forge
conda install scipy==1.8.0 -c conda-forge

# pip dependencies
pip install qdarkstyle==3.1
pip install pytz==2021.3
pip install coloredlogs==15.0.1

See the full requirements for a comprehensive list.

Note: after less than a year, some dependencies have already managed to become heavily inconsistent with each other. While a more consistent installation process doesn't come, here are things that can be tried out:

  • Skip the pyside2 line until the very end of the conda installs, and run it with the --no-deps flag, i.e. conda install --no-deps pyside2==5.13.2 -c conda-forge
  • Install matplotlib via pip.
  • If omegaconf gives you trouble, try to add pip install antlr4-python3-runtime==4.8
  • If you have issues playing audio check this post
    • Also: sudo apt-get install libqt5multimedia5-plugins may help

Running the demo

The demo stores and loads session in the form of HDF5 files inside of a workspace. To start the demo, simply issue the following command:

python -m iamusica_demo

The following detection settings provide good detection both loading fairly modern recordings as well as when run from a laptop, using the built-in 16kHz microphone, and at about 2m distance from the piano:

name value description
OVIF (100, 30, 100) onset+value inference frame sizes (left pad, detection, right pad)
PTHRESH 0.75 probability threshold for onset detection
MEL_OFFSET -12 audio intensity offset, in decibels

They can be run from the start as follows:

python -m iamusica_demo OVIF="[100, 30, 100]" PTHRESH=0.75 MEL_OFFSET=-12

The currently available configurations can be inspected as follows:

python -c "from iamusica_demo.__main__ import ConfDef; print(ConfDef.__doc__)"

Main components

demo frontend description

number name description
1 Toolbar Menus to manage sessions and get help
2 Logmel-Spectrogram area Displays the processed audio (x=time, y=frequency)
3 Piano roll area Displays the detections (aligned with spectrogram)
4 View settings Change the spectrogram and pianoroll appearance
5 Detection parameters Adjust how the detection input/output is processed
6 Audio player Plays the audio shown in the spectrogram
7 Mouse selection Click-and-drag on piano roll to select notes
8 Vertical grid Left double-click on piano roll to set piano grid, right double-click to remove
9 Velocity histogram Velocity histogram for current note selection
10 Selection statistics Statistics for current note selection
  • See Help->Keyboard shortcuts for convenience hotkeys to control e.g. the audio player and view settings.
  • Threshold: Set this to a higher number if the detection contains false positives, and lower if false positives.
  • Left pad, detection chunks, right pad: The detector is a "sliding window", whose total size in time frames is the sum of these 3. The actual detections are performed in consecutive blocks of size detection_chunks, but context at both left and right sides is important to avoid boundary artifacts. This is given by the left and right pads. Smaller window sizes lead to lower latency, but potentially worse detection. A configuration of (100, 30, 100) delivers good results with reasonably low latency (about 5.5 seconds, at ~42 frames per second).
  • dB gain: This can be calibrated to ensure that the velocity detection is within the desired range. The detector seems to be decently robust against this.
  • Tuning shift: "Vertical shift" of the spectrogram, in number of frequency bins. I.e. a positive number will shift the spectrogram upwards, causing the detector . This parameter is experimental.
  • Statistics: If piano grid is active when notes are selected, the statistics will also show grid alignment.

Usage examples

1. Creating a new session and loading+exploring a file

  1. Start the app, e.g. via python -m iamusica_demo OVIF="[100, 30, 100]" PTHRESH=0.75 MEL_OFFSET=-12.
  2. In the menu bar, select Edit->Create session to open a file browser, and provide a workspace location where the newly created session will be stored.
  3. At the bottom right, click on Append Audiofile and browse select a file with a piano recording. Upon confirmation, the detector should run on the file with the currently active parameters, and the spectrogram and piano roll will be populated with the file contents.
  4. To explore the file, the spectrogram and piano roll windows can be moved horizontally and stretched vertically. Horizontal zoom is possible through the Display width parameter. The audio player will also now be able to play the loaded file.
  5. Once done, close the window (session changes are automatically saved)

2. Opening a previous session and extending it with recordings

  1. Start the app as before, and open the previously created session via Edit->Open session (select the specific session folder inside the workspace). The previously loaded and processed file should reappear on the display.
  2. At the bottom right, click on Record Audio. After a while, the spectrogram and piano roll should be updated with audio being recorded in real time. Detection parameters can also be changed during recording, which is useful to tune the detector for a specific setting. Whenever done, click on Stop Recording.

3. Analizing recorded fragments

The piano roll displays the detected note intensities by color, but we can get a more precise reading by selecting notes via click-and-drag. The velocities will show up as a histogram in the bottom left corner. Furthermore, velocity statistics for the selection will also be shown to the right of the histogram.

The user can also create a vertical grid via left mouse double-click. This will create a thick vertical line at the clicked location, and a grid to both sides with the width specified by Grid width at the view settings panel. Width can be changed at any point, and the grid can be removed via right mouse double-click.

If notes are selected when the grid is present, some alignment-related statistics will also be shown at the right of the histogram.

The velocity analysis tool can be used to e.g. compare the velocity profies of different performances for the same fragment. The grid analysis can be used to e.g. quantify rubatos or assess time precision.

About

ONSETS&VELOCITIES real-time piano detection - Python demo [EUSIPCO2023]

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages