Skip to content

Repository for "Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval Systems: a Survey"

Notifications You must be signed in to change notification settings

dinhviettoanle/survey-music-nlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 

Repository files navigation

Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval Systems: a Survey

arXiv preprint

This repository contains the tables accompanying the paper "Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval Systems: a Survey" by Dinh-Viet-Toan Le, Louis Bigo, Mikaela Keller and Dorien Herremans.

Representations

Event-based tokenization

Elementary tokens

Tokenization Score-based / Performance-based Alphabet Grouping Vocab. size Data
ABC notation Score Text alphabet Bar patching N/A Monophonic
MIDI-like (2018) Performance
<Note-ON> (MIDI value)
<Note-OFF> (MIDI value)
<Time-shift> (absolute time) <Velocity> (integer)
BPE, BPE Unigram 388 Piano
LakhNES (2019) Performance
<Note-ON-[Trk]> (MIDI value)
<Note-OFF-[Trk]> (MIDI value)
<Time-shift> (absolute time)
- 630 Multi-track
REMI (2020) Score
<Pitch> (MIDI value)
<Duration> (music time)
<Velocity> (integer)
<Bar>
<Position> (music time)
<Chord> (class)
BPE, BPE, BPE Unigram 332 Piano
REMI+ (2022) Score
REMI alphabet + features:
<Instrument> (class)
<Time-Signature> (class)
<Tempo> (integer)
- N/A Multi-track
Lee & al. (2022) (ComMU) Score
REMI alphabet + metadata:
<BPM> (integer)
<Key> (class)
<Instrument> (class)
<Time-Signature> (class)
<Pitch-range> (class)
<Number-of-measures> (number)
<Min-velocity> (integer)
<Max-velocity> (integer)
<Rhythm> (class)
- 728 Multi-track
MusIAC (2022) Score
REMI alphabet + control info:
<Tensile-train> (class)
<Cloud diameter> (class)
<Density> (class)
<Polyphony> (class)
<Occupation> (class)
- 360 Multi-track
Gover & al. (2022) Score
<Pitch> (MIDI value)
<Duration> (music time)
<Position> (music time)
<Bar>
<Hand> (class)
- N/A Piano
Wu & Yang (2023) (MuseMorphose) Score
<Pitch-[Trk]> (MIDI value)
<Duration-[Trk]> (music time)
<Velocity-[Trk]> (integer)
<Bar>
<Position> (music time)
<Tempo> (integer)
- 3440 Multi-track
MultiTrack (2020) (MMM) Performance
<Start-piece>
<Start-track>/<End-track>
<Start-bar>/<End-bar>
<Start-fill><End-fill>
<Note-ON> (MIDI value)
<Note-OFF> (MIDI value)
<Time-shift> (absolute time)
<Instrument> (class)
<Density level> (integer)
- 440 Multi-track
MMR (2022) (SymphonyNet) Score
<Start-score>/<End-score>
<Start-bar>/<End-bar>
<Chord> (class)
<Change-track>
<Position> (integer)
<Pitch> (MIDI value)
<Duration> (music time)
BPE N/A Multi-track
TSD (2023) Performance
<Pitch> (MIDI value)
<Velocity> (integer)
<Duration> (absolute time)
<Time-shift> (absolute time)
<Rest> (absolute time)
<Program> (class)
BPE 249 Multi-track
Structured (2021) Performance
<Pitch> (MIDI value)
<Velocity> (integer)
<Duration> (absolute time)
<Time-shift> (absolute time)
- 428 Piano
Chen & al. (2020) Score (tabs)
<Pitch> (MIDI value)
<Duration> (music time)
<Velocity> (integer)
<Position> (music time)
<Bar> (integer)
<String> (integer)
<Fret> (integer)
<Technique> (class)
<Grooving> (class)
- 231 Guitar
Li & al. (2023) Score
<Pitch-class> (class)
<Octave> (integer)
<Duration> (music time)
<Bar> (integer)
<Position> (music time)
<Velocity> (integer)
- N/A Monophonic
DadaGP (2021) Score (tabs)
<Start><End>
<Instrument:note> (class)
<String> (integer)
<Fret> (integer)
<Drums:note> (MIDI value)
<Effect> (class)
<Wait> (integer)
BPE Unigram 2140 Guitar

Composite tokens

Tokenization Musical features Embedded object Data
Luo & al. (2020) (MG-VAE)
<Pitch> (class)
<Interval> (number)
<Rhythm> (class)
3-long vector Monophonic
Zhang (2020)
<Program> (class)
<Pitch> (integer)
<Velocity> (integer)
3-long vector + Time-shift Multi-track
PiRhDy (2020)
<Chroma> (class)
<Octave> (integer)
<Inter-onset-interval> (music time)
<Note-state> (class)
<Velocity> (integer)
5-long vector Multi-track
Zixun & al. (2021)
<Pitch> (one-hot)
<Duration> (one-hot)
<Current-chord> (one-hot)
<Next-chord> (one-hot)
<Bar> (one-hot)
246-long vector Lead sheet
Octuple (2021) (MusicBERT)
<Time-signature> (class)
<Tempo> (integer)
<Bar> (integer)
<Position> (music time)
<Instrument> (class)
<Pitch> (MIDI value)
<Duration> (music time)
<Velocity> (integer)
8-long vector Multi-track
Dong & al. (2023) (MMT)
<Type> (class)
<Beat> (integer)
<Position> (music time)
<Pitch> (MIDI value)
<Duration> (music time)
<Instrument> (class)
6-long vector Multi-track
Dalmazzo & al. (2023) (Chordinator)
<Chord-root> (class)
<Chord-nature> (class)
<Chord-extensions> (class)
<MIDI-array> (multi-hot)
<Slash-chord> (boolean)
8-long vector Chord sequences
Wang & al. (2021) (MuseBERT)
<Onset> (music time)
<Pitch> (MIDI value)
<Duration> (music time)
+ factorized properties
Matrices of factorized attributes and relations Multi-track
MuMIDI (2020)
<Bar>
<Position> (music time)
<Tempo> (integer)
<Track> (class)
<Chord> (class)
<Pitch> (MIDI value)
<Drum> (MIDI value)
<Velocity> (integer)
<Duration> (music time)
Note / Event grouping Multi-track
Compound Word (2021)
<Family> (class)
<Time-signature> (class)
<Bar> (integer)
<Beat> (music time)
<Chord> (class)
<Tempo> (integer)
<Pitch> (MIDI value)
<Duration> (music time)
<Velocity> (integer)
Note / Event grouping Piano
Di & al. (2021)
<Type> (class)
<Bar/beat> (integer)
<Density> (class)
<Strenth> (class)
<Instrument> (integer)
<Pitch> (MIDI value)
<Duration> (music time)
Note / Event grouping Multi-track
Makris & al. (2022)
Encoder input:
<Onset> (number)
<Group> (class)
<Type> (class)
<Duration> (music time or none)
<Value> (any - depends on type)
Decoder output:
<Onset> (number)
<Drums> (integer)
Note / Event grouping Encoder: Multi-track
Decoder: Drums

Models

Recurrent models

RNN

Model Recurrent unit Architecture Data Representation Tasks
RNN-RBM (2012) Vanilla RNN RBM + RNN Multi-track Time-slice (piano roll) Free generation
RNN-DBN (2014) Vanilla RNN RBM + DBN + RNN Multi-track Time-slice (piano roll) Free generation

LSTM

Model Recurrent unit Architecture Data Representation Tasks
Folk-RNN (2016)
Code
LSTM LSTM Monophonic ABC notation Free generation
C-RNN-GAN (2016)
Code
LSTM GAN + Bi-LSTM Multi-track Pitch + duration + time-shift + velocity (composite tokens) Free generation
Song from Pi (2016) LSTM Hierarchical + LSTM Multi-track Custom features (composite tokens) Free generation (melody, chord, drum generation)
Melody / Attention-RNN (2016)
Code
LSTM LSTM (+ Attention) Monophonic Note-ON / Note-OFF Priming
DeepBach (2017)
Code
LSTM Bi-LSTM 4-part chorales Time-slice-based Harmonization
Free generation
Anticipation-RNN (2017)
Code
LSTM LSTM Monophonic Pitch + duration (time-slice-based) Infilling
JamBot (2017)
Code
LSTM LSTM Multi-track Time-slice (piano roll) Chord generation
Chord-conditioned generation
Note-RNN / RL Tuner (2017)
Code
LSTM LSTM (+ Reinforcement Learning) Monophonic Note-ON / Note-OFF Free generation
PerformanceRNN (2018)
Code
LSTM LSTM Piano MIDI-like Expressive performance generation
Chen & al. (2018)
Code
LSTM Bi-LSTM Piano Time-slice (piano roll) Roman Numeral Analysis
StructureNet (2018) LSTM LSTM Monophonic Custom features (composite tokens) Free generation
Music-VAE (2018)
Code
LSTM VAE + LSTM Monophonic MIDI-like Samples interpolation
Free generation
JazzGAN (2018) LSTM GAN + LSTM Lead sheet Pitch + duration + chord (event-based) Chord-conditioned generation
DeepJ (2018)
Code
LSTM Biaxial LSTM Piano Time-slice (piano roll) Free generation
Style embedding analysis
Chen & al. (2019) LSTM Bi-LSTM Lead sheet Time-slice (piano roll) Chord-conditioned generation
Makris & al. (2019) LSTM LSTM (Drums) / Feed-forward (Context) Multi-track Drums: event-based
Context: time-slice
Drums accompaniment generation
MahlerNet (2019)
Code
LSTM VAE + Bi-LSTM Multi-track Event-based Samples interpolation
GrooVAE (2019)
Code
LSTM VAE + Bi-LSTM Drums Time-slice (drumroll) Drum Infilling
Tap2Drum
Humanization
Wu & al. (2019) LSTM Hierarchical + Bi-LSTM Monophonic Note-ON / Note-OFF Structure-conditioned generation
VirtuosoNet (2019)
Code
LSTM Hierarchical + VAE + Bi-LSTM + Attention Piano Custom features (composite tokens) Expressive performance generation
Amadeus (2019) LSTM Hierarchical + Reinforcement Learning Piano Pitch + duration (event-based) Free generation
MuseAE (2020)
Code
LSTM Adversarial Auto-encoder + LSTM Multi-track Time-slice (piano roll) Samples interpolation
Embedding analysis
Jin & al. (2020) LSTM LSTM + Reinforcement Learning Multi-track Time-slice (piano roll) Free generation
GGA-MG (2020)
Code
LSTM Bi-LSTM + Genetic Algorithm Monophonic ABC notation Free generation
Yu & al. (2021)
Code
LSTM GAN + LSTM Monophonic Pitch + duration (event-based) Lyrics-conditioned generation
CM-HRNN (2021)
Code
LSTM Hierarchical + LSTM Lead sheet Pitch + duration + chord + bar (composite tokens) Chord-conditioned generation
Keerti & al. (2022) LSTM Bi-LSTM + Attention Monophonic Pitch + duration (event-based) Sequence reconstruction
LStoM (2022)
Code
LSTM Bi-LSTM Multi-track Custom features (event-based) Melody extraction
Turker & al. (2022) LSTM VAE + LSTM Piano Note-ON / Note-OFF Sequence reconstruction
Latent space analysis

GRU

Model Recurrent unit Architecture Data Representation Tasks
MIDI-VAE (2018)
Code
GRU VAE + GRU Multi-track Time-slice (piano roll) Style transfer
Samples interpolation
XiaoIce Band (2018) GRU GRU + Attention Multi-track Pitch + duration + chord (event-based) Chord-conditioned generation
Arrangement generation
Songwriter (2019) GRU GRU + Attention Monophonic Pitch + duration (event-based) Lyrics-conditioned generation
Yang & al. (2019)
Code
GRU VAE + bi-GRU Lead sheet Time-slice (piano roll) + chords (chromagram) Melody contour-conditioned generation
Chord-conditioned generation
BUTTER (2020)
Code
GRU VAE + GRU Monophonic Time-slice (piano roll) Text-based query
Music captioning
Text-conditioned generation
Kong & al. (2020)
Code
GRU Bi-GRU Piano Time-slice (piano roll) Composer classification
MG-VAE (2020) GRU VAE + Bi-GRU Monophonic Pitch + interval + duration (event-based) Free generation
PianoTree-VAE (2020)
Code
GRU VAE + bi-GRU Piano / Multi-track Time-slice (pianoroll / MIDI-like) Samples interpolation
Free generation
Embedding analysis
Su & al. (2022) GRU Bi-GRU + CNN + Attention Monophonic Pitch + duration (time-slice-based) Free generation

Attention-based models

End-to-end models

Transformer decoder-only architecture
Model Base model MIR mechanism Data Representation Tasks
Music Transformer (2018)
Code
Transformer decoder Relative attention Piano / Choral MIDI-like Priming
Harmonization
Chen & al. (2020) Transformer-XL - Guitar tabs REMI-derived (tabs) Free tabs generation
Pop Music Transformer (2020)
Code
Transformer-XL - Piano REMI Priming
Free generation
Jazz Transformer (2020)
Code
Transformer-XL - Lead sheet REMI-derived (chords) Free generation
PopMAG (2020) Transformer-XL - Multi-track MuMIDI Accompaniment generation
Wu & al. (2020) Transformer-XL - Piano MIDI-like-derived (composite tokens) Free generation
Di & al. (2020)
Code
Transformer decoder - Multi-track Compound-word-derived (rhythm family) Video-to-music
Chang & al. (2021)
Code
XLNet Relative bar encoding Piano Compound Word Infilling
Compound Word Transformer (2021)
Code
Linear Transformer decoder - Piano Compound Word Priming
Free generation
Sarmento & al. (2021)
Code
Transformer-XL - Guitar tabs + multi-track DadaGP Metadata-conditioned generation
Sulun & al. (2022)
Code
Music Transformer - Multi-track MIDI-like Emotion-conditioned generation
ComMU (2022)
Code
Transformer-XL - Multi-track REMI + metadata Metadata-conditioned generation
Multi-track combination
SymphonyNet (2022)
Code
Linear Transformer 3-D positional encoding Orchestral MMR Chord-conditioned generation
Priming
Free generation
Li & al. (2023) Transformer-XL - Lead sheet REMI-derived (pitch class) Free generation
Multitrack Music Transformer (2023)
Code
Transformer decoder - Orchestral MMT Free generation
Instrument-conditioned generation
Priming
GTR-CTRL (2023) Transformer-XL - Guitar tabs + multi-track DadaGP Instrument-conditioned generation
Genre-conditioned generation
ShredGP (2023) Transformer-XL - Guitar tabs DadaGP Style-conditioned generation
Choir Transformer (2023)
Code
Transformer decoder Relative attention 4-part chorales Chord + pitch (event-based) Harmonization
Guo & al. (2023)
Code
Transformer decoder with custom attention Fundamental music embedding
RIPO attention
Monophonic FME Priming
Compose & Embellish (2023)
Code
Transformer decoder - Multi-track REMI Lead sheet priming
Accompaniment refinement
RHEPP-Transformer (2023)
Code
Transformer decoder - Piano Octuple Expressive performance generation
Angioni & al. (2023)
Code
Transformer decoder - Multi-track TSD-like Style classification
Chordinator (2023)
Code
minGPT (no pre-training) - Chords Custom chord features (+ MIDI array) Chord generation
MMT-I/-G/-GI (2023) Transformer decoder - Multi-track REMI+ (+ genre, instrument) Genre-conditioned generation
Instrument-conditioned generation
Agarwal & al. (2024) Transformer decoder Structure-informed Positional Encoding Multi-track Pianoroll time-slices Free generation
Accompaniment generation
Transformer encoder-only architecture
Model Base model MIR mechanism Data Representation Tasks
MTBert (2023) BERT (no-pre-training) - 4-part chorales Interval + duration (event-based) Fugue form analysis
Transformer encoder-decoder architecture
Model Base model MIR mechanism Data Representation Tasks
Transformer-VAE (2020) Transformer encoder-decoder - Monophonic Pitch + duration (time-slice-based) Priming
Harmony Transformer (2021)
Code
Transformer encoder-decoder - Piano Pianoroll time-slices Roman Numeral Analysis
Makris & al. (2021)
Code
Transformer encoder-decoder - Lead sheet Encoder: bar features / Decoder: chord + pitch + duration Emotion-conditioned generation
Liutkus & al. (2021)
Code
Performer Stochastic positional encoding Multi-track REMI / MIDI-like-derived (multi-track) Free generation
Groove continuation
Gover & al. (2022) BART - Piano REMI-derived (hands token) Arrangement generation
Museformer (2022)
Code
Transformer encoder-decoder with custom attention Fine-/coarse-grained attention
Bar selection
Multi-track REMI Free generation
Theme Transformer (2022)
Code
Transformer encoder-decoder Theme-aligned positional encoding Multi-track REMI-derived (theme tokens) Theme-conditioned generation
FIGARO (2022)
Code
Transformer encoder-decoder - Multi-track REMI+ Controllable generation
MuseMorphose (2023)
Code
Transformer encoder + Transformer-XL In-attention conditioning Piano REMI-derived (multi-track) Style transfer
Controllable generation
Accomontage 3 (2023)
Code
Transformer encoder-decoder Instrument embedding Multi-track Pianoroll time-slices Accompaniment generation
TeleMelody (2023)
Code
Transformer encoder-decoder - Monophonic Bar + position + pitch + duration (event-based) Lyrics-to-melody
MuseCoco (2023)
Code
Text2Attr: BERT
Attr2Music: Linear Transformer
- Multi-track REMI Text-to-MIDI
Model combinations
Model Base model MIR mechanism Data Representation Tasks
Zhang (2020) Generator: Transformer decoder
Discriminator: Transformer encoder
- Multi-track MIDI-like-derived (composite tokens) Free generation
Transformer-GAN (2021)
Code
Generator: Transformer-XL
Discriminator: BERT
- Piano MIDI-like Free generation
Dai & al. (2021) Generator: Transformer encoder
Discriminator: LSTM
- Multi-track Pitch + rhythm (event-based) Structure-conditioned generation
Chord conditioned generation
Choi & al. (2021)
Code
Chord encoder: Bi-LSTM
Rhythm decoder: Transformer decoder
Pitch decoder: Transformer decoder
- Lead sheet Pitch + rhythm + chord (time-slice-based) Chord-conditioned generation
Makris & al. (2022)
Code
Bi-LSTM encoder / Transformer encoder - Multi-track Compound-word-derived Drums accompaniment generation
Neves & al. (2022)
Code
Generator: Linear Transformer
Discriminator: Linear Transformer
Local prediction map Piano REMI Emotion-conditioned generation
Q&A (2023)
Code
PianoTree-VAE
Transformer decoder
Instrument embedding Multi-track Piano roll time-slices Accompaniment generation
Duan & al. (2023) Generator: Transformer encoder
Discriminator: LSTM
- Monophonic Pitch + duration + rest (event-based) Lyrics-to-melody
Video2Music (2023)
Code
GRU + Transformer encoder-decoder - Multi-track MIDI-like Video-to-music

Pre-trained models

Transformer encoder-only architecture
Model Base model MIR mechanism Data Representation Tasks
MuseBERT (2021)
Code
BERT Generalized relative positional encoding Multi-track MuseBERT representation Controllable generation
Chord analysis
Accompaniment refinement
MidiBERT-Piano (2021)
Code
BERT - Piano REMI / Compound Word Melody extraction
Velocity prediction
Composer classification
Emotion classification
MusicBERT (2021)
Code
RoBERTa Bar-level masking Multi-track Octuple Melody completion
Accompaniment suggestion
Genre classification
Style classification
DBTMPE (2021) Transformer encoder - Multi-track Pitch combinations + durations (event-based) Style classification
MRBERT (2023) BERT Melody/rhythm cross-attention Lead sheet Pitch + duration (event-based) Free generation
Infilling
Chord analysis
SoloGPBERT (2023) BERT - Guitar tabs DadaGP Guitar player classification
Shen & al. (2023) MidiBERT-Piano Pre-training tasks
(quad-attribute masking / key prediction)
Multi-track Compound Word simplified Melody extraction
Velocity prediction
Composer classification
Emotion classification
CLaMP (2023)
Code
Text encoder: DistilRoBERTa
Music encoder: BERT
- Lead sheet ABC notation-derived Text-based semantic music search
Music recommandation
Music classification
Transformer decoder-only architecture
Model Base model MIR mechanism Data Representation Tasks
LakhNES (2019)
Code
Transformer-XL - Multi-track MIDI-like Free generation
Musenet (2019) GPT-2 Timing embedding / Structural embedding Multi-track MIDI-like Priming
MMM (2020)
Code
GPT-2 - Multi-track MultiTrack representation Free generation
Priming
Inpainting
Controllable generation
Angioni & al. (2023)
Code
GPT-2 - Multi-track TSD-like Priming
Zhang & al. (2023)
Code
GPT-3 - Drums Drumroll time-slices Priming
Bubeck & al. (2023) GPT-4 - Text / Mono-track ABC notation Text-to-ABC
ComposerX (2024)
*Code
GPT-4 - Text / Multi-track ABC notation Text-to-ABC
MuPT (2023) Transformer decoder - Multi-track ABC notation-derived Free generation
Transformer encoder-decoder architecture
Model Base model MIR mechanism Data Representation Tasks
MusIAC (2022)
Code
Transformer encoder-decoder - Multi-track REMI Infilling
Controllable generation
Li & al. (2023) Transformer encoder-decoder - Lead sheet Pitch + duration (event-based) Harmony analysis
Chord generation
Fu & al. (2023) MusicBERT + Music Transformer - Multi-track Octuple Melody completion
Accompaniment suggestion
Genre classification
Style classification
Multi-MMLG (2023) XLNet + MuseBERT - Multi-track Compound-word-derived Melody extraction
Comparative studies
Model Base model MIR mechanism Data Representation Tasks
Ferreira & al. (2023)
Code
GRU / Performance-RNN / GPT-2 / Music Transformer / MuseNet - Piano MIDI-like Free generation
Wu & al. (2023)
Code
BERT / GPT-2 / BART - Lead sheet ABC notation Text-to-ABC

Cite

If you find this useful, please cite our paper.

@misc{le2024surveymirnlp,
    title={Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval: a Survey}, 
    author={Dinh-Viet-Toan Le and Louis Bigo and Mikaela Keller and Dorien Herremans},
    year={2024},
    eprint={2402.17467},
    archivePrefix={arXiv},
    primaryClass={cs.IR}
}

About

Repository for "Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval Systems: a Survey"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages