Skip to content
"Pop Music Transformer: Generating Music with Rhythm and Harmony", arXiv 2020
Python Jupyter Notebook
Branch: master
Clone or download
Latest commit 8c05282 Feb 13, 2020
Type Name Latest commit message Commit time
Failed to load latest commit information.
result initial files Jan 31, 2020
.gitignore initial files Jan 31, 2020
LICENSE Create LICENSE Feb 1, 2020 Update Feb 5, 2020 Update Feb 13, 2020 Update Feb 5, 2020 add finetune code (test) Feb 5, 2020
midi2remi.ipynb initial files Jan 31, 2020 add finetune code (test) Feb 5, 2020 initial files Jan 31, 2020 Update Feb 13, 2020


Authors: Yu-Siang Huang, Yi-Hsuan Yang

Paper (arXiv) | Blog | Audio demo (Google Drive)

REMI, which stands for REvamped MIDI-derived events, is a new event representation we propose for converting MIDI scores into text-like discrete tokens. Compared to the MIDI-like event representation adopted in exising Transformer-based music composition models, REMI provides sequence models a metrical context for modeling the rhythmic patterns of music. Using REMI as the event representation, we train a Transformer-XL model to generate minute-long Pop piano music with expressive, coherent and clear structure of rhythm and harmony, without needing any post-processing to refine the result. The model also provides controllability of local tempo changes and chord progression.


  title={Pop music transformer: Generating music with rhythm and harmony},
  author={Huang, Yu-Siang and Yang, Yi-Hsuan},
  journal={arXiv preprint arXiv:2002.00212},

Getting Started

Install Dependencies

  • python 3.6 (recommend using Anaconda)
  • tensorflow-gpu 1.14.0 (pip install tensorflow-gpu==1.14.0)
  • miditoolkit (pip install miditoolkit)

Download Pre-trained Checkpoints

We provide two pre-trained checkpoints for generating samples.

Obtain the MIDI Data

We provide the MIDI files including local tempo changes and estimated chord. (5 MB)

  • data/train: 775 files used for training models
  • data/evaluation: 100 files (prompts) used for the continuation experiments

Generate Samples

See as an example:

from model import PopMusicTransformer
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

def main():
    # declare model
    model = PopMusicTransformer(
    # generate from scratch
    # generate continuation
    # close model

if __name__ == '__main__':

Convert MIDI to REMI

You can find out how to convert the MIDI messages into REMI events in the midi2remi.ipynb.


1. How to synthesize the audio files (e.g., mp3)?

We strongly recommend using DAW (e.g., Logic Pro) to open/play the generated MIDI files. Or, you can use FluidSynth with a SoundFont. However, it may not be able to correctly handle the tempo changes (see fluidsynth/issues/141).

2. What is the function of the inputs "temperature" and "topk"?

It is the temperature-controlled stochastic sampling methods are used for generating text from a trained language model. You can find out more details in the reference paper CTRL: 4.1 Sampling.

It is worth noting that the sampling method used for generation is very critical to the quality of the output, which is a research topic worthy of further exploration.

3. How to finetune with my personal MIDI data?

Please see issue/Training on custom MIDI corpus


The content of comes from the kimiyoung/transformer-xl repository.

You can’t perform that action at this time.