# Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions
---

## Resources

- 📃 [Paper](https://arxiv.org/abs/2002.00212)
- 📚 [Project Page](https://ailabs.tw/human-interaction/pop-music-transformer/)
- 🎬 [Examples](https://drive.google.com/open?id=1LzPBjHPip4S0CBOLquk5CNapvXSfys54)
- 💻 [Code](https://github.com/YatingMusic/remi)

## Abstract

[Abstract](https://arxiv.org/pdf/2002.00212.pdf) — *A great number of deep learning based models have been recently proposed for automatic music composition. Among these models, the Transformer stands out as a prominent approach for generating expressive classical piano performance with a coherent structure of up to one minute. The model is powerful in that it learns abstractions of data on its own, without much human-imposed domain knowledge or constraints. In contrast with this general approach, this paper shows that Transformers can do even better for music modeling, when we improve the way a musical score is converted into the data fed to a Transformer model. In particular, we seek to impose a metrical structure in the input data, so that Transformers can be more easily aware of the beat-bar-phrase hierarchical structure in music. The new data representation maintains the flexibility of local tempo changes, and provides hurdles to control the rhythmic and harmonic structure of music. With this approach, we build a Pop Music Transformer that composes Pop piano music with better rhythmic structure than existing Transformer models.*


## Authors

Yu-Siang Huang<sup>1</sup>,
Yi-Hsuan Yang<sup>1</sup>
<br>
<sup>1</sup>*Taiwan AI Labs & Academia Sinica*<br>

## Citation

### Plain Text


```
Yu-Siang Huang and Yi-Hsuan Yang. 2020. Pop Music Transformer: Beatbased Modeling and Generation of Expressive Pop Piano Compositions. In 28th ACM International Conference on Multimedia (MM ’20), October 12–16, 2020, Seattle, WA, USA.. ACM, New York, NY, USA, 9 pages. https://doi.org/10.1145/3394171.3413671
```



### BibTex

```
@article{DBLP:journals/corr/abs-2002-00212,
  author    = {Yu{-}Siang Huang and
               Yi{-}Hsuan Yang},
  title     = {Pop Music Transformer: Generating Music with Rhythm and Harmony},
  journal   = {CoRR},
  volume    = {abs/2002.00212},
  year      = {2020},
  url       = {https://arxiv.org/abs/2002.00212},
  eprinttype = {arXiv},
  eprint    = {2002.00212},
  timestamp = {Mon, 10 Feb 2020 15:12:57 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2002-00212.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}
```



# Setup the notebook

## Install [MidiTok](https://github.com/Natooz/MidiTok)

In [None]:
!pip install miditok

## Download the [REMI](https://github.com/YatingMusic/remi) dataset

In [None]:
!gdown --fuzzy https://drive.google.com/file/d/1JUDHGrVYGyHtjkfI2vgR1xb2oU8unlI3/view

In [None]:
!unzip data.zip

## Install [fluidsynth](https://github.com/FluidSynth/fluidsynth)

In [None]:
!apt-get install fluidsynth

### Test fluidsynth with a sample from the REMI dataset

#### Use the default soundfont

In [None]:
midi_file = "/content/data/train/000.midi"
sound_font = "/usr/share/sounds/sf2/FluidR3_GM.sf2"
out_filename = "output"
out_wav = f"{out_filename}.wav"
out_mp3 = f"{out_filename}.mp3"

!fluidsynth $sound_font $midi_file -F $out_wav
!ffmpeg -i $out_wav -acodec mp3 $out_mp3 -y

from IPython.display import Audio
Audio(out_mp3)

#### Use a different soundfont

In [None]:
!wget http://ftp.osuosl.org/pub/musescore/soundfont/Sonatina_Symphonic_Orchestra_SF2.zip
!unzip Sonatina_Symphonic_Orchestra_SF2.zip

In [None]:
midi_file = "/content/data/train/000.midi"
sound_font = "Sonatina_Symphonic_Orchestra.sf2"
out_filename = "output"
out_wav = f"{out_filename}.wav"
out_mp3 = f"{out_filename}.mp3"

!fluidsynth $sound_font $midi_file -F $out_wav
!ffmpeg -i $out_wav -acodec mp3 $out_mp3 -y

from IPython.display import Audio
Audio(out_mp3)

## Setup up the `REMI` repository

In [None]:
!git clone https://github.com/1ucky40nc3/remi.git
%cd remi

### Install dependencies

In [None]:
!pip install tensorflow==2.1
!pip install -U numpy==1.18.5

### Download pretrained checkpoints

In [None]:
!gdown --fuzzy https://drive.google.com/file/d/1gxuTSkF51NP04JZgTE46Pg4KQsbHQKGo/view
!gdown --fuzzy https://drive.google.com/file/d/1nAKjaeahlzpVAX0F9wjQEG_hL4UosSbo/view

!mkdir pretrained
!unzip REMI-tempo-checkpoint.zip -d pretrained
!unzip REMI-tempo-chord-checkpoint.zip -d pretrained

# Inference

In [None]:
# @markdown Generate a .midi file
%cd /content/remi
!mkdir -p ./results

checkpoint = "./pretrained/REMI-tempo-checkpoint" # @param {type: "string"}
output_path = "./results" # @param {type: "string"}
prompt = None # @param {type: "string"}
n_target_bar = 16 # @param {type: "number"}
temperature = 1.2 # @param {type: "number"}
topk = 5 # @param {type: "number"}
seed = 42 # @param {type: "number"}

!python main.py \
    --checkpoint $checkpoint \
    --output_path $output_path \
    --seed $seed

In [None]:
# @title Play the latest generated audio
import os
import glob

if os.path.isfile(output_path):
    midi_file = output_path
else:
    midi_files = glob.glob(os.path.join(output_path, "*.midi"))
    mid_files = glob.glob(os.path.join(output_path, "*.mid"))
    midi_files.extend(mid_files)
    midi_file = sorted(midi_files)[-1]

sound_font = "/content/Sonatina_Symphonic_Orchestra.sf2"
out_filename = midi_file.split("/")[-1].split(".")[0]
out_wav = f"{out_filename}.wav"
out_mp3 = f"{out_filename}.mp3"

!fluidsynth $sound_font $midi_file -F $out_wav
!ffmpeg -i $out_wav -acodec mp3 $out_mp3 -y

from IPython.display import Audio
Audio(out_mp3)

# Training

In [None]:
%cd /content/remi

checkpoint = "./pretrained/REMI-tempo-checkpoint" # @param {type: "string"}
data_dir = "/content/data/train" # @param {type: "string"}
output_dir = "./outputs" # @param {type: "string"}
num_epochs = 200 # @param {type: "number"}
seed = 42 # @param {type: "number"}

!python finetune.py \
    --checkpoint $checkpoint \
    --data_dir $data_dir \
    --num_epochs $num_epochs \
    --seed $seed