# MusicVAE for Drum

Groove MIDI Dataset을 학습하여 4마디의 드럼 연주 샘플을 만드는 과정을 보이는 예시입니다.  
이 노트북은 Colab환경에서 작성되었습니다.  

## DATASET
[Groove MIDI Dataset](https://magenta.tensorflow.org/datasets/groove)  
- TF DATASET에 존재하는 데이터이므로 TFDS모듈을 통해서 사용했습니다.  

## Model
[MusicVAE](https://arxiv.org/pdf/1803.05428.pdf) 논문의 모델을 구현한  
[Magenta](https://github.com/magenta/magenta/tree/master/magenta/models/music_vae)의 MUSIC_VAE를 활용하여 구성했습니다.  

colab환경에서 큰 모델을 구성하여 학습시키기에는 무리가 있다고 생각해 작은 모델을 구현하였으나, 상황에 따라 바꿀 수 있게끔 작성하였습니다.   

### Encoder
논문에서와 같이 **2개**의 **BidirectionalLSTM** 레이어를 사용하였고, 크기는 **512**입니다.  

latent 차원은 **256**을 지정했습니다.  

논문에서는 2048 size의 BidirectionalLSTM 레이어 2개와 512차원의 latent vector를 사용했습니다.  

### Decoder
논문의 핵심이 **계층적 디코더**를 활용하여 시퀀스의 길이가 긴 구조에 대해서 VAE모델을 만드는 것이라 생각해 4마디의 샘플을 만드는 모델이지만 계층적 LSTM을 활용해서 구현했습니다.  

디코더 역시 **2개**의 **CategoricalLstmDecoder** 레이어를 사용하였고 그 크기는 **256**입니다.  

논문에서는 1024크기의 디코더를 사용했습니다.  

### Others
DrumsConverter를 사용하였고,  

다른 파라미터는 `music_vae.configs`에 있는 다른 모델들을 참고하였습니다.  



## 환경 구성
모델과 학습에 필요한 magenta 라이브러리와 음악 재생에 필요한 모듈들을 다운로드 합니다.  


In [1]:
!apt-get update -qq && apt-get install -qq libfluidsynth1 fluid-soundfont-gm build-essential libasound2-dev libjack-dev
!pip install -q pyfluidsynth
!git clone https://github.com/tensorflow/magenta.git


Selecting previously unselected package fluid-soundfont-gm.
(Reading database ... 155632 files and directories currently installed.)
Preparing to unpack .../fluid-soundfont-gm_3.1-5.1_all.deb ...
Unpacking fluid-soundfont-gm (3.1-5.1) ...
Selecting previously unselected package libfluidsynth1:amd64.
Preparing to unpack .../libfluidsynth1_1.1.9-1_amd64.deb ...
Unpacking libfluidsynth1:amd64 (1.1.9-1) ...
Setting up fluid-soundfont-gm (3.1-5.1) ...
Setting up libfluidsynth1:amd64 (1.1.9-1) ...
Processing triggers for libc-bin (2.27-3ubuntu1.3) ...
/sbin/ldconfig.real: /usr/local/lib/python3.7/dist-packages/ideep4py/lib/libmkldnn.so.0 is not a symbolic link

Cloning into 'magenta'...
remote: Enumerating objects: 15848, done.[K
remote: Total 15848 (delta 0), reused 0 (delta 0), pack-reused 15848[K
Receiving objects: 100% (15848/15848), 36.37 MiB | 25.49 MiB/s, done.
Resolving deltas: 100% (12054/12054), done.
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wh

In [2]:
cd /content/magenta

/content/magenta


In [3]:
!pip install -e .

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Obtaining file:///content/magenta
Collecting dm-sonnet
  Downloading dm_sonnet-2.0.0-py3-none-any.whl (254 kB)
[K     |████████████████████████████████| 254 kB 22.8 MB/s 
Collecting librosa<0.8.0,>=0.6.2
  Downloading librosa-0.7.2.tar.gz (1.6 MB)
[K     |████████████████████████████████| 1.6 MB 45.5 MB/s 
Collecting mido==1.2.6
  Downloading mido-1.2.6-py2.py3-none-any.whl (69 kB)
[K     |████████████████████████████████| 69 kB 8.6 MB/s 
[?25hCollecting mir_eval>=0.4
  Downloading mir_eval-0.7.tar.gz (90 kB)
[K     |████████████████████████████████| 90 kB 10.8 MB/s 
[?25hCollecting note-seq
  Downloading note_seq-0.0.3-py3-none-any.whl (210 kB)
[K     |████████████████████████████████| 210 kB 42.9 MB/s 
[?25hCollecting numba<0.50
  Downloading numba-0.49.1-cp37-cp37m-manylinux2014_x86_64.whl (3.6 MB)
[K     |████████████████████████████████| 3.6 MB 9.5 MB/s 
Collecting pygtrie>=

In [4]:
from google.colab import files
import os
import warnings
import magenta.music as mm
from magenta.models.music_vae import configs
from magenta.models.music_vae.trained_model import TrainedModel

warnings.filterwarnings("ignore", category=DeprecationWarning)

Import requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
  from numba.decorators import jit as optional_jit
Import of 'jit' requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
  from numba.decorators import jit as optional_jit


## 모델 생성
magenta에서는 console을 통한 학습을 권장하는데, 모델에 대한 정보를 `music_vae.config`에서 
불러와 학습합니다.  
그러나 config를 추가하는 함수를 찾지 못해서 config.py파일에 직접 config를 추가하는 방법을 사용했습니다.  

In [5]:
config_to_add = """
CONFIG_MAP['hierdec-drums_4bar'] = Config(
    model=MusicVAE(
        lstm_models.BidirectionalLstmEncoder(),
        lstm_models.HierarchicalLstmDecoder(
            lstm_models.CategoricalLstmDecoder(),
            level_lengths=[16, 4],
            disable_autoregression=True)),
    hparams=merge_hparams(
        lstm_models.get_default_hparams(),
        HParams(
            batch_size=512,
            max_seq_len=64,
            z_size=256,
            enc_rnn_size=[512, 512],
            dec_rnn_size=[256, 256],
            free_bits=48,
            max_beta=0.2,
            sampling_schedule='inverse_sigmoid',
            sampling_rate=1000
        )),
    note_sequence_augmenter=None,
    data_converter=data.DrumsConverter(
        max_bars=100,
        slice_bars=4,
        steps_per_quarter=4,
        roll_input=True,
    ),
    train_examples_path=None,
    eval_examples_path=None,
)
"""

In [6]:
config_file = '/content/magenta/magenta/models/music_vae/configs.py'

with open(config_file, 'a') as f:
    f.write(config_to_add)

## 학습
학습 스크립트를 통해 학습합니다.  
실습환경이므로 학습은 10회만 진행하였습니다.  

In [7]:
!python3 /content/magenta/magenta/models/music_vae/music_vae_train.py \
 --config=hierdec-drums_4bar \
 --run_dir=/content/checkpoints/drum_4bar \
 --num_steps=10 \
 --mode=train \
 --tfds_name=groove/full-midionly \

Import requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
  from numba.decorators import jit as optional_jit
Import of 'jit' requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
  from numba.decorators import jit as optional_jit
Instructions for updating:
non-resource variables are not supported in the long term
2022-06-05 13:37:22.522250: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
INFO:tensorflow:Building MusicVAE model with BidirectionalLstmEncoder, HierarchicalLstmDecoder, and hparams:
{'max_seq_len': 64, 'z_size': 256, 'free_bits': 48, 'max_beta': 0.2, 'beta_rate': 0.0, 'batch_size': 512, 'grad_clip': 1.0, 'clip_mode': 'global_norm', 'grad_norm_clip_to_zero': 100

## 샘플 생성 후 재생
학습한 모델을 이용해 4마디 드럼 연주 샘플을 생성하고, 재생, 다운로드합니다.  

### 모델 불러오기

In [8]:
os.chdir('/content/magenta/magenta/models/music_vae/')

In [9]:
import configs
def play(note_sequence):
  mm.play_sequence(note_sequence, synth=mm.fluidsynth)

def download(note_sequence, filename):
  mm.sequence_proto_to_midi_file(note_sequence, filename)
  files.download(filename)

print("Initializing Music VAE...")

config = 'hierdec-drums_4bar'
model_path = '/content/checkpoints/drum_4bar/train/model.ckpt-10'
num_music = 4

music_vae = TrainedModel(
      configs.CONFIG_MAP[config], 
      batch_size=num_music, 
      checkpoint_dir_or_path=model_path)

print('🎉 Done!')

Initializing Music VAE...
INFO:tensorflow:Building MusicVAE model with BidirectionalLstmEncoder, HierarchicalLstmDecoder, and hparams:
{'max_seq_len': 64, 'z_size': 256, 'free_bits': 48, 'max_beta': 0.2, 'beta_rate': 0.0, 'batch_size': 4, 'grad_clip': 1.0, 'clip_mode': 'global_norm', 'grad_norm_clip_to_zero': 10000, 'learning_rate': 0.001, 'decay_rate': 0.9999, 'min_learning_rate': 1e-05, 'conditional': True, 'dec_rnn_size': [256, 256], 'enc_rnn_size': [512, 512], 'dropout_keep_prob': 1.0, 'sampling_schedule': 'inverse_sigmoid', 'sampling_rate': 1000, 'use_cudnn': False, 'residual_encoder': False, 'residual_decoder': False, 'control_preprocessing_rnn_size': [256]}
INFO:tensorflow:
Encoder Cells (bidirectional):
  units: [512, 512]

INFO:tensorflow:
Hierarchical Decoder:
  input length: 64
  level output lengths: [16, 4]

INFO:tensorflow:
Decoder Cells:
  units: [256, 256]

Instructions for updating:
Use `tf.cast` instead.


  name=name),
  return layer.apply(inputs)
  self._names["W"], [input_size + self._num_units, self._num_units * 4])
  initializer=tf.constant_initializer(0.0))


Instructions for updating:
Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API
Instructions for updating:
Please use `keras.layers.RNN(cell)`, which is equivalent to this API
Instructions for updating:
Do not call `graph_parents`.
INFO:tensorflow:Restoring parameters from /content/checkpoints/drum_4bar/train/model.ckpt-10


  kernel_initializer=tf.random_normal_initializer(stddev=0.001))
  kernel_initializer=tf.random_normal_initializer(stddev=0.001))


🎉 Done!


### 재생
샘플을 생성하고 재생합니다.  
temperature가 높을수록 랜덤한 연주가 나옵니다.  

In [12]:
temperature = 0.5 #@param {type:"slider", min:0.1, max:1.5, step:0.1}
drums_samples = music_vae.sample(n=4, length=64, temperature=temperature)
for ns in drums_samples:
  play(ns)

### 파일로 저장

In [11]:
for i, ns in enumerate(drums_samples):
  download(ns, '%s_sample_%d.mid' % (music_vae, i))

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>