# Environment Setting

- MusicVAE [Colab Notebook](https://colab.research.google.com/github/magenta/magenta-demos/blob/master/colab-notebooks/MusicVAE.ipynb#scrollTo=0x8YTRDwv8Gk)을 참고하여 세팅

In [1]:
import glob

BASE_DIR = "gs://download.magenta.tensorflow.org/models/music_vae/colab2"

print('Installing dependencies...')
!apt-get update -qq && apt-get install -qq libfluidsynth1 fluid-soundfont-gm build-essential libasound2-dev libjack-dev
!pip install -q pyfluidsynth
!pip install -qU magenta

# Hack to allow python to pick up the newly-installed fluidsynth lib.
# This is only needed for the hosted Colab environment.
import ctypes.util
orig_ctypes_util_find_library = ctypes.util.find_library
def proxy_find_library(lib):
  if lib == 'fluidsynth':
    return 'libfluidsynth.so.1'
  else:
    return orig_ctypes_util_find_library(lib)
ctypes.util.find_library = proxy_find_library


print('Importing libraries and defining some helper functions...')
from google.colab import files
import magenta.music as mm
from magenta.models.music_vae import configs
from magenta.models.music_vae.trained_model import TrainedModel
import numpy as np
import os
import tensorflow.compat.v1 as tf

tf.disable_v2_behavior()
tf.enable_eager_execution()

# Necessary until pyfluidsynth is updated (>1.2.5).
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

def play(note_sequence):
  mm.play_sequence(note_sequence, synth=mm.fluidsynth)

def interpolate(model, start_seq, end_seq, num_steps, max_length=32,
                assert_same_length=True, temperature=0.5,
                individual_duration=4.0):
  """Interpolates between a start and end sequence."""
  note_sequences = model.interpolate(
      start_seq, end_seq,num_steps=num_steps, length=max_length,
      temperature=temperature,
      assert_same_length=assert_same_length)

  print('Start Seq Reconstruction')
  play(note_sequences[0])
  print('End Seq Reconstruction')
  play(note_sequences[-1])
  print('Mean Sequence')
  play(note_sequences[num_steps // 2])
  print('Start -> End Interpolation')
  interp_seq = mm.sequences_lib.concatenate_sequences(
      note_sequences, [individual_duration] * len(note_sequences))
  play(interp_seq)
  mm.plot_sequence(interp_seq)
  return interp_seq if num_steps > 3 else note_sequences[num_steps // 2]

def download(note_sequence, filename):
  mm.sequence_proto_to_midi_file(note_sequence, filename)
  files.download(filename)

print('Done')

Installing dependencies...
Selecting previously unselected package fluid-soundfont-gm.
(Reading database ... 155632 files and directories currently installed.)
Preparing to unpack .../fluid-soundfont-gm_3.1-5.1_all.deb ...
Unpacking fluid-soundfont-gm (3.1-5.1) ...
Selecting previously unselected package libfluidsynth1:amd64.
Preparing to unpack .../libfluidsynth1_1.1.9-1_amd64.deb ...
Unpacking libfluidsynth1:amd64 (1.1.9-1) ...
Setting up fluid-soundfont-gm (3.1-5.1) ...
Setting up libfluidsynth1:amd64 (1.1.9-1) ...
Processing triggers for libc-bin (2.27-3ubuntu1.3) ...
/sbin/ldconfig.real: /usr/local/lib/python3.7/dist-packages/ideep4py/lib/libmkldnn.so.0 is not a symbolic link

[K     |████████████████████████████████| 1.4 MB 33.7 MB/s 
[K     |████████████████████████████████| 1.4 MB 55.4 MB/s 
[K     |████████████████████████████████| 254 kB 65.2 MB/s 
[K     |████████████████████████████████| 69 kB 8.1 MB/s 
[K     |████████████████████████████████| 210 kB 67.8 MB/s 
[K   

Import requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
  from numba.decorators import jit as optional_jit
Import of 'jit' requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
  from numba.decorators import jit as optional_jit


Instructions for updating:
non-resource variables are not supported in the long term
Done


# Preprocess MIDI to tfrecord

In [2]:
import hashlib
import os

from note_seq import abc_parser
from note_seq import midi_io
from note_seq import musicxml_reader
import tensorflow.compat.v1 as tf

## Step1. Convert MIDI to Proto Type of Sequence

In [None]:
sequence = midi_io.midi_to_sequence_proto(tf.gfile.GFile('/content/drive/MyDrive/Colab Notebooks/magenta/magenta/scripts/INPUT_DIRECTORY/drummer1/session2/100_funk-rock_92_fill_4-4.mid', 'rb').read())
play(sequence)
sequence

## Step2. Convert Sequence to TFRecord

In [None]:
with tf.io.TFRecordWriter('test_tfrecord') as writer:
    writer.write(sequence.SerializeToString())

## Step3. Convert all MIDI files in Input_Directory to Sequence_TFRecord

- Next script repeats two preprocessing above.

In [10]:
cd /content/drive/MyDrive/Colab Notebooks/magenta/magenta/scripts

/content/drive/MyDrive/Colab Notebooks/magenta/magenta/scripts


In [11]:
!python convert_dir_to_note_sequences.py \
  --input_dir=INPUT_DIRECTORY \
  --output_file=SEQUENCES_TFRECORD \
  --recursive

Import requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
  from numba.decorators import jit as optional_jit
Import of 'jit' requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
  from numba.decorators import jit as optional_jit
Instructions for updating:
non-resource variables are not supported in the long term
2022-06-04 11:56:56.704682: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
INFO:tensorflow:Converting files in 'INPUT_DIRECTORY/'.
I0604 11:56:56.707351 139935290070912 convert_dir_to_note_sequences.py:82] Converting files in 'INPUT_DIRECTORY/'.
INFO:tensorflow:0 files converted.
I0604 11:56:56.710596 139935290070912 convert_dir_to_note_sequences.py:88] 0 files c

In [12]:
raw_dataset = tf.data.TFRecordDataset('SEQUENCES_TFRECORD')
for raw_record in raw_dataset.take(5):
  print(repr(raw_record))

<tf.Tensor: shape=(), dtype=string, numpy=b'\nA/id/midi/INPUT_DIRECTORY/fc6d3d6112b4cd2662fc6cd4b136e6e34ad6dc1e\x12.drummer1/session3/1_rock-prog_125_beat_4-4.mid\x1a\x0fINPUT_DIRECTORY \xe0\x03*\x04\x10\x04\x18\x042\x00:\t\x11\x00\x00\x00\x00\x00@_@B\x18\x08,\x10H\x19\xfc\xa9\xf1\xd2Mbp?!\xe1z\x14\xaeG\xe1\xba?H\x01B\x18\x08,\x10P\x19\x11X9\xb4\xc8v\xde?!\x10X9\xb4\xc8v\xe2?H\x01B\x18\x08,\x10K\x19\xbbI\x0c\x02+\x87\xee?!L7\x89A`\xe5\xf0?H\x01B\x18\x08(\x10n\x197\x89A`\xe5\xd0\xf4?!<\xdfO\x8d\x97n\xf6?H\x01B\x18\x08,\x10H\x19P\x8d\x97n\x12\x83\xf6?!T\xe3\xa5\x9b\xc4 \xf8?H\x01B\x18\x08(\x10\x7f\x197\x89A`\xe5\xd0\xf6?!<\xdfO\x8d\x97n\xf8?H\x01B\x18\x08&\x10#\x19k\xbct\x93\x18\x04\xf8?!\xd9\xce\xf7S\xe3\xa5\xf9?H\x01B\x18\x08,\x10P\x19\xea&1\x08\xac\x1c\xfe?!X9\xb4\xc8v\xbe\xff?H\x01B\x18\x08$\x10u\x19{\x14\xaeG\xe1z\xfe?!@5^\xbaI\x0c\x00@H\x01B\x18\x080\x10@\x19y\xe9&1\x08\xac\xfe?!\xbf\x9f\x1a/\xdd$\x00@H\x01B\x18\x08&\x10\x18\x19\xa0\x1a/\xdd$\x06\x00@!\xd7\xa3p=\n\xd7\x00@H\x01B\x

# Train

- Hyper Parameter는 default값 사용
- 논문에서 언급한 50k ~ 100k 사이, 50k epoch 학습
- 실제 학습결과 약 33k epoch 부터 loss에 큰 변화 없음
- Colab 런타임 유형 GPU 기준 약 5시간 소요
- Default Path의 tmp는 런타임 초기화 후 ckpt 초기화 <BR>
  -> Google Drive `weights/modelname`로 Path 변경

In [6]:
cd /content/drive/MyDrive/Colab Notebooks/magenta/magenta/models/music_vae

/content/drive/MyDrive/Colab Notebooks/magenta/magenta/models/music_vae


# Solution1: config.py 수정

- `config.py` 수정 <BR>
  `CONFIG_MAP[cat-drums_4bar_big]` 추가 (`cat-drums_2bar_big` 참고) <BR>
  `max_seq_len=64` (4 bars w/ 16 steps per bar) 수정, <BR>
  `slice_bars=4` 수정 <BR>
  과제요구사항 `4마디(4 bar) 샘플` 추출에 근거
<BR><BR>
- `train option`: `--config=cat-drums_4bar_big` 사용 <BR>
- `train option`: `--tfds_name = 4bar-midionly` 사용 <BR>
  과제요구사항 `Groove MIDI Dataset` 사용에 근거
<BR><BR>
- `config.py`를 수정하여 학습을 시도했으나 실패 <BR>
  ValueError: Invalid config: cat-drums_4bar_big 발생
- 시간관계상 빠르게 다른 방법 `groovae_4bar` 시도

In [7]:
!python music_vae_train.py \
--config=cat-drums_4bar_big \
--run_dir=./weights/drums4bar \
--mode=train \
--tfds_name=groove/4bar-midionly \
--chekpoint_to_keep 100

Import requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
  from numba.decorators import jit as optional_jit
Import of 'jit' requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
  from numba.decorators import jit as optional_jit
Instructions for updating:
non-resource variables are not supported in the long term
Traceback (most recent call last):
  File "music_vae_train.py", line 339, in <module>
    console_entry_point()
  File "music_vae_train.py", line 335, in console_entry_point
    tf.app.run(main)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/platform/app.py", line 36, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 312, in run
    _ru

# Solution2: preprocess_tfrecord.py 수정

- `preprocess_tfrecord.py` 수정 <BR>
  `flags.DEFINE_bool`의 `'is_drum', True`로 수정 <BR>
  `flags.DEFINE_bool`의 `'drums_only', True`로 수정 <BR>
  과제요구사항 `드럼 샘플`에 근거
<BR><BR>
- `train option`: `--config = groovae_4bar` 사용 <BR>
  과제요구사항 `4마디(4 bar) 샘플` 추출에 근거
- `train option`: `--tfds_name = 4bar-midionly` 사용 <BR>
  과제요구사항 `Groove MIDI Dataset` 사용에 근거


In [9]:
!python music_vae_train.py \
--config=groovae_4bar \
--run_dir=./weights/groovae4bar \
--mode=train \
--tfds_name=groove/4bar-midionly \
--chekpoint_to_keep 100

Import requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
  from numba.decorators import jit as optional_jit
Import of 'jit' requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
  from numba.decorators import jit as optional_jit
Instructions for updating:
non-resource variables are not supported in the long term
2022-06-05 04:11:33.052619: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
INFO:tensorflow:Building MusicVAE model with BidirectionalLstmEncoder, GrooveLstmDecoder, and hparams:
{'max_seq_len': 64, 'z_size': 256, 'free_bits': 48, 'max_beta': 0.2, 'beta_rate': 0.0, 'batch_size': 512, 'grad_clip': 1.0, 'clip_m

# Load Trained Model

- MusicVAE에서 아래코드에 의해 Load된 Model은 Ont-Hot Encoded 되었다고 기술

## Load Pretrained Model

- MusicVAE Pretrained Model `groovae_4bar.tar` [download](https://storage.googleapis.com/magentadata/models/music_vae/checkpoints/groovae_4bar.tar)
- `groove_4bar.tar`는 2721 Epoch 학습한 `model.ckpt-2721` 포함
- 압축된 모델을 `weights/groovae4bar/train`에 압축해제

In [32]:
# Load One-hot Encoded Pretrained Model
drums_models = {}
drums_config = configs.CONFIG_MAP['groovae_4bar']
drums_models['groovae_4bar'] = TrainedModel(drums_config, batch_size=4, checkpoint_dir_or_path='./weights/groovae4bar/train/model.ckpt-2721')

INFO:tensorflow:Building MusicVAE model with BidirectionalLstmEncoder, GrooveLstmDecoder, and hparams:
{'max_seq_len': 64, 'z_size': 256, 'free_bits': 48, 'max_beta': 0.2, 'beta_rate': 0.0, 'batch_size': 4, 'grad_clip': 1.0, 'clip_mode': 'global_norm', 'grad_norm_clip_to_zero': 10000, 'learning_rate': 0.001, 'decay_rate': 0.9999, 'min_learning_rate': 1e-05, 'conditional': True, 'dec_rnn_size': [256, 256], 'enc_rnn_size': [512], 'dropout_keep_prob': 0.3, 'sampling_schedule': 'constant', 'sampling_rate': 0.0, 'use_cudnn': False, 'residual_encoder': False, 'residual_decoder': False, 'control_preprocessing_rnn_size': [256]}
INFO:tensorflow:
Encoder Cells (bidirectional):
  units: [512]

INFO:tensorflow:
Decoder Cells:
  units: [256, 256]



  name=name),
  return layer.apply(inputs)
  self._names["W"], [input_size + self._num_units, self._num_units * 4])
  initializer=tf.constant_initializer(0.0))


INFO:tensorflow:Restoring parameters from ./weights/groovae4bar/train/model.ckpt-2721


  kernel_initializer=tf.random_normal_initializer(stddev=0.001))
  kernel_initializer=tf.random_normal_initializer(stddev=0.001))


## Load Trained Model

- Load and Try 35k epoch Trained Model.
- Load and Try 40k epoch Trained Model.
- Load and Try 45k epoch Trained Model.
- Load and Try 50k epoch Trained Model.

In [26]:
# Load One-hot Encoded Trained Model
drums_models = {}
drums_config = configs.CONFIG_MAP['groovae_4bar']
# drums_models['groovae_4bar'] = TrainedModel(drums_config, batch_size=4, checkpoint_dir_or_path='./weights/groovae4bar/train/model.ckpt-35026')
# drums_models['groovae_4bar'] = TrainedModel(drums_config, batch_size=4, checkpoint_dir_or_path='./weights/groovae4bar/train/model.ckpt-40106')
# drums_models['groovae_4bar'] = TrainedModel(drums_config, batch_size=4, checkpoint_dir_or_path='./weights/groovae4bar/train/model.ckpt-45182')
drums_models['groovae_4bar'] = TrainedModel(drums_config, batch_size=4, checkpoint_dir_or_path='./weights/groovae4bar/train/model.ckpt-50058')

INFO:tensorflow:Building MusicVAE model with BidirectionalLstmEncoder, GrooveLstmDecoder, and hparams:
{'max_seq_len': 64, 'z_size': 256, 'free_bits': 48, 'max_beta': 0.2, 'beta_rate': 0.0, 'batch_size': 4, 'grad_clip': 1.0, 'clip_mode': 'global_norm', 'grad_norm_clip_to_zero': 10000, 'learning_rate': 0.001, 'decay_rate': 0.9999, 'min_learning_rate': 1e-05, 'conditional': True, 'dec_rnn_size': [256, 256], 'enc_rnn_size': [512], 'dropout_keep_prob': 0.3, 'sampling_schedule': 'constant', 'sampling_rate': 0.0, 'use_cudnn': False, 'residual_encoder': False, 'residual_decoder': False, 'control_preprocessing_rnn_size': [256]}
INFO:tensorflow:
Encoder Cells (bidirectional):
  units: [512]

INFO:tensorflow:
Decoder Cells:
  units: [256, 256]



  name=name),
  return layer.apply(inputs)
  self._names["W"], [input_size + self._num_units, self._num_units * 4])
  initializer=tf.constant_initializer(0.0))


INFO:tensorflow:Restoring parameters from ./weights/groovae4bar/train/model.ckpt-45182


  kernel_initializer=tf.random_normal_initializer(stddev=0.001))
  kernel_initializer=tf.random_normal_initializer(stddev=0.001))


# Generate Samples

## 위에서 Load한 모델 `drums_samples`의 상위 4개 Sample을 Generate

- 1 Bar = 16 Steps
- 4 Bar = 4 * 16 Steps (`length` = 64)
- Generate 4 Samples (`n` = 4)
- Generated Samples은 `generated_sample`에 저장

In [33]:
temperature = 0.5
drums_sample_model = 'groovae_4bar'
drums_samples = drums_models[drums_sample_model].sample(n=4, length=64, temperature=temperature)
for ns in drums_samples:
  play(ns)

In [34]:
# Generated Sample Data Sequence
drums_samples[0]

tempos {
  qpm: 120.0
}
notes {
  pitch: 51
  velocity: 62
  start_time: 0.2523536297958344
  end_time: 0.3773536297958344
  instrument: 9
  is_drum: true
}
notes {
  pitch: 36
  velocity: 44
  start_time: 0.36968966154381633
  end_time: 0.49468966154381633
  instrument: 9
  is_drum: true
}
notes {
  pitch: 38
  velocity: 100
  start_time: 0.49327755346894264
  end_time: 0.6182775534689426
  instrument: 9
  is_drum: true
}
notes {
  pitch: 51
  velocity: 107
  start_time: 0.49771563499234617
  end_time: 0.6227156349923462
  instrument: 9
  is_drum: true
}
notes {
  pitch: 51
  velocity: 84
  start_time: 0.7533439798280597
  end_time: 0.8783439798280597
  instrument: 9
  is_drum: true
}
notes {
  pitch: 38
  velocity: 95
  start_time: 0.8773430874571204
  end_time: 1.0023430874571204
  instrument: 9
  is_drum: true
}
notes {
  pitch: 51
  velocity: 80
  start_time: 1.00253982283175
  end_time: 1.12753982283175
  instrument: 9
  is_drum: true
}
notes {
  pitch: 38
  velocity: 92
  start_

## Optional Download Generated MIDI Samples.

- Save Generated Samples as midi file.
- `File Name` = 'model name_sample_index.mid'

In [35]:
# Optionally Download Generated MIDI Samples.
for idx, ns in enumerate(drums_samples):
  download(ns, f'{drums_sample_model}_sample_{idx}.mid')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

# Conclusion

- Generated Samples with Trained Model 비교
- Generated Samples with Pretrained Model 비교
- Pretrained, Trained 모두 성공적으로 드럼 사운드만을 추출
- Trained Epochs 가 증가할수록 복잡하고 다양한 패턴의 드럼 샘플 생성
- Pretrained보다 Trained 모델이 직관적으로 더 복잡하고 다양한 패턴의 드럼 샘플 생성 <BR>
  -> 논문에서 언급한대로 학습한 Epoch가 높기 때문으로 추측