# Audio2Midi Transcription for Piano
***

ROLI has been developing a modular digital keyboard with RGB luminations to guide the users to learn and play with their favorite music. The interaction is designed so that the LUMI APP provides extensive tutorial vidoes and more generally song contents with meta musical information that could be displayed in real-time on the light up keyboards, which provides visual guidance and gamification elements to engage with learners at various experience level, as there's always more fun to learn with their favorite song track.

<img src = "./imgs/lumi.jpg" style = "width: 800px"/>

As an important element in realizing the desired user experience through LUMI and LUMI APP, machine-learning assisted music information retrieval (ML-MIR) is an active research area for me and my team at ROLI. The notebooks showcases some of basic work related to piano audio transcription. Based on some of the state of art work ([Reference 1 - An End-to-End Neural Network for Polyphonic Piano Music Transcirption](https://arxiv.org/abs/1508.01774) , [Reference 2 - Onsets and Frames: Dual-Objective Piano Transcription](https://arxiv.org/abs/1710.11153)) We've explored ...

In [2]:
import tensorflow as tf
import librosa
import numpy as np

from magenta.common import tf_utils
from magenta.music import audio_io
import magenta.music as mm
from magenta.models.onsets_frames_transcription import audio_label_data_utils
from magenta.models.onsets_frames_transcription import configs
from magenta.models.onsets_frames_transcription import constants
from magenta.models.onsets_frames_transcription import data
from magenta.models.onsets_frames_transcription import infer_util
from magenta.models.onsets_frames_transcription import train_util
from magenta.music import midi_io
from magenta.protobuf import music_pb2
from magenta.music import sequences_lib

### Dataset

The piano onset detector has been trained on MAPS (Disklavier) dataset, which consists of 17.9 hrs piano playing, which has been used as the test set while the training set is artificially generated from ROLI's equator synthesizer.

### The Model

<img src='./imgs/CRNN-Onset Stack.jpg' style = 'width:800px' />


### Training

+ Adam Optimizer for 20,000 epochs
+ Learning rate warm up for 500 epochs
+ 32 hrs on a single local GPU

In [3]:
config = configs.CONFIG_MAP['onsets_frames']
hparams = config.hparams
hparams.use_cudnn = False
hparams.batch_size = 1

examples = tf.placeholder(tf.string, [None])

dataset = data.provide_batch(
    examples=examples,
    preprocess_examples=True,
    params=hparams,
    is_training=False,
    shuffle_examples=False,
    skip_n_initial_records=0)

estimator = train_util.create_estimator(
    config.model_fn, CHECKPOINT_DIR, hparams)

iterator = dataset.make_initializable_iterator()
next_record = iterator.get_next()

Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    - tf.numpy_function maintains the semantics of the deprecated tf.py_func
    (it is not differentiable, and manipulates numpy arrays). It drops the
    stateful argument making all functions stateful.
    
Instructions for updating:
Use `tf.compat.v1.data.get_output_shapes(dataset)`.


NameError: name 'CHECKPOINT_DIR' is not defined

In [4]:
prediction_list = list(
    estimator.predict(
        input_fn,
        yield_single_examples=False))
assert len(prediction_list) == 1

frame_predictions = prediction_list[0]['frame_predictions'][0]
onset_predictions = prediction_list[0]['onset_predictions'][0]
velocity_values = prediction_list[0]['velocity_values'][0]

sequence_prediction = sequences_lib.pianoroll_to_note_sequence(
    frame_predictions,
    frames_per_second=data.hparams_frames_per_second(hparams),
    min_duration_ms=0,
    min_midi_pitch=constants.MIN_MIDI_PITCH,
    onset_predictions=onset_predictions,
    velocity_values=velocity_values)

# Ignore warnings caused by pyfluidsynth
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning) 

mm.plot_sequence(sequence_prediction)
mm.play_sequence(sequence_prediction, mm.midi_synth.fluidsynth,
                 colab_ephemeral=False)

NameError: name 'estimator' is not defined