<a href="https://colab.research.google.com/github/carlosholivan/ColabNotebooksforAudio/blob/master/1.Librosa_wav_to_MIDI_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="https://www.unizar.es/sites/default/files/identidadCorporativa/imagen/logoUZ.png"  width="480">

# <a name="top"></a>WAV TO MIDI ISOLATED VOICE (07/05/2020)


Authors: José Ramón Beltrán and Carlos Hernández

Department of Electronic Engineering and Communications, Universidad de Zaragoza, Calle María de Luna 3, 50018 Zaragoza

This notebook transforms a mono wav audio file into a MIDI file with only Librosa library. The majority part of the code is based on Steve Tjoa notebooks for Music information Retrieval: [GitHub](https://github.com/stevetjoa/musicinformationretrieval.com)

## Table of Contents

- [1. Librosa Onsets Detection and Pitch Estimation](#librosa)
    - [1.1. Librosa Onsets Detection](#librosa-onsets)
    - [1.2. Librosa Pitch Estimation](#librosa-pitch)
    - [1.3. MIDI Writing](#librosa-midi)
    - [1.4. Results](#librosa-results)
    - [1.5. Conclusions and Future Work](#librosa-conclusions)


## INSTALLING DEPENDENCIES

In [0]:
#@title Install MIDIUtil
!pip install MIDIUtil

In [0]:
#@title Install Librosa

!pip install librosa

In [0]:
#@title Install Google Magenta

#@test {"output": "ignore"}
%tensorflow_version 1.x

print('Installing dependencies...')
!apt-get update -qq && apt-get install -qq libfluidsynth1 fluid-soundfont-gm build-essential libasound2-dev libjack-dev
!pip install -qU pyfluidsynth pretty_midi

!pip install -qU magenta

# Hack to allow python to pick up the newly-installed fluidsynth lib. 
# This is only needed for the hosted Colab environment.
import ctypes.util
orig_ctypes_util_find_library = ctypes.util.find_library
def proxy_find_library(lib):
  if lib == 'fluidsynth':
    return 'libfluidsynth.so.1'
  else:
    return orig_ctypes_util_find_library(lib)
ctypes.util.find_library = proxy_find_library

print('Importing libraries and defining some helper functions...')
from google.colab import files

import magenta.music as mm
import magenta
import tensorflow

print('🎉 Done!')
print(magenta.__version__)
print(tensorflow.__version__)

<img src="https://4.bp.blogspot.com/-WELZsAfX1U0/Vl7UxvJNHdI/AAAAAAAAF34/9Kl1x1y0Uv4/s1600/separador.png" style="width:500px;"/>

In [0]:
#@title Upload Audio File (wav)
import os

from google.colab import files
uploaded = files.upload()

for name, data in uploaded.items():
  with open(name, 'wb') as f:
    f.write(data)
    #os.rename(f.name, 'file.wav')
    song = f.name[:-4]

# <a name="librosa"></a>1. LIBROSA ONSETS DETECTION AND PITCH ESTIMATION

In [0]:
import seaborn
import numpy, scipy, IPython.display as ipd, matplotlib.pyplot as plt
import librosa, librosa.display
from midiutil import MIDIFile

## <a name="librosa-onsets"></a>1.1. LIBROSA ONSETS DETECTION

In [0]:
wav = song + '.wav'
x, sr = librosa.load(wav)

bins_per_octave = 36

hop_length = 250
onset_env = librosa.onset.onset_strength(x, sr=sr, hop_length=hop_length)

onset_samples = librosa.onset.onset_detect(x,
                                           sr=sr, units='samples', 
                                           hop_length=hop_length, 
                                           backtrack=False,
                                           pre_max=20,
                                           post_max=20,
                                           pre_avg=100,
                                           post_avg=100,
                                           delta=0.2,
                                           wait=0)
onset_boundaries = numpy.concatenate([[0], onset_samples, [len(x)]])
onset_times = librosa.samples_to_time(onset_boundaries, sr=sr)

## <a name="librosa-pitch"></a>1.2. LIBROSA PITCH ESTIMATION


Librosa pitch estimation is done with the autocorrelation function:

In [0]:
def estimate_freq(segment, sr, fmin=50.0, fmax=2000.0):
    
    # Compute autocorrelation of input segment.
    r = librosa.autocorrelate(segment)
    
    # Define lower and upper limits for the autocorrelation argmax.
    i_min = sr/fmax
    i_max = sr/fmin
    r[:int(i_min)] = 0
    r[int(i_max):] = 0
    
    # Find the location of the maximum autocorrelation.
    i = r.argmax()
    f0 = float(sr)/i
    return f0

In [0]:
def estimate_pitch(x, onset_samples, i, sr):
    n0 = onset_samples[i]
    n1 = onset_samples[i+1]
    f0 = estimate_freq(x[n0:n1], sr) 
    m = 12*(numpy.log2(f0) - numpy.log2(440.0)) + 69 #MIDI number
    return m

In [0]:
signal = []
for i in range(len(onset_boundaries)-1):
    y = estimate_pitch(x, onset_boundaries, i, sr=sr)
    signal.append(y)

## <a name="librosa-midi"></a>1.3. MIDI WRITING

In [0]:
"---------------------------write MIDIfile------------------------------"
# create your MIDI object
mf = MIDIFile(numTracks = 1)     # only 1 track
track = 0   # the only track

time_lib = 0    # start at the beginning
bpm = 110
mf.addTrackName(track, time_lib, "Sample Track")
mf.addTempo(track, time_lib, bpm)
ms_in_1beat = 60000/bpm

# add some notes
channel = 0
volume = 100

for i in range(len(onset_times)-1):
    pitch = int(numpy.round(signal[i]))     # C4 (middle C)
    time_lib = onset_times[i]*1000 / ms_in_1beat          # start on beat 0
    duration = (onset_times[i+1]-onset_times[i])*1000 / ms_in_1beat        
    mf.addNote(track, channel, pitch, time_lib, duration, volume)
    
# write it to disk
with open('Librosa_' + song + '.mid', 'wb') as outf:
    mf.writeFile(outf)

## <a name="librosa-results"></a>1.4. LIBROSA RESULTS

In [0]:
#General function to plot piano-roll representation + midi and wav comparation
import bokeh 
import magenta.music as mm

def midi_pianoroll(file):
    print(file)

    note_seq = mm.midi_file_to_sequence_proto(file)

    # This is a colab utility method that visualizes a NoteSequence.
    fig = mm.plot_sequence(note_seq, show_figure=False)
    bokeh.plotting.output_notebook()
    bokeh.plotting.show(fig)

    # This is a colab utility method that plays a NoteSequence.
    mm.play_sequence(note_seq,synth=mm.fluidsynth)
    return

In [0]:
#original song
print('Orininal song wav')
ipd.Audio(song + '.wav', rate=sr)

In [0]:
"-------------------MIDI-------------------------"
#MIDI file from librosa
file = 'Librosa_' + song + '.mid'
midi_pianoroll(file)

In [0]:
#@title Download Librosa MIDI File (optional)

files.download('Librosa_' + song + '.mid')

## <a name="librosa-conclusions"></a>1.5. CONCLUSIONS AND FUTURE WORK

* No silence removal -> Remove silences  
* Onsets are not accurate with some voice effects -> Try other libraries

## <a name="librosa-conclusions"></a>REFERENCES

https://musicinformationretrieval.com/pitch_transcription_exercise.html


<img src="https://4.bp.blogspot.com/-WELZsAfX1U0/Vl7UxvJNHdI/AAAAAAAAF34/9Kl1x1y0Uv4/s1600/separador.png" style="width:500px;"/>