Copyright 2020 Carlos Hernández Oliván.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/calosholivan/AIMusicGeneration/colab_notebooks/SourcesSeparationComparation_and_WavToMidi_v2.ipynb)

![picture](https://www.clubexcelencia.org/sites/default/files/fotos/images/noticias/visionarios/universidadzaragoza01.jpg)

# WAV TO MIDI


Author: Carlos Hernández [Github](https://github.com/carlosholivan)

Department of Electronic Engineering and Communications, Universidad de Zaragoza, Calle María de Luna 3, 50018 Zaragoza

This notebook provides a comparation between 2 sources separation libraries, demucs from facebook and spleeter by deezer. 

WARNING: Uploaded file **must** be a .mp3 file. In spite of the extension change in the code, all the files obtained will be .mp3 quality.

INSTRUCTIONS:

Run every cell in the notebook **in order**. Cells with "(optional)" are not compulsory to run them.


![picture](https://drive.google.com/uc?id=17qBD02Te8vg04v7z4QWZydZchTQ8S8vr)

## Table of contents

1. [Sources Separation](#sources-separation)
  * [DEMUCS by Facebook](#demucs-by-facebook)
  * [SPLEETER by Deezer](#spleeter-by-deezer)
  * [SPLEETER vs DEMUCS](#spleeter-vs-demucs)
2. [Onset Detection](#onset-detection)  
  * [MAGENTA ONSET AND FRAMES - Polyphonic piano music transcription](#magenta-onsets)

In [0]:
#@title Upload Audio File (mp3 or wav)

from google.colab import files
uploaded = files.upload()


Saving Clouds_20By_20Nagra_20Beats.mp3 to Clouds_20By_20Nagra_20Beats.mp3


In [0]:
#@title Rewrite file name
import os

for name, data in uploaded.items():
  with open(name, 'wb') as f:
    f.write(data)
    os.rename(f.name, 'file.mp3')
    print ('saved file with name:', name)

saved file with name: Clouds_20By_20Nagra_20Beats.mp3


## <a name="sources-separation"></a>SOURCES SEPARATION 

![picture](https://www.researchgate.net/profile/Thanh_Duong11/publication/335339440/figure/fig2/AS:794976699047936@1566548623211/Audio-source-separation.ppm)

### <a name="demucs-by-facebook"></a>DEMUCS by Facebook

It separates 4 sources: DRUMS, BASS, VOCALS and "OTHERS"

Dataset : MusDB [[Website]](https://sigsep.github.io/datasets/musdb.html#musdb18-compressed-stems)

Architecture: encoder/decoder compoased of a convolutional encoder, a bidirectional LSTM and a convolutional decoder. The encoder and decoder are linked with skip U-Net connections.

![picture](https://techdroy.com/wp-content/uploads/2019/12/demucs-inteligencia-artificial-separa-canciones-pistas-scaled.png.webp)

[[Github]](https://github.com/facebookresearch/demucs) [[Paper]](https://hal.archives-ouvertes.fr/hal-02379796/document)

In [0]:
#@title Anaconda Download

!wget -c https://repo.continuum.io/archive/Anaconda3-5.1.0-Linux-x86_64.sh

In [0]:
#@title Permises

!chmod +x Anaconda3-5.1.0-Linux-x86_64.sh

In [0]:
#@title Anaconda Path

!bash ./Anaconda3-5.1.0-Linux-x86_64.sh -b -f -p /usr/local

In [0]:
#@title Demucs Download
import sys
sys.path.append('/usr/local/lib/python3.7/site-packages/')

!git clone https://github.com/facebookresearch/demucs

In [0]:
#@title Go to Demucs Path

os.chdir('demucs')

In [0]:
#@title Conda Update

!conda update -n base conda

In [0]:
#@title Demucs Model Download
!conda env update -f environment-cpu.yml

In [0]:
#@title Conda init

!conda init bash

In [0]:
#@title Demucs Activation
!conda activate demucs


In [0]:
#@title Install Pytorch
!pip install torch numpy scipy

In [0]:
#@title Copy File in Demucs Directory
!cp '/content/file.mp3' demucs/

In [0]:
#@title Go to Demucs Directory
!cd demucs

In [0]:
#@title Demucs Separation
!pip install tqdm
!python3 -m pip install -U lameenc #necesary to expot as .mp3
!python3 -m demucs.separate --dl -n demucs --mp3 -d cpu '/content/file.mp3' 

In [0]:
#@title Create Directories
!mkdir /output/
!mkdir /output/demucs/

In [0]:
#@title Move Files
!mv /content/demucs/separated/demucs/ /output/demucs/

### <a name="spleeter-by-deezer"></a>SPLEETER by Deezer


![picture](https://lh3.googleusercontent.com/proxy/CNqC--BTKG0sl83gZTbWPk3xNs_JfgVkUn8uKX87n11gppZA6Fk00Ki5nvPxxnxvx9kvNaQ5P84hmxOcKAx1UhbAfxRKmmZmynocYwCSUXl_BXZ0SXUVLPnj_DRqCINEzqZ8zmlbsS_rwl2vtWQ)

Pre-trained models:
* Vocals (singing voice) / accompaniment separation (2 stems)
* Vocals / drums / bass / other separation (4 stems)
* Vocals / drums / bass / piano / other separation (5 stems)

[[Github]](https://github.com/deezer/spleeter) [[GoogleColab]](https://colab.research.google.com/github/deezer/spleeter/blob/master/spleeter.ipynb)

Papers: 

* Andreas Jansson, Eric J. Humphrey, Nicola Montecchio, Rachel Bittner, Aparna Kumar, and Tillman Weyde. Singingvoice  separation  with  deep  u-net  convolutional  networks.  InProceedings of the International Society for MusicInformation Retrieval Conference (ISMIR), pages 323–332, 2017. [[Link]](https://openaccess.city.ac.uk/id/eprint/19289/1/)

![picture](https://d3i71xaburhd42.cloudfront.net/83ea11b45cba0fc7ee5d60f608edae9c1443861d/3-Figure1-1.png)

Other references to see: 

* MIMILAKIS, Stylianos I.; DROSSOS, Konstantinos; SCHULLER, Gerald. Unsupervised Interpretable Representation Learning for Singing Voice Separation. arXiv preprint arXiv:2003.01567, 2020. [[Link]](https://arxiv.org/pdf/2003.01567.pdf)

In [0]:
#@title ffmpeg installation

!apt install ffmpeg

In [0]:
#@title Spleeter Installation

!pip install spleeter

In [0]:
#@title Spleeter options (optional)
!spleeter separate -h

In [0]:
#@title Create Spleeter Directory

!mkdir /output/spleeter

In [0]:
#@title Spleeter Separation
!spleeter separate -i '/content/file.mp3' -p spleeter:4stems -o '/output/spleeter/' -c mp3

Once we have the audio file separated we just show the audio samples to listen to them

In [0]:
#@title Path settings

demucs_drums = '/output/demucs/demucs/file/drums.mp3'
spleeter_drums = '/output/spleeter/file/drums.mp3'
drums = [demucs_drums, spleeter_drums]

demucs_bass = '/output/demucs/demucs/file/bass.mp3'
spleeter_bass = '/output/spleeter/file/bass.mp3'
bass = [demucs_bass, spleeter_bass]

demucs_vocals = '/output/demucs/demucs/file/vocals.mp3'
spleeter_vocals = '/output/spleeter/file/vocals.mp3'
vocals = [demucs_vocals, spleeter_vocals]

demucs_other = '/output/demucs/demucs/file/other.mp3'
spleeter_other = '/output/spleeter/file/other.mp3'
other = [demucs_other, spleeter_other]

In [0]:
#@title Play Song

import IPython.display as ipd

ipd.display(ipd.Audio('/content/file.mp3'))


In [0]:
#@title Drums Audios
 
print('demucs drums')
ipd.display(ipd.Audio(demucs_drums))
print('spleeter drums')
ipd.display(ipd.Audio(spleeter_drums))


In [0]:
#@title Bass Audios
print('demucs bass')
ipd.display(ipd.Audio(demucs_bass))
print('spleeter bass')
ipd.display(ipd.Audio(spleeter_bass))

In [0]:
#@title Vocals Audios

print('demucs vocals')
ipd.display(ipd.Audio(demucs_vocals))
print('spleeter vocals')
ipd.display(ipd.Audio(spleeter_vocals))

In [0]:
#@title Other Audios

print('demucs other')
ipd.display(ipd.Audio(demucs_other))
print('spleeter other')
ipd.display(ipd.Audio(spleeter_other))

In [0]:
#@title Download Files - zip (optional)

!zip -r output.zip /output/

from google.colab import files
files.download("output.zip")

### REFERENCES

* Audio Source Separation (Signals and Communication Technology) (English Edition) 1st ed. 2018 Edition

![+picture](https://webstockreview.net/images/decorative-clipart-end-line-1.png)

Copyright 2020 Google LLC.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

## <a name="onset-detection"></a>ONSET DETECTION

Now, once we have the sources separated we will obtain the midis associated to each source. 

Note: As the source separation is not perfect, there will be noise which will make te midis not so accurate.

![+picture](https://www.researchgate.net/profile/Gilberto_Bernardes2/publication/299446066/figure/fig1/AS:613999402033152@1523400273114/Amplitude-and-pitch-detection-functions-for-audio-onset-detection-Vertical-lines.png)

### <a name="magenta-onsets"></a>MAGENTA ONSET AND FRAMES - Polyphonic piano music transcription

Convert raw recordings of solo piano performances into MIDI files.

[[Website]](https://magenta.tensorflow.org/onsets-frames) [[GoogleColab]](https://colab.research.google.com/notebooks/magenta/onsets_frames_transcription/onsets_frames_transcription.ipynb) [[Github]](https://github.com/tensorflow/magenta/tree/master/magenta/models/onsets_frames_transcription)

![picture](https://magenta.tensorflow.org/assets/onsets_frames/networkstack9.png)

Onsets an Frames from Magenta work with .wav files so we convert the .mp3 files to wav

In [0]:
#@title Change file .mp3 to .wav

#Demucs
os.rename(r'/output/demucs/demucs/file/drums.mp3',r'/output/demucs/demucs/file/drums.wav')
os.rename(r'/output/demucs/demucs/file/bass.mp3',r'/output/demucs/demucs/file/bass.wav')
os.rename(r'/output/demucs/demucs/file/vocals.mp3',r'/output/demucs/demucs/file/vocals.wav')
os.rename(r'/output/demucs/demucs/file/other.mp3',r'/output/demucs/demucs/file/other.wav')

#Spleeter
os.rename(r'/output/spleeter/file/drums.mp3',r'/output/spleeter/file/drums.wav')
os.rename(r'/output/spleeter/file/bass.mp3',r'/output/spleeter/file/bass.wav')
os.rename(r'/output/spleeter/file/vocals.mp3',r'/output/spleeter/file/vocals.wav')
os.rename(r'/output/spleeter/file/other.mp3',r'/output/spleeter/file/other.wav')

We have to obtain a midi file from each separated source, so we will have drums.wav, drums.wav.midi,...

In [0]:
#@title Setup Environment

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import glob

print('Copying checkpoint from GCS...')
!rm -r /content/onsets-frames
!mkdir /content/onsets-frames
!gsutil -q -m cp -R gs://magentadata/models/onsets_frames_transcription/* /content/onsets-frames/
!unzip -o /content/onsets-frames/checkpoint.zip -d /content/onsets-frames
CHECKPOINT_DIR = '/content/onsets-frames/train'
  
print('Installing dependencies...')
!apt-get update -qq && apt-get install -qq libfluidsynth1 fluid-soundfont-gm build-essential libasound2-dev libjack-dev ffmpeg  
!pip install pyfluidsynth pretty_midi

if glob.glob('/content/onsets-frames/magenta*.whl'):
  !pip install -q /content/onsets-frames/magenta*.whl
else:
  !pip install -q magenta

# Hack to allow python to pick up the newly-installed fluidsynth lib.
import ctypes.util

orig_find_library = ctypes.util.find_library
def proxy_find_library(lib):
  if lib == 'fluidsynth':
    return 'libfluidsynth.so.1'
  else:
    return orig_find_library(lib)

ctypes.util.find_library = proxy_find_library

In [0]:
#@title Initialize Model
import tensorflow.compat.v1 as tf
import librosa
import numpy as np

from google.colab import files

from magenta.common import tf_utils
from magenta.music import audio_io
import magenta.music as mm
from magenta.models.onsets_frames_transcription import audio_label_data_utils
from magenta.models.onsets_frames_transcription import configs
from magenta.models.onsets_frames_transcription import constants
from magenta.models.onsets_frames_transcription import data
from magenta.models.onsets_frames_transcription import infer_util
from magenta.models.onsets_frames_transcription import train_util
from magenta.music import midi_io
from magenta.music.protobuf import music_pb2
from magenta.music import sequences_lib

In [0]:
#@title Convert SPLEETER DRUMS to midi with E-GMD checkpoint (this model is only for drums transcription)

!MODEL_DIR=/content/onsets-frames/train
!onsets_frames_transcription_transcribe \
  --model_dir="${CHECKPOINT_DIR}" \
  --config="drums" \
  /output/spleeter/file/drums.wav

In [0]:
#@title Convert SPLEETER BASS to midi with MAESTRO checkpoint

!MODEL_DIR=MAESTRO_CHECKPOINT_DIR
!onsets_frames_transcription_transcribe \
  --model_dir="${MODEL_DIR}" \
  --hparams=use_cudnn=false \
  output/spleeter/file/bass.wav

In [0]:
#@title Convert SPLEETER VOCALS to midi with MAESTRO checkpoint

!MODEL_DIR="/content/onsets_frames_transcription/checkpoints/MAESTRO"
!onsets_frames_transcription_transcribe \
  --model_dir="${MODEL_DIR}" \
  --hparams=use_cudnn=false \
  output/spleeter/file/vocals.wav

In [0]:
#@title Convert SPLEETER OTHER to midi with MAESTRO checkpoint

!MODEL_DIR="/content/onsets_frames_transcription/checkpoints/MAESTRO"
!onsets_frames_transcription_transcribe \
  --model_dir="${MODEL_DIR}" \
  --hparams=use_cudnn=false \
  output/spleeter/file/other.wav

In [0]:
#@title General function to plot piano-roll representation + midi and wav comparation
import bokeh 
import magenta.music as mm

def compare_wav_and_midi(separated_souce, spleeter_wav, demucs_wav):
    
    print("================================SPLEETER=========================================")
    file_spleeter = 'output/spleeter/file/' + separated_source + ".wav.midi"

    note_seq_spleeter = mm.midi_file_to_sequence_proto(file_spleeter)

    # This is a colab utility method that visualizes a NoteSequence.
    fig = mm.plot_sequence(note_seq_spleeter, show_figure=False)
    fig.plot_width = 8000
    fig.plot_height = 200
    bokeh.plotting.output_notebook()
    bokeh.plotting.show(fig)

    # This is a colab utility method that plays a NoteSequence.
    print('spleeter', separated_souce ,'midi')
    mm.play_sequence(note_seq_spleeter,synth=mm.fluidsynth)

    print('spleeter', separated_souce ,'wav')
    ipd.display(ipd.Audio(spleeter_wav))

    
    print("=====================================DEMUCS=======================================")
    file_demucs = 'output/spleeter/file/' + separated_source + ".wav.midi"

    note_seq_demucs = mm.midi_file_to_sequence_proto(file_demucs)

    # This is a colab utility method that visualizes a NoteSequence.
    fig = mm.plot_sequence(note_seq_demucs, show_figure=False)
    fig.plot_width = 8000
    fig.plot_height = 200
    bokeh.plotting.output_notebook()
    bokeh.plotting.show(fig)

    # This is a colab utility method that plays a NoteSequence.
    print('demucs', separated_souce ,'midi')
    mm.play_sequence(note_seq_demucs,synth=mm.fluidsynth)

    print('demucs', separated_souce ,'wav')
    ipd.display(ipd.Audio(demucs_wav))
    
    return

In [0]:
#@title "Drums" midi vs audio | spleeter vs demucs

separated_source = "drums"
spleeter_wav = spleeter_drums
demucs_wav = demucs_drums
compare_wav_and_midi(separated_source, spleeter_wav, demucs_wav)

In [0]:
#@title "Bass" midi vs audio | spleeter vs demucs

separated_source = "bass"
spleeter_wav = spleeter_bass
demucs_wav = demucs_bass
compare_wav_and_midi(separated_source, spleeter_wav, demucs_wav)

In [0]:
#@title "Vocals" midi vs audio | spleeter vs demucs

separated_source = "vocals"
spleeter_wav = spleeter_vocals
demucs_wav = demucs_vocals
compare_wav_and_midi(separated_source, spleeter_wav, demucs_wav)

In [0]:
#@title "Other" midi vs audio | spleeter vs demucs

separated_source = "other"
spleeter_wav = spleeter_other
demucs_wav = demucs_other
compare_wav_and_midi(separated_source, spleeter_wav, demucs_wav)

...................................