# Semi-supervised CNMF from scratch

In this repo have we provided the opportunity to download the templates and activations that we computed on MAPS using the proposed CNMF semi-supervised technique. However, you may want to retrain the templates to double check, or perform training and testing on our own data. To that end, we provide here the complete toolchain to perform both steps.

# Training the templates

First, you need to find the path to your training data.

In [7]:
# these variables are actually useless.
#path_data = "C:/Users/amarmore/Desktop/Audio samples/MAPS" # an example
path_data = "D:/Travail/Travail/Toolbox/Data/Audio_Prozik/Maps/MAPS" # an example

We have setup a script for learning templates from isolated notes, called ``launch_W_learning.py``, which main purpose is parsing for calling the actual CNMF script ``learning_W_and_persist()`` located in ``script_igrida_learn.py``.

To use with the **MAPS dataset**, you only need to call ``launch_W_learning.py`` with a few arguments:
- Piano name, e.g. "ENSTDkCl"
- The convolution size $\tau$. Setting "all" will run for $\tau=5,10,20$.
- The training data path

Feel free to tweak this script to change the inputs so as to control other parameters such as the number of iterations.

By default, ``learning_W_and_persist()`` will write its output in ../data_persisted/xxx where xxx is formatted based on the name of the piano, the number of iterations, the STFT type, the note intensity and the values of $\beta$ (divergence) and $\tau$.

To use with a **different dataset** than MAPS, you should probably write your own parser to call ``learning_W_and_persist()`` properly, similar to our ``launch_W_learning.py``. You should not change anything in ``learning_W_and_persist()`` unless you know what you are doing. Also, you should put the MIDI code of each isolated note in its name (last series of digits in the name) for the parser to find it.

*Note:* If you have an error regarding the folder ../data_persisted/tmp_W not existing, you probably need to create it manually.

In [9]:
# Make sure you are in the right directory (./code/)
# For instance to run training with piano AkPnCGdD with \tau=5. For now we hape to copy paste the path path_data --> how to solve this?
%run launch_W_learning.py "AkPnCGdD" 5 "D:/Travail/Travail/Toolbox/Data/Audio_Prozik/Maps/MAPS"

D:/Travail/Travail/Toolbox/Data/Audio_Prozik/Maps/MAPS
MIDI:  100
time:37.1307430267334
MIDI:  103
time:29.668962478637695
MIDI:  104
time:45.54955339431763
MIDI:  105
time:33.28725004196167
MIDI:  106
time:38.11964225769043
MIDI:  21
time:57.23000884056091
MIDI:  22
time:50.316720485687256
MIDI:  24
time:48.983516216278076
MIDI:  25
time:53.39903473854065
MIDI:  26
time:64.10714221000671
MIDI:  27


KeyboardInterrupt: 

# Performing Transcription with the templates (testing)

Thanks to the training scripts, you should have computed the templates for all your individual notes. There should be 88 of these although any number of template works in practice.

Now we perform transcription. Again you should ready the path of your test data.

In [10]:
# Again this variable is useless
path_test = "D:/Travail/Travail/Toolbox/Data/Audio_Prozik/Maps/MAPS" # an example

Again if you are using the MAPS dataset, we have already setup a parser you can use off the shelf, ``launch_H_learning.py``. It will properly call a function ``semi_supervised_transcribe_cnmf()`` located in ``script_igrida_transcribe.py`` which does the transcription heavy lifting. You can call ``launch_H_learning.py`` with 4 arguments:
- The name of the piano you use templates from
- The name of the piano you transcribe
- The convolution size \tau (default is 10)
- The path to the MAPS database (Or any set of songs to transcribe)

By default, the activations (H matrices) are written in a local folder ./data_persisted/activations/xxx where xxx follows the same conventions as for the learning stage.

To transcribe your own songs, you should rewrite a small script calling ``semi_supervised_transcribe_cnmf()`` similar to ours.

*Note*: if you already have computed a few activations, they will not be re-computed. Therefore to redo all computations you should erase or store the previous ones in a different location than ./data_persisted/activations/.

*Note 2*: if you are using windows, you may encounter a bug because our file names are too long. A fix is described [here](https://helpdeskgeek.com/how-to/how-to-fix-filename-is-too-long-issue-in-windows/)

In [11]:
# Make sure you are in the right directory (./code/)
# For instance to run training with piano AkPnCGdD with \tau=5. For now we hape to copy paste the path path_data --> how to solve this?
%run launch_H_learning.py "AkPnCGdD" "AkPnCGdD" 10 "D:/Travail/Travail/Toolbox/Data/Audio_Prozik/Maps/MAPS"

Piano templates learned on: AkPnCGdD
processing piano song: MAPS_MUS-alb_esp2_AkPnCGdD
Found in loads.
processing piano song: MAPS_MUS-alb_esp3_AkPnCGdD
Found in loads.
processing piano song: MAPS_MUS-bk_xmas5_AkPnCGdD
Found in loads.
processing piano song: MAPS_MUS-bor_ps1_AkPnCGdD
Found in loads.
processing piano song: MAPS_MUS-br_im2_AkPnCGdD
Found in loads.
processing piano song: MAPS_MUS-br_im5_AkPnCGdD
Found in loads.
processing piano song: MAPS_MUS-chpn-p11-format0_AkPnCGdD
Found in loads.
processing piano song: MAPS_MUS-chpn-p11_AkPnCGdD
Found in loads.
processing piano song: MAPS_MUS-chpn-p13_AkPnCGdD
Found in loads.
processing piano song: MAPS_MUS-chpn-p14_AkPnCGdD
Found in loads.
processing piano song: MAPS_MUS-chpn-p15_AkPnCGdD
Found in loads.
processing piano song: MAPS_MUS-chpn-p20_AkPnCGdD
Found in loads.
processing piano song: MAPS_MUS-chpn-p7_AkPnCGdD
Found in loads.
processing piano song: MAPS_MUS-chpn_op10_e01_AkPnCGdD
Found in loads.
processing piano song: MAPS_MUS-

To perform Transcription, please then refer to the notebook "Supplementary material 1"