# IJCNN 2021 Demo Page
This demo page is for the paper __Revisiting Onsets and Frames Model with Additive Attention__. High resolution figures and the audio samples for the transcription results can be found here. Source code for the paper is available at https://github.com/KinWaiCheuk/IJCNN2021.github.io

In [3]:
from IPython.display import HTML
table = "<style>audio {width:200px}; td {vertical-align: middle}</style>"
HTML(table)

## Model Archetecture
Left: Onsets and Frames model with an additive attention mechanism<br/>
Right: Linear model with an additive attention mechanism

For Onsets and Frames model, the attention mechanism attends to only one of the three features: $\boldsymbol{x_{\text{spec}}}$ or $\hat{\boldsymbol{y}}_{\text{onset}}$ or $\boldsymbol{\hat{y}_{\text{feat}}}$
![](demo/model.png)

## Transcription Results
The transcription results corresponding to the four sample spectrograms above are shown here. Piano rolls generated by the model is converted to midi files, and the WAV files are rendered from the midi files using [Garritan Personal Orchestra](https://www.garritan.com/products/personal-orchestra-5/): Concert D Grand Piano

### Original Audio

Ground Truth 1: <audio src="demo/Audio/label1.wav" controls>alternative text</audio><br/>
Ground Truth 2: <audio src="demo/Audio/label2.wav" controls>alternative text</audio><br/>
Ground Truth 3: <audio src="demo/Audio/label3.wav" controls>alternative text</audio><br/>
Ground Truth 4: <audio src="demo/Audio/label4.wav" controls>alternative text</audio>

### Onsets & Frames Model
<table border="0">
 <tr>
    <td style="text-align: left"><b style="font-size:14px">w/ Everything</b></td>
    <td style="text-align: left"><b style="font-size:14px">w/o BiLSTM</b></td> 
    <td style="text-align: left"><b style="font-size:14px">w/o Inference</b></td>
    <td style="text-align: left"><b style="font-size:14px">w/o $F_{\text{onset}}$</b></td>
 </tr>
 <tr>
    <td>Spec1: <audio src="demo/Original_model/1_piano.wav" controls>alternative text</audio><br/>
Spec2: <audio src="demo/Original_model/2_piano.wav" controls>alternative text</audio><br/>
Spec3: <audio src="demo/Original_model/3_piano.wav" controls>alternative text</audio><br/>
Spec4: <audio src="demo/Original_model/4_piano.wav" controls>alternative text</audio>
    </td>
    <td><audio src="demo/No_LSTM/1_piano.wav" controls>alternative text</audio><br/>
        <audio src="demo/No_LSTM/2_piano.wav" controls>alternative text</audio><br/>
        <audio src="demo/No_LSTM/3_piano.wav" controls>alternative text</audio><br/>
        <audio src="demo/No_LSTM/4_piano.wav" controls>alternative text</audio>
    </td>     
    <td><audio src="demo/Original_model_no_inference/1_piano.wav" controls>alternative text</audio><br/>
        <audio src="demo/Original_model_no_inference/2_piano.wav" controls>alternative text</audio><br/>
        <audio src="demo/Original_model_no_inference/3_piano.wav" controls>alternative text</audio><br/>
        <audio src="demo/Original_model_no_inference/4_piano.wav" controls>alternative text</audio>
     </td>
    <td><audio src="demo/No_Onset_stack/1_piano.wav" controls>sdasd text</audio><br/>
        <audio src="demo/No_Onset_stack/2_piano.wav" controls>alternative text</audio><br/>
        <audio src="demo/No_Onset_stack/3_piano.wav" controls>alternative text</audio><br/>
        <audio src="demo/No_Onset_stack/4_piano.wav" controls>alternative text</audio>
     </td>     
 </tr>
</table>

### Onsets & Frames Model with Additive Attention
<table border="0">
 <tr>
    <td style="text-align: left"><b style="font-size:14px">w/ Everything</b></td>
    <td style="text-align: left"><b style="font-size:14px">w/o BiLSTM</b></td> 
    <td style="text-align: left"><b style="font-size:14px">w/o Inference</b></td>
    <td style="text-align: left"><b style="font-size:14px">w/o $F_{\text{onset}}$</b></td>
 </tr>
 <tr>
    <td>Spec1: <audio src="demo/OnsetsFrames_attn/1_piano.wav" controls>alternative text</audio><br/>
Spec2: <audio src="demo/OnsetsFrames_attn/2_piano.wav" controls>alternative text</audio><br/>
Spec3: <audio src="demo/OnsetsFrames_attn/3_piano.wav" controls>alternative text</audio><br/>
Spec4: <audio src="demo/OnsetsFrames_attn/4_piano.wav" controls>alternative text</audio>
    </td>
    <td><audio src="demo/No_LSTM_attn/1_piano.wav" controls>alternative text</audio><br/>
        <audio src="demo/No_LSTM_attn/2_piano.wav" controls>alternative text</audio><br/>
        <audio src="demo/No_LSTM_attn/3_piano.wav" controls>alternative text</audio><br/>
        <audio src="demo/No_LSTM_attn/4_piano.wav" controls>alternative text</audio>
    </td>     
    <td><audio src="demo/OnsetsFrames_attn_no_inference/1_piano.wav" controls>alternative text</audio><br/>
        <audio src="demo/OnsetsFrames_attn_no_inference/2_piano.wav" controls>alternative text</audio><br/>
        <audio src="demo/OnsetsFrames_attn_no_inference/3_piano.wav" controls>alternative text</audio><br/>
        <audio src="demo/OnsetsFrames_attn_no_inference/4_piano.wav" controls>alternative text</audio>
     </td>
    <td><audio src="demo/No_Onset_stack_attn/1_piano.wav" controls>alternative text</audio><br/>
    <audio src="demo/No_Onset_stack_attn/2_piano.wav" controls>alternative text</audio><br/>
    <audio src="demo/No_Onset_stack_attn/3_piano.wav" controls>alternative text</audio><br/>
    <audio src="demo/No_Onset_stack_attn/4_piano.wav" controls>alternative text</audio>
     </td>     
 </tr>
</table>

### Linear Model
<table border="0">
 <tr>
    <td style="text-align: left"><b style="font-size:14px">$D=5$ w/ inference</b></td>
    <td style="text-align: left"><b style="font-size:14px">$D=5$ w/o inference</b></td> 
    <td style="text-align: left"><b style="font-size:14px">$D=0$ w/ inference</b></td>
    <td style="text-align: left"><b style="font-size:14px">$D=0$ w/o inference</b></td>
 </tr>
 <tr>
    <td>Spec1: <audio src="demo/Linear_attn_5/1_piano.wav" controls>alternative text</audio><br/>
Spec2: <audio src="demo/Linear_attn_5/2_piano.wav" controls>alternative text</audio><br/>
Spec3: <audio src="demo/Linear_attn_5/3_piano.wav" controls>alternative text</audio><br/>
Spec4: <audio src="demo/Linear_attn_5/4_piano.wav" controls>alternative text</audio>
    </td>
    <td><audio src="demo/Linear_attn_5_no_inference/1_piano.wav" controls>alternative text</audio><br/>
        <audio src="demo/Linear_attn_5_no_inference/2_piano.wav" controls>alternative text</audio><br/>
        <audio src="demo/Linear_attn_5_no_inference/3_piano.wav" controls>alternative text</audio><br/>
        <audio src="demo/Linear_attn_5_no_inference/4_piano.wav" controls>alternative text</audio>
    </td>     
    <td><audio src="demo/Linear_D0/1_piano.wav" controls>alternative text</audio><br/>
        <audio src="demo/Linear_D0/2_piano.wav" controls>alternative text</audio><br/>
        <audio src="demo/Linear_D0/3_piano.wav" controls>alternative text</audio><br/>
        <audio src="demo/Linear_D0/4_piano.wav" controls>alternative text</audio>
     </td>
    <td><audio src="demo/Linear_D0_no_inference/1_piano.wav" controls>alternative text</audio><br/>
    <audio src="demo/Linear_D0_no_inference/2_piano.wav" controls>alternative text</audio><br/>
    <audio src="demo/Linear_D0_no_inference/3_piano.wav" controls>alternative text</audio><br/>
    <audio src="demo/Linear_D0_no_inference/4_piano.wav" controls>alternative text</audio>
     </td>     
 </tr>
</table>

## Attention Maps

### Onsets and Frames Model with Attention D=30

This is Figure 2 in the paper. Right click and view each image in full resolution in the new tab. 

![](demo/OnsetsFrames_attn/spec_map.png)
![](demo/OnsetsFrames_attn/onset_map.png)
![](demo/OnsetsFrames_attn/feat_map.png)

Row 1: Attedning on $\boldsymbol{x_{\text{spec}}} \in [0,1]^{T\times N}$<br/>
Row 2: Attedning on $\boldsymbol{\hat{y}_{\text{onset}}} \in [0,1]^{T\times 88}$<br/>
Row 3: Attedning on $\boldsymbol{\hat{y}_{\text{feat}}} \in [0,1]^{T\times 88}$

### Onsets and Frames Model with Varying Attention Size
From top row to bottom row: <b> D=60, D=30, D=20, D=5 </b>
![](demo/OnsetsFrames_attn/spec_map_D60.png) 
![](demo/OnsetsFrames_attn/spec_map.png)
![](demo/OnsetsFrames_attn/spec_map_D20.png)
![](demo/OnsetsFrames_attn/spec_map_D05.png) 

### Linear Model with Varying Attention Size
From top row to bottom row: <b> D=60, D=30, D=20, D=5 </b>
![](demo/Linear/D30.png) 
![](demo/Linear/D20.png)
![](demo/Linear/D10.png)
![](demo/Linear/D05.png)