# Jointist Demo
This page contains the audio samples for the paper __Jointist: Simultaneous Improvement of Multi-instrument Transcription and Music Source Separation via Joint Training__. 

**Subjective evaluaiton:** https://forms.gle/bXEazqjNwAgKfGch9

**Source code:** To be released upon acceptance

## Table of contents
1. [Jointist v.s. MT3](#Jointist-v.s.-MT3)
1. [Jointist Transcription Demo](#Jointist-Transcription-Examples)
    1. [Mozart K550](#A.-Mozart-Symphony-No.-40-in-G-minor-K.-550-%28Classical-Music%29)
    1. [In Bloom](#B.-In-Bloom---Nirvana-%28Rock%29)
    1. [突然好想你](#C.-突然好想你---Mayday-%28Chinese-pop%29)
    1. [Psycho](#D.-Psycho---Red-Velvet-%28K-pop%29)
    1. [Lemon](#E.-Lemon---Yonezu-Kenshi-%28J-pop%29)
    1. [夜に駆ける](#F.-夜に駆ける---YOASOBI-%28J-pop%29)
1. [Jointist Source Separation Demo](#Jointst-Source-Separation-Examples)
    1. [Track01873](#0.-Track01873-%28Slakh-Test-Set%29)
    1. [Mozart K550](#1.-Mozart-Symphony-No.-40-in-G-minor-K.-550-%28Classical-Music%29)
    1. [In Bloom](#2.-In-Bloom---Nirvana-%28Rock%29)
    1. [突然好想你](#3.-突然好想你---Mayday-%28Chinese-pop%29)
    1. [Psycho](#4.-Psycho---Red-Velvet-%28K-pop%29)
    1. [Lemon](#5.-Lemon---Yonezu-Kenshi-%28J-pop%29)
    1. [夜に駆ける](#6.-夜に駆ける---YOASOBI-%28J-pop%29)



<!-- Three models are compared here:
- ReconVAT (existing + new data): The proposed semi-supervised AMT framework based on spectrogram reconstruction [[1]](https://arxiv.org/abs/2010.09969) and VAT [[2]](https://arxiv.org/abs/1704.03976). It is trained using existing data for 4k epoches, then add the music downloaded from Youtube or ISMLP as the unlabelled data and train for another 4k epoches.
- ReconVAT (existing): The proposed semi-supervised AMT framework trained using existing data for 8k epoches.
- Baseline: A fully supervied model [[3]](https://ieeexplore.ieee.org/document/9222310) trained using existing data for 8k epoches.
 -->

In [1]:
from IPython.display import HTML
table = \
"""
<style>
audio {width:100px}; td {vertical-align: center}

.myaudio {
    controlslist: noplaybackrate;
    controls
}

/* Removes the timeline */
audio::-webkit-media-controls-timeline {
  display: none !important;
}

audio::-webkit-media-controls-timeline-container {
  display: none !important;
}


/* Removes the time stamp */
audio::-webkit-media-controls-current-time-display {
  display: none;
}
audio::-webkit-media-controls-time-remaining-display {
  display: none;
}

/* Removes mute-button */
audio::-webkit-media-controls-mute-button {
  display: none !important;
}

/* Removes volumn slider */
audio::-webkit-media-controls-volume-slider {
    display: none !important;
}
audio::-webkit-media-controls-volume-slider-container {
    display: none !important;
}

}
</style>



"""




# audio::-webkit-media-controls-mute-button { display: none !important; } 
# audio::-webkit-media-controls-volume-slider { display: none !important; }




# /* Removes play-button */
# audio::-webkit-media-controls-play-button {
#   display: none;
# }

# <script>
# import lottieWeb from 'https://cdn.skypack.dev/lottie-web';

# const playIconContainer = document.getElementById('play-icon');
# const audioPlayerContainer = document.getElementById('audio-player-container');
# const seekSlider = document.getElementById('seek-slider');
# const volumeSlider = document.getElementById('volume-slider');
# const muteIconContainer = document.getElementById('mute-icon');
# let playState = 'play';
# let muteState = 'unmute';

# const playAnimation = lottieWeb.loadAnimation({
#   container: playIconContainer,
#   path: 'https://maxst.icons8.com/vue-static/landings/animated-icons/icons/pause/pause.json',
#   renderer: 'svg',
#   loop: false,
#   autoplay: false,
#   name: "Play Animation",
# });

# const muteAnimation = lottieWeb.loadAnimation({
#     container: muteIconContainer,
#     path: 'https://maxst.icons8.com/vue-static/landings/animated-icons/icons/mute/mute.json',
#     renderer: 'svg',
#     loop: false,
#     autoplay: false,
#     name: "Mute Animation",
# });

# playAnimation.goToAndStop(14, true);

# playIconContainer.addEventListener('click', () => {
#     if(playState === 'play') {
#         playAnimation.playSegments([14, 27], true);
#         playState = 'pause';
#     } else {
#         playAnimation.playSegments([0, 14], true);
#         playState = 'play';
#     }
# });

# muteIconContainer.addEventListener('click', () => {
#     if(muteState === 'unmute') {
#         muteAnimation.playSegments([0, 15], true);
#         muteState = 'mute';
#     } else {
#         muteAnimation.playSegments([15, 25], true);
#         muteState = 'unmute';
#     }
# });

# const showRangeProgress = (rangeInput) => {
#     if(rangeInput === seekSlider) {
#       audioPlayerContainer.style.setProperty('--seek-before-width', rangeInput.value / rangeInput.max * 100 + '%');
#     } else {
#       audioPlayerContainer.style.setProperty('--volume-before-width', rangeInput.value / rangeInput.max * 100 + '%');
#     }
# }

# seekSlider.addEventListener('input', (e) => {
#     showRangeProgress(e.target);
# });
# volumeSlider.addEventListener('input', (e) => {
#     showRangeProgress(e.target);
# });
# </script>


# img {
#     transition:transform 0.25s ease;
# }

# img:click {
#     -webkit-transform:scale(1.5); /* or some other value */
#     transform:scale(2);
# }
HTML(table)

## Jointist v.s. MT3

The full audio and midi files are avaliable at: https://drive.google.com/file/d/1-F2wAALel9UUwMWZdHYZhQb3kIAfAQqL/view?usp=sharing

In general, Jointist is more robust to unseen musical instruments.

<table border="0">
    
 <tr>
    <td style="text-align: left"><b style="font-size:14px">Audio Name</b></td>
    <td style="text-align: center"><b style="font-size:14px">Beatles<br>Let_It_Be</b></td>
    <td style="text-align: center"><b style="font-size:14px">JayChou<br>chaorenbuhuifei</b></td>
    <td style="text-align: center"><b style="font-size:14px">MichaelJackson<br>BlackOrWhite</b></td>
    <td style="text-align: center"><b style="font-size:14px">Queen<br>IWantToBreakFree</b></td>
    <td style="text-align: center"><b style="font-size:14px">Radiohead<br>Karma_Police</b></td>
    <td style="text-align: center"><b style="font-size:14px">RWC<br>RM-P083s</b></td>       
 </tr>
    
 <tr>
    <td style="text-align: left"><b style="font-size:14px">Audio</b></td>
    <td style="text-align: center">><audio class="myaudio" src="audio/jointist_mt3/Beatles_Let_It_Be_audio_10s.mp3" controls>alternative text</audio><br/>
    <td style="text-align: center">><audio class="myaudio" src="audio/jointist_mt3/JayChou_chaorenbuhuifei_audio_10s.mp3" controls>alternative text</audio><br/>
    <td style="text-align: center">><audio class="myaudio" src="audio/jointist_mt3/MichaelJackson_BlackOrWhite_audio_10s.mp3" controls>alternative text</audio><br/>
    <td style="text-align: center">><audio class="myaudio" src="audio/jointist_mt3/Queen_IWantToBreakFree_audio_10s.mp3" controls>alternative text</audio><br/>
    <td style="text-align: center">><audio class="myaudio" src="audio/jointist_mt3/Radiohead_Karma_Police_audio_10s.mp3" controls>alternative text</audio><br/>
    <td style="text-align: center">><audio class="myaudio" src="audio/jointist_mt3/RWC_RM-P083_audio_10s.mp3" controls>alternative text</audio><br/>
 </tr>
    
 <tr>
    <td style="text-align: left"><b style="font-size:14px">MT3</b></td>
    <td style="text-align: center">><audio class="myaudio" src="audio/jointist_mt3/Beatles_Let_It_Be_mt3_10s.mp3" controls>alternative text</audio><br/>
    <td style="text-align: center">><audio class="myaudio" src="audio/jointist_mt3/JayChou_chaorenbuhuifei_mt3_10s.mp3" controls>alternative text</audio><br/>
    <td style="text-align: center">><audio class="myaudio" src="audio/jointist_mt3/MichaelJackson_BlackOrWhite_mt3_10s.mp3" controls>alternative text</audio><br/>
    <td style="text-align: center">><audio class="myaudio" src="audio/jointist_mt3/Queen_IWantToBreakFree_mt3_10s.mp3" controls>alternative text</audio><br/>
    <td style="text-align: center">><audio class="myaudio" src="audio/jointist_mt3/Radiohead_Karma_Police_mt3_10s.mp3" controls>alternative text</audio><br/>
    <td style="text-align: center">><audio class="myaudio" src="audio/jointist_mt3/RWC_RM-P083_mt3_10s.mp3" controls>alternative text</audio><br/>
 </tr>
    
 <tr>
    <td style="text-align: left"><b style="font-size:14px">Jointist</b></td>
    <td style="text-align: center">><audio class="myaudio" src="audio/jointist_mt3/Beatles_Let_It_Be_jointist_10s.mp3" controls>alternative text</audio><br/>
    <td style="text-align: center">><audio class="myaudio" src="audio/jointist_mt3/JayChou_chaorenbuhuifei_jointist_10s.mp3" controls>alternative text</audio><br/>
    <td style="text-align: center">><audio class="myaudio" src="audio/jointist_mt3/MichaelJackson_BlackOrWhite_jointist_10s.mp3" controls>alternative text</audio><br/>
    <td style="text-align: center">><audio class="myaudio" src="audio/jointist_mt3/Queen_IWantToBreakFree_jointist_10s.mp3" controls>alternative text</audio><br/>
    <td style="text-align: center">><audio class="myaudio" src="audio/jointist_mt3/Radiohead_Karma_Police_jointist_10s.mp3" controls>alternative text</audio><br/>
    <td style="text-align: center">><audio class="myaudio" src="audio/jointist_mt3/RWC_RM-P083_jointist_10s.mp3" controls>alternative text</audio><br/>
    </td>  
 </tr>
    
</table>

## Jointist Transcription Examples
Here, we test the music transcription feature of Jointist on real audio clips in a variaty of music genres. $f_{IR}$ determines the musical instruments appear in an audio clip, and then $f_{T}$ uses instrument condition to perform transcription.

### A. Mozart Symphony No. 40 in G minor K. 550 (Classical Music)
<table border="0">
    
 <tr>
    <td style="text-align: left"><b style="font-size:14px">Input</b></td>
    <td style="text-align: left"><b style="font-size:14px">Output</b></td> 
 </tr>
    
 <tr>
    <td><audio src="audio/T/RM-C002/audio_RM-C002.mp3" controls controlslist="noplaybackrate">alternative text</audio><br/>
    </td>
    <td><audio class="myaudio" src="audio/T/RM-C002/jointist_RM-C002.mp3" controls>alternative text</audio><br/>
    </td>  
 </tr>
</table>

### B. In Bloom - Nirvana (Rock)
<table border="0">
    
 <tr>
    <td style="text-align: left"><b style="font-size:14px">Input</b></td>
    <td style="text-align: left"><b style="font-size:14px">Output</b></td> 
 </tr>
    
 <tr>
    <td><audio src="audio/T/Nirvana_In_Bloom/In_Bloom-ground.mp3" controls>alternative text</audio><br/>
    </td>
    <td><audio src="audio/T/Nirvana_In_Bloom/jointist.mp3" controls>alternative text</audio><br/>
    </td>  
 </tr>
</table>

### C. 突然好想你 - Mayday (Chinese pop)
<table border="0">
    
 <tr>
    <td style="text-align: left"><b style="font-size:14px">Input</b></td>
    <td style="text-align: left"><b style="font-size:14px">Output</b></td> 
 </tr>
    
 <tr>
    <td><audio src="audio/T/Chinese_pop/突然好想你_ground.mp3" controls>alternative text</audio><br/>
    </td>
    <td><audio src="audio/T/Chinese_pop/突然好想你_pred1000.mp3" controls>alternative text</audio><br/>
    </td>  
 </tr>
</table>

### D. Psycho - Red Velvet (K-pop)
<table border="0">
    
 <tr>
    <td style="text-align: left"><b style="font-size:14px">Input</b></td>
    <td style="text-align: left"><b style="font-size:14px">Output</b></td> 
 </tr>
    
 <tr>
    <td><audio src="audio/T/K-pop/psycho-ground.mp3" controls>alternative text</audio><br/>
    </td>
    <td><audio src="audio/T/K-pop/psycho-jointist.mp3" controls>alternative text</audio><br/>
    </td>  
 </tr>
</table>

### E. Lemon - Yonezu Kenshi (J-pop)
<table border="0">
    
 <tr>
    <td style="text-align: left"><b style="font-size:14px">Input</b></td>
    <td style="text-align: left"><b style="font-size:14px">Output</b></td> 
 </tr>
    
 <tr>
    <td><audio src="audio/T/lemon/lemon_original.mp3" controls>alternative text</audio><br/>
    </td>
    <td><audio src="audio/T/lemon/lemon_1000_pred.mp3" controls>alternative text</audio><br/>
    </td>  
 </tr>
</table>

### F. 夜に駆ける - YOASOBI (J-pop)
<table border="0">
    
 <tr>
    <td style="text-align: left"><b style="font-size:14px">Input</b></td>
    <td style="text-align: left"><b style="font-size:14px">Output</b></td> 
 </tr>
    
 <tr>
    <td><audio src="audio/T/YOASOBI/YOASOBI - 夜に駆ける_ground.mp3" controls>alternative text</audio><br/>
    </td>
    <td><audio src="audio/T/YOASOBI/YOASOBI - 夜に駆ける_pred.mp3" controls>alternative text</audio><br/>
    </td>  
 </tr>
</table>

<a href="#Table-of-contents">Back to TOC</a>

### Jointst Source Separation Examples
Here, we test the source separtion feature of Jointist on real audio clips in a variaty of genres. $f_{IR}$ determines the musical instruments appear in an audio clip, and then $f_{MSS}$ uses both the transcription output and the instrument condition to separate the sources. And hence, Jointist is able to deal with various number of musical instruments.

Since Slakh2100 has no __vocal track__ (__voice track__ in this dataset is simply a sound effect using voice), our Jointist is very weak in separating vocals. Most of the time, Jointist treats it as Synth.

We believe that with a better defined instrument taxonomy and more training dataset, the performance for Jointist would be even better.

### 0. Track01873 (Slakh Test Set)
<table border="0">
    
 <tr>
    <td style="text-align: left"><b style="font-size:14px"></b></td>     
    <td style="text-align: left"><b style="font-size:14px">Input</b></td>
     <td style="text-align: left"><b style="font-size:14px"></b></td>     
    <td style="text-align: left"><b style="font-size:14px">Output</b></td> 
 </tr>
    
 <tr>
    <td>Mix</td>
    <td><audio src="audio/SS/Track01873/segments/mix_seg.mp3" controls>alternative text</audio><br/></td>
    <td>Bass</td>
    <td><audio src="audio/SS/Track01873/segments/bass.mp3" controls>alternative text</audio><br/></td>
 </tr>
    
    
 <tr>
    <td></td>
    <td></td>
    <td>Drums</td>
    <td><audio src="audio/SS/Track01873/segments/drums.mp3" controls>alternative text</audio><br/></td>
 </tr>    
    
 <tr>
    <td></td>
    <td></td>
    <td>Electric Guitar</td>
    <td><audio src="audio/SS/Track01873/segments/eguitar.mp3" controls>alternative text</audio><br/></td>
 </tr>        
    
 <tr>
    <td></td>
    <td></td>
    <td>Electric Piano</td>
    <td><audio src="audio/SS/Track01873/segments/epiano.mp3" controls>alternative text</audio><br/></td>
 </tr>       
    
    
 <tr>
    <td></td>
    <td></td>
    <td>String</td>
    <td><audio src="audio/SS/Track01873/segments/string.mp3" controls>alternative text</audio><br/></td>
 </tr>       
    
    
 <tr>
    <td></td>
    <td></td>
    <td>Synth Pad</td>
    <td><audio src="audio/SS/Track01873/segments/synth pad.mp3" controls>alternative text</audio><br/></td>
 </tr>       
    
 <tr>
    <td></td>
    <td></td>
    <td>Voice</td>
    <td><audio src="audio/SS/Track01873/segments/voice.mp3" controls>alternative text</audio><br/></td>
 </tr>           

 <tr>
    <td></td>
    <td></td>
    <td>Re-mix</td>
    <td><audio src="audio/SS/Track01873/segments/mix_seg.mp3" controls>alternative text</audio><br/></td>
 </tr>              
</table>

<a href="#Table-of-contents">Back to TOC</a>

### 1. Mozart Symphony No. 40 in G minor K. 550 (Classical Music)
<table border="0">
    
 <tr>
    <td style="text-align: left"><b style="font-size:14px"></b></td>     
    <td style="text-align: left"><b style="font-size:14px">Input</b></td>
     <td style="text-align: left"><b style="font-size:14px"></b></td>     
    <td style="text-align: left"><b style="font-size:14px">Output</b></td> 
 </tr>
    
 <tr>
    <td>Mix</td>
    <td><audio src="audio/T/RM-C002/audio_RM-C002.mp3" controls>alternative text</audio><br/></td>
    <td>Bass</td>
    <td><audio src="audio/SS/Mozart_K550/Bass.mp3" controls>alternative text</audio><br/></td>
 </tr>
    
    
 <tr>
    <td></td>
    <td></td>
    <td>Chromatic Percussion</td>
    <td><audio src="audio/SS/Mozart_K550/Chromatic Percussion.mp3" controls>alternative text</audio><br/></td>
 </tr>

 <tr>
    <td></td>
    <td></td>
    <td>Drums</td>
    <td><audio src="audio/SS/Mozart_K550/Drums.mp3" controls>alternative text</audio><br/></td>
 </tr>    

 <tr>
    <td></td>
    <td></td>
    <td>Electric Guitar</td>
    <td><audio src="audio/SS/Mozart_K550/Electric Guitar.mp3" controls>alternative text</audio><br/></td>
 </tr>    

 <tr>
    <td></td>
    <td></td>
    <td>Oboe</td>
    <td><audio src="audio/SS/Mozart_K550/Oboe.mp3" controls>alternative text</audio><br/></td>
 </tr>    


 <tr>
    <td></td>
    <td></td>
    <td>Piano</td>
    <td><audio src="audio/SS/Mozart_K550/Piano.mp3" controls>alternative text</audio><br/></td>
 </tr>    


 <tr>
    <td></td>
    <td></td>
    <td>Strings</td>
    <td><audio src="audio/SS/Mozart_K550/Strings.mp3" controls>alternative text</audio><br/></td>
 </tr>    


 <tr>
    <td></td>
    <td></td>
    <td>Synth Pad</td>
    <td><audio src="audio/SS/Mozart_K550/Synth Pad.mp3" controls>alternative text</audio><br/></td>
 </tr>    


 <tr>
    <td></td>
    <td></td>
    <td>Voice</td>
    <td><audio src="audio/SS/Mozart_K550/Voice.mp3" controls>alternative text</audio><br/></td>
 </tr>
    
 <tr>
    <td></td>
    <td></td>
    <td>Re-mix</td>
    <td><audio src="audio/SS/Mozart_K550/remix.mp3" controls>alternative text</audio><br/></td>
 </tr>        

</table>

<a href="#Table-of-contents">Back to TOC</a>

### 2. In Bloom - Nirvana (Rock)
<table border="0">
    
 <tr>
    <td style="text-align: left"><b style="font-size:14px"></b></td>     
    <td style="text-align: left"><b style="font-size:14px">Input</b></td>
     <td style="text-align: left"><b style="font-size:14px"></b></td>     
    <td style="text-align: left"><b style="font-size:14px">Output</b></td> 
 </tr>
    
 <tr>
    <td>Mix</td>
    <td><audio src="audio/T/Nirvana_In_Bloom/In_Bloom-ground.mp3" controls>alternative text</audio><br/></td>
    <td>Acoustic Guitar</td>
    <td><audio src="audio/SS/Nirvana_In_Bloom/seg/Acoustic Guitar.mp3" controls>alternative text</audio><br/></td>
 </tr>
    
    
 <tr>
    <td></td>
    <td></td>
    <td>Bass</td>
    <td><audio src="audio/SS/Nirvana_In_Bloom/seg/Bass.mp3" controls>alternative text</audio><br/></td>
 </tr>    
    
 <tr>
    <td></td>
    <td></td>
    <td>Drums</td>
    <td><audio src="audio/SS/Nirvana_In_Bloom/seg/Drums.mp3" controls>alternative text</audio><br/></td>
 </tr>    

 <tr>
    <td></td>
    <td></td>
    <td>Electric Guitar</td>
    <td><audio src="audio/SS/Nirvana_In_Bloom/seg/Electric Guitar.mp3" controls>alternative text</audio><br/></td>
 </tr>    


 <tr>
    <td></td>
    <td></td>
    <td>Piano</td>
    <td><audio src="audio/SS/Nirvana_In_Bloom/seg/Piano.mp3" controls>alternative text</audio><br/></td>
 </tr>    


 <tr>
    <td></td>
    <td></td>
    <td>Synth Lead</td>
    <td><audio src="audio/SS/Nirvana_In_Bloom/seg/Synth Lead.mp3" controls>alternative text</audio><br/></td>
 </tr>    
 


 <tr>
    <td></td>
    <td></td>
    <td>Voice</td>
    <td><audio src="audio/SS/Nirvana_In_Bloom/seg/Voice.mp3" controls>alternative text</audio><br/></td>
 </tr>
    
    
 <tr>
    <td></td>
    <td></td>
    <td>Re-mix</td>
    <td><audio src="audio/SS/Nirvana_In_Bloom/seg/remix.mp3" controls>alternative text</audio><br/></td>
 </tr>        
    
</table>

<a href="#Table-of-contents">Back to TOC</a>

### 3. 突然好想你 - Mayday (Chinese pop)
<table border="0">
    
 <tr>
    <td style="text-align: left"><b style="font-size:14px"></b></td>     
    <td style="text-align: left"><b style="font-size:14px">Input</b></td>
     <td style="text-align: left"><b style="font-size:14px"></b></td>     
    <td style="text-align: left"><b style="font-size:14px">Output</b></td> 
 </tr>
    
 <tr>
    <td>Mix</td>
    <td><audio src="audio/T/Chinese_pop/突然好想你_ground.mp3" controls>alternative text</audio><br/></td>
    <td>Acoustic Guitar</td>
    <td><audio src="audio/SS/Chinese_pop/Acoustic Guitar.mp3" controls>alternative text</audio><br/></td>
 </tr>
    
    
 <tr>
    <td></td>
    <td></td>
    <td>Bass</td>
    <td><audio src="audio/SS/Chinese_pop/Bass.mp3" controls>alternative text</audio><br/></td>
 </tr>
    
 <tr>
    <td></td>
    <td></td>
    <td>Brass</td>
    <td><audio src="audio/SS/Chinese_pop/Brass.mp3" controls>alternative text</audio><br/></td>
 </tr>        
    
 <tr>
    <td></td>
    <td></td>
    <td>Drums</td>
    <td><audio src="audio/SS/Chinese_pop/Drums.mp3" controls>alternative text</audio><br/></td>
 </tr>    

 <tr>
    <td></td>
    <td></td>
    <td>Electric Guitar</td>
    <td><audio src="audio/SS/Chinese_pop/Electric Guitar.mp3" controls>alternative text</audio><br/></td>
 </tr>    
    
 <tr>
    <td></td>
    <td></td>
    <td>Electric Piano</td>
    <td><audio src="audio/SS/Chinese_pop/Electric Piano.mp3" controls>alternative text</audio><br/></td>
 </tr>    


 <tr>
    <td></td>
    <td></td>
    <td>Piano</td>
    <td><audio src="audio/SS/Chinese_pop/Piano.mp3" controls>alternative text</audio><br/></td>
 </tr>    

 <tr>
    <td></td>
    <td></td>
    <td>Saxophone</td>
    <td><audio src="audio/SS/Chinese_pop/Saxophone.mp3" controls>alternative text</audio><br/></td>
 </tr>    

 <tr>
    <td></td>
    <td></td>
    <td>Strings</td>
    <td><audio src="audio/SS/Chinese_pop/Strings.mp3" controls>alternative text</audio><br/></td>
 </tr>        

 <tr>
    <td></td>
    <td></td>
    <td>Synth Lead</td>
    <td><audio src="audio/SS/Chinese_pop/Synth Lead.mp3" controls>alternative text</audio><br/></td>
 </tr>    
 
 <tr>
    <td></td>
    <td></td>
    <td>Synth Pad</td>
    <td><audio src="audio/SS/Chinese_pop/Synth Pad.mp3" controls>alternative text</audio><br/></td>
 </tr>
    
 <tr>
    <td></td>
    <td></td>
    <td>Trumpet</td>
    <td><audio src="audio/SS/Chinese_pop/Trumpet.mp3" controls>alternative text</audio><br/></td>
 </tr>    
    
 <tr>
    <td></td>
    <td></td>
    <td>Violin</td>
    <td><audio src="audio/SS/Chinese_pop/Violin.mp3" controls>alternative text</audio><br/></td>
 </tr>         
 

 <tr>
    <td></td>
    <td></td>
    <td>Voice</td>
    <td><audio src="audio/SS/Chinese_pop/Voice.mp3" controls>alternative text</audio><br/></td>
 </tr>
    
    
 <tr>
    <td></td>
    <td></td>
    <td>Re-mix</td>
    <td><audio src="audio/SS/Chinese_pop/remix.mp3" controls>alternative text</audio><br/></td>
 </tr>        

</table>

<a href="#Table-of-contents">Back to TOC</a>

### 4. Psycho - Red Velvet (K-pop)
<table border="0">
    
 <tr>
    <td style="text-align: left"><b style="font-size:14px"></b></td>     
    <td style="text-align: left"><b style="font-size:14px">Input</b></td>
     <td style="text-align: left"><b style="font-size:14px"></b></td>     
    <td style="text-align: left"><b style="font-size:14px">Output</b></td> 
 </tr>
    
 <tr>
    <td>Mix</td>
    <td><audio src="audio/T/K-pop/psycho-ground.mp3" controls>alternative text</audio><br/></td>
    <td>Acoustic Guitar</td>
    <td><audio src="audio/SS/psycho/seg/Acoustic Guitar.mp3" controls>alternative text</audio><br/></td>
 </tr>
    
    
 <tr>
    <td></td>
    <td></td>
    <td>Bass</td>
    <td><audio src="audio/SS/psycho/seg/Bass.mp3" controls>alternative text</audio><br/></td>
 </tr>    

 <tr>
    <td></td>
    <td></td>
    <td>Brass</td>
    <td><audio src="audio/SS/psycho/seg/Brass.mp3" controls>alternative text</audio><br/></td>
 </tr>
    
 <tr>
    <td></td>
    <td></td>
    <td>Chromatic Percussion</td>
    <td><audio src="audio/SS/psycho/seg/Chromatic Percussion.mp3" controls>alternative text</audio><br/></td>
 </tr>        
    
 <tr>
    <td></td>
    <td></td>
    <td>Drums</td>
    <td><audio src="audio/SS/psycho/seg/Drums.mp3" controls>alternative text</audio><br/></td>
 </tr>    

 <tr>
    <td></td>
    <td></td>
    <td>Electric Guitar</td>
    <td><audio src="audio/SS/psycho/seg/Electric Guitar.mp3" controls>alternative text</audio><br/></td>
 </tr>    

 <tr>
    <td></td>
    <td></td>
    <td>Electric Piano</td>
    <td><audio src="audio/SS/psycho/seg/Electric Piano.mp3" controls>alternative text</audio><br/></td>
 </tr>    
    
 <tr>
    <td></td>
    <td></td>
    <td>Piano</td>
    <td><audio src="audio/SS/psycho/seg/Piano.mp3" controls>alternative text</audio><br/></td>
 </tr>    


 <tr>
    <td></td>
    <td></td>
    <td>Synth Lead</td>
    <td><audio src="audio/SS/psycho/seg/Synth Lead.mp3" controls>alternative text</audio><br/></td>
 </tr>    
 
 <tr>
    <td></td>
    <td></td>
    <td>Synth Pad</td>
    <td><audio src="audio/SS/psycho/seg/Synth Pad.mp3" controls>alternative text</audio><br/></td>
 </tr>
    

 <tr>
    <td></td>
    <td></td>
    <td>Voice</td>
    <td><audio src="audio/SS/psycho/seg/Voice.mp3" controls>alternative text</audio><br/></td>
 </tr>
    
    
 <tr>
    <td></td>
    <td></td>
    <td>Re-mix</td>
    <td><audio src="audio/SS/psycho/seg/remix.mp3" controls>alternative text</audio><br/></td>
 </tr>        

</table>

<a href="#Table-of-contents">Back to TOC</a>

### 5. Lemon - Yonezu Kenshi (J-pop)
<table border="0">
    
 <tr>
    <td style="text-align: left"><b style="font-size:14px"></b></td>     
    <td style="text-align: left"><b style="font-size:14px">Input</b></td>
     <td style="text-align: left"><b style="font-size:14px"></b></td>     
    <td style="text-align: left"><b style="font-size:14px">Output</b></td> 
 </tr>
    
 <tr>
    <td>Mix</td>
    <td><audio src="audio/T/lemon/lemon_original.mp3" controls>alternative text</audio><br/></td>
    <td>Bass</td>
    <td><audio src="audio/SS/lemon/seg/Bass.mp3" controls>alternative text</audio><br/></td>
 </tr>
    
    
 <tr>
    <td></td>
    <td></td>
    <td>Brass</td>
    <td><audio src="audio/SS/lemon/seg/Brass.mp3" controls>alternative text</audio><br/></td>
 </tr>    
    
 <tr>
    <td></td>
    <td></td>
    <td>Chromatic Percussion</td>
    <td><audio src="audio/SS/lemon/seg/Chromatic Percussion.mp3" controls>alternative text</audio><br/></td>
 </tr>    

 <tr>
    <td></td>
    <td></td>
    <td>Electric Guitar</td>
    <td><audio src="audio/SS/lemon/seg/Electric Guitar.mp3" controls>alternative text</audio><br/></td>
 </tr>    


 <tr>
    <td></td>
    <td></td>
    <td>Electric Piano</td>
    <td><audio src="audio/SS/lemon/seg/Electric Piano.mp3" controls>alternative text</audio><br/></td>
 </tr>    


 <tr>
    <td></td>
    <td></td>
    <td>Organ</td>
    <td><audio src="audio/SS/lemon/seg/Organ.mp3" controls>alternative text</audio><br/></td>
 </tr>    


 <tr>
    <td></td>
    <td></td>
    <td>Pipe</td>
    <td><audio src="audio/SS/lemon/seg/Pipe.mp3" controls>alternative text</audio><br/></td>
 </tr>    


 <tr>
    <td></td>
    <td></td>
    <td>Strings</td>
    <td><audio src="audio/SS/lemon/seg/Strings.mp3" controls>alternative text</audio><br/></td>
 </tr>
    
 <tr>
    <td></td>
    <td></td>
    <td>Synth Lead</td>
    <td><audio src="audio/SS/lemon/seg/Synth Lead.mp3" controls>alternative text</audio><br/></td>
 </tr>
    
 <tr>
    <td></td>
    <td></td>
    <td>Synth Pad</td>
    <td><audio src="audio/SS/lemon/seg/Synth Pad.mp3" controls>alternative text</audio><br/></td>
 </tr>    

 <tr>
    <td></td>
    <td></td>
    <td>Voice</td>
    <td><audio src="audio/SS/lemon/seg/Voice.mp3" controls>alternative text</audio><br/></td>
 </tr>
    
 <tr>
    <td></td>
    <td></td>
    <td>Re-mix</td>
    <td><audio src="audio/SS/lemon/seg/remix.mp3" controls>alternative text</audio><br/></td>
 </tr>        
    
</table>

<a href="#Table-of-contents">Back to TOC</a>

### 6. 夜に駆ける - YOASOBI (J-pop)
<table border="0">
    
 <tr>
    <td style="text-align: left"><b style="font-size:14px"></b></td>     
    <td style="text-align: left"><b style="font-size:14px">Input</b></td>
     <td style="text-align: left"><b style="font-size:14px"></b></td>     
    <td style="text-align: left"><b style="font-size:14px">Output</b></td> 
 </tr>
    
 <tr>
    <td>Mix</td>
    <td><audio src="audio/SS/YOASOBI/mix.mp3" controls>alternative text</audio><br/></td>
    <td>Acoustic Guitar</td>
    <td><audio src="audio/SS/YOASOBI/Acoustic Guitar.mp3" controls>alternative text</audio><br/></td>
 </tr>
    
    
 <tr>
    <td></td>
    <td></td>
    <td>Bass</td>
    <td><audio src="audio/SS/YOASOBI/Bass.mp3" controls>alternative text</audio><br/></td>
 </tr>    
    
 <tr>
    <td></td>
    <td></td>
    <td>Brass</td>
    <td><audio src="audio/SS/YOASOBI/Brass.mp3" controls>alternative text</audio><br/></td>
 </tr>        
    
 <tr>
    <td></td>
    <td></td>
    <td>Cello</td>
    <td><audio src="audio/SS/YOASOBI/Cello.mp3" controls>alternative text</audio><br/></td>
 </tr>       
    
    
 <tr>
    <td></td>
    <td></td>
    <td>Drums</td>
    <td><audio src="audio/SS/YOASOBI/Drums.mp3" controls>alternative text</audio><br/></td>
 </tr>       
    
    
 <tr>
    <td></td>
    <td></td>
    <td>Electric Guitar</td>
    <td><audio src="audio/SS/YOASOBI/Electric Guitar.mp3" controls>alternative text</audio><br/></td>
 </tr>       
    
 <tr>
    <td></td>
    <td></td>
    <td>Piano</td>
    <td><audio src="audio/SS/YOASOBI/Piano.mp3" controls>alternative text</audio><br/></td>
 </tr>           

 <tr>
    <td></td>
    <td></td>
    <td>Synth Lead</td>
    <td><audio src="audio/SS/YOASOBI/Synth Lead.mp3" controls>alternative text</audio><br/></td>
 </tr>
    
 <tr>
    <td></td>
    <td></td>
    <td>Synth Pad</td>
    <td><audio src="audio/SS/YOASOBI/Synth Pad.mp3" controls>alternative text</audio><br/></td>
 </tr>           
    
 <tr>
    <td></td>
    <td></td>
    <td>Violin</td>
    <td><audio src="audio/SS/YOASOBI/Violin.mp3" controls>alternative text</audio><br/></td>
 </tr>
    
 <tr>
    <td></td>
    <td></td>
    <td>Re-mix</td>
    <td><audio src="audio/SS/YOASOBI/remix.mp3" controls>alternative text</audio><br/></td>
 </tr>    
  
</table>

<a href="#Table-of-contents">Back to TOC</a>

This figure shows the detail model architecture for jointist. As an example, we assume that the audio clip $X_\text{wav}$ has three different instruments: piano, bass, and guitar.

The instrument recognition module $f_\text{IR}$ generates the instrument condition $\hat{Y}_\text{cond}$ containing three instruments: piano, bass, and guitar. Then three one-hot vectors ($I^\text{pn}_\text{cond}$, $I^\text{bass}_\text{cond}$, and $I^\text{gtr}_\text{cond}$) are extracted from $\hat{Y}_\text{cond}$ representing piano, bass, and guitar.

The transcription module $Y_\text{T}$ then takes in the one-hot vectors and iteratively to produce indivual piano roll for each instrument. 

Similiary the source separation module $Y_\text{MSS}$ also takes in the one-hot vectors and iteratively. On top of the instrument condition $I_\text{cond}$, $Y_\text{MSS}$ also takes the transcription result $\hat{Y}_T$ and an extra condition.

<p style="text-align: center;" width="100%">
    <img src="./fig/jointist.png"> 
</p>
<a href="#Table-of-contents">Back to TOC</a>

### Transcription module

$f_\text{T}$

<p style="text-align: center;" width="100%">
    <img src="./fig/T.png"> 
</p>
<a href="#Table-of-contents">Back to TOC</a>

### Music source separation module
$f_\text{MSS}$

<p style="text-align: center;" width="100%">
    <img src="./fig/MSS.png"> 
</p>
<a href="#Table-of-contents">Back to TOC</a>