# Recipe for training JukeDrummer on a custom drum-covers dataset
As we mostly reuse source code from [JukeDrummer](https://github.com/legoodmanner/jukedrummer), this fork adapts the python and shell scripts used.

This notebook is mostly written as a report of the steps used, although most steps were actually run as scripts outside of the notebook.

There was a lot of manual moving of files, therefore, if this notebook or any script glitches, please open an issue!

In summary, these are the steps we will follow:

0. Environment setup
1. Baseline: Generate ~24s long accompanying drums using pre-trained checkpoint weights
2. This work: Retrain JukeDrummer for a new dataset of drum covers:
    - Generate a new dataset of paired drumless and drums audio files
    - Preprocess this new dataset (segment, mel, beat extraction, language model tokens)
    - Train the Jukedrummer models: VQ-VAEs and LanguageModel
3. Demo: Run inference again, using the newly trained weights

# Environment 
We combine two public models:
1. Demucs [ISMIR'21][ICASSP'23]: for creating a dataset of drumless and drums-only audio files from a personal collection of drum covers
2. JukeDrummer [ISMIR'23]: for training a drum-accompaniment generation model. 

## JukeDrummer Fork
The ISMIR'22 repo is here: https://github.com/legoodmanner/jukedrummer

This repo had minor issues with package versions, and changes in some model keys and function arguments.

A fork was used to make changes to the source repo:
https://github.com/3x10e8/jukedrummer

Refer to the [readme.md](./readme.md) file, now pointing to updated [requirements.txt](./requirements.txt)

## Demucs Clone
For creating a new dataset of audio files consisting of two steps: drums (target) and drumless (others), we will clone demucs:

https://github.com/adefossez/demucs

I went with the option of creating a conda environment using the provided [environment-cuda.yml](https://github.com/adefossez/demucs/blob/main/environment-cuda.yml).

```
conda env update -f environment-cuda.yml
conda activate demucs
pip install -e .
```

`pip` threw warnings about making an editable install.

Furthermore, testing demucs for trying to convert the included [test.mp3](https://github.com/adefossez/demucs/blob/main/test.mp3):
```
torch.from_numpy(wav) RuntimeError: Numpy is not available
```
This seemed to originate from using a newer numpy version, as [environment-cuda.yml](https://github.com/adefossez/demucs/blob/main/environment-cuda.yml) did not specify a numpy version.

Fixable by downgrading:

`pip install --upgrade numpy==1.24.1` resolved this issue.

# Drum Covers Dataset

I ran demucs on a dataset of drum covers previously recorded as WAV files.

Parameter | Value
:-- | :--
Number of WAV (total drum covers) | 347
Number of drummers | 1*
Total duration** | 18 hours, 44 minutes, 59 seconds
Drum covers were mostly performed for this playlist | [YouTube](https://www.youtube.com/playlist?list=PLVeAUqPtEmT4D5u7DbwJRtJ0nu_00Pb53)

*occasionally more if the backing track also had drums (not excluded), but this should be only a small subset

**extracted using: [get_wav_duration.py](https://gist.github.com/Wazzabeee/5dc05b11b8529457cde7b3fea0c0a45e)

Some notes on this drum cover dataset:
1. Recordings were made on a Roland TD-25: an electronic drumkit that allows playing along to music (as a backing track).
2. Each drum cover was saved as one WAV file, combining the drum cover and the backing music (often drumless) into one file.
    - Some files probably had original drums in the backing tracks, resulting in two drum tracks being overlaid in the covers. These files were not identified nor excluded.
    - Similarly, some files would have consisted of just the drums, without any backing tracks. These were also not excluded.
3. Some (if not all) songs were covered multiple times. Repeated covers could get split between train and validation sets (this was not controlled for).
3. Not all drum covers used the same drum sounds, nor would the volumes have matched across recordings.
4. The drum covers were not perfect -- there would be timing errors!
5. The trained model aims to generate drum tracks based on how I play the drums, it is not intended to generate music from the source playlist.

### Running demucs
WAVs were copied into one directory, and then demucs was run using:

`demucs -o . --filename "{stem}/{track}.{ext}" -j 20 --two-stems drums *.WAV`

# JukeDrummer: Inference with pre-trained weights

First, we copy over a drumless WAV, and truncate it to 23.78 seconds (seems to work for the network layer size):

In [None]:
!python input_drumless/truncate_to_24s/sounds/truncate.py

In [None]:
!python inference.py --exp_idx 1 --cuda 0 --ckpt_dir ckpt/ --input_dir input_drumless --output_dir output_with_drums/

# Preprocessing new dataset

In [None]:
! bash script/preprocessing.sh

# Training JukeDrummer on new dataset

In [None]:
!python3 train_vqvae.py --vq_idx 1 --data_type target --cuda 0

In [None]:
!python3 train_vqvae.py --vq_idx 1 --data_type others --cuda 0

In [None]:
!python3 token_extract.py --cuda 0 --vq_idx 1 --data_type target --ckpt_dir ckpt_3x10e8/ --mel_dir data/mel --output_dir data/token
!python3 token_extract.py --cuda 0 --vq_idx 1 --data_type others --ckpt_dir ckpt_3x10e8/ --mel_dir data/mel --output_dir data/token
!python3 train_lm.py --cuda 0 --exp_idx 1 #--wandb

# Inference with new weights

In [None]:
for ckpt in [
    'ckpt/',                # pretrained
    'ckpt_lm_tr86_epo300/', # pretrained vq, train LM on smaller train set
    'ckpt_lm_tr124_epo40/', # pretrained vq, train LM on larger train set
    'ckpt_lm_tr124_epo60/', # pretrained vq, train LM on larger train set for more epochs
    'ckpt_vq_tr62_epo69/',  # train vq, reuse trained LM from last run
    'ckpt_vq,lm_tr124_epo350/', # final model
]:
    !python inference.py --exp_idx 1 --cuda 0 --ckpt_dir {ckpt} --input_dir input_drumless --output_dir output_with_drums/

# Appendix
Training only LM weights for 60 epochs, using the new dataset, embedded using pre-trained (ISMIR'23) VD-VQE weights:

In [None]:
!bash script/train.sh