<a href="https://colab.research.google.com/github/buganart/descriptor-transformer/blob/main/descriptor_model_train.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
#@markdown Before starting please save the notebook in your drive by clicking on `File -> Save a copy in drive`

In [None]:
#@markdown Check GPU, should be a Tesla V100
!nvidia-smi -L
import os
print(f"We have {os.cpu_count()} CPU cores.")

In [None]:
#@markdown Mount google drive
from google.colab import drive
from google.colab import output
drive.mount('/content/drive')

from pathlib import Path
if not Path("/content/drive/My Drive/IRCMS_GAN_collaborative_database").exists():
    raise RuntimeError(
        "Shortcut to our shared drive folder doesn't exits.\n\n"
        "\t1. Go to the google drive web UI\n"
        "\t2. Right click shared folder IRCMS_GAN_collaborative_database and click \"Add shortcut to Drive\""
    )

def clear_on_success(msg="Ok!"):
    if _exit_code == 0:
        output.clear()
        print(msg)

In [None]:
#@markdown Install wandb and log in
%pip install wandb
output.clear()
import wandb
from pathlib import Path
wandb_drive_netrc_path = Path("drive/My Drive/colab/.netrc")
wandb_local_netrc_path = Path("/root/.netrc")
if wandb_drive_netrc_path.exists():
    import shutil

    print("Wandb .netrc file found, will use that to log in.")
    shutil.copy(wandb_drive_netrc_path, wandb_local_netrc_path)
else:
    print(
        f"Wandb config not found at {wandb_drive_netrc_path}.\n"
        f"Using manual login.\n\n"
        f"To use auto login in the future, finish the manual login first and then run:\n\n"
        f"\t!mkdir -p '{wandb_drive_netrc_path.parent}'\n"
        f"\t!cp {wandb_local_netrc_path} '{wandb_drive_netrc_path}'\n\n"
        f"Then that file will be used to login next time.\n"
    )

!wandb login
output.clear()
print("ok!")

# Description

This notebook is used for training descriptor model and log results to the wandb project "demiurge/descriptor_model". This notebook is based on the code from [buganart/descriptor-transformer](https://github.com/buganart/descriptor-transformer).

To start training the descriptor model, user will need to 

1. specify **audio_db_dir** to locate a music folder or a descriptor folder in the mounted Google Drive. If the folder is a music folder, the code will process all the music files (.wav) to descriptpor files (.json). **hop_length** and **sr(sampling rate)** are parameters to process music files into descriptors. The extracted descriptors describe a music segment with librosa features such as *spectral_centroid*, *spectral_flatness*, *spectral_rolloff*, and *rms*.  

2. specify **selected_model** in "LSTM", "LSTMEncoderDecoderModel", "TransformerEncoderOnlyModel", or "TransformerModel". The models will take processed descriptors as input with sequence length **window_size**, and try to predict subsequent descriptors based on the input. For "LSTMEncoderDecoder" and "TransformerModel", the model encoder read the input with sequence length **window_size**, and the model decoder generate output with sequence length **forecast_size**. However, for "LSTM" and "TransformerEncoderOnlyModel", the model will just try to predict the next one descriptor based on the input.

3. run the code, and record the wandb run id. The wandb run id will be used for resuming run and for generate descriptors in the [prediction notebook](https://github.com/buganart/descriptor-transformer/blob/main/descriptor_model_predict.ipynb).



---

## Models

* ["LSTM"](https://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html#example-an-lstm-for-part-of-speech-tagging)
    * simple time series next step prediction model built based on nn.LSTM module.
    * this model will only predict the next one step descriptor given all descriptors of all sequence length of the input
    * the model will process the input with nn.LSTM module, and then the resulting hidden vector will be processed by nn.Linear module to generate output descriptors.

* ["LSTMEncoderDecoderModel"](https://arxiv.org/pdf/1406.1078.pdf)
    * encoder decoder model implemented using nn.LSTM module
    * after the encoder nn.LSTM module process the input descriptors of all sequence length into a hidden vector, the decoder nn.LSTM module can decode the hidden vector to descriptors of sequence length in **forecast_size**.
* ["TransformerEncoderOnlyModel"](https://pytorch.org/tutorials/beginner/transformer_tutorial.html)
    * based on pytorch Transformer implementation using only nn.TransformerEncoder.
    * this model will only predict the next one step descriptor after the nn.TransformerEncoder module process the input descriptors
* ["TransformerModel"](https://pytorch.org/docs/stable/generated/torch.nn.Transformer.html)
    * based on pytorch Transformer implementation
    * setting the input descriptors as the source(*src*) vector, and the zero vector of sequence length in **forecast_size** as target(*tgt*) vector, the model try to predict the output descriptors of sequence length in **forecast_size**


---

##Training Parameters

The training parameters for the selected models. Based on the specified **selected_model**, some training parameters may have no effect on the resulting model (For example, **hidden_size** has no effect on Transformer type models). If no model is mentioned in the description of the parameter, that parameter is applicable to all models.  

**experiment_dir**: 

the path where the data generated from the melgan training process is saved.

**resume_run_id**: 

In case the run is stopped, and the user want to resume such run, please specify wandb run id in the **resume_run_id**.

---

**remove_outliers**: 

For all the descriptors extracted from the **audio_db_dir** for training, the mean and std will be calculated. If **remove_outliers** is set to True, descriptors that the values are far from the mean will be removed.

**process_on_the_fly**: 

decide whether to process descriptors into batch before training or during training.

*   If *True*, the extracted descriptors for each music sample file (.wav) are stored in the dataset list separately. When a data batch is needed in the training process, the sample index (which sample to draw data) and window index (which segment of the sample) will be determined and returned. 
*   If *False*, the extracted descriptors for each music sample file (.wav) will be batchified, so every segment of size **window_size** + **forecast_size** will be copied and saved in the dataset list. In the training process, random batch will be drawn from the dataset list. This approach dramatically increase the dataset storage size.



---

**window_size**: 

the sequence length of the descriptor model input 

**forecast_size**: 

the sequence length of the descriptor model output

---

**add_positional_embedding**: 

Sinusoidal Positional Encoding described in the transformer model paper [Attention Is All You Need](https://arxiv.org/pdf/1706.03762.pdf). The Positional Encoding is optional for "LSTMEncoderDecoderModel" and "TransformerModel", but necessary for "TransformerEncoderOnlyModel".
* If *True*, position embedding of size *dim_pos_encoding* will be added to the feature dimension of the data.
* If *False*, the value of *dim_pos_encoding* will have no effect to the model.

**dim_pos_encoding**: 

the number of dimensions for the Sinusoidal Positional Encoding. 
**add_positional_embedding** must be *True* to add position embedding of size *dim_pos_encoding* to the data.

* For "TransformerEncoderOnlyModel" and "TransformerModel", the number of MFCC features *feature_size* is 39, and the total number of features (*dim_pos_encoding* + *feature_size*) must be divisible by *nhead*.

**num_layers**: 

* "LSTM": The *num_layers* parameter for the nn.LSTM module

* "LSTMEncoderDecoderModel": The *num_layers* parameter for the nn.LSTM encoder and nn.LSTM decoder.

* "TransformerEncoderOnlyModel": The *num_layers* parameter for the nn.TransformerEncoder module.

* "TransformerModel": The *num_encoder_layers* and *num_decoder_layers* parameter for the nn.Transformer module.

**hidden_size**:

only for "LSTM" and "LSTMEncoderDecoderModel". 

* The *hidden_size* of the nn.LSTM module in the "LSTM" and "LSTMEncoderDecoderModel".

**nhead**, **dropout**, **dim_feedforward**:

only for "TransformerEncoderOnlyModel" and "TransformerModel". 

* The *nhead*, *dropout*, *dim_feedforward* of the nn.Transformer module in the "TransformerModel".

* The *nhead*, *dropout*, *dim_feedforward* of the nn.TransformerEncoder module in the "TransformerEncoderOnlyModel".

* the number of MFCC features *feature_size* is 39, and the total number of features (*dim_pos_encoding* + *feature_size*) must be divisible by *nhead*.


---

**learning_rate**: 

the learning rate to train the model

**batch_size**:

the batch size of the data batch drawn from the dataset to train the model per training step

**epochs**:

the maximum number of epoch to train the model

**save_interval**:

the interval in number of epoch to save a checkpoint file from pytorch lightning Trainer.





In [None]:
#@title Configuration
#@markdown Directories can be found via file explorer on the left by navigating into `drive` to the desired folders. 
#@markdown Then right-click and *`Copy path`*.

#@markdown ### #dataset directory / train save directory
#@markdown - the path to save experiment / model data
experiment_dir = "/content/drive/My Drive/IRCMS_GAN_collaborative_database/Experiments/colab-violingan/descriptor-model" #@param {type:"string"}

#@markdown - the path to the new non-GAN generated dataset
#@markdown - This is the dataset to train the descriptor model. The data will be split into segments of size
#@markdown  - window_size + 1 for [LSTM, TransformerEncoderOnlyModel]
#@markdown  - window_size + forecast_size for [LSTMEncoderDecoder, TransformerModel] 
#@markdown - The model will predict next 1 or next forecast_size descriptors given window_size descriptors
#@markdown - the files in the folder can be music (.wav) or extracted descriptors (.json)
# audio_db_dir = "/content/drive/My Drive/AUDIO DATABASE/MUSIC TRANSFORMER/barber corpus" #@param {type:"string"}
audio_db_dir = "/content/drive/My Drive/AUDIO DATABASE/TESTING" #@param {type:"string"}
#@markdown - wav parameters to process music (.wav) to extracted descriptors (.json)
hop_length = 1024 #@param {type:"integer"}
sr = 44100 #@param {type:"integer"} 
descriptor_size = 39



#@markdown ### #Resumption of previous runs
#@markdown Optional resumption arguments below, leaving it empty will start a new run from scratch. 

#@markdown Note that for resuming run, the config parameters will NOT be changed even the train arugments are different.
#@markdown - The ID can be found on wandb. 
#@markdown - It's 8 characters long and may contain a-z letters and digits (for example `1t212ycn`).

#@markdown Resume a previous run 
resume_run_id = "" #@param {type:"string"}

#@markdown ### #train argument
selected_model = "LSTM" #@param ["LSTM", "LSTMEncoderDecoderModel", "TransformerEncoderOnlyModel", "TransformerModel"]
#@markdown - remove descriptors that the values are far from the mean
remove_outliers=True#@param {type: "boolean"}
#@markdown - if True, fixed number of samples will be drawn per epoch from the dataset regardless of the dataset size.
#@markdown - if False, all sliced data will be packed and shuffled per epoch
process_on_the_fly=True#@param {type: "boolean"}

#@markdown - the input sequence length to train the model
window_size = 2000 #@param {type: "integer"}
#@markdown - positional_embedding optional for "LSTMEncoderDecoderModel", "TransformerModel", necessary for "TransformerEncoderOnlyModel"
add_positional_embedding=True#@param {type: "boolean"}
dim_pos_encoding=20     #@param {type: "integer"}
#@markdown - num_layer for each model (including num_encoder_layer/num_decoder_layer)
num_layers = 3 #@param {type: "integer"}

learning_rate = 1e-4 #@param {type: "number"}
batch_size = 64 #@param {type: "integer"}
epochs = 3000 #@param {type: "integer"}

# log_interval = 10 #@param {type: "integer"}
#@markdown - how many epochs to save a model checkpoint  
save_interval = 10 #@param {type: "integer"}
# n_test_samples = 8 #@param {type: "integer"}



notes = "" #@param {type: "string"}
#@markdown model specific argument
#@markdown - LSTMEncoderDecoder, TransformerModel (forecast_size)
#@markdown - the output sequence length of the model based on the input of size "window_size" 
forecast_size=2000 #@param {type: "integer"}
#@markdown - LSTM
#@markdown - the hidden dim for LSTM
hidden_size=100 #@param {type: "integer"}
#@markdown - TransformerEncoder, TransformerModel
nhead=5     #@param {type: "integer"}
dropout=0.1     #@param {type: "number"}
dim_feedforward=128     #@param {type: "integer"}

import re
from pathlib import Path
from argparse import Namespace

audio_db_dir = Path(audio_db_dir)
experiment_dir = Path(experiment_dir)


for path in [experiment_dir]:
    path.mkdir(parents=True, exist_ok=True)

if not audio_db_dir.exists():
    raise RuntimeError(f"audio_db_dir {audio_db_dir} does not exists.")

def check_wandb_id(run_id):
    if run_id and not re.match(r"^[\da-z]{8}$", run_id):
        raise RuntimeError(
            "Run ID needs to be 8 characters long and contain only letters a-z and digits.\n"
            f"Got \"{run_id}\""
        )

check_wandb_id(resume_run_id)

colab_config = {
    "audio_db_dir": audio_db_dir,
    "hop_length": hop_length,
    "sr": sr,
    "experiment_dir": experiment_dir,
    "resume_run_id": resume_run_id,
    "remove_outliers": remove_outliers,
    "descriptor_size": descriptor_size,
    "window_size": window_size,
    "forecast_size": forecast_size,
    "learning_rate": learning_rate,
    "batch_size": batch_size,
    "epochs": epochs,
    "save_interval": save_interval,
    "selected_model": selected_model,
    "notes": notes,
    "hidden_size": hidden_size,
    "num_layers": num_layers,
    "dim_pos_encoding": dim_pos_encoding,
    "nhead": nhead,
    "dropout": dropout,
    "dim_feedforward": dim_feedforward,
}

for k, v in colab_config.items():
    print(f"=> {k:20}: {v}")

config = Namespace(**colab_config)
config.seed = 1234

if config.selected_model not in ["LSTMEncoderDecoderModel", "TransformerModel"]:
    config.forecast_size = 0
config.window_size = config.window_size + config.forecast_size

In [None]:
#@markdown Install dependency
%pip install --upgrade git+https://github.com/buganart/descriptor-transformer.git#egg=desc
import torch
from desc.train_function import save_model_args, get_resume_run_config, init_wandb_run, setup_datamodule, setup_model, train
clear_on_success()

#Train

In [None]:
run = init_wandb_run(config, run_dir=experiment_dir)#, mode="offline")
datamodule = setup_datamodule(config, run, isTrain=True, process_on_the_fly=process_on_the_fly)
model, extra_trainer_args = setup_model(config, run)
if torch.cuda.is_available():
    extra_trainer_args["gpus"] = -1
train(config, run, model, datamodule, extra_trainer_args)