# Voicebox Training
Before starting this tutorial, please ensure you have training datasets prepared according to [tutorials/tts/Voicebox_MFA.ipynb](Voicebox_MFA.ipynb).

We use hierarchical structure for configuration files, so it might be difficult to understand directly. Therefore we provide logged model configuration at [examples/tts/conf/voicebox/mel_spec-LibriHeavy.yaml](/examples/tts/conf/voicebox/mel_spec-LibriHeavy.yaml) and [examples/tts/conf/voicebox/DAC-GigaSpeech.yaml](/examples/tts/conf/voicebox/DAC-GigaSpeech.yaml), to show how to override the default arguments.

In the following, we will show how to train with several settings.

## DAC feature + GigaSpeech
The following is part of the slurm script for training, to trace the original configuration files, checkout [experiment/a100-GS-DAC/base.yaml](/examples/tts/conf/voicebox/experiment/a100-GS-DAC/base.yaml) and [experiment/tricks/adam_warmup_cos_anneal.yaml](/examples/tts/conf/voicebox/experiment/tricks/adam_warmup_cos_anneal.yaml). Also check out [examples/tts/conf/voicebox/DAC-GigaSpeech.yaml](/examples/tts/conf/voicebox/DAC-GigaSpeech.yaml) for a flattened model config (which has a smaller model size for debugging purpose).

In [None]:
! WANDB_API_KEY=YOUR_API_KEY PYTHONPATH=. python3 examples/tts/voicebox.py \
    experiment=\[a100-GS-DAC/base,tricks/adam_warmup_cos_anneal\] \
    ++model.voicebox.use_unet_skip_connection=True \
    ++model.voicebox.pytorch_mha=True \
    ++exp_manager.create_wandb_logger=True \
    ++exp_manager.create_tensorboard_logger=False \
    ++exp_manager.resume_if_exists=True \
    ++exp_manager.resume_ignore_no_checkpoint=True \
    ++exp_manager.max_time_per_run="00:03:50:00" \
    ++exp_manager.wandb_logger_kwargs.project=${PROJECT_NAME} \
    ++exp_manager.wandb_logger_kwargs.name=${EXP_NAME} \
    ++exp_manager.checkpoint_callback_params.every_n_train_steps=${EVERY_N_TRAIN_STEPS} \
    ++exp_manager.checkpoint_callback_params.every_n_epochs=${EVERY_N_EPOCHS} \
    ++exp_manager.checkpoint_callback_params.always_save_nemo=${ALWAYS_SAVE_NEMO} \
    ++exp_manager.checkpoint_callback_params.save_nemo_on_train_end=${SAVE_NEMO_ON_TRAIN_END} \
    ++exp_manager.checkpoint_callback_params.filename="'vb-{val_loss/vb:.4f}-{epoch}-{step}'" \
    trainer.devices=-1 \
    trainer.num_nodes=$SLURM_JOB_NUM_NODES

## Mel-Spectrogram + LibriHeavy
The following is part of the slurm script for training, to trace the original configuration files, checkout [experiment/a100-GS-DAC/mel-LH.yaml](/examples/tts/conf/voicebox/experiment/a100-GS-DAC/mel-LH.yaml) and [experiment/tricks/adam_warmup_cos_anneal.yaml](/examples/tts/conf/voicebox/experiment/tricks/adam_warmup_cos_anneal.yaml). Also check out [examples/tts/conf/voicebox/mel_spec-LibriHeavy.yaml](/examples/tts/conf/voicebox/mel_spec-LibriHeavy.yaml) for a flattened model config (which has a smaller model size for debugging purpose).

In [None]:
! WANDB_API_KEY=YOUR_API_KEY PYTHONPATH=. python3 examples/tts/voicebox.py \
    experiment=\[a100-GS-DAC/mel-LH,tricks/adam_warmup_cos_anneal\] \
    ++model.voicebox.use_unet_skip_connection=True \
    ++model.voicebox.pytorch_mha=True \
    ++model.validation_ds.max_duration=20 \
    ++exp_manager.create_wandb_logger=True \
    ++exp_manager.create_tensorboard_logger=False \
    ++exp_manager.resume_if_exists=True \
    ++exp_manager.resume_ignore_no_checkpoint=True \
    ++exp_manager.max_time_per_run="00:03:50:00" \
    ++exp_manager.wandb_logger_kwargs.project=${PROJECT_NAME} \
    ++exp_manager.wandb_logger_kwargs.name=${EXP_NAME} \
    ++exp_manager.checkpoint_callback_params.every_n_train_steps=${EVERY_N_TRAIN_STEPS} \
    ++exp_manager.checkpoint_callback_params.every_n_epochs=${EVERY_N_EPOCHS} \
    ++exp_manager.checkpoint_callback_params.always_save_nemo=${ALWAYS_SAVE_NEMO} \
    ++exp_manager.checkpoint_callback_params.save_nemo_on_train_end=${SAVE_NEMO_ON_TRAIN_END} \
    ++exp_manager.checkpoint_callback_params.filename="'vb-{val_loss/vb:.4f}-{epoch}-{step}'" \
    trainer.devices=-1 \
    trainer.num_nodes=$SLURM_JOB_NUM_NODES

## Duration Predictor Training
Simply add the following to your voicebox training command:
```python
    ++model.validation_ds.max_duration=20 \
    ~model.freeze_updates.modules.duration_predictor=-1 \
    ++model.freeze_updates.modules.voicebox=-1 \
```
Note that `model.duration_predictor.audio_enc_dec` should be exactly the same as `model.voicebox.audio_enc_dec`, to ensure the phoneme length matches what the Voicebox audio model requires.

In [None]:
! WANDB_API_KEY=YOUR_API_KEY PYTHONPATH=. python3 examples/tts/voicebox.py \
    experiment=\[a100-GS-DAC/base,tricks/adam_warmup_cos_anneal\] \
    ++model.voicebox.use_unet_skip_connection=True \
    ++model.voicebox.pytorch_mha=True \
    ++model.validation_ds.max_duration=20 \
    ~model.freeze_updates.modules.duration_predictor=-1 \
    ++model.freeze_updates.modules.voicebox=-1 \
    ++exp_manager.checkpoint_callback_params.monitor=val_loss/dp_no_sil_spn \
    ++exp_manager.create_wandb_logger=True \
    ++exp_manager.create_tensorboard_logger=False \
    ++exp_manager.resume_if_exists=True \
    ++exp_manager.resume_ignore_no_checkpoint=True \
    ++exp_manager.max_time_per_run="00:03:50:00" \
    ++exp_manager.wandb_logger_kwargs.project=${PROJECT_NAME} \
    ++exp_manager.wandb_logger_kwargs.name=${EXP_NAME} \
    ++exp_manager.checkpoint_callback_params.every_n_train_steps=${EVERY_N_TRAIN_STEPS} \
    ++exp_manager.checkpoint_callback_params.every_n_epochs=${EVERY_N_EPOCHS} \
    ++exp_manager.checkpoint_callback_params.always_save_nemo=${ALWAYS_SAVE_NEMO} \
    ++exp_manager.checkpoint_callback_params.save_nemo_on_train_end=${SAVE_NEMO_ON_TRAIN_END} \
    ++exp_manager.checkpoint_callback_params.filename="'vb-{val_loss/vb:.4f}-{epoch}-{step}'" \
    trainer.devices=-1 \
    trainer.num_nodes=$SLURM_JOB_NUM_NODES