## Getting Started with Mayavoz

#### Contents:
- [How to do inference using pretrained model](#inference)
- [How to train your custom model](#basictrain)

### Install Mayavoz

In [None]:
! pip install -q mayavoz 

<div id="inference"></div>

###  Pretrained Model

To start using pretrained model,select any of the available recipes from [here](). 
For this exercice I am selecting [mayavoz/waveunet]()

- Mayavoz supports multiple input and output format. Input for inference can be in any of the below format
    - audio file path
    - numpy audio data
    - torch tensor audio data
    
It auto-detects the input format and does inference for you.
    
At the moment mayavoz only accepts single audio input

**Load model**

In [3]:

from mayavoz.models import Mayamodel
model = Mayamodel.from_pretrained("shahules786/mayavoz-dccrn-valentini-28spk")


**Inference using file path**

In [7]:
audio = model.enhance("my_voice.wav")
audio.shape

torch.Size([1, 1, 36414])

**Inference using numpy ndarray**


In [8]:
import torch
from librosa import load
my_voice,sr = load("my_voice.wav",sr=16000)
my_voice.shape

(36414,)

In [9]:
audio = model.enhance(my_voice,sampling_rate=sr)
audio.shape

(1, 1, 36414)

**Inference using torch tensor**


In [10]:
my_voice = torch.from_numpy(my_voice)
audio = model.enhance(my_voice,sampling_rate=sr)
audio.shape

torch.Size([1, 1, 36414])

- if you want to save the output, just pass `save_output=True`

In [11]:
audio = model.enhance("my_voice.wav",save_output=True)

In [12]:
from IPython.display import Audio
SAMPLING_RATE = 16000
Audio("cleaned_my_voice.wav",rate=SAMPLING_RATE)

<div id="basictrain"></div>


## Training your own custom Model

There are two ways of doing this

* [Using mayavoz framework ](#code)
* [Using mayavoz command line tool ](#cli)




<div id="code"></div>

**Using Mayavoz framwork** [Basic]
- Prepapare dataloader
- import preferred model
- Train

Files is dataclass that helps your to organise your train/test file paths

In [8]:
from mayavoz.utils import Files

name = "valentini"
root_dir = "/Users/shahules/Myprojects/enhancer/datasets/vctk"
files = Files(train_clean="clean_testset_wav",
         train_noisy="clean_testset_wav",
        test_clean="noisy_testset_wav",
         test_noisy="noisy_testset_wav")
duration = 4.0 
stride = None
sampling_rate = 16000

Now there are two types of `matching_function`
- `one_to_one` : In this one clean file will only have one corresponding noisy file. For example Valentini datasets
- `one_to_many` : In this one clean file will only have one corresponding noisy file. For example MS-SNSD dataset.

In [9]:
mapping_function = "one_to_one"


In [10]:
from mayavoz.data import MayaDataset
dataset = MayaDataset(
            name=name,
            root_dir=root_dir,
            files=files,
            duration=duration,
            stride=stride,
            sampling_rate=sampling_rate,
            min_valid_minutes = 5.0,
        )


In [11]:
from mayavoz.models import Demucs
model = Demucs(dataset=dataset, loss="mae")


In [12]:
import pytorch_lightning as pl

In [13]:
trainer = pl.Trainer(max_epochs=1)
trainer.fit(model)

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


Selected fp257 for valid



  | Name    | Type        | Params
----------------------------------------
0 | _loss   | LossWrapper | 0     
1 | encoder | ModuleList  | 4.7 M 
2 | decoder | ModuleList  | 4.7 M 
3 | de_lstm | DemucsLSTM  | 24.8 M
----------------------------------------
34.2 M    Trainable params
0         Non-trainable params
34.2 M    Total params
136.866   Total estimated model params size (MB)


Total train duration 27.4 minutes
Total validation duration 29.733333333333334 minutes
Total test duration 57.2 minutes
Epoch 0:  48%|▍| 13/27 [15:18<16:29, 70.66s/it, loss=0.0265, v_num=2, train_loss
Validation: 0it [00:00, ?it/s][A
Validation:   0%|                                        | 0/14 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|                           | 0/14 [00:00<?, ?it/s][A
Epoch 0:  52%|▌| 14/27 [19:55<18:29, 85.37s/it, loss=0.0265, v_num=2, train_loss[A
Epoch 0:  56%|▌| 15/27 [42:05<33:40, 168.34s/it, loss=0.0265, v_num=2, train_los[A
Epoch 0:  59%|▌| 16/27 [1:14:42<51:21, 280.16s/it, loss=0.0265, v_num=2, train_l[A
Epoch 0:  63%|▋| 17/27 [1:28:59<52:20, 314.08s/it, loss=0.0265, v_num=2, train_l[A
Epoch 0:  67%|▋| 18/27 [1:29:11<44:35, 297.31s/it, loss=0.0265, v_num=2, train_l[A
Epoch 0:  70%|▋| 19/27 [1:44:31<44:00, 330.08s/it, loss=0.0265, v_num=2, train_l[A
Epoch 0:  74%|▋| 20/27 [1:53:03<39:34, 339.17s/it, loss=0.0265, v_num=2, train_l[A
Epoch 0: 

`Trainer.fit` stopped: `max_epochs=1` reached.


Epoch 0: 100%|█| 27/27 [2:31:21<00:00, 336.37s/it, loss=0.0265, v_num=2, train_l


**mayavoz model and dataset are highly customazibale**, see [here]() for advanced usage

<div id="cli"></div>


## Mayavoz CLI

In [None]:
! pip install mayavoz[cli]

### TL;DR
Calling the following command would train mayavoz Demucs model on MS-SNSD dataset.

```bash
mayavoz-train \
    model=Demucs \
    Demucs.sampling_rate=16000 \
    dataset=VCTK dataset.root_dir = "your_root_directory" \
    trainer=fastrun_dev

```

This is more or less equaivalent to below code

In [None]:
from mayavoz.data import MayaDataset
from mayavoz.models import Demucs

dataset = MayaDataset(
                    name='vctk'
                    root_dir="your_root_directory",
                    )
model = Demucs(dataset=dataset, sampling_rate=16000)
trainer = Trainer()
trainer.fit(model)

For example, if you want to add/change `stride` of dataset

```bash
mayavoz-train \
    model=Demucs \
    Demucs.sampling_rate=16000 \
    dataset=VCTK dataset.root_dir = "your_root_directory" dataset.stride=1\

```

#### Hydra-based configuration
mayavoz-train relies on Hydra to configure the training process. Adding --cfg job option to the previous command will let you know about the actual configuration used for training:

```bash
mayavoz-train --cfg job \
    model=Demucs \
    Demucs.sampling_rate=16000 \
    dataset=MS-SNSD

```

```yaml
_target_: mayavoz.models.demucs.Demucs
num_channels: 1
resample: 4
sampling_rate : 16000

encoder_decoder:
  depth: 4
  initial_output_channels: 64
  
[...]
```

To change the sampling_rate, you can 

```bash
mayavoz-train \
    model=Demucs model.sampling_rate=16000 \
    dataset=MS-SNSD

```