### TTS Finetune Workflow using TAO

Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.

![image](https://developer.nvidia.com/sites/default/files/akamai/TAO/tlt-tao-toolkit-bring-your-own-model-diagram.png)


### The workflow in a nutshell

- Get sample datasets (or bring your own)
- Creating source and target datasets
- Upload speech dataset to the service
- Creating a spectrogram generator model experiment
- Getting a PTM from NGC
- Actions
    - Dataset convert
    - Pitch stats to compute fmin, fmax, pitch_avg, pitch_std
    - Finetune model
    - Infer to produce data for HiFiGan
- Download inferred data and process it to be compatible with hifigan
- Create a vocoder model experiment
- Upload dataset to service
- Get a PTM for vocoder
- Finetune vocoder
- Inference on sample sentences using both fast_pitch and hifigan
   
**Other TAO Actions**

- Export  
- Train (from scratch)

**Note**

- We assume first dataset in train_datasets is source and second is target. This is not enforced in the API

### Table of contents

1. [Create source and target datasets for fast_pitch](#head-1)
1. [List the created datasets](#head-2)
1. [Create model ](#head-3)
1. [List models](#head-4)
1. [Assign train, eval datasets](#head-5)
1. [Assign PTM](#head-6)
1. [Actions](#head-7)
1. [Dataset convert](#head-8)
1. [Pitch Stats](#head-9)
1. [Finetune](#head-10)
1. [Infer](#head-11)
1. [Convert inference output to a mel_spectrogram dataset](#head-12)
1. [Vocoder](#head-13)
1. [Create dataset and upload mel data](#head-14)
1. [Create model, add train and eval datasets, select and add ptm](#head-15)
1. [Vocoder finetune](#head-16)
1. [Inference from raw sentences](#head-17)
1. [Create a raw dataset to perform vocoder inference on](#head-18)
1. [Vocoder inference on raw data](#head-19)
1. [Vocoder inference on raw data from spectro_gen](#head-20)
1. [Delete dataset sample](#head-21)
1. [Delete model sample](#head-22)

### Requirements
Please find the server requirements [here](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_setup.html#)

In [None]:
import json
import os
import requests
import uuid
import time
from IPython.display import clear_output

### FIXME

1. Assign a workdir in FIXME 1
2. Assign the ip_address and port_number in FIXME 2 ([info](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_rest_api.html))
3. Assign the ngc_api_key variable in FIXME 3

In [None]:
# Define workspaces and other variables
workdir = "workdir_tts" # FIXME1
host_url = "http://<ip_address>:<port_number>" # FIXME2 example: https://10.137.149.22:32334
# In host machine, node ip_address and port number can be obtained as follows,
# ip_address: hostname -i
# port_number: kubectl get service ingress-nginx-controller -o jsonpath='{.spec.ports[0].nodePort}'
ngc_api_key = "<ngc_api_key>" # FIXME3 example: zZYtczM5amdtdDcwNjk0cnA2bGU2bXQ3bnQ6NmQ4NjNhMDItMTdmZS00Y2QxLWI2ZjktNmE5M2YxZTc0OGyM

In [None]:
# Exchange NGC_API_KEY for JWT
response = requests.get(f"{host_url}/api/v1/login/{ngc_api_key}")
user_id = response.json()["user_id"]
print("User ID",user_id)
token = response.json()["token"]
print("JWT",token)

# Set base URL
base_url = f"{host_url}/api/v1/user/{user_id}"
print("API Calls will be forwarded to",base_url)

headers = {"Authorization": f"Bearer {token}"}

In [None]:
# Creating workdir
if not os.path.isdir(workdir):
    os.makedirs(workdir)

### Create source and target datasets for fast_pitch <a class="anchor" id="head-1"></a>

For the rest of this notebook, it is assumed that you have:

 - Pretrained FastPitch and HiFiGAN models that were trained on `LJSpeech` sampled at 22kHz
 
In the case that you are not using a TTS model trained on LJSpeech at the correct sampling rate. Please ensure that you have the original data, including wav files and a .json manifest file. If you have a TTS model but not at 22kHz, please ensure that you set the correct sampling rate, and fft parameters.

For the rest of the notebook, we will be using a toy dataset consisting of 5 mins of audio. This dataset is for demo purposes only. For a good quality model, we recommend at least 30 minutes of audio. We recommend using the [NVIDIA Custom Voice Recorder](https://developer.nvidia.com/riva-voice-recorder-early-access) tool, to generate a good dataset for finetuning.

Let's first download the original LJSpeech dataset. We download the toy dataset after. Then, using the API, we create these datasets and upload them to the service in the required format. Note that for the ljspeech source data, we need to run the convert action in order to create manifest files from the metadata.csv

The first step downloads audio to text file lists from NVIDIA for LJSpeech and generates the manifest files. If you use your own dataset, you have to generate three files: `ljs_audio_text_train_filelist.txt`, `ljs_audio_text_val_filelist.txt`, `ljs_audio_text_test_filelist.txt` yourself and place it inside the ljspeech directory created below. Those files correspond to your train / val / test split. For each text file, the number of rows should be equal to number of samples in this split and each row should be like:

```
DUMMY/<file_name>.wav|<text_of_the_audio>
```

An example row is:

```
DUMMY/LJ045-0096.wav|Mrs. De Mohrenschildt thought that Oswald,
```

In [None]:
# Download source ljspeech dataset
! wget -O ljspeech.tar.bz2 https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2

In [None]:
# Extracting and moving the data to the correct directories.
! tar -xvf ljspeech.tar.bz2
! rm -rf ljspeech
! mv LJSpeech-1.1 ljspeech

In [None]:
# Create "ljspeech" format dataset
! rm ljspeech.tar.bz2
! tar -czvf ljspeech.tar.gz ljspeech

In [None]:
# Download target dataset
!wget https://nemo-public.s3.us-east-2.amazonaws.com/6097_5_mins.tar.gz  # Contains 10MB of data
!tar -xzf 6097_5_mins.tar.gz
!sed -i "s@\"audio_filepath\": \"audio/@\"audio_filepath\": \"audio/@g" 6097_5_mins/manifest.json 

In [None]:
# Downloading auxillary files to train.
!wget https://github.com/NVIDIA/NeMo/raw/v1.9.0/scripts/tts_dataset_files/cmudict-0.7b_nv22.01
!wget https://github.com/NVIDIA/NeMo/raw/v1.9.0/scripts/tts_dataset_files/heteronyms-030921
!wget https://github.com/NVIDIA/NeMo/raw/v1.9.0//nemo_text_processing/text_normalization/en/data/whitelist/lj_speech.tsv
!tar -czvf ljspeech_auxillary.tar.gz cmudict-0.7b_nv22.01 heteronyms-030921 lj_speech.tsv

In [None]:
source_data_path = "ljspeech.tar.gz" # FIX if using own source dataset
target_data_path = "6097_5_mins.tar.gz" # FIX if using own target dataset
auxillary_data_path = "ljspeech_auxillary.tar.gz" # FIX if using own auxillary dataset

In [None]:
# Create
ds_type = "speech"
ds_format = "ljspeech"
data = json.dumps({"type":ds_type,"format":ds_format})

endpoint = f"{base_url}/dataset"

response = requests.post(endpoint,data=data,headers=headers)

print(response)
print(response.json())
source_dataset_id = response.json()["id"]

In [None]:
# Upload
files = [("file",open(source_data_path,"rb"))]

endpoint = f"{base_url}/dataset/{source_dataset_id}/upload"

response = requests.post(endpoint, files=files, headers=headers)

print(response)
print(response.json())

In [None]:
# Create
ds_type = "speech"
ds_format = "custom"
data = json.dumps({"type":ds_type,"format":ds_format})

endpoint = f"{base_url}/dataset"

response = requests.post(endpoint,data=data,headers=headers)

print(response)
print(response.json())
target_dataset_id = response.json()["id"]

In [None]:
# Upload
files = [("file",open(target_data_path,"rb"))]

endpoint = f"{base_url}/dataset/{target_dataset_id}/upload"

response = requests.post(endpoint, files=files, headers=headers)

print(response)
print(response.json())

In [None]:
# Create
ds_type = "speech"
ds_format = "auxillary"
data = json.dumps({"type":ds_type,"format":ds_format})

endpoint = f"{base_url}/dataset"

response = requests.post(endpoint,data=data,headers=headers)

print(response)
print(response.json())
auxillary_dataset_id = response.json()["id"]

In [None]:
# Upload
files = [("file",open(auxillary_data_path,"rb"))]

endpoint = f"{base_url}/dataset/{auxillary_dataset_id}/upload"

response = requests.post(endpoint, files=files, headers=headers)

print(response)
print(response.json())

### List the created datasets <a class="anchor" id="head-2"></a>

In [None]:
endpoint = f"{base_url}/dataset"

response = requests.get(endpoint, headers=headers)

print(response)
# print(response.json()) ## Uncomment for verbose list output
for rsp in response.json():
    print(rsp["id"],rsp["type"],rsp["format"])

### Create model  <a class="anchor" id="head-3"></a>

In [None]:
network_arch = "spectro_gen"
encode_key = "tlt_encode"
data = json.dumps({"network_arch":network_arch,"encryption_key":encode_key})

endpoint = f"{base_url}/model"

response = requests.post(endpoint,data=data,headers=headers)

print(response)
print(response.json())
fast_pitch_id = response.json()["id"]

### List models <a class="anchor" id="head-4"></a>

In [None]:
endpoint = f"{base_url}/model"

response = requests.get(endpoint, headers=headers)

print(response)
# print(response.json()) ## Uncomment for verbose list output
for rsp in response.json():
    print(rsp["id"],rsp["network_arch"])

### Assign train, eval datasets <a class="anchor" id="head-5"></a>

- Note: make sure the order for train_datasets is [source ID, target ID]
- eval_dataset is kept same as target for demo purposes
- inference_dataset is kept as target for chaining with hifigan finetune

In [None]:
dataset_information = {"train_datasets":[source_dataset_id,target_dataset_id],
                       "eval_dataset":target_dataset_id,
                       "inference_dataset":target_dataset_id}
data = json.dumps(dataset_information)

endpoint = f"{base_url}/model/{fast_pitch_id}"

response = requests.patch(endpoint, data=data, headers=headers)

print(response)
print(response.json())

### Assign PTM <a class="anchor" id="head-6"></a>

- Search for fastpitch on NGC
- Assign it to the model

In [None]:
# Get pretrained model for fastpitch
model_list = f"{base_url}/model"
response = requests.get(model_list, headers=headers)

response_json = response.json()

# Search for ptm with given ngc path
ptm_id = None
for rsp in response_json:
    if "fastpitch:1.8.1" in rsp["ngc_path"]:
        ptm_id = rsp["id"]
        print("Metadata for model with requested NGC Path")
        print(rsp)
        break
fast_pitch_ptm = ptm_id

In [None]:
ptm_information = {"ptm":fast_pitch_ptm}
data = json.dumps(ptm_information)

endpoint = f"{base_url}/model/{fast_pitch_id}"

response = requests.patch(endpoint, data=data, headers=headers)

print(response)
print(response.json())

### Actions <a class="anchor" id="head-7"></a>

For all actions:
1. Get default spec schema and derive the default values
2. Modify defaults if needed
3. Post spec dictionary to the service
4. Run model action
5. Monitor job using retrieve
6. Download results using job download endpoint (if needed). Please download after job status goes to "Done" state. Else, you will get a HTTP 404 code returned

In [None]:
job_map = {}

### Dataset convert <a class="anchor" id="head-8"></a>

- First we generate manifest for the "source" ljspeech dataset by running the convert action on the ljspeech format dataset
- Then we merge the ljspeech with target dataset by running the model dataset_convert action

In [None]:
# Get default spec schema
endpoint = f"{base_url}/dataset/{source_dataset_id}/specs/convert/schema"

response = requests.get(endpoint, headers=headers)

print(response)
#print(response.json()) ## Uncomment for verbose schema
specs = response.json()["default"]
print(specs)

In [None]:
## No changes to spec

In [None]:
# Post spec
data = json.dumps(specs)

endpoint = f"{base_url}/dataset/{source_dataset_id}/specs/convert"

response = requests.post(endpoint,data=data,headers=headers)

print(response)
print(response.json())

In [None]:
# Run action
parent = None
actions = ["convert"]
data = json.dumps({"job":parent,"actions":actions})

endpoint = f"{base_url}/dataset/{source_dataset_id}/job"

response = requests.post(endpoint, data=data, headers=headers)

print(response)
print(response.json())

lj_job_id = response.json()[0]

In [None]:
# Monitor job status by repeatedly running this cell, please be patient as this may take a while to go to "Done" status
job_id = lj_job_id
endpoint = f"{base_url}/dataset/{source_dataset_id}/job/{job_id}"

while True:
    clear_output(wait=True)
    response = requests.get(endpoint, headers=headers)
    print(response)
    print(response.json())
    if response.json().get("status") in ["Done","Error"] or response.status_code not in (200,201):
        break
    time.sleep(15)

### Merging manifest files

In [None]:
# Get default spec schema
endpoint = f"{base_url}/model/{fast_pitch_id}/specs/dataset_convert/schema"

response = requests.get(endpoint, headers=headers)

print(response)
#print(response.json()) ## Uncomment for verbose schema
specs = response.json()["default"]
print(specs)

In [None]:
# Apply changes
# NONE FOR DATASET_CONVERT

In [None]:
# Post spec
data = json.dumps(specs)

endpoint = f"{base_url}/model/{fast_pitch_id}/specs/dataset_convert"

response = requests.post(endpoint,data=data,headers=headers)

print(response)
print(response.json())

In [None]:
# Run action
parent = None
actions = ["dataset_convert"]
data = json.dumps({"job":parent,"actions":actions})

endpoint = f"{base_url}/model/{fast_pitch_id}/job"

response = requests.post(endpoint, data=data, headers=headers)

print(response)
print(response.json())

job_map["dataset_convert"] = response.json()[0]
print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell
job_id = job_map['dataset_convert']
endpoint = f"{base_url}/model/{fast_pitch_id}/job/{job_id}"

while True:
    clear_output(wait=True)
    response = requests.get(endpoint, headers=headers)
    print(response)
    print(response.json())
    if response.json().get("status") in ["Done","Error"] or response.status_code not in (200,201):
        break
    time.sleep(15)

### Pitch Stats <a class="anchor" id="head-9"></a>
- Run this on the target dataset to check visually if the pitch frequencies are good

In [None]:
# pitch_stats
# Get default spec schema
endpoint = f"{base_url}/dataset/{target_dataset_id}/specs/pitch_stats/schema"

response = requests.get(endpoint, headers=headers)

print(response)
#print(response.json()) ## Uncomment for verbose schema
specs = response.json()["default"]
print(specs)

In [None]:
# Apply changes
specs["pitch_fmin"] = 65
specs["pitch_fmax"] = 2094

In [None]:
# Post spec
data = json.dumps(specs)

endpoint = f"{base_url}/dataset/{target_dataset_id}/specs/pitch_stats"

response = requests.post(endpoint,data=data,headers=headers)

print(response)
print(response.json())

In [None]:
# Run action
parent = None
actions = ["pitch_stats"]
data = json.dumps({"job":parent,"actions":actions})

endpoint = f"{base_url}/dataset/{target_dataset_id}/job"

response = requests.post(endpoint, data=data, headers=headers)

print(response)
print(response.json())

job_map["pitch_stats"] = response.json()[0]
print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell
job_id = job_map['pitch_stats']
endpoint = f"{base_url}/dataset/{target_dataset_id}/job/{job_id}"

while True:
    clear_output(wait=True)
    response = requests.get(endpoint, headers=headers)
    print(response)
    print(response.json())
    if response.json().get("status") in ["Done","Error"] or response.status_code not in (200,201):
        break
    time.sleep(15)

In [None]:
# Download pitch stats output
job_id = job_map["pitch_stats"]
endpoint = f'{base_url}/dataset/{target_dataset_id}/job/{job_id}/download'

response = requests.get(endpoint, headers=headers)

print(response)
# Save
temptar = f'{job_id}.tar.gz'
with open(temptar, 'wb') as f:
    f.write(response.content)
print("Untarring")
# Untar to destination
tar_command = f'tar -xvf {temptar} -C {workdir}/'
os.system(tar_command)
os.remove(temptar)
print(f"Results at {workdir}/{job_id}")
saved_dir = f"{workdir}/{job_id}"


In [None]:
# Visualize pitch stats output
!pip3 install matplotlib==3.3.3
import matplotlib.pyplot as plt
%matplotlib inline
import os
from math import ceil
from IPython.display import Image

valid_image_ext = ['.jpg', '.png', '.jpeg', '.ppm']

def visualize_images(image_dir, num_cols=2, num_images=10):
    """Visualize images in the notebook.
    
    Args:
        image_dir (str): Path to the directory containing images.
        num_cols (int): Number of columns.
        num_images (int): Number of images.

    """
    output_path = os.path.join(image_dir)
    num_rows = int(ceil(float(num_images) / float(num_cols)))
    f, axarr = plt.subplots(num_rows, num_cols, figsize=[240,90])
    f.tight_layout()
    a = [os.path.join(output_path, image) for image in os.listdir(output_path) 
         if os.path.splitext(image)[1].lower() in valid_image_ext]
    for idx, img_path in enumerate(a[:num_images]):
        col_id = idx % num_cols
        row_id = idx // num_cols
        img = plt.imread(img_path)
        axarr[row_id, col_id].imshow(img)
visualize_images(saved_dir, num_cols=5, num_images=10)

### Finetune <a class="anchor" id="head-10"></a>

- Please modify ```specs["trainer"]["max_epochs"]``` to modify number of epochs you want to run training for. Default is 100

In [None]:
# Get default spec schema
endpoint = f"{base_url}/model/{fast_pitch_id}/specs/finetune/schema"

response = requests.get(endpoint, headers=headers)

print(response)
#print(response.json()) ## Uncomment for verbose schema
specs = response.json()["default"]
print(specs)

In [None]:
# Apply changes from pitch_stats job
specs["n_speakers"] = 2
specs["pitch_fmin"] = 65
specs["pitch_fmax"] = 2094
specs["pitch_avg"] = 117.27540199742586
specs["pitch_std"] = 22.1851002822779
specs["trainer"] = {"max_epochs":11}

In [None]:
# Post spec
data = json.dumps(specs)

endpoint = f"{base_url}/model/{fast_pitch_id}/specs/finetune"

response = requests.post(endpoint,data=data,headers=headers)

print(response)
print(response.json())

In [None]:
# Run action
parent = None
actions = ["finetune"]
data = json.dumps({"job":parent,"actions":actions})

endpoint = f"{base_url}/model/{fast_pitch_id}/job"

response = requests.post(endpoint, data=data, headers=headers)

print(response)
print(response.json())

job_map["spectro_gen_finetune"] = response.json()[0]
print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell
job_id = job_map['spectro_gen_finetune']
endpoint = f"{base_url}/model/{fast_pitch_id}/job/{job_id}"

while True:
    clear_output(wait=True)
    response = requests.get(endpoint, headers=headers)
    print(response)
    print(response.json())
    if response.json().get("status") in ["Done","Error"] or response.status_code not in (200,201):
        break
    time.sleep(15)

### Infer <a class="anchor" id="head-11"></a>

- Infer runs inference on previously set inference_dataset

In [None]:
# Get default spec schema
endpoint = f"{base_url}/model/{fast_pitch_id}/specs/infer/schema"

response = requests.get(endpoint, headers=headers)

print(response)
#print(response.json()) ## Uncomment for verbose schema
specs = response.json()["default"]
print(response.json()["required"])
print(specs)

In [None]:
#Enter your inference text
sentences = ["by the end of no such thing the audience , like beatrice , has a watchful affection for the monster .",
             "director rob marshall went out gunning to make a great one .",
             "uneasy mishmash of styles and genres ."   
            ]

In [None]:
# Apply changes
specs["input_batch"] = sentences
specs["mode"] = "infer_hifigan_ft"
specs["speaker"] = 1

In [None]:
# Post spec
data = json.dumps(specs)

endpoint = f"{base_url}/model/{fast_pitch_id}/specs/infer"

response = requests.post(endpoint,data=data,headers=headers)

print(response)
print(response.json())

In [None]:
# Run action
parent = job_map["spectro_gen_finetune"]
actions = ["infer"]
data = json.dumps({"job":parent,"actions":actions})

endpoint = f"{base_url}/model/{fast_pitch_id}/job"

response = requests.post(endpoint, data=data, headers=headers)

print(response)
print(response.json())

job_map["spectro_gen_infer"] = response.json()[0]
print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell
job_id = job_map['spectro_gen_infer']
endpoint = f"{base_url}/model/{fast_pitch_id}/job/{job_id}"

while True:
    clear_output(wait=True)
    response = requests.get(endpoint, headers=headers)
    print(response)
    print(response.json())
    if response.json().get("status") in ["Done","Error"] or response.status_code not in (200,201):
        break
    time.sleep(15)

In [None]:
# Download infer output for chaining with vocoder
# Download job contents once the above job shows "Done" status
job_id = job_map["spectro_gen_infer"]
endpoint = f'{base_url}/model/{fast_pitch_id}/job/{job_id}/download'

# Save
temptar = f'{job_id}.tar.gz'
with requests.get(endpoint, headers=headers, stream=True) as r:
    r.raise_for_status()
    with open(temptar, 'wb') as f:
        for chunk in r.iter_content(chunk_size=8192):
            f.write(chunk)

print("Untarring")
# Untar to destination
tar_command = f'tar -xvf {temptar} -C {workdir}/'
os.system(tar_command)
os.remove(temptar)
print(f"Results at {workdir}/{job_id}")
infer_out_path = f"{workdir}/{job_id}"

os.remove(infer_out_path+"/status.json")
os.remove(infer_out_path+"/infer.log")
os.remove(infer_out_path+"/logs_from_toolkit.txt")

### Convert inference output to a mel_spectrogram dataset <a class="anchor" id="head-12"></a>

In [None]:
# Copy target data to workdir
!cp $target_data_path $workdir

# Untar the target data inside workdir
target_tar_name = target_data_path.split("/")[-1]
tar_command = f'tar -xvf {workdir}/{target_tar_name} -C {workdir}/'
os.system(tar_command)
os.remove(f"{workdir}/{target_tar_name}")

# get the name of this untarred target dataset folder into $original_target_folder
target_name = target_tar_name.rstrip(".tar.gz")
original_target_folder = f"{workdir}/{target_name}".replace("//","/")

In [None]:
# move it to the target folder
!mv $infer_out_path $original_target_folder/mel_spectrogram

In [None]:
manifest = os.path.join(original_target_folder,"manifest.json")
# append paths to manifest audio_filepath
with open(manifest,"r") as f:
    lines = f.readlines()
print("Number of lines in target data: ",len(lines))

os.remove(manifest)

with open(manifest,"w") as f:
    cnt = 0
    for line in lines:
        line_dict = json.loads(line.strip("\n"))
        line_dict["mel_filepath"] = f"mel_spectrogram/{cnt}.npy"
        f.write(json.dumps(line_dict)+"\n")
        cnt += 1

In [None]:
# Tar the updated mel-spectrogram appended dataset
tarfilename = original_target_folder.split("/")[-1]+".tar.gz"
print(tarfilename)

output_save = f'{workdir}/{tarfilename}'.replace("//","/")
tar_command = f'tar -czvf {output_save} {original_target_folder}'
os.system(tar_command)

In [None]:
mel_dataset_path = f"{workdir}/{tarfilename}".replace("//","/")
mel_dataset_path

### Vocoder <a class="anchor" id="head-13"></a>

### Create dataset and upload mel data <a class="anchor" id="head-14"></a>

In [None]:
# Create
ds_type = "mel_spectrogram"
ds_format = "hifigan"
data = json.dumps({"type":ds_type,"format":ds_format})

endpoint = f"{base_url}/dataset"

response = requests.post(endpoint,data=data,headers=headers)

print(response)
print(response.json())
mel_dataset_id = response.json()["id"]

In [None]:
# Upload
files = [("file",open(mel_dataset_path,"rb"))]

endpoint = f"{base_url}/dataset/{mel_dataset_id}/upload"

response = requests.post(endpoint, files=files, headers=headers)

print(response)
print(response.json())

In [None]:
endpoint = f"{base_url}/dataset"

response = requests.get(endpoint, headers=headers)

print(response)
# print(response.json()) ## Uncomment for verbose list output
for rsp in response.json():
    print(rsp["id"],rsp["type"],rsp["format"])

### Create model, add train and eval datasets, select and add ptm <a class="anchor" id="head-15"></a>

In [None]:
network_arch = "vocoder"
encode_key = "tlt_encode"
data = json.dumps({"network_arch":network_arch,"encryption_key":encode_key})

endpoint = f"{base_url}/model"

response = requests.post(endpoint,data=data,headers=headers)

print(response)
print(response.json())
vocoder_model_id = response.json()["id"]

In [None]:
dataset_information = {"train_datasets":[mel_dataset_id],
                       "eval_dataset":mel_dataset_id}
data = json.dumps(dataset_information)

endpoint = f"{base_url}/model/{vocoder_model_id}"

response = requests.patch(endpoint, data=data, headers=headers)

print(response)
print(response.json())

In [None]:
# Get pretrained model for hifigan
model_list = f"{base_url}/model"
response = requests.get(model_list, headers=headers)

response_json = response.json()

# Search for ptm with given ngc path
ptm_id = None
for rsp in response_json:
    if "hifigan:1.0.0rc1" in rsp["ngc_path"]:
        ptm_id = rsp["id"]
        print("Metadata for model with requested NGC Path")
        print(rsp)
        break
vocoder_ptm = ptm_id

In [None]:
ptm_information = {"ptm":vocoder_ptm}
data = json.dumps(ptm_information)

endpoint = f"{base_url}/model/{vocoder_model_id}"

response = requests.patch(endpoint, data=data, headers=headers)

print(response)
print(response.json())

### Vocoder finetune <a class="anchor" id="head-16"></a>

- Please modify ```specs["trainer"]["max_steps"]``` to modify number of steps you want to run training for. Default is 1000

In [None]:
# Get default spec schema
endpoint = f"{base_url}/model/{vocoder_model_id}/specs/finetune/schema"

response = requests.get(endpoint, headers=headers)

print(response)
#print(response.json()) ## Uncomment for verbose schema
specs = response.json()["default"]
print(specs)

In [None]:
# Apply changes
specs["trainer"] = {"max_steps":100}
specs["training_ds"]["dataloader_params"]["batch_size"] = 8

In [None]:
# Post spec
data = json.dumps(specs)

endpoint = f"{base_url}/model/{vocoder_model_id}/specs/finetune"

response = requests.post(endpoint,data=data,headers=headers)

print(response)
print(response.json())

In [None]:
# Run action
parent = None
actions = ["finetune"]
data = json.dumps({"job":parent,"actions":actions})

endpoint = f"{base_url}/model/{vocoder_model_id}/job"

response = requests.post(endpoint, data=data, headers=headers)

print(response)
print(response.json())

job_map["vocoder_finetune"] = response.json()[0]
print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell
job_id = job_map['vocoder_finetune']
endpoint = f"{base_url}/model/{vocoder_model_id}/job/{job_id}"

while True:
    clear_output(wait=True)
    response = requests.get(endpoint, headers=headers)
    print(response)
    print(response.json())
    if response.json().get("status") in ["Done","Error"] or response.status_code not in (200,201):
        break
    time.sleep(15)

### Inference from raw sentences <a class="anchor" id="head-17"></a>

- Take some sentences and run spectro_gen inference
- Then use the output of this to generate vocoder inference

In [None]:
sentences = ["by the end of no such thing the audience , like beatrice , has a watchful affection for the monster .",
             "director rob marshall went out gunning to make a great one .",
             "uneasy mishmash of styles and genres ."   
            ]

In [None]:
# Get default spec schema
endpoint = f"{base_url}/model/{fast_pitch_id}/specs/infer/schema"

response = requests.get(endpoint, headers=headers)

print(response)
#print(response.json()) ## Uncomment for verbose schema
specs = response.json()["default"]
print(specs)

In [None]:
# Apply changes
specs["mode"] = "infer"
specs["input_batch"] = sentences

In [None]:
# Post spec
data = json.dumps(specs)

endpoint = f"{base_url}/model/{fast_pitch_id}/specs/infer"

response = requests.post(endpoint,data=data,headers=headers)

print(response)
print(response.json())

In [None]:
# Run action
parent = job_map["spectro_gen_finetune"]
actions = ["infer"]
data = json.dumps({"job":parent,"actions":actions})

endpoint = f"{base_url}/model/{fast_pitch_id}/job"

response = requests.post(endpoint, data=data, headers=headers)

print(response)
print(response.json())

job_map["spectro_gen_infer_raw"] = response.json()[0]
print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell
job_id = job_map['spectro_gen_infer_raw']
endpoint = f"{base_url}/model/{fast_pitch_id}/job/{job_id}"

while True:
    clear_output(wait=True)
    response = requests.get(endpoint, headers=headers)
    print(response)
    print(response.json())
    if response.json().get("status") in ["Done","Error"] or response.status_code not in (200,201):
        break
    time.sleep(15)

In [None]:
# Download infer output for chaining with vocoder
# Download job contents once the above job shows "Done" status
job_id = job_map["spectro_gen_infer_raw"]
endpoint = f'{base_url}/model/{fast_pitch_id}/job/{job_id}/download'

# Save
temptar = f'{job_id}.tar.gz'
with requests.get(endpoint, headers=headers, stream=True) as r:
    r.raise_for_status()
    with open(temptar, 'wb') as f:
        for chunk in r.iter_content(chunk_size=8192):
            f.write(chunk)

print("Untarring")
# Untar to destination
tar_command = f'tar -xvf {temptar} -C {workdir}/'
os.system(tar_command)
os.remove(temptar)
print(f"Results at {workdir}/{job_id}")
raw_infer_out_path = f"{workdir}/{job_id}"

os.remove(raw_infer_out_path+"/status.json")
os.remove(raw_infer_out_path+"/infer.log")
os.remove(raw_infer_out_path+"/logs_from_toolkit.txt")

In [None]:
# Tar it so it can be uploaded
foldername = job_map["spectro_gen_infer_raw"]

tar_command = f"cd {workdir}; \
                mkdir raw; \
                mv {foldername} raw/mel_spectrogram ; \
                tar -czvf raw_melspectrograms.tar.gz raw; \
                cd -"
print(os.system(tar_command))
raw_tarfile = f'{workdir}/raw_melspectrograms.tar.gz'.replace("//","/")

### Create a raw dataset to perform vocoder inference on <a class="anchor" id="head-18"></a>

In [None]:
# Create
ds_type = "mel_spectrogram"
ds_format = "raw"
data = json.dumps({"type":ds_type,"format":ds_format})

endpoint = f"{base_url}/dataset"

response = requests.post(endpoint,data=data,headers=headers)

print(response)
print(response.json())
raw_dataset_id = response.json()["id"]

In [None]:
# Upload
files = [("file",open(raw_tarfile,"rb"))]

endpoint = f"{base_url}/dataset/{raw_dataset_id}/upload"

response = requests.post(endpoint, files=files, headers=headers)

print(response)
print(response.json())

### Vocoder inference on raw data <a class="anchor" id="head-19"></a>

In [None]:
# Add this inference dataset to vocoder model
dataset_information = {"inference_dataset":raw_dataset_id}
data = json.dumps(dataset_information)

endpoint = f"{base_url}/model/{vocoder_model_id}"

response = requests.patch(endpoint, data=data, headers=headers)

print(response)
print(response.json())

### Vocoder inference on raw data from spectro_gen <a class="anchor" id="head-20"></a>

In [None]:
# Get default spec schema
endpoint = f"{base_url}/model/{vocoder_model_id}/specs/infer/schema"

response = requests.get(endpoint, headers=headers)

print(response)
#print(response.json()) ## Uncomment for verbose schema
specs = response.json()["default"]
print(specs)

In [None]:
# Apply changes

In [None]:
# Post spec
data = json.dumps(specs)

endpoint = f"{base_url}/model/{vocoder_model_id}/specs/infer"

response = requests.post(endpoint,data=data,headers=headers)

print(response)
print(response.json())

In [None]:
# Run action
parent = job_map["vocoder_finetune"]
actions = ["infer"]
data = json.dumps({"job":parent,"actions":actions})

endpoint = f"{base_url}/model/{vocoder_model_id}/job"

response = requests.post(endpoint, data=data, headers=headers)

print(response)
print(response.json())

job_map["vocoder_infer_raw"] = response.json()[0]
print(job_map)

In [None]:
# Monitor job status by repeatedly running this cell
job_id = job_map['vocoder_infer_raw']
endpoint = f"{base_url}/model/{vocoder_model_id}/job/{job_id}"

while True:
    clear_output(wait=True)
    response = requests.get(endpoint, headers=headers)
    print(response)
    print(response.json())
    if response.json().get("status") in ["Done","Error"] or response.status_code not in (200,201):
        break
    time.sleep(15)

In [None]:
# Download infer output of vocoder
# Download job contents once the above job shows "Done" status
job_id = job_map["vocoder_infer_raw"]
endpoint = f'{base_url}/model/{vocoder_model_id}/job/{job_id}/download'

# Save
temptar = f'{job_id}.tar.gz'
with requests.get(endpoint, headers=headers, stream=True) as r:
    r.raise_for_status()
    with open(temptar, 'wb') as f:
        for chunk in r.iter_content(chunk_size=8192):
            f.write(chunk)

print("Untarring")
# Untar to destination
tar_command = f'tar -xvf {temptar} -C {workdir}/'
os.system(tar_command)
os.remove(temptar)
print(f"Results at {workdir}/{job_id}")
raw_infer_wav_path = f"{workdir}/{job_id}"

os.remove(raw_infer_wav_path+"/status.json")
os.remove(raw_infer_wav_path+"/infer.log")
os.remove(raw_infer_wav_path+"/logs_from_toolkit.txt")

In [None]:
!ls $raw_infer_wav_path

In [None]:
import os
import IPython.display as ipd
# change path of the file here
ipd.Audio(f'{raw_infer_wav_path}/0.wav')

### Delete dataset sample <a class="anchor" id="head-21"></a>

In [None]:
# Create
ds_type = "mel_spectrogram"
ds_format = "hifigan"
data = json.dumps({"type":ds_type,"format":ds_format})

endpoint = f"{base_url}/dataset"

response = requests.post(endpoint,data=data,headers=headers)

print(response)
print(response.json())
random_ds = response.json()["id"]

In [None]:
endpoint = f"{base_url}/dataset"

response = requests.get(endpoint, headers=headers)

print(response)
# print(response.json()) ## Uncomment for verbose list output
for rsp in response.json():
    print(rsp["id"],rsp["type"],rsp["format"])

In [None]:
endpoint = f"{base_url}/dataset/{random_ds}"
response = requests.delete(endpoint, headers=headers)

print(response)
print(response.json())

In [None]:
endpoint = f"{base_url}/dataset"

response = requests.get(endpoint, headers=headers)

print(response)
# print(response.json()) ## Uncomment for verbose list output
for rsp in response.json():
    print(rsp["id"],rsp["type"],rsp["format"])

### Delete model sample <a class="anchor" id="head-22"></a>

In [None]:
network_arch = "vocoder"
encode_key = "tlt_encode"
data = json.dumps({"network_arch":network_arch,"encryption_key":encode_key})

endpoint = f"{base_url}/model"

response = requests.post(endpoint,data=data, headers=headers)

print(response)
print(response.json())
random_mdl = response.json()["id"]

In [None]:
endpoint = f"{base_url}/model"

response = requests.get(endpoint, headers=headers)

print(response)
# print(response.json()) ## Uncomment for verbose list output
for rsp in response.json():
    print(rsp["id"],rsp["network_arch"])

In [None]:
endpoint = f"{base_url}/model/{random_mdl}"
response = requests.delete(endpoint, headers=headers)

print(response)
print(response.json())

In [None]:
endpoint = f"{base_url}/model"

response = requests.get(endpoint, headers=headers)

print(response)
# print(response.json()) ## Uncomment for verbose list output
for rsp in response.json():
    print(rsp["id"],rsp["network_arch"])