## Notebook 2: fine-tuning and inference

**This notebook is a demo for my bachelor's thesis - a TTS system wich includes the following capabilities:**

1) Voice cloning, based on audio recorded by the user within this notebook

2) Voice anonymization, where the textual information from the recording is kept, but the speaker's identity is not

3) Classical TTS



---

## Setup - run the following instructions

### Install necessary libraries

In [None]:
# !apt-get update
# !apt-get install tensorrt
# !apt-get install python3-libnvinfer-dev

In [None]:
!pip install python-dotenv

Collecting python-dotenv
  Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.0.1


In [None]:
!pip install git+https://github.com/NVIDIA/dllogger@v0.1.0#egg=dllogger

Collecting dllogger
  Cloning https://github.com/NVIDIA/dllogger (to revision v0.1.0) to /tmp/pip-install-0_14kau6/dllogger_5a88cb85c4384e4ab4ce0951572f945b
  Running command git clone --filter=blob:none --quiet https://github.com/NVIDIA/dllogger /tmp/pip-install-0_14kau6/dllogger_5a88cb85c4384e4ab4ce0951572f945b
  Running command git checkout -q 26a0f8f1958de2c0c460925ff6102a4d2486d6cc
  Resolved https://github.com/NVIDIA/dllogger to commit 26a0f8f1958de2c0c460925ff6102a4d2486d6cc
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: dllogger
  Building wheel for dllogger (setup.py) ... [?25l[?25hdone
  Created wheel for dllogger: filename=DLLogger-0.1.0-py3-none-any.whl size=5615 sha256=53661d74d0b49168a02dffbb0f3e5a1129252f49ffcb32cf617dd59a33b986bd
  Stored in directory: /tmp/pip-ephem-wheel-cache-ham7_zm2/wheels/3e/37/1a/76f3f71919b1e99f2d778d1da0dbb35b67878d7fa8f4cbf60c
Successfully built dllogger
Installing collected packages: dllogger
Su

In [None]:
# !pip install tensorrt



---



### Mount to google drive and change the spk_id and TTS_speaker

In [None]:
import os, sys, re, shutil
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
root_path = '/content/drive/MyDrive/demo_licenta/' # or any other path to the shortcut
home_path = os.path.join(root_path, 'FastPitch_new')
if os.path.exists(home_path):
  os.chdir(home_path)

### Set your speaker ID, the desired TTS speaker ID and use the zip address provided in notebook t1

In [None]:
spk_id = "RT_slow"
TTS_speaker = 2
zip_address = "/content/drive/MyDrive/demo_licenta/licenta/denoised/denoised_RT_slow/RT_slow_zip"
os.path.exists(zip_address)

True

### Run to create your new speaker file and to copy the necessary information

In [None]:
if os.path.exists(os.path.join(home_path, f"new_speakers/{spk_id}")) == False:
  os.mkdir(os.path.join(home_path, f"new_speakers/{spk_id}"))

In [None]:
# shutil.copy(os.path.join(zip_address, f"meta_4_pitch_mels_{spk_id}.txt"), os.path.join(home_path, f'new_speakers/{spk_id}/meta_4_pitch_mels_{spk_id}.txt'))
# shutil.copy(os.path.join(zip_address, f"{spk_id}_metadata.txt"), os.path.join(home_path, f'new_speakers/{spk_id}/{spk_id}_metadata.txt'))
# shutil.copy(os.path.join(zip_address, f"{spk_id}_metadata_train.txt"), os.path.join(home_path, f"new_speakers/{spk_id}/{spk_id}_metadata_train.txt"))
# shutil.copy(os.path.join(zip_address, f"{spk_id}_metadata_eval.txt"), os.path.join(home_path, f"new_speakers/{spk_id}/{spk_id}_metadata_eval.txt"))
shutil.copy(os.path.join(zip_address, f"TTS_text_{spk_id}.txt"), os.path.join(home_path, 'phrases', f"TTS_text_{spk_id}.txt"))
# shutil.copytree(os.path.join(zip_address, "wavs22"), os.path.join(home_path, f"new_speakers/{spk_id}/wavs22"))
# shutil.copy(os.path.join(zip_address, f"{spk_id}_18x384.npy"), os.path.join(home_path, f"embs/{spk_id}_18x384.npy"))

'/content/drive/MyDrive/demo_licenta/FastPitch_new/phrases/TTS_text_RT_slow.txt'

## Modify Setup files

### TTS & anonymous speaker

In [None]:
def setup_TTS(setup_file, spk_id, new_spk, new_anonym=1, new_config="pred", new_phrases=""):
  setup_custom = setup_file.replace(".sh", f"_{spk_id}.sh")

  search_SPEAKERS = re.compile(r'^(SPEAKERS=\().*(\))$')
  search_anonym = re.compile(r'^(export ANONYM_FACTOR=).*$')
  search_config = re.compile(r'^(export FASTPITCH_config=).*$')
  search_phrases = re.compile(r'^(PHRASES=).*$')

  with open(setup_file, 'r') as infile, open(setup_custom, 'w') as outfile:
      for line in infile:
          if search_SPEAKERS.match(line.strip()) and new_spk != "":
              line = f'SPEAKERS=({new_spk})\n'
          elif search_anonym.match(line.strip()):
              line = f'export ANONYM_FACTOR={new_anonym}\n'
          elif search_config.match(line.strip()):
              line = f'export FASTPITCH_config="{new_config}"\n'
          elif search_phrases.match(line.strip()) and new_phrases != "":
              line = f'PHRASES={new_phrases}\n'
          else:
              pass
          outfile.write(line)
  print(f"Your setup file is ready. You can find it at: {setup_custom}")
  return setup_custom

In [None]:
setup_file = "./scripts/setup_TTS.sh"
run_setup_TTS = setup_TTS(setup_file, spk_id, new_spk=TTS_speaker, new_anonym=1) # 1 == no anonymization

Your setup file is ready. You can find it at: ./scripts/setup_TTS_RT_slow.sh


### Speaker Adaptation

**Steps for new speakers**:

- prepare dataset
- setup voice cloner

In [None]:
def setup_dataprep(setup_file, spk_id):
  setup_custom = setup_file.replace(".sh", f"_{spk_id}.sh")

  search_SPEAKERS = re.compile(r"current_speaker\s*=\s*\"(.*?)\"")

  with open(setup_file, 'r') as infile, open(setup_custom, 'w') as outfile:
      for line in infile:
        if search_SPEAKERS.match(line.strip()):
          line = f'current_speaker="{spk_id}"\n'
        else:
            pass
        outfile.write(line)
  print(f"Your setup file is ready. You can find it at: {setup_custom}")
  return setup_custom

In [None]:
run_setup_prep_dataset = setup_dataprep('./scripts/prepare_dataset.sh', spk_id)

Your setup file is ready. You can find it at: ./scripts/prepare_dataset_RT_slow.sh


In [None]:
def setup_VC(setup_file, spk_id, new_phrases):
  setup_custom = setup_file.replace(".sh", f"_{spk_id}.sh")

  search_SPEAKERS = re.compile(r"SPEAKER\s*=\s*\"(.*?)\"")
  search_phrases = re.compile(r"PHRASES\s*=\s*\"(.*?)\"")

  with open(setup_file, 'r') as infile, open(setup_custom, 'w') as outfile:
      for line in infile:
          if search_SPEAKERS.match(line.strip()) and spk_id != "":
              line = f'SPEAKER="{spk_id}"\n'
          elif search_phrases.match(line.strip()) and new_phrases != "":
              line = f'PHRASES="{new_phrases}"\n'
          else:
              pass
          outfile.write(line)
  print(f"Your setup file is ready. You can find it at: {setup_custom}")
  return setup_custom

In [None]:
run_setup_VC = setup_VC("./scripts/setup_VC.sh", spk_id, new_phrases=f"phrases/TTS_text_{spk_id}.txt")

Your setup file is ready. You can find it at: ./scripts/setup_VC_RT_slow.sh


## Running the Setup files

### TTS & anonymous speaker



In [None]:
!bash $run_setup_TTS


AMP=false, batch_size=1

SPEAKER: 2
FASTPITCH CONFIGURATION: pred
ANONYM FLAG: 1
PHRASES: ./phrases/anonym_test.txt
OUTPUT_DIR: out_synth/TTS/pred/2
From models.py: Importing model: model_pred
####Structure in use: PREDICTORS conditioned only#########
<class 'model_pred.FastPitch'>
2024-07-16 21:50:46.532367: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-16 21:50:46.532410: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-16 21:50:46.533768: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-16 21:50:46.541053: I tensorflow/core/platform/cpu_feature_guard.cc:182] 

### Listening generated files

**One of the generated files:**

In [None]:
from IPython.display import Audio

OUTPUT_DIR = os.path.join(home_path, f"out_synth/TTS/pred/{TTS_speaker}/audio_001.wav")
Audio(OUTPUT_DIR, autoplay=False)

**One of the original files:**

In [None]:
original_ref = os.path.join(home_path, f"original_audios/TTS/{TTS_speaker}.wav")
Audio(original_ref, autoplay=False)

### Speaker adaptation

In [None]:
# only if needed:
!bash $run_setup_prep_dataset

DLL 2024-07-17 00:51:28.612285 - PARAMETER dataset_path : ./new_speakers/RT_slow/wavs22 
DLL 2024-07-17 00:51:28.612911 - PARAMETER wav_text_filelists : ['./new_speakers/RT_slow/meta_4_pitch_mels_RT_slow.txt'] 
DLL 2024-07-17 00:51:28.613060 - PARAMETER extract_mels : True 
DLL 2024-07-17 00:51:28.613122 - PARAMETER extract_pitch : True 
DLL 2024-07-17 00:51:28.613168 - PARAMETER save_alignment_priors : False 
DLL 2024-07-17 00:51:28.613211 - PARAMETER log_file : preproc_log.json 
DLL 2024-07-17 00:51:28.613249 - PARAMETER n_speakers : 1 
DLL 2024-07-17 00:51:28.613289 - PARAMETER max_wav_value : 32768.0 
DLL 2024-07-17 00:51:28.613327 - PARAMETER sampling_rate : 22050 
DLL 2024-07-17 00:51:28.613364 - PARAMETER filter_length : 1024 
DLL 2024-07-17 00:51:28.613401 - PARAMETER hop_length : 256 
DLL 2024-07-17 00:51:28.613439 - PARAMETER win_length : 1024 
DLL 2024-07-17 00:51:28.613475 - PARAMETER mel_fmin : 0.0 
DLL 2024-07-17 00:51:28.613512 - PARAMETER mel_fmax : 8000.0 
DLL 2024-07-

In [None]:
!bash $run_setup_VC



AMP=false, 1x1x1 (global batch size 1)

SPEAKER: RT_slow
FASTPITCH CONFIGURATION: pred
DATASET: ./new_speakers/RT_slow/wavs22/
TRAIN_SET: ./new_speakers/RT_slow/RT_slow_metadata_train.txt
VAL_SET: ./new_speakers/RT_slow/RT_slow_metadata_eval.txt
OUTPUT_DIR: ./OUTPUT_MODELS/frozen/RT_slow_frdur_pred/
2024-07-17 00:57:58.394382: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-17 00:57:58.394437: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-17 00:57:58.395837: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-17 00:57:58.403294: I tensorflow/core/platform/cpu_fea

### Listening generated files

**One of the generated files:**

In [None]:
from IPython.display import Audio

OUTPUT_DIR = os.path.join(home_path, f"out_synth/frozen/{spk_id}/pred/conf_frdur/audio_002.wav")
Audio(OUTPUT_DIR, autoplay=False)

**One of the original files:**

In [None]:
original_ref = os.path.join(home_path, f"new_speakers/{spk_id}/wavs22/{spk_id}_001_22.wav")
Audio(original_ref, autoplay=False)

### **"dol" speaker**:

In [None]:
# modifying the setup to allow only inference commands:
inference_only = "yes"
run_setup_VC = setup_VC("./scripts/setup_VC.sh", "dol", new_phrases="")

Your setup file is ready. You can find it at: ./scripts/setup_VC_dol.sh


In [None]:
input_file = "./scripts/train_freeze.sh"

if inference_only == "yes":
  with open(input_file, 'r') as infile, open(input_file.replace(".sh", "_temp.sh"), 'w') as outfile:
      for line in infile:
          if "python3" in line.strip():
            line = line.strip().replace("python3", "#python3")
          else:
            pass
          outfile.write(line)

In [None]:
!bash $run_setup_VC



AMP=false, 1x1x1 (global batch size 1)

SPEAKER: dol
FASTPITCH CONFIGURATION: pred
DATASET: /mnt/student-share/data4teodora/SWARA_mels_pitch_meta/
TRAIN_SET: ./new_speakers/dol/dol_metadata_train.txt
VAL_SET: ./new_speakers/dol/dol_metadata_eval.txt
OUTPUT_DIR: ./OUTPUT_MODELS/frozen/dol_frdur_pred/

#######################################################
The model was successfully adapted to speaker dol
#######################################################


AMP=false, batch_size=1

Current speaker: dol
Phrases file: phrases/VC_test.txt
Model in use: ./OUTPUT_MODELS/frozen/dol_frdur_pred/FastPitch_checkpoint_560.pt
Audio files saved to: out_synth/frozen/dol/pred/conf_frdur
From models.py: Importing model: model_pred
####Structure in use: PREDICTORS conditioned only#########
<class 'model_pred.FastPitch'>
2024-07-17 01:19:35.043897: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when 

#### Listening generated files

**One of the generated files:**

In [None]:
from IPython.display import Audio

OUTPUT_DIR = os.path.join(home_path, "out_synth/frozen/dol/pred/conf_frdur/audio_002.wav")
Audio(OUTPUT_DIR, autoplay=False)

**One of the original files:**

In [None]:
original_ref = os.path.join(home_path, f"original_audios/VC/dol.wav")
Audio(original_ref, autoplay=False)