# Learn OpenAI Whisper - Chapter 9
## Notebook 4: Synthetizing speech using a fine-tuned voice model

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1TxsyV259Aru5EWX9ebrREaW3IgpUT3eG)

This notebook complements the book [Learn OpenAI Whisper](https://a.co/d/1p5k4Tg).


This notebook is based on the [TorToiSe-TTS-Fast](https://github.com/152334H/tortoise-tts-fast) project, which drastically boost the performance of [TorToiSe](https://github.com/neonbjb/tortoise-tts), without modifying the base models.

After creating a fine-tuned voice model using [Notebook 3 of this chapter](https://colab.research.google.com/drive/1qKflIgjPFVDW3qLaL08CV-quth5MwcRd), we then load fine-tuned autoregressive model uring the parameter `--ar-checkpoint`, synthesize speech using the model, and play the generated audio.

```
./script/tortoise-tts.py --preset very_fast --ar-checkpoint /path/to/checkpoint.pth #...
```

## 1. Checking NVIDIA GPU:
We start by checking if an NVIDIA GPU is available using the `nvidia-smi` command. It prints the GPU information if connected, otherwise it indicates that no GPU is connected.

In [None]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
else:
  print(gpu_info)

Thu Apr 11 23:50:31 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   41C    P8               9W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

## 2. Checking Virtual Memory:
Next we check the available RAM using the `psutil` library. It prints the amount of available RAM in gigabytes and indicates if a high-RAM runtime is being used.

In [None]:
from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))

if ram_gb < 20:
  print('Not using a high-RAM runtime')
else:
  print('You are using a high-RAM runtime!')

Your runtime has 13.6 gigabytes of available RAM

Not using a high-RAM runtime


## 3. Cloning and Installing tortoise-tts-fast:
This cell clones the tortoise-tts-fast repository from GitHub and installs the required dependencies using `pip3`.

In [None]:
!git clone https://github.com/152334H/tortoise-tts-fast
%cd tortoise-tts-fast
!pip3 install -r requirements.txt --no-deps
!pip3 install -e .

Cloning into 'tortoise-tts-fast'...
remote: Enumerating objects: 2314, done.[K
remote: Total 2314 (delta 0), reused 0 (delta 0), pack-reused 2314[K
Receiving objects: 100% (2314/2314), 105.07 MiB | 30.09 MiB/s, done.
Resolving deltas: 100% (1070/1070), done.
Updating files: 100% (697/697), done.
/content/tortoise-tts-fast
Ignoring backports-zoneinfo: markers 'python_version >= "3.8" and python_version < "3.9"' don't match your environment
Collecting bigvgan@ git+https://github.com/152334H/BigVGAN.git@HEAD (from -r requirements.txt (line 38))
[31mERROR: Can't verify hashes for these requirements because we don't have a way to hash version control repositories:
    bigvgan@ git+https://github.com/152334H/BigVGAN.git@HEAD from git+https://github.com/152334H/BigVGAN.git@HEAD (from -r requirements.txt (line 38))[0m[31m
[0mObtaining file:///content/tortoise-tts-fast
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25h

# Restart session
WARNING: The following packages were previously imported in this runtime:
  [numpy]
You must restart the runtime in order to use newly installed versions.

In Google Colab, from the top menu, select `Runtime`, then `Restart session`.
<img src="https://github.com/PacktPublishing/Learn-OpenAI-Whisper/raw/main/Chapter09/images/Restart_the_runtime_600x102.png" width=600>

## 4. Installing Additional Supporting Libraries:
Next we install additional libraries such as `transformers`, `voicefixer`, and `BigVGAN` using `pip3`.

In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"
!pip3 install transformers==4.29.2
!pip3 uninstall voicefixer
!pip3 install voicefixer==0.1.2
%cd tortoise-tts-fast
!pwd
!pip3 install git+https://github.com/152334H/BigVGAN.git

Collecting transformers==4.29.2
  Downloading transformers-4.29.2-py3-none-any.whl (7.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.1/7.1 MB[0m [31m20.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 4.33.3
    Uninstalling transformers-4.33.3:
      Successfully uninstalled transformers-4.33.3
Successfully installed transformers-4.29.2
Found existing installation: voicefixer 0.1.3
Uninstalling voicefixer-0.1.3:
  Would remove:
    /usr/local/bin/voicefixer
    /usr/local/bin/voicefixer.cmd
    /usr/local/lib/python3.10/dist-packages/voicefixer-0.1.3.dist-info/*
    /usr/local/lib/python3.10/dist-packages/voicefixer/*
Proceed (Y/n)? Y
  Successfully uninstalled voicefixer-0.1.3
Collecting voicefixer==0.1.2
  Downloading voicefixer-0.1.2-py3-none-any.whl (52 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m52.2/52.2 kB[0m [31m816.

## 5. Mounting Google Drive:
This cell mounts Google Drive to load the fine-tuned voice model.

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


## 6. Loading a Fine-tuned Autoregressive Voice Model:
Next, we set the path to the fine-tuned autoregressive voice model (`gpt_path`) and the text to be synthesized (`text`).

<img src="https://github.com/PacktPublishing/Learn-OpenAI-Whisper/raw/main/Chapter09/images/ch09_4_Google_drive_DLAS_checkpoint_directory.JPG" width=600>



In [8]:
%cd /content/tortoise-tts-fast/scripts
%pwd

gpt_path = '/content/gdrive/MyDrive/Learn_OAI_Whisper_20240411_JRB/models/60_gpt.pth'
text = "Benny, bring me everyone. EVERYONE!!!"

/content/tortoise-tts-fast/scripts


## 7. Running tortoise_tts.py:
The code runs the `tortoise_tts.py` script with the specified arguments, including the `--preset` option for inference speed, the `--ar_checkpoint` option for the fine-tuned model path, the `-o` option for output file name, and the text to be synthesized.

In [9]:
!python tortoise_tts.py --preset fast --ar_checkpoint $gpt_path -o "152.wav" $text

Loading tts...
Removing weight norm...
Rendering random_00 (1 of 1)...
  Benny, bring me everyone. EVERYONE!!!
Generating autoregressive samples..
100% 6/6 [00:22<00:00,  3.71s/it]
Computing best candidates using CLVP
100% 6/6 [00:04<00:00,  1.43it/s]
Transforming autoregressive outputs into audio..
100% 20/20 [00:01<00:00, 10.34it/s]


## 8. Playing the Synthesized Audio:
Finally, we search in Google Colab `Files` for directory `tortoise-tts-fast/scripts/results/`. In that directoy you will find the generated audio from the voice cloning model. We use `IPython` to display and play the synthesized audio file.

<img src="https://github.com/PacktPublishing/Learn-OpenAI-Whisper/raw/main/Chapter09/images/ch09_4_Google_Colab_tortoise-tts-fast_audio_from_cloned_voice.JPG" width=400>

In [10]:
import IPython
IPython.display.Audio('/content/tortoise-tts-fast/scripts/results/random_00_00.wav')

In [11]:
IPython.display.Audio('/content/tortoise-tts-fast/scripts/results/random_combined.wav')