# Learn OpenAI Whisper - Chapter 8
## Notebook 4: Synthetizing speech using a fine-tuned voice model

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1TxsyV259Aru5EWX9ebrREaW3IgpUT3eG)

This notebook complements the book [Learn OpenAI Whisper](https://a.co/d/1p5k4Tg).


This notebook is based on the [TorToiSe-TTS-Fast](https://github.com/152334H/tortoise-tts-fast) project, which drastically boost the performance of [TorToiSe](https://github.com/neonbjb/tortoise-tts), without modifying the base models.

After creating a fine-tuned voice model using [Notebook 3 of this chapter](https://colab.research.google.com/drive/1qKflIgjPFVDW3qLaL08CV-quth5MwcRd), we then load fine-tuned autoregressive model uring the parameter `--ar-checkpoint`, synthesize speech using the model, and play the generated audio.

```
./script/tortoise-tts.py --preset very_fast --ar-checkpoint /path/to/checkpoint.pth #...
```

## 1. Checking NVIDIA GPU:
We start by checking if an NVIDIA GPU is available using the `nvidia-smi` command. It prints the GPU information if connected, otherwise it indicates that no GPU is connected.

In [None]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
else:
  print(gpu_info)

## 2. Checking Virtual Memory:
Next we check the available RAM using the `psutil` library. It prints the amount of available RAM in gigabytes and indicates if a high-RAM runtime is being used.

In [None]:
from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))

if ram_gb < 20:
  print('Not using a high-RAM runtime')
else:
  print('You are using a high-RAM runtime!')

## 3. Cloning and Installing tortoise-tts-fast:
This cell clones the tortoise-tts-fast repository from GitHub and installs the required dependencies using `pip3`.

In [None]:
!git clone https://github.com/152334H/tortoise-tts-fast
%cd tortoise-tts-fast
!pip3 install -r requirements.txt --no-deps
!pip3 install -e .

## 4. Installing Additional Supporting Libraries:
Next we install additional libraries such as `transformers`, `voicefixer`, and `BigVGAN` using `pip3`.

In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"
!pip3 install transformers==4.29.2
!pip3 uninstall voicefixer
!pip3 install voicefixer==0.1.2
%cd tortoise-tts-fast
!pwd
!pip3 install git+https://github.com/152334H/BigVGAN.git

## 5. Mounting Google Drive:
This cell mounts Google Drive to load the fine-tuned voice model.

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

## 6. Loading a Fine-tuned Autoregressive Voice Model:
Next, we set the path to the fine-tuned autoregressive voice model (`gpt_path`) and the text to be synthesized (`text`).

In [None]:
%cd /content/tortoise-tts-fast/scripts
%pwd

gpt_path = '/content/gdrive/MyDrive/Generative_AI/Deep_Fakes_Voice/tortoise/WaWF-JRB-audio_20230607.mp3_2023_06_08-13_20/models/120_gpt.pth'
text = "Benny, bring me everyone. EVERYONE!"

## 7. Running tortoise_tts.py:
The code runs the `tortoise_tts.py` script with the specified arguments, including the `--preset` option for inference speed, the `--ar_checkpoint` option for the fine-tuned model path, the `-o` option for output file name, and the text to be synthesized.

In [None]:
!python tortoise_tts.py --preset fast --ar_checkpoint $gpt_path -o "152.wav" $text

## 8. Playing the Synthesized Audio:
Finally, the code uses IPython to display and play the synthesized audio file.

In [None]:
import IPython
IPython.display.Audio('/content/tortoise-tts-fast/scripts/results/random_00_00.wav')