# Easy Inferencing with 🐸 TTS ⚡

#### You want to quicly synthesize speech using Coqui 🐸 TTS model?

💡: Grab a pre-trained model and use it to synthesize speech using any speaker voice, including yours! ⚡

🐸 TTS comes with a list of pretrained models and speaker voices. You can even start a local demo server that you can open it on your favorite web browser and 🗣️ .

In this notebook, we will: 
```
1. List available pre-trained 🐸 TTS models
2. Run a 🐸 TTS model
3. Listen to the synthesized wave 📣
4. Run multispeaker 🐸 TTS model 
```
So, let's jump right in!


## Install 🐸 TTS ⬇️

In [8]:
! pip install -U pip
! pip install TTS



## ✅ List available pre-trained 🐸 TTS models

Coqui 🐸TTS comes with a list of pretrained models for different model types (ex: TTS, vocoder), languages, datasets used for training and architectures. 

You can either use your own model or the release models under 🐸TTS.

Use `tts --list_models` to find out the availble models.



In [9]:
! tts --list_models


 Name format: type/language/dataset/model
 1: tts_models/multilingual/multi-dataset/xtts_v2
 2: tts_models/multilingual/multi-dataset/xtts_v1.1
 3: tts_models/multilingual/multi-dataset/your_tts
 4: tts_models/multilingual/multi-dataset/bark
 5: tts_models/bg/cv/vits
 6: tts_models/cs/cv/vits
 7: tts_models/da/cv/vits
 8: tts_models/et/cv/vits
 9: tts_models/ga/cv/vits
 10: tts_models/en/ek1/tacotron2
 11: tts_models/en/ljspeech/tacotron2-DDC
 12: tts_models/en/ljspeech/tacotron2-DDC_ph
 13: tts_models/en/ljspeech/glow-tts [already downloaded]
 14: tts_models/en/ljspeech/speedy-speech
 15: tts_models/en/ljspeech/tacotron2-DCA
 16: tts_models/en/ljspeech/vits
 17: tts_models/en/ljspeech/vits--neon
 18: tts_models/en/ljspeech/fast_pitch
 19: tts_models/en/ljspeech/overflow
 20: tts_models/en/ljspeech/neural_hmm
 21: tts_models/en/vctk/vits [already downloaded]
 22: tts_models/en/vctk/fast_pitch
 23: tts_models/en/sam/tacotron-DDC
 24: tts_models/en/blizzard2013/capacitron-t2-c50
 25: tt

## ✅ Run a 🐸 TTS model

#### **First things first**: Using a release model and default vocoder:

You can simply copy the full model name from the list above and use it 


In [17]:
!tts --text "▁I▁KNOCKED▁AT▁THE▁DOOR▁ON▁THE▁ANCIENT▁SIDE▁OF▁THE▁BUILDING" \
--model_name "tts_models/en/ljspeech/glow-tts" \
--out_path output.wav


 > tts_models/en/ljspeech/glow-tts is already downloaded.
 > vocoder_models/en/ljspeech/multiband-melgan is already downloaded.
 > Using model: glow_tts
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:-100
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:0
 | > fft_size:1024
 | > power:1.1
 | > preemphasis:0.0
 | > griffin_lim_iters:60
 | > signal_norm:False
 | > symmetric_norm:True
 | > mel_fmin:50.0
 | > mel_fmax:7600.0
 | > pitch_fmin:1.0
 | > pitch_fmax:640.0
 | > spec_gain:1.0
 | > stft_pad_mode:reflect
 | > max_norm:1.0
 | > clip_norm:True
 | > do_trim_silence:True
 | > trim_db:60
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:None
 | > base:10
 | > hop_length:256
 | > win_length:1024
 > Vocoder Model: multiband_melgan
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resam

## 📣 Listen to the synthesized wave 📣

In [11]:
import IPython
IPython.display.Audio("output.wav")

### **Second things second**:

🔶 A TTS model can be either trained on a single speaker voice or multispeaker voices. This training choice is directly reflected on the inference ability and the available speaker voices that can be used to synthesize speech. 

🔶 If you want to run a multispeaker model from the released models list, you can first check the speaker ids using `--list_speaker_idx` flag and use this speaker voice to synthesize speech.

In [12]:
# list the possible speaker IDs.
!tts --model_name "tts_models/en/vctk/vits" \
--list_speaker_idxs 


 > tts_models/en/vctk/vits is already downloaded.
Traceback (most recent call last):
  File "/Users/coldbrew/miniconda3/envs/fluent/bin/tts", line 8, in <module>
    sys.exit(main())
  File "/Users/coldbrew/miniconda3/envs/fluent/lib/python3.10/site-packages/TTS/bin/synthesize.py", line 377, in main
    model_path, config_path, model_item = manager.download_model(args.model_name)
  File "/Users/coldbrew/miniconda3/envs/fluent/lib/python3.10/site-packages/TTS/utils/manage.py", line 419, in download_model
    output_model_path, output_config_path = self._find_files(output_path)
  File "/Users/coldbrew/miniconda3/envs/fluent/lib/python3.10/site-packages/TTS/utils/manage.py", line 442, in _find_files
    raise ValueError(" [!] Model file not found in the output path")
ValueError:  [!] Model file not found in the output path
[0m

## 💬 Synthesize speech using speaker ID 💬

In [13]:
!tts --text "Trying out specific speaker voice"\
--out_path spkr-out.wav --model_name "tts_models/en/vctk/vits" \
--speaker_idx "p341"

 > tts_models/en/vctk/vits is already downloaded.
Traceback (most recent call last):
  File "/Users/coldbrew/miniconda3/envs/fluent/bin/tts", line 8, in <module>
    sys.exit(main())
  File "/Users/coldbrew/miniconda3/envs/fluent/lib/python3.10/site-packages/TTS/bin/synthesize.py", line 377, in main
    model_path, config_path, model_item = manager.download_model(args.model_name)
  File "/Users/coldbrew/miniconda3/envs/fluent/lib/python3.10/site-packages/TTS/utils/manage.py", line 419, in download_model
    output_model_path, output_config_path = self._find_files(output_path)
  File "/Users/coldbrew/miniconda3/envs/fluent/lib/python3.10/site-packages/TTS/utils/manage.py", line 442, in _find_files
    raise ValueError(" [!] Model file not found in the output path")
ValueError:  [!] Model file not found in the output path
[0m

## 📣 Listen to the synthesized speaker specific wave 📣

In [16]:
import IPython
IPython.display.Audio("spkr-out.wav")

ValueError: rate must be specified when data is a numpy array or list of audio samples.

🔶 If you want to use an external speaker to synthesize speech, you need to supply `--speaker_wav` flag along with an external speaker encoder path and config file, as follows:

First we need to get the speaker encoder model, its config and a referece `speaker_wav`

In [None]:
!wget https://github.com/coqui-ai/TTS/releases/download/speaker_encoder_model/config_se.json
!wget https://github.com/coqui-ai/TTS/releases/download/speaker_encoder_model/model_se.pth.tar
!wget https://github.com/coqui-ai/TTS/raw/speaker_encoder_model/tests/data/ljspeech/wavs/LJ001-0001.wav

In [None]:
!tts --model_name tts_models/multilingual/multi-dataset/your_tts \
--encoder_path model_se.pth.tar \
--encoder_config config_se.json \
--speaker_wav LJ001-0001.wav \
--text "Are we not allowed to dim the lights so people can see that a bit better?"\
--out_path spkr-out.wav \
--language_idx "en"

## 📣 Listen to the synthesized speaker specific wave 📣

In [None]:
import IPython
IPython.display.Audio("spkr-out.wav")

## 🎉 Congratulations! 🎉 You now know how to use a TTS model to synthesize speech! 
Follow up with the next tutorials to learn more adnavced material.