# Portuguese Text to Speech With Professional Quality

This is a notebook for running TTS experiments with Tacotron2, Fastspeech2 and Multiband-MelGAN models trained over the "g Neutral Speech Male" dataset. The code hereby presented is an adaptation of https://github.com/TensorSpeech/TensorFlowTTS and https://github.com/kan-bayashi/ParallelWaveGAN repositories, to work with our data. 

Internet connection is needed to run this code directly on Kaggle. To start, please go to notebook settings (in the right side of the screen) to verify your phone and, after that, enable internet connection for this notebook.


You can download code and pretrained models accessing https://www.kaggle.com/datasets/pedrohlopes/tensorflowttscustomkaggle.

For information about the licensing, how to cite and more, please visit: https://www.smt.ufrj.br/gpa/propor2022 and https://www.kaggle.com/datasets/mediatechlab/gneutralspeech. The voice hereby presented is licensed and by running this notebook you accept the [terms of use](https://www.smt.ufrj.br/~gpa/terms_of_use.pdf).

# Initial Setup

Installing Libraries, setting up paths, etc... (This may take a while)

In [None]:
import IPython.display as ipd
import sys
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
!cp -r /kaggle/input/tensorflowttscustomkaggle/tacotron2-kaggle/ParallelWaveGAN/ ./
!pip3 install -r /kaggle/input/tensorflowttscustomkaggle/tacotron2-kaggle/requirements-docker.txt --ignore-installed --quiet --no-deps --user --no-warn-script-location
!pip3 install pyopenjtalk --quiet --user --no-warn-script-location
sys.path.append('/kaggle/input/tensorflowttscustomkaggle/tacotron2-kaggle/')
sys.path.append('/kaggle/working/ParallelWaveGAN/')
%cd /kaggle/input/tensorflowttscustomkaggle/tacotron2-kaggle/
!pip3 install -e /kaggle/working/ParallelWaveGAN --quiet --user --no-warn-script-location

# Tacotron 2 + MultiBand MelGAN inference

In [None]:
from TTS_controller import TTS_controller
current_model = 'gneutralspeech_upsampling'
tts_controller = TTS_controller(f'configs/{current_model}_config.json') # create TTS instance

In [4]:
raw_text = input('Input text:')
speech_settings = {}
audio = tts_controller.inference(raw_text,speech_settings) # inference uses custom dict to adapt for common mistakes/ acronyms / foreign words
ipd.Audio(audio,rate=44100)

Input text: Digite qualquer coisa aqui em português que vou falar direitinho.


Texto para falar: Digite qualquer coisa aqui em português que vou falar direitinho.


# FastSpeech2 + Multiband MelGAN inference

In [5]:
current_model = 'gneutralspeech_fs_up'
tts_controller.update_model(f'configs/{current_model}_config.json') # updating tts model, may take a little while

In [9]:
speech_settings['speed'] = 1.0
speech_settings["pitch"] = 1.0
speech_settings["energy"]  = 1.0 # optional FS2 configs
raw_text = input('Input text:')
audio = tts_controller.inference(raw_text,speech_settings)
ipd.Audio(audio,rate=44100)

Input text: Digite qualquer coisa aqui em português que vou falar direitinho.


Texto para falar: Digite qualquer coisa aqui em português que vou falar direitinho.
Speed_ratios: 1.0, f0_ratios: 1.0, Energy ratios: 1.0
