# Afrikaans Text-to-Speech Demo

A demonstration notebook for two TTS systems, Naive TTS and G2PxTTS from https://github.com/JulianHerreilers/afrTTS.

## Reference:

Univoc: https://github.com/bshall/UniversalVocoding

Tacotron: https://github.com/bshall/Tacotron/


Install the necessary packages:

In [None]:
!pip install -q omegaconf
!pip install -q librosa==0.8.0
!pip install -q univoc
!pip install -q tacotron
!pip install -q torch

#There may be some installs I forgot about

In [None]:
import torch
import soundfile as sf
from univoc import Vocoder
from tacotron import text_to_id, Tacotron
import matplotlib.pyplot as plt
from IPython.display import Audio
from tqdm import tqdm as tqdm


def load_afrdict(file_name):
    """Loads the Afr(local) Pronouncing Dictionary"""

    dict_ref = file_name
    dict_file = open(dict_ref, 'r')
    dict_list = dict_file.readlines()
    dict_file.close()
    afrdict = {}
    for i in range(0, len(dict_list)):
        dict_list[i] = dict_list[i].strip().split()
        entry = " ".join([str(word) for word in dict_list[i][1:]])
        afrdict[str(dict_list[i][0])] = entry
    return afrdict

G2P Imports

In [None]:
import torch
import torch.nn as nn
from torch import optim
import torch.nn.functional as F
from torch.utils import data
from torch.nn.utils.rnn import pack_padded_sequence
from g2pmodel import g2p_model_init as g2p_init
from demo_utils import process_text_input

In [None]:
afrdict_rcrl = load_afrdict("rcrl_apd.1.4.1.txt")
afrdict_afr_za = load_afrdict("afr_za_dict.txt")
afrdict_afr_za["boom"]

In [None]:
model_name = "G2P/best_models/G2p-e256h256n2d0.1.pt"
g2p_model =  g2p_init(model_name)
g2p_model.load_state_dict(torch.load(model_name))
g2p_model.to("cuda:0")

In [None]:
text = "142 drie"
process_text_input(g2p_model, afrdict_afr_za, text)

Download pretrained weights for the vocoder and move to the GPU

In [None]:
vocoder = Vocoder.from_pretrained(
    "https://github.com/bshall/UniversalVocoding/releases/download/v0.2/univoc-ljspeech-7mtpaq.pt"
).cuda()

Download pretrained weights for tacotron - NaiveTTS and G2PxTTS

In [None]:
tacotron_naive = Tacotron.from_pretrained(
    "https://github.com/JulianHerreilers/pantoffel_tacotron_models_storage/releases/download/v0.190k-210k-230k-beta/model-230000.pt"
).cuda()

tacotron_G2P = Tacotron.from_pretrained(
    "https://github.com/JulianHerreilers/pantoffel_tacotron_models_storage/releases/download/v1.120epoch/model-300000.pt"
).cuda()

Load the CMU pronunciation dictionary and add the pronunciation of "PyTorch"

The text to be synthesized:

In [None]:
text = "league of legends is great." #Used for G2PxTTS which will generate pronunciations for the words if not in dictionary
texta = "wys jou resultate in die tabel." #Used for NaiveTTS which will have to remove words if not in dictionary

Synthesize the audio!

In [None]:
x = torch.LongTensor(text_to_id(texta, afrdict_rcrl)).unsqueeze(0).cuda()
with torch.no_grad():
    mel_spec, _ = tacotron_naive.generate(x)
    wave, sr = vocoder.generate(mel_spec.transpose(1, 2))
Audio(wav, rate=sr)

In [None]:
plt.plot(wav)

In [None]:
text = process_text_input(g2p_model, afrdict_afr_za, text)
x = torch.LongTensor(text_to_id(text, afrdict_afr_za)).unsqueeze(0).cuda()
with torch.no_grad():
    mel_spec, _ = tacotron_G2P.generate(x)
    wave, sr = vocoder.generate(mel_spec.transpose(1, 2))
Audio(wave, rate=sr)