<a href="https://colab.research.google.com/github/Tomiinek/Multilingual_Text_to_Speech/blob/master/notebooks/code_switching_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Multilingual Text-to-Speech Demo

This notebook demonstrates multilingual code-switching text-to-speech using:

- Tacotron based spectrogram generation: https://github.com/Tomiinek/Multilingual_Text_to_Speech
- WaveRNN vocoder: https://github.com/Tomiinek/WaveRNN, forked from fatchord/WaveRNN


**Estimated time to complete**: 5 minutes



In [None]:
import sys
import os
import IPython
from IPython.display import Audio

## Clone repositories

In [None]:
os.chdir(os.path.expanduser("~"))
    
tacotron_dir = "Multilingual_Text_to_Speech"
if not os.path.exists(tacotron_dir):
  ! git clone https://github.com/Tomiinek/Multilingual_Text_to_Speech # $tacotron_dir

wavernn_dir = "WaveRNN"
if not os.path.exists(wavernn_dir):
  ! git clone https://github.com/Tomiinek/$wavernn_dir

## Download pre-trained models

In [None]:
! mkdir -p checkpoints
os.chdir(os.path.join(os.path.expanduser("~"), "checkpoints"))

tacotron_chpt = "generated_switching.pyt"
if not os.path.exists(os.path.join(os.path.expanduser("~"), "checkpoints", tacotron_chpt)):
  ! curl -O -L "https://github.com/Tomiinek/Multilingual_Text_to_Speech/releases/download/v1.0/$tacotron_chpt" 

wavernn_chpt = "wavernn_weight.pyt"
if not os.path.exists(os.path.join(os.path.expanduser("~"), "checkpoints", wavernn_chpt)):
  ! curl -O -L "https://github.com/Tomiinek/Multilingual_Text_to_Speech/releases/download/v1.0/$wavernn_chpt"     

os.chdir(os.path.expanduser("~"))

## Install dependencies

In [None]:
! pip install -q -U soundfile
! pip install -q -U phonemizer
! pip install -q -U epitran

## Input texts to be synthesized

Inputs consist of **three parts delimited** by `|`:
  - **Input utterance** - Only a basic normalization is applied to input utterances, so **you should not use obscure characters and punctuation**. See the examples below that are formatted properly.
  - **Speaker ID** - There are more available speaker IDs, but **you should use just one of** `00-fr`, `00-de`, `00-nl`, `09-ru`, and `00-zh` as the WaveRNN vocoder was trained only on their voices.
  - **Per-character language** specification - You have to provide a **list of language codes** (one of `de`, `fr`, `nl`, `ru`, `zh`) **with the number of their characters delimited by comma**, e.g., `l1-n1,l2-n2,l3` says that the language `l1` occupies `n1` characters from the beginning, the language `l2` takes next `n2` characters and the language `l3` has all the remaining characters to the end. **You can mix up more languages** to control accent by replacing language codes (such as `l1`) with `l1*w1:l2*w2`, which means that the language `l1` has the weight `w1` and `l2` has the weight `w2`. For example, `de*0.75:fr*0.25` combines German and French with more emphasis on German.

Feel free to modify the examples below.






**Run this to demonstrate code switching:**

In [None]:
inputs = [
    "Cette requête s'explique par les relations peu conventionnelles que Schrödinger entretient avec les femmes:|00-fr|fr-68,de-11,fr",
    "Ces quartiers, parmi lesquels figurent De Pijp, le Kinkerbuurt et le Dapperbuurt, sont principalement financés par des banquiers et|00-fr|fr-39,nl-7,fr-5,nl-11,fr-7,nl-11,fr",
    "Les romans de Фёдор Михайлович Достоевский sont parfois qualifiés de métaphysiques,|00-fr|fr-14,ru-28,fr",
    "Le yǒngdìnghé est une rivière du nord de la Chine. Elle est l'un des tributaires du fleuve hǎihé.|00-fr|fr-3,zh-10,fr-78,zh-5,fr",
    "François Hollande ist ein französischer Politiker der Sozialistischen Partei und war Staatspräsident der Französischen Republik.|00-de|fr-17,de",
    "Sie liegt zwischen dem Ijsselmeer, der Ijssel und den Hügeln der Veluwe.|00-de|de-23,nl-10,de-6,nl-6,de-20,nl",
    "Ключевская сопка erreicht ihre außerordentliche Höhe,|00-de|ru-16,de",
    "Der tiānān ménguǎngcháng ist ein Platz im Zentrum von Peking, der Hauptstadt der Volksrepublik China.|00-de|de-4,zh-20,de",
    "Als men langs deze laan loopt van de Brandenburger Tor tot aan de Alexanderplatz over de Schloßbrücke vanaf welke|00-nl|nl-37,de-17,nl-12,de-14,nl-9,de-12,nl",
    "De naam van De Gaulle leeft voort in het grootste vliegveld van Frankrijk, Aéroport Charles De Gaulle.|00-nl|nl-12,fr-9,nl-54,fr-26,nl",
    "Nog steeds wordt Александр Сергеевич Пушкин in de Russische wereld en daarbuiten vereerd en gelezen.|00-nl|nl-17,ru-26,nl",
    "De chángjiāng stroomt vervolgens door zhòngqìng, de grootste stad van sìchuān.|00-nl|nl-3,zh-10,nl-25,zh-9,nl-23,zh-7,nl",
    "При нём трудами Pöppelmanna и других придворных мастеров центр Dresdenа приобрёл знакомый облик в стиле барокко.|09-ru|ru-16,de-10,ru-37,de-7,ru",
    "Как считают современные археологи, на месте Notre-Dame de Paris находились четыре различных храма:|09-ru|ru-44,fr-19,ru",
    "Johannes Vermeer van Delft был известным экспертом по вопросам искусства.|09-ru|nl-26,ru",
    "На протяжении истории běijīng был известен в Китае под разными именами.|09-ru|ru-22,zh-7,ru",
    "tā de fùqīn Rudolf Schrödinger è shì shēngchǎn yóubù hé fángshǔibù de gōngchǎng zhǔ tóngshí yě shì yīmíng yuányìjiā。|00-zh|zh-12,de-18,zh",
    "tāmen tóupiào juédìng jiànzào Sacré-Coeur， érqiě dìngyìtā wèi dùi bālígōngshè shèyuán suǒ fànxià de zùixíng de bǔcháng|00-zh|zh-30,fr-11,zh",
    "Vincent van Gogh de wǔwèi bóbó shūshū men， yǒu sānwèi shì xiāngdāng chénggōng de yìshùpǐn jiāoyìshāng。|00-zh|nl-16,zh",
    "yóuyú Александр shì dùi Кутузов huáiyǒu ègǎn， tā zài jūndùi de lǐngdǎozhíwù bèi zàicì chèxiāo。|00-zh|zh-6,ru-9,zh-9,ru-7,zh"
]

**Run this to demonstrate smooth pronunciation control:**

In [None]:
inputs = [
    "Jean-Paul Marat fait deux voyages en Angleterre au temps de la Révolution.|00-fr|fr",
    "Jean-Paul Marat fait deux voyages en Angleterre au temps de la Révolution.|00-fr|de*0.1:fr*0.9",
    "Jean-Paul Marat fait deux voyages en Angleterre au temps de la Révolution.|00-fr|de*0.2:fr*0.8",
    "Jean-Paul Marat fait deux voyages en Angleterre au temps de la Révolution.|00-fr|de*0.3:fr*0.7",
    "Jean-Paul Marat fait deux voyages en Angleterre au temps de la Révolution.|00-fr|de*0.4:fr*0.6",
    "Jean-Paul Marat fait deux voyages en Angleterre au temps de la Révolution.|00-fr|de*0.5:fr*0.5",
    "Jean-Paul Marat fait deux voyages en Angleterre au temps de la Révolution.|00-fr|de*0.6:fr*0.4",
    "Jean-Paul Marat fait deux voyages en Angleterre au temps de la Révolution.|00-fr|de*0.7:fr*0.3",
    "Jean-Paul Marat fait deux voyages en Angleterre au temps de la Révolution.|00-fr|de*0.8:fr*0.2",
    "Jean-Paul Marat fait deux voyages en Angleterre au temps de la Révolution.|00-fr|de*0.9:fr*0.1",
    "Jean-Paul Marat fait deux voyages en Angleterre au temps de la Révolution.|00-fr|de",
]

**Run this to demonstrate voice cloning:**

In [None]:
inputs = [
    "Der Distrikt liegt in den Kafueauen und ist von Landwirtschaft geprägt.|00-fr|de",
    "Der Distrikt liegt in den Kafueauen und ist von Landwirtschaft geprägt.|00-de|de",
    "Le texte complet de l'initiative peut être consulté sur le site de la Chancellerie fédérale.|00-fr|fr",
    "Le texte complet de l'initiative peut être consulté sur le site de la Chancellerie fédérale.|00-de|fr",
    "Dit wordt de start van Van Oostzanens carrière als zelfstandig kunstschilder.|00-fr|nl",
    "Dit wordt de start van Van Oostzanens carrière als zelfstandig kunstschilder.|00-de|nl",
    "Название штата произошло благодаря серии картографических ошибок и неточностей.|00-fr|ru",
    "Название штата произошло благодаря серии картографических ошибок и неточностей.|00-de|ru",
    "jìsuànjī dàxué zhǔyào xuékē shì kēxué hé jìzhúbù， xuéshēng kěyǐ huòqǔ jìsuànjīkēxué hé jìzhú de běnkē xuéwèi|00-fr|zh",
    "jìsuànjī dàxué zhǔyào xuékē shì kēxué hé jìzhúbù， xuéshēng kěyǐ huòqǔ jìsuànjīkēxué hé jìzhú de běnkē xuéwèi|00-de|zh"
]

## Synthesis

### Spectrogram generation

In [None]:
os.chdir(os.path.join(os.path.expanduser("~"), tacotron_dir))
if "utilss" in sys.modules: del sys.modules["utilss"]

from synthesize import synthesize
from utilss import build_model

model = build_model(os.path.join(os.path.expanduser("~"), "checkpoints", tacotron_chpt))
model.eval()

spectrograms = [synthesize(model, "|" + i) for i in inputs]

### Waveform generation

In [None]:
os.chdir(os.path.join(os.path.expanduser("~"), wavernn_dir))
if "utilss" in sys.modules: del sys.modules["utilss"]

from wavernn.models.fatchord_version import WaveRNN
from wavernn.utilss import hparams as hp
from scripts.gen_wavernn import generate
import torch

hp.configure('hparams.py')
model = WaveRNN(rnn_dims=hp.voc_rnn_dims, fc_dims=hp.voc_fc_dims, bits=hp.bits, pad=hp.voc_pad, upsample_factors=hp.voc_upsample_factors, 
                feat_dims=hp.num_mels, compute_dims=hp.voc_compute_dims, res_out_dims=hp.voc_res_out_dims, res_blocks=hp.voc_res_blocks, 
                hop_length=hp.hop_length, sample_rate=hp.sample_rate, mode=hp.voc_mode).to(torch.device('cuda' if torch.cuda.is_available() else 'cpu'))
model.load(os.path.join(os.path.expanduser("~"), "checkpoints", wavernn_chpt))

waveforms = [generate(model, s, hp.voc_gen_batched, hp.voc_target, hp.voc_overlap) for s in spectrograms]

## Resulting audios



In [None]:
for idx, w in enumerate(waveforms):
  print(inputs[idx])
  IPython.display.display(IPython.display.Audio(data=w, rate=hp.sample_rate))