# Text to speech
Text to speech systems (TTS) transforms written text into a waveform with the corresponding spoken text.

There exist a set of commercial products for this task, e.g. eleven labs https://beta.elevenlabs.io/ (last visited on June 2023).

One of the simplest ways to use text to speech in python is the module pyttsx3:

https://pypi.org/project/pyttsx3/

This module is a wrapper for the os-internal text to speech system. At least for Windows, the results are satisfying.

In [1]:
import pyttsx3
import time
import os
os.chdir('../Python')
import WaveInterface

Filename = 'output.wav'

# initialize the speech synthesizer
engine = pyttsx3.init()

rate = engine.getProperty('rate')   # getting details of current speaking rate
engine.setProperty('rate', 125)     # setting up new voice rate

# https://ichi.pro/de/eine-einfuhrung-in-pyttsx3-ein-text-zu-sprache-konverter-fur-python-81905511310787
voices = engine.getProperty('voices')       #getting details of current voice
for voice in voices:
    print("Voice: %s" % voice.name)
    print(" - ID: %s" % voice.id)
    print(" - Languages: %s" % voice.languages)
    print(" - Gender: %s" % voice.gender)
    print(" - Age: %s" % voice.age)
    print("\n")

engine.setProperty('voice', voices[0].id)  #changing index, changes voices. 0 for german female
#engine.setProperty('voice', voices[1].id)   #changing index, changes voices. 1 for english female

# select the language
engine.setProperty('voice', 'german')

# select the text
sentence = "Lassen Sie die Ente zu Wasser."

Voice: Microsoft Hedda Desktop - German
 - ID: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\TTS_MS_DE-DE_HEDDA_11.0
 - Languages: []
 - Gender: None
 - Age: None


Voice: Microsoft Zira Desktop - English (United States)
 - ID: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\TTS_MS_EN-US_ZIRA_11.0
 - Languages: []
 - Gender: None
 - Age: None




In [2]:
StartTime = time.time()
# use the synthesizer to say the sentence
#engine.say(sentence)
# use the synthesizer to save the wave form to a file
engine.save_to_file(sentence, Filename)

# wait until everything is finished
engine.runAndWait()

StopTime = time.time()

## Latency and real-time capability
In order to measure the real-time capability of the algorithm on the used hardware, the length of the output waveform and the duration of evaluation is measured.

If the length of the output waveform is longer than the duraction of evaluation, the algorithm is real-time capable.

In [3]:
x, Fs, bits = WaveInterface.ReadWave(Filename)

print(str(x.shape[0] / Fs), ' seconds of audio generated in ', str(StopTime - StartTime), ' seconds')

3.075691609977324  seconds of audio generated in  0.2465212345123291  seconds


## Programming exercise

Implement the following procedures to uses TTS and dataset augmentation in order to generate a second set of data in the folder TTSCommands:

GetCommandList() uses TrainingsDataInterface.CTrainingsDataInterface to generate a list of all possible commands in the folder ../Python/Commands.

CommandToFilename(command) gets a single command as input, e.g. 'Stopp'. It should return a string consisting of the following parts: Foldername/command/command_UUID.wav. Foldername corresponds to TTSCommands. command corresponds to the input argument of the procedure. UUID corresponding to a unique hexadecimal string. Search the web for creating a UUID in python.

CreateWaveFile(command) uses the engine of pyttsx3 in order to create a wave file of the command and store it in the filename created by CommandToFilename(command). The filename should be returned.

DoResampling(x, OriginalSamplingRate) takes input samples $x$ at a given original sampling rate and uses the package librosa to apply a resampling to TargetSamplingRate$=48000$ Hz. This resampled vector is the output.

ApplyCorrectSamplingRate(Filename) loads the wave file located at filename, apply the procedure DoResampling on the samples and overwrites the filename with the resampled data.

In [7]:
Foldername = 'TTSCommands'
TargetSamplingRate = 48000

### solution begins
import TrainingsDataInterface
import uuid
import os
import DatasetAugmentation
import WaveInterface
import librosa
### solution ends

def GetCommandList():
    CommandList = []
    ### solution begins
    ATrainingsDataInterface = TrainingsDataInterface.CTrainingsDataInterface()
    for CommandIndex in range(ATrainingsDataInterface.GetNumberOfCommands()):
        command = ATrainingsDataInterface.GetCommandString(CommandIndex)
        CommandList.append(command)
    ### solution ends
    return CommandList

def CommandToFilename(command):
    ### solution begins
    Filename = Foldername + '/' + command + '/' + command + '_' + str(uuid.uuid4()) + '.wav'
    ### solution ends
    return Filename

def CreateWaveFile(command):
    ### solution begins        
    Filename = CommandToFilename(command)
    engine.save_to_file(command, Filename)
    engine.runAndWait()     
    ### solution ends   
    return Filename

def DoResampling(x, OriginalSamplingRate):
    ### solution begins
    if np.abs(TargetSamplingRate - OriginalSamplingRate) > 1:
        y = librosa.resample(x, orig_sr = OriginalSamplingRate, target_sr = TargetSamplingRate)
        y *= (np.max(np.abs(x)) / np.max(np.abs(y)))
        return y
    else:
        return x
    ### solution ends

def ApplyCorrectSamplingRate(Filename):
    ### solution begins
    x, Fs, bits = WaveInterface.ReadWave(Filename)
    y = DoResampling(x, Fs)
    WaveInterface.WriteWave(y, TargetSamplingRate, bits, Filename)
    ### solution ends

def CreateDatasetAugmentation(Filename, command):    
    x, Fs, bits = WaveInterface.ReadWave(Filename)
    AAudioDatasetAugmentation = DatasetAugmentation.CAudioDatasetAugmentation(x, Fs)
    ListOfResults = AAudioDatasetAugmentation.Generate()
    ListOfFilenames = []
    for n in range(len(ListOfResults)):        
        ListOfFilenames.append(Filename)
        WaveInterface.WriteWave(ListOfResults[n], Fs, bits, Filename) 
        Filename = CommandToFilename(command)
    return ListOfFilenames

import unittest
import numpy as np
from os.path import exists
import shutil

if exists(Foldername):
    shutil.rmtree(Foldername)
os.makedirs(Foldername)
CommandList = GetCommandList()
for command in CommandList:       
    os.makedirs(Foldername + '/' + command)
    Filename = CreateWaveFile(command)
    ApplyCorrectSamplingRate(Filename)
    assert exists(Filename), 'error in file creation: file not existing'
    x, Fs, bits = WaveInterface.ReadWave(Filename)
    assert Fs == TargetSamplingRate, 'output file has wrong sampling rate'

    ListOfFilenames = CreateDatasetAugmentation(Filename, command) 
    for Filename in ListOfFilenames:
        assert exists(Filename), 'error in file creation: file not existing'
        x, Fs, bits = WaveInterface.ReadWave(Filename)
        assert Fs == TargetSamplingRate, 'output file has wrong sampling rate'

class TestProgrammingExercise(unittest.TestCase):

    def test_GetCommandList1(self):
        CommandList = GetCommandList()
        self.assertEqual(len(CommandList), 47)

    def test_GetCommandList2(self):
        CommandList = GetCommandList()
        output = []
        for x in CommandList:
            if x not in output:
                output.append(x)
        self.assertEqual(len(CommandList), len(output))

    def test_GetCommandList3(self):
        CommandList = GetCommandList()
        output = []
        for x in CommandList:
            self.assertTrue(isinstance(x, str))

    def test_CommandToFilename1(self):
        s = 'random' + str(np.random.randint(10))
        Filename = CommandToFilename(s)
        self.assertEqual(Filename[-4:], '.wav')

    def test_CommandToFilename2(self):
        s = 'random' + str(np.random.randint(10))
        Filename = CommandToFilename(s)
        self.assertEqual(Filename[:len(Foldername)], Foldername)

    def test_CommandToFilename3(self):
        s = 'random' + str(np.random.randint(10))
        Filename = CommandToFilename(s)
        self.assertTrue(isinstance(Filename, str))

    def test_CommandToFilename4(self):
        from uuid import UUID
        s = 'random' + str(np.random.randint(10))
        Filename = CommandToFilename(s)
        try:
            uuid_obj = UUID(Filename[-40:-4])
            IsUUID = True
        except ValueError:
            IsUUID = False
        self.assertTrue(IsUUID)

    def test_CommandToFilename5(self):
        s = 'random' + str(np.random.randint(10))
        Filename = CommandToFilename(s)
        self.assertEqual(Filename[-41], '_')
    
    def test_Result1(self):
        ATrainingsDataInterface = TrainingsDataInterface.CTrainingsDataInterface()
        BTrainingsDataInterface = TrainingsDataInterface.CTrainingsDataInterface(Foldername)
        self.assertEqual(ATrainingsDataInterface.GetNumberOfCommands(), BTrainingsDataInterface.GetNumberOfCommands())

    def test_Result2(self):
        ATrainingsDataInterface = TrainingsDataInterface.CTrainingsDataInterface()
        BTrainingsDataInterface = TrainingsDataInterface.CTrainingsDataInterface(Foldername)
        for commandindex in range(ATrainingsDataInterface.GetNumberOfCommands()):
            self.assertEqual(ATrainingsDataInterface.GetCommandString(commandindex), BTrainingsDataInterface.GetCommandString(commandindex))

    def test_Result3(self):
        BTrainingsDataInterface = TrainingsDataInterface.CTrainingsDataInterface(Foldername)
        x = np.random.randn(1000)        
        AAudioDatasetAugmentation = DatasetAugmentation.CAudioDatasetAugmentation(x, TargetSamplingRate)
        for commandindex in range(BTrainingsDataInterface.GetNumberOfCommands()):
            self.assertEqual(BTrainingsDataInterface.GetNumberOfCommandInstances(commandindex), AAudioDatasetAugmentation.GetNumberOfResults())        
            
unittest.main(argv=[''], verbosity=2, exit=False)

test_CommandToFilename1 (__main__.TestProgrammingExercise.test_CommandToFilename1) ... ok
test_CommandToFilename2 (__main__.TestProgrammingExercise.test_CommandToFilename2) ... ok
test_CommandToFilename3 (__main__.TestProgrammingExercise.test_CommandToFilename3) ... ok
test_CommandToFilename4 (__main__.TestProgrammingExercise.test_CommandToFilename4) ... ok
test_CommandToFilename5 (__main__.TestProgrammingExercise.test_CommandToFilename5) ... ok
test_GetCommandList1 (__main__.TestProgrammingExercise.test_GetCommandList1) ... ok
test_GetCommandList2 (__main__.TestProgrammingExercise.test_GetCommandList2) ... ok
test_GetCommandList3 (__main__.TestProgrammingExercise.test_GetCommandList3) ... ok
test_Result1 (__main__.TestProgrammingExercise.test_Result1) ... ok
test_Result2 (__main__.TestProgrammingExercise.test_Result2) ... ok
test_Result3 (__main__.TestProgrammingExercise.test_Result3) ... ok

----------------------------------------------------------------------
Ran 11 tests in 0.848s

<unittest.main.TestProgram at 0x1be7fe44f50>

## Exam preparation

1) A simple text to speech (TTS) system can be implemented by recording all words spoken by a single person. With this set of recordings a TTS can be defined by concatenating the recordings of the target words. What problems arise with this implementation?

2) The probability of words in a language can be estimated by the Zipf theorem: If you consider the $N$ words with highest probability, the probability of the word with position $n$ can be evaluated by $p(n)\approx\frac{1}{n\cdot\ln\left(1.78\cdot N\right)}$. Evaluate the probabilities of the five words with highest probability 'der', 'die', 'und', 'in' and 'den' according to 'Häufigkeitswörterbuch von F. W. Kaeding, 1897' assuming a set of words of $N=10^5$. How many words do you need, in order to represent $50$ % of the words used in a language according to Zipf theorem.

In [7]:
### solution
import numpy as np

TargetProbability = 0.5
NumberOfWordsInALanguage = 1e5
sum = 0.0
n = 1
while sum < (np.log(1.78*NumberOfWordsInALanguage) * TargetProbability):
    sum += 1/n
    n += 1
print(n, ' words have a probability of ', TargetProbability)

238  words have a probability of  0.5
