# Text to speech
Text to speech systems (TTS) transforms written text into a waveform with the corresponding spoken text.

There exist a set of commercial products for this task, e.g. eleven labs https://beta.elevenlabs.io/ (last visited on June 2023).

One of the simplest ways to use text to speech in python is the module pyttsx3:

https://pypi.org/project/pyttsx3/

This module is a wrapper for the os-internal text to speech system-. At least for Windows, the results are satisfying.

In [1]:
import pyttsx3
import time
import os
os.chdir('../Python')
import WaveInterface

Filename = 'output.wav'

# initialize the speech synthesizer
engine = pyttsx3.init()

rate = engine.getProperty('rate')   # getting details of current speaking rate
engine.setProperty('rate', 125)     # setting up new voice rate

# https://ichi.pro/de/eine-einfuhrung-in-pyttsx3-ein-text-zu-sprache-konverter-fur-python-81905511310787
voices = engine.getProperty('voices')       #getting details of current voice
for voice in voices:
    print("Voice: %s" % voice.name)
    print(" - ID: %s" % voice.id)
    print(" - Languages: %s" % voice.languages)
    print(" - Gender: %s" % voice.gender)
    print(" - Age: %s" % voice.age)
    print("\n")

engine.setProperty('voice', voices[0].id)  #changing index, changes voices. 0 for german female
#engine.setProperty('voice', voices[1].id)   #changing index, changes voices. 1 for english female

# select the language
engine.setProperty('voice', 'german')

# select the text
sentence = "Lassen Sie die Ente zu Wasser."

Voice: Microsoft Hedda Desktop - German
 - ID: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\TTS_MS_DE-DE_HEDDA_11.0
 - Languages: []
 - Gender: None
 - Age: None


Voice: Microsoft Zira Desktop - English (United States)
 - ID: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\TTS_MS_EN-US_ZIRA_11.0
 - Languages: []
 - Gender: None
 - Age: None




In [2]:
StartTime = time.time()
# use the synthesizer to say the sentence
#engine.say(sentence)
# use the synthesizer to save the wave form to a file
engine.save_to_file(sentence, Filename)

# wait until everything is finished
engine.runAndWait()

StopTime = time.time()

## Latency and real-time capability
In order to measure the real-time capability of the algorithm on the used hardware, the length of the output waveform and the duration of evaluation is measured.

If the length of the output waveform is longer than the duraction of evaluation, the algorithm is real-time capable.

In [3]:
x, Fs, bits = WaveInterface.ReadWave(Filename)

print(str(x.shape[0] / Fs), ' seconds of audio generated in ', str(StopTime - StartTime), ' seconds')

3.075691609977324  seconds of audio generated in  0.21684479713439941  seconds


## Programming exercise

Use the given text to speech system to create one wave-file for each command listed in the folder ../Python/Commands. The naming convention for these wave files should be command_UUID.wav, with command corresponding to the current command-string and UUID corresponding to a unique hexadecimal string. Search the web for creating a UUID in python.

In [18]:


def GetCommandList():
    CommandList = []
    ### solution begins

    ### solution ends
    return CommandList

def CommandToFilename(command):
    ### solution begins

    ### solution ends
    return Filename

def CreateWaveFile(command):
    ### solution begins

    ### solution ends
    return Filename

import unittest
import numpy as np
from os.path import exists

CommandList = GetCommandList()
for command in CommandList:
    Filename = CreateWaveFile(command)
    assert exists(Filename), 'error in file creation: file not existing'

class TestProgrammingExercise(unittest.TestCase):

    def test_1(self):
        CommandList = GetCommandList()
        self.assertEqual(len(CommandList), 47)

    def test_2(self):
        CommandList = GetCommandList()
        output = []
        for x in CommandList:
            if x not in output:
                output.append(x)
        self.assertEqual(len(CommandList), len(output))

    def test_3(self):
        CommandList = GetCommandList()
        output = []
        for x in CommandList:
            self.assertTrue(isinstance(x, str))

    def test_4(self):
        s = 'random' + str(np.random.randint(10))
        Filename = CommandToFilename(s)
        self.assertEqual(Filename[-4:], '.wav')

    def test_5(self):
        s = 'random' + str(np.random.randint(10))
        Filename = CommandToFilename(s)
        self.assertEqual(Filename[:len(s)], s)

    def test_6(self):
        s = 'random' + str(np.random.randint(10))
        Filename = CommandToFilename(s)
        self.assertEqual(Filename[len(s)], '_')

    def test_7(self):
        s = 'random' + str(np.random.randint(10))
        Filename = CommandToFilename(s)
        self.assertTrue(isinstance(Filename, str))

    def test_8(self):
        from uuid import UUID
        s = 'random' + str(np.random.randint(10))
        Filename = CommandToFilename(s)
        try:
            uuid_obj = UUID(Filename[len(s)+1:-4])
            IsUUID = True
        except ValueError:
            IsUUID = False
        self.assertTrue(IsUUID)

unittest.main(argv=[''], verbosity=2, exit=False)

0 created
1 created
10 created
2 created
3 created
4 created
5 created
6 created
7 created
8 created
9 created
Aktion created
Attacke created
Hallo created
MOPS created
Schau created
Start created
Stopp created
Wende created
an created
auf created
aus created
blau created
drehe created
fest created
gehe created
gruen created
halt created
hinauf created
hinunter created
hoch created
ja created
langsam created
lauf created
links created
los created
nein created
rechts created
rot created
rueckwaerts created
runter created
schnell created
vor created


test_1 (__main__.TestProgrammingExercise.test_1) ... 

vorwaerts created
weiss created
zu created
zurueck created


ok
test_2 (__main__.TestProgrammingExercise.test_2) ... ok
test_3 (__main__.TestProgrammingExercise.test_3) ... ok
test_4 (__main__.TestProgrammingExercise.test_4) ... ok
test_5 (__main__.TestProgrammingExercise.test_5) ... ok
test_6 (__main__.TestProgrammingExercise.test_6) ... ok
test_7 (__main__.TestProgrammingExercise.test_7) ... ok
test_8 (__main__.TestProgrammingExercise.test_8) ... ok

----------------------------------------------------------------------
Ran 8 tests in 0.525s

OK


<unittest.main.TestProgram at 0x231d4398bd0>

## Exam preparation

1) A simple text to speech (TTS) system can be implemented by recording all words spoken by a single person. With this set of recordings a TTS can be defined by concatenating the recordings of the target words. What problems arise with this implementation?

2) The probability of words in a language can be estimated by the Zipf theorem: If you consider the $N$ words with highest probability, the probability of the word with position $n$ can be evaluated by $p(n)\approx\frac{1}{n\cdot\ln\left(1.78\cdot N\right)}$. Evaluate the probabilities of the five words with highest probability 'der', 'die', 'und', 'in' and 'den' according to 'Häufigkeitswörterbuch von F. W. Kaeding, 1897' assuming a set of words of $N=10^5$. How many words do you need, in order to represent $50$ % of the words used in a language according to Zipf theorem.

In [7]:
### solution
import numpy as np

TargetProbability = 0.5
NumberOfWordsInALanguage = 1e5
sum = 0.0
n = 1
while sum < (np.log(1.78*NumberOfWordsInALanguage) * TargetProbability):
    sum += 1/n
    n += 1
print(n, ' words have a probability of ', TargetProbability)

238  words have a probability of  0.5
