# Introduction

In this demo, we will show how dubbing works.

The main data classes are located in `kaia.persona.dub.core.structures`. 

* `Dub` is an abstract class representing the connection of some value (of arbitrary type) to the string and voice line.
* `SetDub` is a class that binds values of a finite set. `DictDub` and `EnumDub` are its descendants for dictionaries and enums.
* `SequenceDub` is a sequence of constants and some other dubs.
* `UnionDub` contains several sequences. The idea is that it gets the values, converts it to dict, then finds a sequence processing these values, and uses this sequence to create a string representation of the value.

Other dubs are language-specific and are located in `kaia.persona.dub.languages.en`. These are, e.g., `CardinalDub` and `OrdinalDub` which inherit `SetDub` and represent numbers; or `DateDub` which extends `UnionDub` and processes `datetime.date` objects.

To define intents and replies of the assistant, `Template` class is used; this class contains `UnionDub` as a field. `Template` also contains methods for parsing, to-string convertion and others. These methods are shortcuts for algorithms that are located in `kaia.persona.dub.core.algorithms`. These algorithms are implementations of depth-first search over `UnionDub`, and you don't need to import them directly.

To represent a particular sentence that is a combination of `Template` and the associated value, `Utterance` is used.

## Templates

We will now create a template of an average complexity to demonstrate how dubbing works.

In [1]:
from kaia.avatar.dub.languages.en import Template, CardinalDub, PluralAgreement

template = Template(
    'It is {hours} {hours_word} and {minutes} {minutes_word}',
    hours = CardinalDub(0, 24),
    hours_word = PluralAgreement('hours', 'hour', 'hours'),
    minutes = CardinalDub(0, 60),
    minutes_word = PluralAgreement('minutes', 'minute', 'minutes')
)

In [2]:
value = dict(hours=11, minutes=1)
string = template.to_str(value)
string

'It is eleven hours and one minute'

Notice the word "hours" and "minute". The form is choosen by `PluralAgreement` in accordance with the value of the corresponding field.

Template can also parse strings:

In [3]:
template.parse(string)

{'minutes': 1, 'hours': 11}

Templates can be defined within `TemplatesCollection` subclasses for convenience:

In [4]:
from kaia.avatar.dub.languages.en import TemplatesCollection

class MyTemplates(TemplatesCollection):
    hello = Template("Hello")
    how_are_you = Template("How are you?")

MyTemplates.hello.name

'__main__.MyTemplates.hello'

That will assign the name of the field to the template.

We have defined a big class with lots of intents for testing purposes:

In [5]:
from kaia.avatar.dub.sandbox import Intents

[i.name for i in Intents.get_templates()]

['kaia.avatar.dub.sandbox.intents.Intents.yes',
 'kaia.avatar.dub.sandbox.intents.Intents.no',
 'kaia.avatar.dub.sandbox.intents.Intents.time',
 'kaia.avatar.dub.sandbox.intents.Intents.date',
 'kaia.avatar.dub.sandbox.intents.Intents.weather',
 'kaia.avatar.dub.sandbox.intents.Intents.transport',
 'kaia.avatar.dub.sandbox.intents.Intents.timer_create',
 'kaia.avatar.dub.sandbox.intents.Intents.timer_how_much_time',
 'kaia.avatar.dub.sandbox.intents.Intents.timer_how_many_timers',
 'kaia.avatar.dub.sandbox.intents.Intents.timer_cancel',
 'kaia.avatar.dub.sandbox.intents.Intents.spotify',
 'kaia.avatar.dub.sandbox.intents.Intents.cook']

## Voiceover

Voiceover is performed by BrainBox. Currently, there are two solutions for this:

* [https://github.com/synesthesiam/opentts](OpenTTS): a ready container that can do text-to-speech for many languages through various techniques. While working out of the box, it does not provide the voice cloning.
* CoquiTTS container that hosts available models (YourTTS, VITS and XTTS), as well as the fine-tuning results.

While TortoiseTTS is also doing text-to-speech, it's unpractical for this purpose, as it's slow and requires GPU. OpenTTS and CoquiTTS do not require GPU and use VITS-based models that works very fast. 

Other models are sure possible. Voiceover should be implemented as a BrainBoxDecider that takes text and voice, and returns the list of names (probably only one) of the audio files created. This file can the be downloaded and used for voiceover. 

The following code demonstrates the protocol. For the remaining of the demo, the following requirements apply:

* docker must be enabled and configured.
* OpenTTS demo from `brainbox_deciders` should be completed.

To run voiceover, we need a function that converts `text` and `voice` variables into BrainBoxTaskPack. In our case, it is going to be the single request to OpenTTS decider.

The result of the following cell should apper within few seconds even without GPU. If it's not the case, check BrainBox output console: it's probable that OpenTTS wasn't run in the `brainbox_decider` section and the container is being pulled. 

In [8]:
from kaia.brainbox import BrainBox, BrainBoxTask, BrainBoxTaskPack, DownloadingPostprocessor
from kaia.infra import FileIO
from ipywidgets import Audio
from kaia.brainbox import BrainBox

def task_generator(text, voice):
    return BrainBoxTaskPack(
        BrainBoxTask(
            id = BrainBoxTask.safe_id(), 
            decider='OpenTTS', 
            arguments=dict(voice='coqui-tts:en_vctk', lang='en', speakerId=voice, text=text)
            ),
        (),
        DownloadingPostprocessor(take_element_before_downloading=0, opener=FileIO.read_bytes)
    )
            
pack = task_generator('Hello world', 'p225')
with BrainBox().create_test_api() as api:
    result = api.execute(pack)
Audio(value=result, autoplay = False)

Audio(value=b'RIFFD\x0c\x01\x00WAVEfmt \x10\x00\x00\x00\x01\x00\x01\x00"V\x00\x00D\xac\x00\x00\x02\x00\x10\x00…

Now, `DubbingService` does pretty much the same: it executes the tasks, and also handles `Utterances` and caching: if something was already spoken, it won't be generated for the second time. Also, cache can be used for long outputs: separate it into several strings with `preview` method.

In [9]:
from kaia.avatar.server.dubbing_service import BrainBoxDubbingService

with BrainBox().create_test_api() as api:
    service = BrainBoxDubbingService(task_generator, api)
    audio = service.dub_string('Hello world via Dubbing Service', 'p225')
Audio(value = audio.data, autoplay = False)

Audio(value=b'RIFFD\xd0\x01\x00WAVEfmt \x10\x00\x00\x00\x01\x00\x01\x00"V\x00\x00D\xac\x00\x00\x02\x00\x10\x00…

`dub_string` returns `kaia.eaglesong.Audio` object, which is handy to use in voice assistant.