# TTS

This folder shows an end-to-end AI example, with the[Coqui AI TTS](https://github.com/coqui-ai/TTS/) text-to-speech library. The demo also shows how to run a photon with multimedia outputs (in this case a WAV response.)

With this demo, you will be able to run deepfloyd and get results like follows:


<audio src="assets/thequickbrownfox.mp3" controls></audio>

First, let's install the necessities.

In [1]:
!pip install -r requirements.txt > /dev/null

[33mDEPRECATION: torchsde 0.2.5 has a non-standard dependency specifier numpy>=1.19.*; python_version >= "3.7". pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of torchsde or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063[0m[33m
[0m

## Running the code locally

Note: if you do not have a local GPU, skip to <a href=#remote>the next section</a>.

The code, `tts.py`, live under the same folder as this ipython notebook. Feel free to check it out. We will move on to running it. Let's first see if we have a GPU.

In [2]:
import torch
if torch.cuda.is_available():
    print("Great, we have a GPU.")
else:
    print("Actually, running without a GPU is quite slow and not recommended.")

Great, we have a GPU.


Now, let's run the photon. Since we are in the ipython notebook, we will use the subprocess module to spawn the local deployment. If you are going to run it manually, feel free to just run `python tts.py`.

In [3]:
from subprocess import Popen, PIPE
process = Popen(['python', 'tts.py'])

2023-08-15 11:55:27.512 | INFO     | __main__:init:28 - Loading the model...
2023-08-15 11:55:27.512 | INFO     | __main__:_load_model:45 - Loading model tts_models/en/vctk/vits
2023-08-15 11:55:27.519 | INFO     | __main__:_load_model:47 - Using GPU
2023-08-15 11:55:28.736 | INFO     | __main__:_load_model:52 - Loaded model tts_models/en/vctk/vits
2023-08-15 11:55:28.736 | INFO     | __main__:_load_model:54 - Model has languages []
2023-08-15 11:55:28.737 | INFO     | __main__:_load_model:55 - Model has speakers ['ED\n', 'p225', 'p226', 'p227', 'p228', 'p229', 'p230', 'p231', 'p232', 'p233', 'p234', 'p236', 'p237', 'p238', 'p239', 'p240', 'p241', 'p243', 'p244', 'p245', 'p246', 'p247', 'p248', 'p249', 'p250', 'p251', 'p252', 'p253', 'p254', 'p255', 'p256', 'p257', 'p258', 'p259', 'p260', 'p261', 'p262', 'p263', 'p264', 'p265', 'p266', 'p267', 'p268', 'p269', 'p270', 'p271', 'p272', 'p273', 'p274', 'p275', 'p276', 'p277', 'p278', 'p279', 'p280', 'p281', 'p282', 'p283', 'p284', 'p285', 

Wait for the above process to start. Because it is loading the checkpoing and making initializations, it will take quite some time, especially if we need to download the checkpoints. Towards the end, you will see "Uvicorn running on http://0.0.0.0:8080" (or another port) - this means the service is successfully running.

Now, let's use the lepton sdk client to communicate to the service.

In [4]:
from leptonai.client import Client, local
# Note: if the port above is not 8080 (the default), specify the port with local(port=xxxx).
c = Client(local())
print("Possible paths are:")
print(c.paths())

2023-08-15 Possible paths are:
dict_keys(['/list_languages', '/list_speakers', '/get_model_name', '/tts'])
11:55:34,930 - INFO:     127.0.0.1:51098 - "GET /openapi.json HTTP/1.1" 200 OK


In [5]:
# The example exposes 4 different paths, and let's look at
# the documentation of each path - they are automatically
# generated by the sdk.
help(c.get_model_name)
help(c.list_languages)
help(c.list_speakers)
help(c.tts)

Help on function get_model_name in module leptonai.client:

get_model_name(*args, **kwargs)
    Returns the name of the current model.
    
    Automatically inferred parameters from openapi:
    
    Input Schema: None
    
    Output Schema:
      output: str

Help on function list_languages in module leptonai.client:

list_languages(*args, **kwargs)
    Returns a list of languages supported by the current model. Empty list
    if no model is loaded, or the model does not support multiple languages.
    
    Automatically inferred parameters from openapi:
    
    Input Schema: None
    
    Output Schema:
      output: array[str]

Help on function list_speakers in module leptonai.client:

list_speakers(*args, **kwargs)
    Returns a list of speakers supported by the current model. Empty list
    if no model is loaded, or the model does not support multiple speakers.
    
    Automatically inferred parameters from openapi:
    
    Input Schema: None
    
    Output Schema:
      out

In [6]:
# Let's inspect the current model.
print(f"Model name is: {c.get_model_name()}")
print(f"Supported languages are: {c.list_languages()}")
print(f"Supported speakers are: {c.list_speakers()}")

2023-08-15 11:55:42,458 - INFO:     127.0.0.1:54566 - "POST /get_model_name HTTP/1.1" 200 OK
Model name is: tts_models/en/vctk/vits
2023-08-15 11:55:42,461 - INFO:     127.0.0.1:54566 - "POST /list_languages HTTP/1.1" 200 OK
Supported languages are: []
2023-08-15 11:55:42,463 - INFO:     127.0.0.1:54566 - "POST /list_speakers HTTP/1.1" 200 OK
Supported speakers are: ['ED\n', 'p225', 'p226', 'p227', 'p228', 'p229', 'p230', 'p231', 'p232', 'p233', 'p234', 'p236', 'p237', 'p238', 'p239', 'p240', 'p241', 'p243', 'p244', 'p245', 'p246', 'p247', 'p248', 'p249', 'p250', 'p251', 'p252', 'p253', 'p254', 'p255', 'p256', 'p257', 'p258', 'p259', 'p260', 'p261', 'p262', 'p263', 'p264', 'p265', 'p266', 'p267', 'p268', 'p269', 'p270', 'p271', 'p272', 'p273', 'p274', 'p275', 'p276', 'p277', 'p278', 'p279', 'p280', 'p281', 'p282', 'p283', 'p284', 'p285', 'p286', 'p287', 'p288', 'p292', 'p293', 'p294', 'p295', 'p297', 'p298', 'p299', 'p300', 'p301', 'p302', 'p303', 'p304', 'p305', 'p306', 'p307', 'p308'

In [7]:
# Let's actually run a tts example.
text = """
It was the best of times, it was the worst of times,
it was the age of wisdom, it was the age of foolishness,
it was the epoch of belief, it was the epoch of incredulity,
it was the season of light, it was the season of darkness,
it was the spring of hope, it was the winter of despair.
"""
audio = c.tts(text=text, speaker="p225")
import IPython
IPython.display.Audio(audio)

2023-08-15 11:55:44.964 | INFO     | __main__:_tts:98 - Synthesizing '
It was the best of times, it was the worst of times,
it was the age of wisdom, it was the age of foolishness,
it was the epoch of belief, it was the epoch of incredulity,
it was the season of light, it was the season of darkness,
it was the spring of hope, it was the winter of despair.
' with language 'None' and speaker 'p225'
2023-08-15 11:55:44.964 | INFO     | __main__:_tts:111 - Synthesizing '
It was the best of times, it was the worst of times,
it was the age of wisdom, it was the age of foolishness,
it was the epoch of belief, it was the epoch of incredulity,
it was the season of light, it was the season of darkness,
it was the spring of hope, it was the winter of despair.
' with language 'None' and speaker 'p225'


 > Text splitted to sentences.
['It was the best of times, it was the worst of times,', 'it was the age of wisdom, it was the age of foolishness,', 'it was the epoch of belief, it was the epoch of incredulity,', 'it was the season of light, it was the season of darkness,', 'it was the spring of hope, it was the winter of despair.']
 > Processing time: 0.5445094108581543
 > Real-time factor: 0.02821378470650426
2023-08-15 11:55:45,555 - INFO:     127.0.0.1:54566 - "POST /tts HTTP/1.1" 200 OK


Viola! Feel free to play more with it, and when we are done, let's clean up the local execution.

In [8]:
# Once we are done, let's close up the local process.
process.terminate()

2023-08-15 11:55:55,785 - INFO:     Shutting down
2023-08-15 11:55:55,886 - INFO:     Waiting for application shutdown.
2023-08-15 11:55:55,886 - INFO:     Application shutdown complete.
2023-08-15 11:55:55,886 - INFO:     Finished server process [46397]


# Running remotely <a name="remote" />

Let's try to run the photon remotely by creating, pushing, and running the photon remotely. First, let's log in.

Go to [https://dashboard.lepton.ai/credentials](https://dashboard.lepton.ai/credentials), log in, and copy your workspace's credentials to the below line, replacing `INSERT_YOUR_CREDENTIAL_HERE`. The credential looks like `jazwwwt0:dsfsdweldhifdsfdsfd`.

In [9]:
!lep login -c INSERT_YOUR_CREDENTIAL_HERE

[34m    _     _____ ____ _____ ___  _   _       _    ___     [0m
[34m   | |   | ____|  _ \_   _/ _ \| \ | |     / \  |_ _|    [0m
[34m   | |   |  _| | |_) || || | | |  \| |    / _ \  | |     [0m
[34m   | |___| |___|  __/ | || |_| | |\  |   / ___ \ | |     [0m
[34m   |_____|_____|_|    |_| \___/|_| \_|  /_/   \_\___|    [0m
                                                         
Logged in to your workspace [32mjazwwwt0[0m.
        build time: 2023-08-15_17-48-15
           version: 6873aa1


Cool, let's run it.

In [10]:
!lep photon create -n tts -m tts.py
!lep photon push -n tts
!lep photon run -n tts --deployment-name tts --resource-shape gpu.t4

Photon [32mtts[0m created.
Photon [32mtts[0m pushed to workspace.
Running the most recent version of [32mtts[0m: tts-9p34ausg
Photon launched as [32mtts[0m. Use `lep deployment status -n tts` to check the status.


In [12]:
# Let's see if the photon is running. If it hasn't finished starting yet, wait a bit and re-check.
!lep deployment status -n tts

Created at: 2023-08-15 11:56:19
Photon ID:  tts-9p34ausg
State:      [32mRunning[0m
Endpoint:   https://jazwwwt0-tts.cloud.lepton.ai
Is Public:  No
Replicas List:
┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┓
┃[1m [0m[1mreplica id          [0m[1m [0m┃[1m [0m[1mstatus[0m[1m [0m┃[1m [0m[1mmessage[0m[1m [0m┃
┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━┩
│ tts-67d6b5958f-g5lsd │ [32mReady[0m  │ (empty) │
└──────────────────────┴────────┴─────────┘
[32m1[0m out of 1 replicas ready.


In [13]:
# These are helper commands for us to get current workspace's id and token
# so we can login via the client.
!echo Workspace id is `lep workspace id`
!echo Workspace token is `lep workspace token`

Workspace id is jazwwwt0
Workspace token is mg***[redacted]***2a


Let's create the client, and once we have the client, we can run the code exactly as if we are accessing the local server above:

In [14]:
from leptonai.client import Client
# Note: copy the id and token above to this line.
c = Client("jazwwwt0", "tts", token="mg***[redacted]***2a")
print("Possible paths are:")
print(c.paths())

Possible paths are:
dict_keys(['/list_languages', '/list_speakers', '/get_model_name', '/tts'])


In [15]:
# The example exposes 4 different paths, and let's look at
# the documentation of each path - they are automatically
# generated by the sdk.
help(c.get_model_name)
help(c.list_languages)
help(c.list_speakers)
help(c.tts)

Help on function get_model_name in module leptonai.client:

get_model_name(*args, **kwargs)
    Returns the name of the current model.
    
    Automatically inferred parameters from openapi:
    
    Input Schema: None
    
    Output Schema:
      output: str

Help on function list_languages in module leptonai.client:

list_languages(*args, **kwargs)
    Returns a list of languages supported by the current model. Empty list
    if no model is loaded, or the model does not support multiple languages.
    
    Automatically inferred parameters from openapi:
    
    Input Schema: None
    
    Output Schema:
      output: array[str]

Help on function list_speakers in module leptonai.client:

list_speakers(*args, **kwargs)
    Returns a list of speakers supported by the current model. Empty list
    if no model is loaded, or the model does not support multiple speakers.
    
    Automatically inferred parameters from openapi:
    
    Input Schema: None
    
    Output Schema:
      out

In [16]:
# Let's actually run a tts example.
text = """
It was the best of times, it was the worst of times,
it was the age of wisdom, it was the age of foolishness,
it was the epoch of belief, it was the epoch of incredulity,
it was the season of light, it was the season of darkness,
it was the spring of hope, it was the winter of despair.
"""
audio = c.tts(text=text, speaker="p225")
import IPython
IPython.display.Audio(audio)

Great! Once we are done, let's clean up the deployment.

In [17]:
# Once we are done, let's shut down the remote service.
!lep deployment remove -n tts

Deployment [32mtts[0m removed.


# Conclusion

This is it! you can find more resources at:
- [the Lepton AI example repo](https://github.com/leptonai/examples)
- [the Lepton AI documentation](https://lepton.ai/docs)

And you are more than welcome to [email us](mailto:info@lepton.ai)