# API Tutorial - Text to Speech

<a target="_blank" href="https://colab.research.google.com/github/ai-amplified/models/blob/main/tutorials/Text%20to%20Speech.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

## Short Description
We have three models for Text to Speech based on:
- **Autoregressive Diffusion (Premium Quality)**
- **Style Diffusion with Adversarial Learning (High Quality)**
- **Variational Inference with Adversarial Learning (Standard Quality)**

### TTS in English, Autoregressive Diffusion (Premium Quality)
This emotion-sensitive TTS model delivers highly accurate and expressive speech from input text, offering exceptional value at up to **10x cheaper** than top-tier competitors. Trained on Autoaregressive Diffusion, it blends autoregressive transformers and diffusion models to produce smooth, coherent speech. The model allows a dynamic tradeoff between speed and performance, supporting various applications with diverse US and UK male and female voices. 

### TTS in English, Style Diffusion with Adversarial Learning (High Quality)
This model meets the rising demand for expressive, natural-sounding speech synthesis, which revolutionize audio generation through techniques like style diffusion and adversarial training. These techniques enhance the model's ability to produce authentic-sounding speech, appealing to discerning listeners. With a diverse range of US male and female voices, the model brings versatility and authenticity to TTS applications, making it ideal for immersive storytelling, personalized virtual assistants, and interactive gaming dialogues, setting a new standard for lifelike speech synthesis.

### Variational Inference with Adversarial Learning (Standard Quality)
This TTS model, up to **7x cheaper** than competitors, delivers exceptional value without compromising performance. Built on the VITS architecture, it offers ultra-fast inference and affordability, leveraging a conditional variational autoencoder with adversarial learning for simplified, high-performance TTS. Supporting multiple languages, this model is ideal for creating natural-sounding speech for diverse applications, enabling businesses and developers to captivate audiences and craft immersive experiences with ease and authenticity.


## Tutorial
This tutorial will guide you through using the Text to Speech API. By following the steps below, you'll be able to convert text to speech using the API. The main steps involved are:

1. Creating an access token
2. Installing the aimped library
3. Running the API with your credentials and payload

## Step 1: Create Access Token

To use the API, you need an access token. Follow these steps to create one:

1. Go to the [API Access Token Creation Page](https://aimped.ai/a3m/#/tokens). You will land here:
![Token Creation Page](images/token_11.png)

2. Select scopes and click on "Create Token".
3. After clicking this button, you will see the pop-up from where you can copy the User Key and User Secret.

![Token Creation Page2](images/token_22.png)

3. Copy the generated access tokens and keep it safe. You'll need it for the next steps.

## Step 2: Install aimped Library
To interact with the API, you need to install the aimped Python library. Open your terminal or command prompt and run the following command:

In [2]:
!pip install aimped==0.2.2

This command will install the necessary library to communicate with the API.

## Step 3: Run the API
Now that you have your access tokens and the library installed, you can run the API for Text to Speech. Follow these steps:

### Set up your credentials:

In [3]:
user_key = "YOUR_USER_KEY"
user_secret = "YOUR_USER_SECRET"

### Import the AimpedAPI class and set the base URL and model ID:
For using different Models and Languages, you just need to change the **Model ID**. The Model ID can be found under "API Information" in the "API Details" tab on each model card.

In [15]:
from aimped.services.api import AimpedAPI

BASE_URL = 'https://aimped.ai'
model_id = "134" # the Model ID can be found under "API Information" in the "API Details" tab on each model card.

### Initialize the API service:

In [16]:
api_service = AimpedAPI(user_key, user_secret, {"base_url": BASE_URL})

### Define your payload:
Define payload according to your input data type.

### Model - Autoregressive Diffusion (Premium Quality)

#### For Text Input

In [20]:
payload = {
  "data_type": "data_json",
  "extra_fields": {
      "voice_id": "Ava (USA-Female)", # Lucy (USA-Female), Chloe (USA-Female), Amelia (UK-Female), John (USA-Male), Paul (USA-Male), Jackson (UK-Male)
      "format": "wav",     # mp3, flac, opus, ogg, aag, mulaw
      "temperature": 0.86, # min: 0.01, max: 2
      "iterations": 82,    # min: 5, max: 150
  },
  "data_json": {
    "text": [
      "I am so relaxed and very happy to hear from you. It's amazing. I am so excited to give you this good news."
    ]
  }
}


#### For File Input

In [4]:
path_uri_obj = api_service.file_upload(
    model_id,
    '/Users/John/Downloads/sample.txt'  # sample file path to upload
    )
path_uri = path_uri_obj['url']

payload = {
  "data_type": "data_txt",
  "extra_fields": {
      "voice_id": "Ava (USA-Female)", # Lucy (USA-Female), Chloe (USA-Female), Amelia (UK-Female), John (USA-Male), Paul (USA-Male), Jackson (UK-Male)
      "format": "wav",     # mp3, flac, opus, ogg, aag, mulaw
      "temperature": 0.86, # min: 0.01, max: 2
      "iterations": 82,    # min: 5, max: 150
  },
  "data_txt": [
      path_uri # Path of your text file
  ]
}

### Model - Style Diffusion with Adversarial Learning (High Quality)

#### For Text Input

In [None]:
payload = {
  "data_type": "data_json",
  "extra_fields": {
      "voice_id": "Andre (USA-Male)", # Finn (USA-Male), Jack (USA-Male), Emily (USA-Female), Ivy (USA-Female)
      "format": "wav",      # mp3, flac, opus, ogg, aag, mulaw
      "timbre": 0.7,        # min: 0.1, max: 1
      "prosody": 0.9,       # min: 0.1, max: 1
      "diff_step": 11,      # min: 5, max: 20 
      "emotion_scale": 3.6, # min: 0, max: 5
  },
  "data_json": {
    "text": [
      "I am so relaxed and very happy to hear from you. It's amazing. I am so excited to give you this good news."
    ]
  }
}

#### For File Input

In [None]:
path_uri_obj = api_service.file_upload(
    model_id,
    '/Users/John/Downloads/sample.txt'  # sample file path to upload
    )
path_uri = path_uri_obj['url']

payload = {
  "data_type": "data_txt",
  "extra_fields": {
      "voice_id": "Andre (USA-Male)", # Finn (USA-Male), Jack (USA-Male), Emily (USA-Female), Ivy (USA-Female)
      "format": "wav",      # mp3, flac, opus, ogg, aag, mulaw
      "timbre": 0.7,        # min: 0.1, max: 1
      "prosody": 0.9,       # min: 0.1, max: 1
      "diff_step": 11,      # min: 5, max: 20 
      "emotion_scale": 3.6, # min: 0, max: 5
  },
  "data_txt": [
      path_uri # Path of your text file
  ]
}

### Model - Variational Inference with Adversarial Learning (Standard Quality)
The VITS model offers TTS in a wide range of languages. Make sure to choose the corresponding Model ID and Voice ID.

#### For Text Input

In [None]:
payload = {
  "data_type": "data_json",
  "extra_fields": {
      "voice_id": "Benjamin (Male)", # Elijah (Male), Daniel (Male), Evelyn (Female), Avery (Female)
      "format": "wav", # mp3, flac, opus, ogg, aag, mulaw
      "speed": 1,      # min: 0.1, max: 10
  },
  "data_json": {
    "text": [
      "I am so relaxed and very happy to hear from you. It's amazing. I am so excited to give you this good news."
    ]
  }
}

#### For File Input

In [None]:
path_uri_obj = api_service.file_upload(
    model_id,
    '/Users/John/Downloads/sample.txt'  # sample file path to upload
    )
path_uri = path_uri_obj['url']

payload = {
  "data_type": "data_txt",
  "extra_fields": {
      "voice_id": "Benjamin (Male)", # Elijah (Male), Daniel (Male), Evelyn (Female), Avery (Female)
      "format": "wav", # mp3, flac, opus, ogg, aag, mulaw
      "speed": 1,      # min: 0.1, max: 10
  },
  "data_txt": [
      path_uri # Path of your text file
  ]
}

### Run the model:

In [21]:
result = api_service.run_model(model_id, payload)

If you're running this model for the first time or after a long time, you might see the following message:

In [19]:
print(result)

{'message': 'We will notify you via email when the instance is ready.'}


Wait for the email notification indicating that the instance is ready. You will be notified on the [Aimped](https://aimped.ai/) as well.
![Notification Page](images/notif_1.png)

You will see this notification, once the instance is ready:
![Notification Page2](images/tts_notif.png)

Once you receive the email or notification on aimped, run the model again:

In [44]:
result = api_service.run_model(model_id, payload)

In [2]:
result

{'status': True,
 'used_credits': 0.38,
 'data_type': ['data_audio'],
 'output': {'data_audio': ['output/audio/model_554/user_3229/003ae05d_test.wav']}}

In [None]:
#Download and save audio file
source_url = 'output/audio/model_554/user_3229/003ae05d_test.wav'
target_path = '/Users/John/Downloads/audio.wav'
api_service.file_download_and_save(source_urlce, target_path)