<a href="https://colab.research.google.com/github/Conv-AI/Convai_Documentation/blob/main/Convai_STT_Tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Convai STT API Tutorial

This interactive python notebook consists of code snippets that illustrates the different API endpoints related to speech-to-text services provided by Convai.

In the notebook, we will only deal with offline-transcription, where you can generate transcript by uploading an audio file.


Note:
1. Before moving on with this tutorial, remember to get your API-Key from the profile section.
2. Upload some sample audio file, in the colab, to generate a transcript for.
3. Convai support audio transcript generation for **wav** and **mp3** formats only.

You can always refer to the documentation here for more details: [Link](https://docs.convai.com/api-docs/reference/api-reference/speech-to-text-api)

### Imports

In [1]:
import requests
import json
import os

### Set up API key

In [11]:
os.environ["CONVAI_API_KEY"]= "your-api-key"

### Offline STT Call

This a pretty simple request made to the endpoint to get back the transcript. 

We can decide to get back only the transcript of the audio or more details with timestamps of particular sections of the transcript, by setting the **enableTimestamp** attribute value in the request.

In [9]:
STT_URL = "https://api.convai.com/stt/"
TIMESTAMP_REQUIRED_FLAG = "True"

payload={
	"enableTimestamps": TIMESTAMP_REQUIRED_FLAG	# Dont need to set if False (default).
}

files = {
    "file": open('path-to-audio','rb'),
}
headers = {
  "CONVAI-API-KEY": os.getenv("CONVAI_API_KEY"),
}

response = requests.request("POST", STT_URL, headers=headers, data=payload, files=files)

if response.status_code == 200:
  print("Transcript generated successfully")

  data = response.json()
  print("Complete Transcript: ", data["result"])

  if TIMESTAMP_REQUIRED_FLAG:   # We print the details if only we have opted for generating timestamps
    print(data["details"])

else:
  print("Failed to generate transcript")

  data = response.json()
  print("Error: ", data["ERROR"])

Transcript generated successfully
Complete Transcript:  Ayon asked me to say puzzle, so I told Fuzz. 
[{'id': '1', 'start-time': '0:00:00,720', 'end-time': '0:00:03,839', 'text': 'Ayon asked me to say puzzle, so I told Fuzz. '}]


The endpoint support processing of audio files on **wav** and **mp3** format only for now. We will extend our support to other formats pretty soon. If you have any dount or particular requirement to this, please reach out to our [Support Team](support@convai.com).

### Add New Words

You also have the ability to add new words to the vocabulary of the text corpus whenever you want to focus on the pronunciation of the new words.

The following code carries out the task for adding new words. Say we want to add the word **"convai"** to our list.

In [10]:
STT_ADD_WORD_URL = "https://api.convai.com/stt/add-words"

payload = json.dumps({
  "word": "convai"
})
headers = {
  "CONVAI-API-KEY": os.getenv("CONVAI_API_KEY"),
  "Content-Type": "application/json"
}

response = requests.request("POST", STT_ADD_WORD_URL, headers=headers, data=payload)

if response.status_code == 200:
  print("New word added")

  data = response.json()
  print("Status: ", data["STATUS"])


else:
  print("Failed to add words")

  data = response.json()
  print("Error: ", data["ERROR"])

New word added
Status:  0


And that wraps up the basic tutorial on the Speech-To-Text service provided by Convai.