![https://pieriantraining.com/](../PTCenteredPurple.png)

In this notebook, we will walk through how to use boto3 to interact with [AWS Polly](https://aws.amazon.com/polly/) to convert text to speech,
[AWS Transcribe](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/transcribe.html) for converting speech into text, and [AWS Translate](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/translate.html) for translating text from one language to another.


### Synthesize Speech
AWS Polly converts text files to speech.
The core function we need to use is *[polly_client.synthesize_speech](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/polly/client/start_speech_synthesis_task.html) to which we can pass:
- Engine='standard'|'neural': What model you want to use
- LanguageCode: Only necessary when using a bilingual language
- OutputFormat: Desired output format
- OutputS3BucketName: The bucket where the mp3 file will be saved (not necessary)
- Text: The text you want to synthesise
- VoiceId: The voice id you want to use

Refer [here](https://docs.aws.amazon.com/polly/latest/dg/SupportedLanguage.html) for the language codes and [here](https://docs.aws.amazon.com/polly/latest/dg/voicelist.html) for the voices


Let's try it out!

In [2]:
import boto3

In [9]:
polly_client = boto3.client('polly', region_name="us-east-1")

text = "Hello everyone, I hope you are enjoying the course so far and are learning many cool things. Let's see how well AWS can transcribe this text"

response = polly_client.synthesize_speech(
    Engine="neural",  # Neural usually sounds better
    Text=text, 
    OutputFormat='mp3', 
    VoiceId='Matthew'
)

# Store the Audiostream
with open('speech.mp3', 'wb') as f:
    f.write(response['AudioStream'].read())


You are now able to navigate to this directory and listen to the generated speech!

In the next step we will check whether we can transcribe this file:

### Transcribing an audio file

AWS Transcribe converts audio files containing speech into text. Let's transcribe an audio file.
Basically, the only function we need to use is *[client.start_transcription_job](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/transcribe/client/start_transcription_job.html)*, to which you pass the following arguments:

- TranscriptionJobName: The name of your job
- LanguageCode: The language code of the spoken language. Alternatively you can use "IdentifyLanguage"
- Media: S3 location of the file you want to transcribe
- OutputBucketName: [Not required] Name of the Output Bucket. Alternatively, aws creates one for you and you do not need to change access rights
- MediaFormat: Format of input file

After starting this job, we need to wait until its finished to download the results.

So let's create a new bucket, upload some mp3 file to it and obtain its text content!

In [1]:
client = boto3.client('s3')
client.create_bucket(Bucket="demo-bucket-transcription")

{'ResponseMetadata': {'RequestId': '1B8N6Z6FP4H1PZJ6',
  'HostId': 'JauxpPmSvWvkh71L1j9EkWFHwvg8WVVydF7wpoyqWdgPETfGQ7qJMy+zxvQdaTMguwuOadexmX4N3z+hUekeQQ==',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': 'JauxpPmSvWvkh71L1j9EkWFHwvg8WVVydF7wpoyqWdgPETfGQ7qJMy+zxvQdaTMguwuOadexmX4N3z+hUekeQQ==',
   'x-amz-request-id': '1B8N6Z6FP4H1PZJ6',
   'date': 'Wed, 13 Sep 2023 16:31:53 GMT',
   'location': '/demo-bucket-transcription',
   'server': 'AmazonS3',
   'content-length': '0'},
  'RetryAttempts': 0},
 'Location': '/demo-bucket-transcription'}

In [12]:
client.upload_file(Filename="speech.mp3", Bucket="demo-bucket-transcription", Key="speech.mp3")

Now let's start the transcription job

In [14]:
transcribe_client = boto3.client('transcribe', region_name="us-east-1")
job_name = "TranscribeJob1"
transcribe_client.start_transcription_job(
    TranscriptionJobName=job_name,
    Media={'MediaFileUri': "s3://demo-bucket-transcription/speech.mp3"},
    MediaFormat="mp3",
    LanguageCode="en-US" # https://docs.aws.amazon.com/transcribe/latest/dg/supported-languages.html
)


{'TranscriptionJob': {'TranscriptionJobName': 'TranscribeJob1',
  'TranscriptionJobStatus': 'IN_PROGRESS',
  'LanguageCode': 'en-US',
  'MediaFormat': 'mp3',
  'Media': {'MediaFileUri': 's3://demo-bucket-transcription/speech.mp3'},
  'StartTime': datetime.datetime(2023, 9, 15, 10, 15, 49, 232000, tzinfo=tzlocal()),
  'CreationTime': datetime.datetime(2023, 9, 15, 10, 15, 49, 205000, tzinfo=tzlocal())},
 'ResponseMetadata': {'RequestId': '9347a5e6-2336-41ee-a959-e3972bca7d31',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '9347a5e6-2336-41ee-a959-e3972bca7d31',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '274',
   'date': 'Fri, 15 Sep 2023 08:15:49 GMT'},
  'RetryAttempts': 0}}

We can check in a while loop if our job is finished using *get_transcription_job(TranscriptionJobName)*

In [15]:
import time
while True:
    status = transcribe_client.get_transcription_job(TranscriptionJobName=job_name)
    if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
        break
    print("Not ready yet...")
    time.sleep(10)
print(status)

Not ready yet...
{'TranscriptionJob': {'TranscriptionJobName': 'TranscribeJob1', 'TranscriptionJobStatus': 'COMPLETED', 'LanguageCode': 'en-US', 'MediaSampleRateHertz': 24000, 'MediaFormat': 'mp3', 'Media': {'MediaFileUri': 's3://demo-bucket-transcription/speech.mp3'}, 'Transcript': {'TranscriptFileUri': 'https://s3.us-east-1.amazonaws.com/aws-transcribe-us-east-1-prod/472948420345/TranscribeJob1/7e46537c-e157-4558-a6f9-79153c506060/asrOutput.json?X-Amz-Security-Token=IQoJb3JpZ2luX2VjEHAaCXVzLWVhc3QtMSJHMEUCIEF%2FqXXnqQuUDmNG9b6UCxR7hLlyvf%2FmAt7alEhiXuIqAiEA22aoLLUfj2Af6Yel90I5MQ5LiN%2FE%2FEA8yJ0XxquaAkwqsgUIWBAEGgwyNzY2NTY0MzMxNTMiDBxyonLbop3GLzzzkyqPBXQyssGV2Qxf%2Bwv8IsBk9RGJulSm%2FmW3h%2FuvVHHUI2r9FC37IJi8zUWuUTttIkxtc7OXF8svwaBs6K02mulO69XO6BnODBz5BmEovTMfpqj0gdHpdSNo82QrkrMUPKWwp1X1ZrJNspms3SMsfG2fIUAe3ywdLpwSPsiqw3b3uCED0Z5VMRp4tTfmk7ZcXtNSH1WOb5HEBG55YeaEUJEGnBneKTQqXVkJcSIptuK8fJQYIpCQC3kU6b9EfNaE8MifOELAjbXpKtDMLJS3Iijp%2FWCipTAHDsGJeXcozrcmd%2FxuAaD8mT5ZUjFCcQtOWQ2EEozbBKCkH

Once the transcription job is complete, AWS Transcribe provides a signed URL (usually an S3 URL) from which you can download the transcribed results. The result is in the form of a JSON file. We can also use python to download it.

In [16]:
import requests
response = requests.get(status['TranscriptionJob']['Transcript']['TranscriptFileUri'])
with open("transcribed.json", 'wb') as file:
    file.write(response.content)


The JSON file output by AWS Transcribe provides detailed information about the transcribed content. Here's a brief overview of its structure:

- **jobName:** 
  - The name of the transcription job
- **accountId:** 
  - Your AWS account ID.
- **results:** 
  - Contains the main transcription data.
  - **transcripts:** 
    - An array with the transcribed text.
  - **items:** 
    - An array containing details about each transcribed word or punctuation.
    - Each item can have:
      - **start_time:** 
        - Start time of the word (if applicable).
      - **end_time:** 
        - End time of the word (if applicable).
      - **alternatives:** 
        - Alternative transcriptions for the word.
      - **type:** 
        - Whether the item is a word or punctuation.
- **status:** 
  - The status of the transcription job (e.g., "COMPLETED").
- **jobCreationDate:** 
  - When the transcription job was created.
- **jobCompletionDate:** 
  - When the transcription job was completed.


Let's load our downloaded file

In [17]:
import json
with open("transcribed.json", "r") as f:
    transcribed_data = json.load(f)
print(transcribed_data["results"])

{'transcripts': [{'transcript': "Hello, everyone. I hope you are enjoying the course so far and are learning many cool things. Let's see how well Aws can transcribe this text."}], 'items': [{'start_time': '0.009', 'end_time': '0.3', 'alternatives': [{'confidence': '0.999', 'content': 'Hello'}], 'type': 'pronunciation'}, {'alternatives': [{'confidence': '0.0', 'content': ','}], 'type': 'punctuation'}, {'start_time': '0.31', 'end_time': '0.93', 'alternatives': [{'confidence': '0.999', 'content': 'everyone'}], 'type': 'pronunciation'}, {'alternatives': [{'confidence': '0.0', 'content': '.'}], 'type': 'punctuation'}, {'start_time': '0.939', 'end_time': '0.98', 'alternatives': [{'confidence': '0.999', 'content': 'I'}], 'type': 'pronunciation'}, {'start_time': '0.99', 'end_time': '1.299', 'alternatives': [{'confidence': '0.999', 'content': 'hope'}], 'type': 'pronunciation'}, {'start_time': '1.309', 'end_time': '1.379', 'alternatives': [{'confidence': '0.999', 'content': 'you'}], 'type': 'pro

In [18]:
print(transcribed_data["results"]["transcripts"][0]["transcript"])

Hello, everyone. I hope you are enjoying the course so far and are learning many cool things. Let's see how well Aws can transcribe this text.


Great! That perfectly matches our mp3.

### Delete the job
To delete the job from your history, you can use *client.delete_transcription_job(TranscriptionJobName)*


In [20]:
transcribe_client.delete_transcription_job(TranscriptionJobName="TranscribeJob1")

{'ResponseMetadata': {'RequestId': 'ccb4735c-8a64-4a45-a2f6-20b439c14853',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'ccb4735c-8a64-4a45-a2f6-20b439c14853',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '0',
   'date': 'Fri, 15 Sep 2023 08:18:47 GMT'},
  'RetryAttempts': 0}}

### Translating the transcribed text

With [translate](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/translate.html), we can convert the transcribed text (or of course, any text) into another language.

You can use client.translate_text to which you pass:

- Text: The text to translate
- SourceLanguageCode: The source language
- TargetLanguageCode: The language you want to translate to

Note that you can also pass a [Terminology List](https://docs.aws.amazon.com/translate/latest/dg/how-custom-terminology.html) to make sure, custom words are not translated.

As an alternative, you can translate entire [documents](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/translate/client/translate_document.html) in one call.

Let's translate the above transcribed text to spanish.
You can find the according language codes [here](https://docs.aws.amazon.com/translate/latest/dg/what-is-languages.html)

In [19]:
translate = boto3.client('translate', region_name="us-east-1")
response = translate.translate_text(
    Text=transcribed_data["results"]["transcripts"][0]["transcript"],
    SourceLanguageCode="en",
    TargetLanguageCode="de"
)
print(response['TranslatedText'])


Hallo, alle zusammen. Ich hoffe, dir gefällt der Kurs bis jetzt und du lernst viele coole Dinge. Mal sehen, wie gut Aws diesen Text transkribieren kann.
