# Amazon Polly Demo

### Convert text to speech with Amazon Polly

***
Copyright [2017]-[2018] Amazon.com, Inc. or its affiliates. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance with the License. A copy of the License is located at

http://aws.amazon.com/apache2.0/

or in the "license" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
***

#### Prerequisites:

The user or role that executes the commands must have permissions in AWS Identity and Access Management (IAM) to perform those actions. AWS provides a set of managed policies that help you get started quickly. For our example, you need to apply the following minimum managed policies to your user or role:

* `AmazonPollyFullAccess`
* `AmazonTranscribeFullAccess`

Be aware that we recommend you follow AWS IAM best practices for production implementations, which is out of scope fof this workshop.

### Examples

Generate text for a number of examples

In [None]:
import boto3
import IPython
from pprint import pprint

polly = boto3.client('polly')

In [None]:
response = polly.synthesize_speech(
    Text="It is great to see you today!",
    TextType="text",
    OutputFormat="mp3",                                           
    VoiceId="Emma")

pprint (response)
     
outfile = "pollyresponse.mp3"
data = response['AudioStream'].read()

with open(outfile,'wb') as f:
     f.write(data)
IPython.display.Audio(outfile) 

### SSML

Define [SSML](https://docs.aws.amazon.com/polly/latest/dg/supported-ssml.html) tags to add breaks, emphasis elements,  increase the speed (prosody), or use phoneme to spell out words 

In [None]:
response = polly.synthesize_speech(
    Text='<speak>I am fine,<break/> thank you.<break strength="x-strong"/> \
          <prosody rate="+20%">What can I do for you?</prosody></speak>',
    TextType="ssml",
    OutputFormat="mp3",                                           
    VoiceId="Emma")
     
outfile = "pollyresponse.mp3"
data = response['AudioStream'].read()

with open(outfile,'wb') as f:
     f.write(data)
IPython.display.Audio(outfile) 

In [None]:
response = polly.synthesize_speech(
    Text='''<speak>
     You say, <phoneme alphabet="ipa" ph="pɪˈkɑːn">pecan</phoneme>. 
     I say, <phoneme alphabet="ipa" ph="ˈpi.kæn">pecan</phoneme>.
    </speak>''',
    TextType="ssml",
    OutputFormat="mp3",                                           
    VoiceId="Brian"
    )
     
outfile = "pollyresponse.mp3"
data = response['AudioStream'].read()

with open(outfile,'wb') as f:
     f.write(data)
IPython.display.Audio(outfile) 

In [None]:
response = polly.synthesize_speech(
    Text="<speak><phoneme ph='bəːɱ ˈzɛksɪʃ bəˈziːʃən dˈɛ wˈeːʃːəːn dˈɛ haʁdˈn'>Beim sächsisch besiegen die weichen die harten.</phoneme></speak>",
    TextType="ssml",
    OutputFormat="mp3",                                           
    VoiceId="Hans"
    )
     
outfile = "pollyresponse.mp3"
data = response['AudioStream'].read()

with open(outfile,'wb') as f:
     f.write(data)
IPython.display.Audio(outfile) 

### Substitues

We can print words that have substitutions when spoken such as for these chemical elments.

In [None]:
response = polly.synthesize_speech(
    Text='<speak>My favorite chemical element is <sub alias="aluminium">Al</sub>, \
    but Al prefers <sub alias="magnesium">Mg</sub>.</speak>',
    TextType="ssml",
    OutputFormat="mp3",                                           
    VoiceId="Brian")
     
outfile = "pollyresponse.mp3"
data = response['AudioStream'].read()

with open(outfile,'wb') as f:
     f.write(data)
IPython.display.Audio(outfile) 

We can also register a custom lexicon that can then automatically pick up these substitutions, see also [Managing Lexicons](https://docs.aws.amazon.com/polly/latest/dg/managing-lexicons.html)

In [None]:
!aws polly put-lexicon --name PollyPSE --content file://PollyPSE.xml

In [None]:
response = polly.get_lexicon(
    Name="PollyPSE")

xmlret = response['Lexicon']['Content']
   
print (xmlret)

In [None]:
response = polly.synthesize_speech(
    Text='My favorite chemical element is Mg',
    TextType="text",
    OutputFormat="mp3",                                           
    VoiceId="Brian",
    LexiconNames=["PollyPSE"]
    )
     
outfile = "pollyresponse.mp3"
data = response['AudioStream'].read()

with open(outfile,'wb') as f:
     f.write(data)
IPython.display.Audio(outfile)

## Transcribe

Lets take the output of that last polly response to transcribe the text

In [None]:
import boto3
import json
import time

transcribe = boto3.client('transcribe')
s3 = boto3.resource('s3')
sts = boto3.client('sts')

In [None]:
# Get the default bucket
account_id = sts.get_caller_identity().get('Account')
bucket_name = 'sagemaker-us-east-1-{}'.format(account_id)
bucket_name

Upload a file to the s3 bucket

In [None]:
outfile = "pollyresponse.mp3"

with open(outfile, 'rb') as data:
    response = s3.Bucket(bucket_name).put_object(Key=outfile, Body=data)
    print(response)

### Start Job

Start a transcription for a given job name

In [None]:
job_name = 'job{}'.format(int(time.time()))
job_name

In [None]:
response = transcribe.start_transcription_job(
    TranscriptionJobName=job_name,
    LanguageCode='en-US',
    MediaFormat='mp3',
    Media={
        'MediaFileUri': 's3://{}/{}'.format(bucket_name, 'pollyresponse.mp3')
    },
    OutputBucketName=bucket_name,
)

### Wait for Job
 
Wait for the transcription process to finish, this can take up to a minute

In [None]:
%%time

status = 'IN_PROGRESS'

while status == 'IN_PROGRESS':
    response = transcribe.get_transcription_job(TranscriptionJobName=job_name)
    status = response['TranscriptionJob']['TranscriptionJobStatus']
    print(status)
    time.sleep(5)
    
response['TranscriptionJob']['Transcript']

### Get Transcription

Download the transcription and inspect the results

In [None]:
transcribe_key = '{}.json'.format(job_name)
out_file = 'transcribe.json'

with open(out_file, 'wb') as data:
    s3.Bucket(bucket_name).download_fileobj(transcribe_key, data)

In [None]:
with open(out_file, 'rb') as data:
    obj = json.load(data)
    
obj['results']['transcripts']