## Speaking to humans with **Polly** ##

Humans are more likely to respond to you if use the same form of communication that they normally use - speech.  

Your speech also has to be of a high quality.  Humans are bad at many things, but one thing they are good at is spotting a fake!

You can use SSML (Sythentic Speech Markup language) to create human-like speech, to increase your chanced of a successful interaction.

![title](img/3.jpg)


In [None]:
from boto3 import Session
from botocore.exceptions import BotoCoreError, ClientError
import pygame.mixer
from contextlib import closing
import os

# Create a session object from the stored AWS config file
session = Session(profile_name="default")
polly = session.client("polly")

In [None]:
def pollySays(value, useSSML=False, useLexicon=False):
    
    """Make a call to polly and store the response as "polly-speech.mp3".
    :param value: Text - The actual text that will be translated to speech.
    :param useSSML Boolean - Operator to tell the function if we are using SSML
    
    if useSSML is true pass an additional argument to synthesize_speech to indicate SSML is being used
    the value param must contain properly formatted SSML if useSSML = True
    """
    err = False
    lex = ["aihacklex"]
    try:
        # TODO Make the API call to Polly here
        if(useSSML):
            if(useLexicon):
                response = polly.synthesize_speech(Text=value, OutputFormat="ogg_vorbis", VoiceId="Emma", TextType="ssml", LexiconNames=lex)
            else:
                response = polly.synthesize_speech(Text=value, OutputFormat="ogg_vorbis", VoiceId="Emma", TextType="ssml")
        else:
            if(useLexicon):
                response = polly.synthesize_speech(Text=value, OutputFormat="ogg_vorbis", VoiceId="Emma", LexiconNames=lex)
            else:
                response = polly.synthesize_speech(Text=value, OutputFormat="ogg_vorbis", VoiceId="Emma")
    
        # Parse the response object and look for the keyword "AudioStream"
        if "AudioStream" in response:
            
            # write the data stream to an audio file on disk that can be re-used later
            with closing(response["AudioStream"]) as stream:
                output = os.path.join("polly-speech.ogg")
                try:
                    with open(output,"wb") as file:
                        file.write(stream.read())
                except IOError as ioe:
                    print("IOError: {}".format(ioe))
            
            # Use pygame libaries to play the audio file we have just created
            #pygame.mixer.init()
            #pygame.mixer.music.load("polly-speech.mp3")
            #pygame.mixer.music.play()
            #pygame.mixer.music.stop()
            pygame.init()
            pygame.mixer.music.load('polly-speech.ogg')
            pygame.mixer.music.play()
            while pygame.mixer.music.get_busy() == True:
                continue
            pygame.quit()
        else:
            print("No audio stream in response! ")
    except(BotoCoreError, ClientError) as e:
        err = True
        print("Error: {}".format(e))
    
    if not err:
        print("No errors reported!  You should have heard your converted text. "\
              "If you did not hear anything, please make sure the speaker is "\
              "plugged in and turned on. ")
    

### With our "PollySays" function created (idiomatic Python requires functions to be phrased as an action) we can make a call to the polly service!  Now we can just focus on writing the text that we want to convert to speech so the humans can understand.  

In [None]:
# Create a variable to hold the text we wish to convert to speech
text = "Hello, I am using speech suitable for a human to hear and understand."

# Call our function - and listen for the result
pollySays(text)

### SSML Examples

Now that we are converting our text to speech - lets add some depth to our speech by incorporating SSML elements.

Some humans are quite clever and can discern if they are speaking to a bot.

Like other markup languages - SSML is expressed via XML tags.  The first, required tag is: "<speak>" We wrap our entire text variable inside this tag, like so:

<speak>This is the text to be converted to speech</speak>

The <speak> tag by itself does <u>not</u> modify our text.  It simply tells Polly that we are going to be using SSML.

Don't forget to comment out the call to the **pollySays(test)** function in the cell above by placing a # (hash) in front of the pollySays command.  We will then uncomment the "pollySays" call in the **below cell** by removing the # (hash).

In [None]:
# Add a pause
# To add a pause to our speech - in addition the to the natural pauses provided by punctuation - we use the <break> tag.  
# Within the <break> tag we indicate how long of a pause we would like with the "time" attribute. <break time="100ms"/>.

# Try this text WITHOUT a pause
text = "<speak>Hello human! Can you please explain to me what is meant by May the 4th be with you?</speak>"

# delete the existing speech file
#os.remove('polly-speech.mp3')

# make a call to pollySays and pass True to the useSSML parameter
pollySays(text, True)

In [None]:
# We will now add a pause and listen for a change
text = '<speak>Hello human! Can you please explain to me what is meant by <break time="100ms"/> May the 4th be with you?</speak>'

pollySays(text, True)

In [None]:
# Lets keep building on that example and add more pitch to just a portion of the text

text = '<speak>Hello human! Can you please explain to me what is meant by <break time="100ms"/> \
    <prosody volume="+4dB"> May the 4th </prosody> be with you?</speak>'
pollySays(text, True)

In [None]:
# Lets add some enunciation to the word Please to get the Plllleeeeeaase effect
text = '<speak>Hello human! Can you <prosody rate="x-slow">please</prosody> explain to me what is meant by <break time="100ms"/> \
    <prosody volume="+4dB"> May the 4th </prosody> be with you?</speak>'
pollySays(text, True)

### Lexicon Examples

A lexicon acts like a translation look-up guide, allowing a matching word to be pronounced differently.
Here is an example of a simple, single entry lexicon file:

```xml
<lexicon
  version="1.0"
  alphabet="x-sampa"
  xml:lang="en"
  xmlns="http://www.w3.org/2005/01/pronunciation-lexicon">
   <lexeme>
      <grapheme>WYSIWYG</grapheme>
      <phoneme>Wiz-E-Wig</phoneme>
   </lexeme>
</lexicon>
```

When polly encounters WYSIWYG in text, it will be spoken as Wiz-E-Wig.

Let's give it a try, this time we have to let Polly know we want to use a lexicon file and we need to provide the Name of that lexicon file.
*note* the LexiconNames parameter requires a list object, not a string.


In [None]:
text = '<speak>I can use a lexicon to change the characters <break time="100ms"/>\
<say-as interpret-as="character">W Y S I W Y G</say-as><break time="100ms"/>into the phrase WYSIWYG</speak>'

pollySays(text, True, True)

![title](img/stophere.jpg)

When the instructor indicates it is time to continue - click [here](Lex-100.ipynb) to go to the next lab