## Amazon Rekognition Demo

Get the `DemoWebsite` URL which will be available in the Outputs section of the *Media Analysis Solution* CloudFormation stack. In the next cell replace the text `[DemoWebsite_URL]` with the URL that you copied.

In [None]:
%%HTML
<h3>Media Analysis Solution</h3>
<br>
<object type="text/html" data="[DemoWebsite_URL]" width="1000" height="600"> <embed src="[DemoWebsite_URL]"></embed></object>


---
Update the following values. 
* `region` - Make sure you select a region where all the AWS AI services are available.
* `bucket_name` - Make sure you enter an existing bucket in the same region as above.
* `bucket_prefix` - Enter a prefix if needed, else leave it blank. If you are entering prefix, end it with a slash(/).

In [1]:
region = 'eu-west-1' # Update it to your region of choice
bucket_name = 'aws-mikasino-sagemaker' # Update it to the S3 bucket name
bucket_prefix = '' # Make sure you include trailing slash (/) if you are adding a prefix

---
Here we are loading the needed libraries and uploading the video file to S3 for later use.

---
## Amazon Polly Demo

Here we are making an API call to convert text to speech. Text to be converted is in variable `Text` as SSML string. Using `VoiceId` as `Ruben` since it's a Dutch text. You shall update the text and select a differnt VoiceId based on the language of your preference. For the complete list of voice Ids check [the documentation](https://docs.aws.amazon.com/polly/latest/dg/voicelist.html). 

Response will be stored in a local file named *pollyresponse.mp3*.

In [None]:
response = polly.synthesize_speech(
    Text="<speak> Wanneer komt er weer eens een Elfstedentocht?<amazon:breath duration='medium' volume='x-loud'/> \
    Helaas was de laaste elfstedentocht<emphasis level='reduced'> in 1997</emphasis>, \
   maar we hebben weer goede hoop voor dit jaar!</speak>",
    TextType="ssml",
    OutputFormat="mp3",                                           
    VoiceId="Ruben")
     
outfile = "pollyresponse.mp3"
data = response['AudioStream'].read()

with open(outfile,'wb') as f:
     f.write(data)

# print (response)
print('Converted text of %s characters to voice and stored it locally as file named %s' % (str(response['RequestCharacters']), outfile))

---
You shall play the response by clicking the play button.

<audio width="360" height="270" controls src="pollyresponse.mp3" />

---
### Polly Custom Lexicon

Amazon Polly supports custom lexicon which enables you to customize the pronunciation of words. For additional details, refer [the documenation](https://docs.aws.amazon.com/polly/latest/dg/managing-lexicons.html).

Here we are creating a custom lexicon to properly covert internet slag to speech.

In [None]:
internet_slag_lexicon = '''<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0" 
      xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
      xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon 
        http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
      alphabet="ipa" xml:lang="en-US">
  <lexeme>
    <grapheme>La vita &#x00E8; bella</grapheme>
    <phoneme>ˈlɑ ˈviːɾə ˈʔeɪ ˈbɛlə</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>Roberto</grapheme>
    <phoneme>ɹəˈbɛːɹɾoʊ</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>Benigni</grapheme>
    <phoneme>bɛˈniːnji</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>IHAC</grapheme>
    <alias>I have a customer</alias>
  </lexeme>
  <lexeme>
    <grapheme>2day</grapheme>
    <alias>Today</alias>
  </lexeme>
  <lexeme>
    <grapheme>2moro</grapheme>
    <alias>Tomorrow</alias>
  </lexeme>
  <lexeme>
    <grapheme>2nite</grapheme>
    <alias>Tonite</alias>
  </lexeme>
  <lexeme>
    <grapheme>ASAP</grapheme>
    <alias>As soon as possible</alias>
  </lexeme>
  <lexeme>
    <grapheme>IIRC</grapheme>
    <alias>If I remember correctly</alias>
  </lexeme>
  <lexeme>
    <grapheme>POV</grapheme>
    <alias>Point of View</alias>
  </lexeme>
  <lexeme>
    <grapheme>TTYL</grapheme>
    <alias>Talk to you later</alias>
  </lexeme>
  <lexeme>
    <grapheme>THX</grapheme>
    <alias>Thanks</alias>
  </lexeme>
  <lexeme>
    <grapheme>YW</grapheme>
    <alias>You are Welcome</alias>
  </lexeme>
</lexicon>'''

lexicon = polly.put_lexicon(
    Name='custom-lexicon-demo',
    Content=internet_slag_lexicon
)

---
Use Polly to synthesize speech without using custom lexicon and with custom lexicon.

In [None]:
text_to_convert='''IHAC looking for way to convert text based chat conversations to speech.
IIRC that is possible through custom lexicon in polly. I want to know your POV on this.
I have to respond back to the customer 2nite, hence can you pleaes let me know ASAP, THX.'''

no_lex_res = polly.synthesize_speech(
    Engine='neural',
    Text='<speak>' + text_to_convert +'</speak>',
    TextType="ssml",
    OutputFormat="mp3",                                           
    VoiceId="Joanna")
     
of_nolex = "neural_nolexicon.mp3"
data_nolex = no_lex_res['AudioStream'].read()

with open(of_nolex,'wb') as f:
     f.write(data_nolex)

lex_res = polly.synthesize_speech(
    Engine='neural',
    Text='<speak>' + text_to_convert +'</speak>',
    LexiconNames=['custom-lexicon-demo'],
    TextType="ssml",
    OutputFormat="mp3",                                           
    VoiceId="Joanna")
     
of_lex = "neural_lexicon.mp3"
data_lex = lex_res['AudioStream'].read()

with open(of_lex,'wb') as f:
     f.write(data_lex)

# print (response)
print('Converted text of %s characters to voice and stored it locally as file named %s' % (str(lex_res['RequestCharacters']), of_lex))

---
Listen to the synthesized speech without lexicon and with lexicon to hear the difference.

### Without Lexicon

<audio width="360" height="270" controls src="./neural_nolexicon.mp3" />

### With Lexicon

<audio width="360" height="270" controls src="./neural_lexicon.mp3" />

---
## Amazon Transcribe Demo

Using Amazon Transcribe we are going to generate the text from the video file. Transcribe will provide a signed S3 URL which will contain the transcribed text in JSON forma. Output of the transcribe will contain the speaker identification labels, timestamp when a particular word was heard, etc.

Click the below arrow to expand the video.

*Before playing the video, start the transcribe job by executing the next cell since it will take few seconds to complete the transcribe job.*

---

<details>
    <summary>Video to be transcribed</summary>
      <video width="640" height="480" controls src="./jeff.mp4" />
</details>

In [None]:
# Converting mp4 to text
transcribe = boto3.client('transcribe')
timestamp = datetime.now().strftime('%Y-%m-%d-%H%M%S')
job_name = "TranscribeDemo-" + timestamp
job_uri = 'https://s3-{}.amazonaws.com/{}/{}jeff.mp4'.format(region, bucket_name, bucket_prefix)
transcribe.start_transcription_job(
    TranscriptionJobName=job_name,
    Media={'MediaFileUri': job_uri},
    MediaFormat='mp4',
    LanguageCode='en-US',
    MediaSampleRateHertz=44100,
    Settings={'MaxSpeakerLabels': 2,'ShowSpeakerLabels': True }    
)
print('Transcribing the video is in progress ', end='')
while True:
    status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
    if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
        print('')
        print('Transcribing the video job completed with status %s\n' % status['TranscriptionJob']['TranscriptionJobStatus'])
        break
    print('.', end='')
    time.sleep(5)
# pprint(status)
url = status['TranscriptionJob']['Transcript']['TranscriptFileUri']
print('Download the text output of transcribe job from the following URL:\n%s ' % url)
transcript='transcript_{}.json'.format(job_name)
urllib.request.urlretrieve(url,transcript)

---
Read the output of the transcribe job and print the text generated from the video.

In [None]:
result = json.load(open(transcript))
transcript_text = result['results']['transcripts'][0]['transcript']
print(transcript_text)

---

Now we are going to label the text based on the speaker labels to display the content specifically spoken by a speaker.

We also use Amazon Comprehend to identify the sentiment of the text for both he speakers.

In [None]:
'''
# read the json output from disk (debugging)
with open('asrOutput.json') as f:
    data = json.load(f)
'''

data = json.load(open(transcript))

# create a list to store start and stop times of the speaker in seconds
spk_0 = []
spk_1 = []

# iterate over the speaker segments from the json
for x in data['results']['speaker_labels']['segments']:
        # check for which speaker a label was submitted
        if x['speaker_label'] == 'spk_0':
                # we need to convert float to int by multiplying *100, else we cannot use it later on to compare ranges
                start         = int(float(x['start_time']) * 100)
                end         = int(float(x['end_time']) * 100)

                # append the start and stop times to a list
                spk_0.append([start, end])

        # check for which speaker a label was submitted
        if x['speaker_label'] == 'spk_1':
                # we need to convert float to int by multiplying *100, else we cannot use it later on to compare ranges
                start         = int(float(x['start_time']) * 100)
                end         = int(float(x['end_time']) * 100)

                # append the start and stop times to a list
                spk_1.append([start, end])

res = []
speaker0                 = []
speaker1                 = []
curr_speaker         = ''


for x in data['results']['items']:
        txt         = x['alternatives'][0]['content']
        # check if the item has a start_time - if not, its probably punctuation which doesn't come with a timestamp. 
        if 'start_time' in x:
                start         = int(float(x['start_time']) * 100)
                end         = int(float(x['end_time']) * 100)
                for y in spk_0:                        
                        if start in range(y[0], y[1]) and end in range(y[0], y[1]):
                                curr_speaker = 'spk_0'
                for y in spk_1:                        
                        if start in range(y[0], y[1]) and end in range(y[0], y[1]):
                                curr_speaker = 'spk_1'
        if curr_speaker == 'spk_0':
                if x['type'] == 'punctuation' and txt != ',':
                        speaker0.append(txt+'\n')
                elif txt == ',' or txt[0].isupper():
                        speaker0.append(txt)
                else:
                        speaker0.append(' '+txt)
        if curr_speaker == 'spk_1':
                if x['type'] == 'punctuation' and txt != ',':
                        speaker1.append(txt+'\n')
                elif txt == ',' or txt[0].isupper():
                        speaker1.append(txt)
                else:
                        speaker1.append(' '+txt)

# check sentiment of both speakers
def check_sentiment(x, y):
        c = boto3.client(service_name='comprehend', region_name='eu-west-1')
        s = y+','
        x = c.detect_sentiment(Text=x, LanguageCode='en')
        y =  ' Mixed : '+str(x['SentimentScore']['Mixed'])
        y += '\t Positive :'+str(x['SentimentScore']['Positive'])
        y += '\t Negative : '+str(x['SentimentScore']['Negative'])
        y += '\t Neutral : '+str(x['SentimentScore']['Neutral'])
        y += '\t Sentiment : '+str(x['Sentiment'])
        return y


# print full text for both speakers
print('Speaker1:')
print(''.join(speaker1))
print('Sentiment of speaker 1 : '+check_sentiment(''.join(speaker1), 'speaker 1 '))
print('\n')
print('Speaker2:')
print(''.join(speaker0))
print('\n')
print('Sentiment of speaker 2 : '+check_sentiment(''.join(speaker0), 'speaker 0 '))


---
## Amazon Translate Demo

Convert the transribed text to German language using Amazon Translate.

In [None]:
# -*- coding: utf-8 -*-
translate = boto3.client('translate', region_name=region)

message = transcript_text

result=translate.translate_text(
    Text=message,
    SourceLanguageCode='en',
    TargetLanguageCode='de'
)

# print(json.dumps(result, sort_keys=True, indent=4, default=str))
print(result['TranslatedText'])

---
## Amazon Comprehend Demo

Now using Amazon Comprehend detect the language from the above text. By providing the detected language as input detect the sentiment, entities and key phrases in the text.

In [None]:
comprehend = boto3.client('comprehend')

text = result['TranslatedText']

language_detected = comprehend.detect_dominant_language(Text=text)['Languages'][0]['LanguageCode']

entity_res = comprehend.detect_entities(Text=text, LanguageCode=language_detected)
senti_res = comprehend.detect_sentiment(Text=text, LanguageCode=language_detected)
key_res = comprehend.detect_key_phrases(Text=text, LanguageCode=language_detected)

In [None]:
print('Language detected is %s \n' % language_detected)

print('Sentiment of the text has been identified as %s with the score of %s \n' % (senti_res['Sentiment'], senti_res['SentimentScore'][senti_res['Sentiment'].title()]))

keyphrases = [[], [], []]
for k in key_res['KeyPhrases']:
    if k['Score'] > .99:
        keyphrases[0].append(k['Text'] + '\n')
    elif k['Score'] > .98:
        keyphrases[1].append(k['Text'] + '\n')
    elif k['Score'] > .97:
        keyphrases[2].append(k['Text'] + '\n')
           
print('Key Phrases identified from the text:')

keytable = PrettyTable(['Score', 'Key Phrases'])
if keyphrases[0]:
    keytable.add_row(['.99', ''.join(keyphrases[0])])
    keytable.add_row(['--', '--------------------'])
if keyphrases[1]:
    keytable.add_row(['.98', ''.join(keyphrases[1])])
    keytable.add_row(['--', '--------------------'])
if keyphrases[2]:
    keytable.add_row(['.97', ''.join(keyphrases[2])])
print(keytable)
print('\n')

entity_thershold = 0.80

topentity = {}
for e in entity_res['Entities']:
    if e['Score'] > entity_thershold:
        topentity[e['Score']] = {e['Text']: e['Type']}
        
top10 = sorted(topentity, reverse=True)[:10]

table = PrettyTable(['Text', 'Type','Score'])
for t in top10:
    table.add_row([list(topentity[t].keys())[0], list(topentity[t].values())[0], t])
    
print('Top 10 entities identified:')    
print(table)

---
## Real-time Audio Transcription using Amazon Transcribe Websockets

Earlier we have seen how to transcribe an existing video file stored in S3. Now let's look at an example how we can to real-time audio transcripton using the Amazon Traanscribe Websockets API.

*You have to update the text `[AMPLIFY_URL]` with the actual Amplify Console URL created as part of the prerequisites. You also need to key in the access key and secret key of the user that you created as part of the prerequisites.*

In [None]:
%%HTML
<h3>Real-time Audio Transcription</h3>
<br>
<object type="text/html" data=[AMPLIFY_URL] width="1000" height="600"> <embed src="[AMPLIFY_URL]"></embed></object>