### Update IAM to add permissions
Before we can execute the code below, we need to ensure our execution role has the permissions to use the AI services. We need to add the policy to the SageMaker studio execution role. Please follow the instructions in the sub-section **Setting up IAM permissions** from the section **Adding sensory cognition to your applications** in Chapter 10 of the book before proceeding to execute the rest of the cells below.

### Import libraries

In [None]:
# Import the libraries we need
import boto3
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import io
import os
import sys
import time
import json
import warnings
from IPython.display import display
from time import strftime, gmtime
from sagemaker import get_execution_role
from datetime import datetime, timezone

warnings.filterwarnings('ignore')

# declare the boto3 handles for each AI service
transcribe = boto3.client("transcribe")
rekognition = boto3.client("rekognition")
translate = boto3.client("translate")
polly = boto3.client("polly")
comprehend = boto3.client("comprehend")
# Amazon S3 (S3) client
s3 = boto3.client('s3')

## Using Amazon Transcribe for automatic speech recognition

In [None]:
bucket = 'your-S3-bucket'
prefix = 'aiml-book/chapter10'

In [None]:
# First let us list our audio files and then upload them to the S3 bucket
# we will use the example audio file we provided with the repo
audio_dir = 'input/audio-recordings'
for sdir, drs, fls in os.walk(audio_dir):
    for file in fls:
        s3.upload_file(os.path.join(sdir, file), bucket, prefix+'/transcribe/'+ os.path.join(sdir, file))
        uri = "s3://" + bucket + '/'+prefix+'/transcribe/' + os.path.join(sdir, file)
        print("Uploaded to: " + uri)

In [None]:
# get the current time
now = datetime.now()
time_now = now.strftime("%H.%M.%S")
job = 'transcribe-test-'+time_now
# start the transcription job
try:
    transcribe.start_transcription_job(
            TranscriptionJobName=job,
            LanguageCode='en-US',
            Media={"MediaFileUri": uri},
            Settings={'MaxSpeakerLabels': 2, 'ShowSpeakerLabels': True}
            )
        
    time.sleep(2)    
    print(transcribe.get_transcription_job(TranscriptionJobName=job)['TranscriptionJob']['TranscriptionJobStatus'])
except Exception as e:
    print(e)

### Get transcription results
Our job will take about 2 to 3 minutes to complete. You can also lookup the status of the job in Amazon Transcribe console by going to https://us-east-1.console.aws.amazon.com/transcribe/home?region=us-east-1#jobs

In [None]:
# Create an output transcripts directory
dr = os.getcwd()+'/output-transcripts'
if not os.path.exists(dr):
    os.makedirs(dr)

In [None]:
# Our transcript is in a presigned URL in Transcribe's S3 bucket, let us download it and get the text we need
import urllib.request
response = transcribe.get_transcription_job(
    TranscriptionJobName=job 
)
out_url = response['TranscriptionJob']['Transcript']['TranscriptFileUri']
infile = job+'-output.json'
urllib.request.urlretrieve(out_url, infile)
# declare an output file to store the transcripts
outfile = 'output-transcripts/'+job+'.txt'
with open(infile, 'rb') as t_in:
    full = json.load(t_in)
    entire_transcript = full["results"]["transcripts"]
    lines = str(entire_transcript).split('. ')
    i = 0
    for line in lines:
        i += 1
        print("Line "+str(i)+": " + line)
    # write the transcript to an output file
    with open(outfile, 'w') as out:
        out.write(str(lines))

## Using Amazon Rekognition for computer vision

In [None]:
# Lets use the Python image processing Pillow library
from PIL import Image
img = Image.open('./input/images/puppy-image.jpg')
display(img)

We will now use Amazon Rekognition APIs to perform some computer vision tasks without any ML training whatsoever. For a full list of Python APIs for Rekognition refer to - https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/rekognition.html

### Detect Labels

In [None]:
# First upload image to S3 bucket
input_dir = 'input/images'
prefix_uris = []
for sdir, drs, fls in os.walk(input_dir):
    for file in fls:
        s3.upload_file(os.path.join(sdir, file), bucket, prefix+'/rekognition/'+ os.path.join(sdir, file))
        uri = "s3://" + bucket + '/'+prefix+'/rekognition/' + os.path.join(sdir, file)
        prefix_uri = prefix+'/rekognition/' + os.path.join(sdir, file)
        prefix_uris.append(prefix_uri)
        print("Uploaded to: " + uri)

In [None]:
# Detect Labels
for prefix_uri in prefix_uris:
    response = rekognition.detect_labels(
        Image={
            'S3Object': {
                'Bucket': bucket,
                'Name': prefix_uri
            }
        },
        MaxLabels=5,
    )
    for label in response['Labels']:
        print("Amazon Rekognition is "+str(round(label['Confidence'],0))+" confident that this picture is of a "+label['Name'])

This is one API we tried for detecting labels. Rekognition has a host of other features and APIs you can use. Please refer to https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/rekognition.html

## Using Amazon Translate for machine translation

In [None]:
# Create an output translations directory
dr = os.getcwd()+'/output-translations'
if not os.path.exists(dr):
    os.makedirs(dr)

In [None]:
# Please make sure you have executed the Amazon Transcribe section above before your come here
# Let us use the transcribed list we created before
x = 0
translate_out = 'output-translations/translations.txt'
t_list = []
# the lines list here was created when we executed the Transcribe code sample earlier in this notebook. 
# It contains lines of transcribed text
for line in lines:
    x += 1
    if (x % 2) == 0:
        result = translate.translate_text(Text=line, SourceLanguageCode='auto', TargetLanguageCode='hi')
    else:
        result = translate.translate_text(Text=line, SourceLanguageCode='auto', TargetLanguageCode='fr')
    t_list.append("Line "+str(x)+": "+result['TranslatedText'])
with open(translate_out, 'w') as t_out:
    t_out.write(str(t_list))
# print the translation results
for l in t_list:
    print(l)

In [None]:
# view the contents of our output file
!cat output-translations/translations.txt

## Using Amazon Polly for text to speech

In [None]:
input_text = "I think AI and ML are the most popular skills right now, and I am glad I brought this book. It helps me learn how to build real-world and large scale AI and ML applications on Amazon Web Services. I loved the breadth and depth of coverage on the ML workflow, on using the various features of Amazon SageMaker, and the AI services that made powerful ML models available behind simple API calls. Overall this books is a very good learning resource"

In [None]:
# Create an output directory
dr = os.getcwd()+'/output-audio'
if not os.path.exists(dr):
    os.makedirs(dr)

In [None]:
response = polly.synthesize_speech(VoiceId='Kajal',
                OutputFormat='mp3', 
                Text = input_text,
                Engine = 'neural')

mp3_file = open('./output-audio/chapter10-polly-test.mp3', 'wb')
mp3_file.write(response['AudioStream'].read())
mp3_file.close()

In [None]:
!ls output-audio

### Play the audio

In [None]:
from IPython.display import Audio
Audio('./output-audio/chapter10-polly-test.mp3', autoplay=True)

## Using Amazon Comprehend for deriving insights

In [None]:
# For Comprehend we will take the Transcript output and see what insights we can get from this text
transcript = 'output-transcripts/'+job+'.txt'
print(transcript)

In [None]:
# get the contents of the transcript into a text
with open(transcript, 'r') as comp_in:
    in_text = comp_in.read().split(',')
# lets re-construct a full text from the list of sentences
full_text = ''
for text in in_text:
    full_text += text+'. '

#### Detect Entities

In [None]:
comp_res = comprehend.detect_entities(Text=full_text, LanguageCode='en')
for entity in comp_res['Entities']:
    print("Comprehend is "+str(round(entity['Score']*100,0))+"% confident that "+entity['Text']+" is an entity of type "+entity['Type']+" ")

#### Detect Keyphrases
These are important groups of words within text that when read together provide a relevant summarization of the text

In [None]:
# Read and print the key phrases
comp_res = comprehend.detect_key_phrases(Text=full_text, LanguageCode='en')
for phrase in comp_res['KeyPhrases']:
    print("Comprehend is "+str(round(phrase['Score']*100,0))+"% confident that "+phrase['Text']+" is a key phrase")

#### Detect Sentiment

In [None]:
sent_text = 'Also if you wanted to wait for the two days we could also have a rental car available for you at no charge in case you wanted that in case it takes a little bit longer to fix that will be an option that we can plan out for you as well but again you are also welcome to come in any time before then and we will get you in as soon as we can'
comp_res = comprehend.detect_sentiment(Text=sent_text, LanguageCode='en')
print(comp_res['Sentiment'])
print(comp_res['SentimentScore'])

#### Detect Syntax

In [None]:
synt_text = 'Also if you wanted to wait for the two days we could get a rental car'
comp_res = comprehend.detect_syntax(Text=synt_text, LanguageCode='en')
for token in comp_res['SyntaxTokens']:
    print("The Part of Speech for word: "+token['Text']+" :is: "+token['PartOfSpeech']['Tag'])


For a full list of Comprehend APIs please refer to https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/comprehend.html#comprehend

### END OF NOTEBOOK
Please refer to Chapter 10 in the book for further instructions