# Amazon Transcribe & Comprehend Demo


# Introduction

In this example, we demonstrate how to build a sample application that does the following:

1. Transcribe a video using Amazon Transcribe
2. Generate captions from the transcript and the video using a `nodejs` application. 
3. Processe the transcript with Amazon Comprehend to extract entities
4. Render the video, captions, and entities in a single HTML page


# Prerequisites

To complete this example, you will need to have:
1. An AWS account and the required IAM permissions.
2. Installed the AWS CLI utilities (see [account setup]('./../../../setup/account-setup.ipynb))
3. Downloaded and installed [nodejs](https://nodejs.org/en/download/) if you haven't done so yet.

# The Task

Let's look at the video we want to transcribe, caption, and analize.

In [14]:
%%bash

# Copy the video file locally
# Uncomment to execute

# aws s3 cp s3://aws-ai/transcribe/data/bbc-one-minute-201804012144CT.mp4 .


In [2]:
# IPython.display can render audio, video, html, etc.
from IPython.display import HTML, Video

In [None]:
Video('bbc-one-minute-201804012144CT.mp4')

# Begin


## 1. Submit a transcription job

```bash

aws transcribe --region us-east-1 start-transcription-job --cli-input-json 
'{
    TranscriptionJobName: lab-02,
    LanguageCode: en-US,
    MediaFormat: mp4,
    Media: {
        MediaFileUri: s3://aws-ai/transcribe/data/bbc-one-minute-201804012144CT.mp4
    }
}'

```

In [15]:
%%bash

# Uncomment to execute

# aws transcribe --region us-east-1 start-transcription-job --cli-input-json 
# '{
#     TranscriptionJobName: lab-02,
#     LanguageCode: en-US,
#     MediaFormat: mp4,
#     Media: {
#         MediaFileUri: s3://aws-ai/transcribe/data/bbc-one-minute-201804012144CT.mp4
#     }
# }'

## 2. View running jobs

```bash

aws transcribe --region us-east-1 list-transcription-jobs --status IN_PROGRESS
```

## 3. Get utility code

```bash
wget https://s3.amazonaws.com/mast-mast-3/public/langws/2018/transcribe-utils-node.zip

unzip transcribe-utils-node.zip



npm install

```

## 4. Get the transcript

``` bash
aws transcribe get-transcription-job --transcription-job-name lab-02

# Your file will be in the Transcript: object. Download it.


wget https://s3.amazonaws.com/aws-transcribe-us-east-1-prod/903447430181/lab-02/asrOutput.json\?X-Amz-Security-Token\=FQoDYXdzEHgaDDUMHfhBHKCViIOtfyK3A6REMibWk04gdMO3ZzwjKwsUef6sOj4pow8tFktaQN1yMKbRj1833o1D3%2B23aJqJGRwIUFHC7ABVN%2FYP1oR4uOEJI65kSi2oKedl3JqznLSUFRoumUfxGntGgeNo%2Blimvir55Rwtdrr9Z3KLYnCa0Uny%2BChjbTEv%2BZUs%2FDjvmTqPlPJpGiNMM9q%2FBrtdhLHKni07rGMDBncRWmo6JP8Jr59EWY5GhdrJxjrv69Gatk8sEbJmh9llzckEXXar39ueBsYAK2m0xJomAynIs1OcAkeuM0lXJiHd%2FjKaMrM4cjeXCaCHKOOiBQM7D2a4Gu2XvzoBx8JGQOSj4RYs6%2BDpt3lSPQRJqKxQdXo92K%2BnEPEtelxOCcvfRudAys%2BIx3vJmJbfyZXybOvEvIsKiQph9MvrJ%2FetHFdL1y%2B0GhSV0gz18pcyQcBd3ZoVcdTIGHl2HycPPYGS2PDGv9oL2CE%2FpyHs1QZhYmuCpsd2RT4w7bHXfdhc71Vu6ZRI%2BW4PYW1Feh%2FRAhhzs3xxUG%2BM2%2F2AWCQWHvF6U0Am7Of9w5fSnuYaT0b14n7YgI06lxnL0BNzU5Uas7p%2FwH8op8Sg2AU%3D\&X-Amz-Algorithm\=AWS4-HMAC-SHA256\&X-Amz-Date\=20180525T154353Z\&X-Amz-SignedHeaders\=host\&X-Amz-Expires\=899\&X-Amz-Credential\=ASIAJWI55L4H4JG3VOMQ%2F20180525%2Fus-east-1%2Fs3%2Faws4_request\&X-Amz-Signature\=8fb790a0233d11e887dd3269046ccadda3a18ee3e12923f50b1324b2474910e9



# Rename the file to JOB_NAME.json


mv asrOutput* lab-02.json

```

## 5. Generate the VTT

```bash

node transcript-to-multi-vtt.js lab-02.json > lab-02.vtt

csplit -ks -f lab-02 lab02.vtt '/^WEBVTT/'
csplit -ks -f lab-02 lab-02.vtt '/^WEBVTT/'
mv -v lab-0200 videoFile00.mp4_en.vtt
mv -v lab-0201 videoFile01.mp4_en.vtt
```

## 6. Create the HTML Wrappers

```bash
node transcript-to-html.js lab-02.json bbc-one-minute-201804012144CT.mp4 "video/mp4" > "videoFile.mp4.html"

```

## 7. Read the HTML File in Jupyter

```python

HTML('videoFile.mp4.html')

```

In [3]:
HTML('videoFile.mp4.html')

Entity,Type,Max Confidence,Count
Last month,DATE,0.99,1
this month,DATE,0.94,1
Beijing,LOCATION,0.57,1
Korea,LOCATION,0.67,1
US,LOCATION,0.9,2
BBC,ORGANIZATION,0.78,1
China,ORGANIZATION,0.81,2
South Korean,OTHER,0.75,1
kim jong un,PERSON,0.86,1
trump,PERSON,0.97,1

Index,Avg. Confidence,Terms,Unsure
"00:00:03,440",0.97,20,0
"00:00:10,100",0.99,28,0
"00:00:20,090",0.97,33,0
"00:00:30,340",0.98,22,0
"00:00:40,190",0.99,35,0
"00:00:50,840",1,3,0
,0% unsure,141,0
