# Techniques for analysing, tagging & detecting breaks in social media video

This notebook contains sample code that will analyse a video file and determine appropriate tags and breaks in content. It utilises Amazon Transcribe, Amazon Rekognition and Amazon Bedrock.

To run this notebook will incur costs from deployment and utilisation of AWS services. Please ensure the steps followed in the 'Cleanup' section are followed. 

### Prerequisites


#### IAM Role
The IAM role that you are using in this notebook will require access to the following services: 
- Amazon Transcribe 
- Amazon Rekognition 
- Amazon Bedrock
- An S3 bucket

Additionally, you will need to ensure that Anthrophic Claude is enabled in your Amazon Bedrock console under "Model Access". 

The S3 bucket is where you will store your video file. This S3 bucket will also be used for temporary 'scratch space' storage. 


#### Required Libraries

In [None]:
%conda install -q -y boto3 ffmpeg huggingface_hub numpy numba jinja2 sqlalchemy 

In [None]:
%pip install boto3 --upgrade opencv-python-headless langchain librosa  

You should reboot the kernel after installing the required dependancies. 

#### Configure your variables

First specify the bucket and file location of the video you wish to analyse. 

It is recommended to use a compressed verison of your video file for analysis as a full, high definition copy is not required. At the end of this notebook there is some example code using AWS Elemenental MediaConvert that you can use to compress your video. 

In [None]:
bucket_name="s3-bucket-name-for-video"

key="myvideo-for-analysis.mp4" 


In [None]:
import uuid
import time 
import jinja2
import boto3
import json
import re
import pandas as pd
import cv2
import math 
from langchain.llms import Bedrock
from langchain.prompts import PromptTemplate
from langchain.chains.llm import LLMChain
import base64 
import os
import librosa
import pandas as pd
import numpy as np

media_file_uri="s3://" + bucket_name + "/" + key 

jobID=str(uuid.uuid4()) 


## Solution Deployment

### Obtaining a Transcript using Amazon Transcribe

In this section we will start a Transcribe job using the specified video file as the input. 
This will output the transcription as well as the SRT subtitles file to a folder on the S3 bucket called 'transcribeOutput'

In [None]:
def create_transcribe_job(media_file_uri, output_bucket, output_key, language_code='en-US'):
    transcribe = boto3.client('transcribe')
    job_name = 'TranscribeJob-' + jobID
    response = transcribe.start_transcription_job(
        TranscriptionJobName=job_name,
        Media={'MediaFileUri': media_file_uri},
        MediaFormat='mp4',  # Change this to the format of your media file
        OutputBucketName=output_bucket,
        OutputKey=output_key,
        LanguageCode=language_code,
        Subtitles={
        'Formats': [
            'vtt','srt'
        ],
        'OutputStartIndex': 1
        },
        ToxicityDetection=[
        {
            'ToxicityCategories': [
                'ALL',
            ]
        }
        ] 
    )

    print("Transcription job created:")
    print(response)
    return job_name

def wait_until_transcription_job_completed(job_name):
    transcribe = boto3.client('transcribe')

    while True:
        response = transcribe.get_transcription_job(TranscriptionJobName=job_name)
        status = response['TranscriptionJob']['TranscriptionJobStatus']

        if status in ['COMPLETED', 'FAILED']:
            print(f"Transcription job {job_name} {status.lower()}.")
            break

        print(f"Transcription job {job_name} is still in progress. Checking again in 30 seconds...")
        time.sleep(30)


In [None]:
output_bucket = bucket_name
output_key = "transcribeOutput/transcribe-" + jobID
job_name=create_transcribe_job(media_file_uri, output_bucket, output_key)
wait_until_transcription_job_completed(job_name)

Once the transcription job is completed, we can retrieve the SRT subtitles file and the transcription.

In [None]:
def download_from_s3(bucket_name, object_key, local_path):
    s3 = boto3.client('s3')

    try:
        s3.download_file(bucket_name, object_key, local_path)
        print(f"File downloaded successfully from S3: {local_path}")
    except:
        print("File not available")


In [None]:
object_key=output_key + ".srt"
srt_file_path = "output-transcribe.srt"
download_from_s3(output_bucket, object_key, srt_file_path)

transcribe_json = "output-transcribe-text.json"
download_from_s3(output_bucket, output_key, transcribe_json)


We now have the SRT (subtitles with timestamps) and the JSON file of the transcription itself. 

In [None]:
#### now lets get the transcription 
with open(transcribe_json, 'r') as file:
    data = json.load(file)
transcript = data.get('results', {}).get('transcripts', [{}])[0].get('transcript', None)

# Print the transcript
print("Transcript:", transcript)


The following function will parse an SRT file and determine all the points where speech occurs. It will populate a dictionary every 500ms describing whether or not speech occurs. It will store this as a dictonary that we can use later for determining where breaks in the content are. 

In [None]:


def parse_file(file_path):
    with open(file_path, 'r') as file:
        content = file.read()

    # Define a regular expression pattern to extract timestamps and text
    pattern = re.compile(r'(\d+:\d+:\d+,\d+) --> (\d+:\d+:\d+,\d+)\n(.+?)\n\n', re.DOTALL)

    # Find all matches in the content
    matches = re.findall(pattern, content)

    # Create a dictionary to store whether text appears every 0.5 seconds
    time_dict = {}

    # Process matches and update the dictionary
    for match in matches:
        start_time, end_time, _ = match
        start_seconds = convert_to_seconds(start_time)
        end_seconds = convert_to_seconds(end_time)

        current_time = start_seconds
        while current_time < end_seconds:
            # Round to the nearest 0.5 seconds
            time_key = round(current_time * 2) / 2
            time_dict.setdefault(time_key, True)
            current_time += 0.5

    # Iterate over every 0.5 seconds and set values to False if no text appears
    total_duration = convert_to_seconds(matches[-1][1])  # Duration of the entire video
    for time_key in range(0, int(total_duration * 2) + 1):
        time_key /= 2
        if time_key not in time_dict:
            time_dict[time_key] = False

    sorted_time_dict = dict(sorted(time_dict.items()))
    df = pd.DataFrame(list(sorted_time_dict.items()), columns=['Time', 'Speech Appears'])
    df['Speech Appears'] = df['Speech Appears'].map(lambda x: '1' if x else '0')


    return df 


def convert_to_seconds(time_str):
    # Convert timestamp to seconds, including milliseconds
    h, m, s, ms = map(int, time_str.replace(',', ':').split(':'))
    return h * 3600 + m * 60 + s + ms / 1000



In [None]:
transcribe_pd = parse_file(srt_file_path)
# Print the dictionary
print("Time Dictionary:")
print (transcribe_pd)

### Analyse Transcript using a LLM

In this stage we will use Bedrock to analyse the transcript and determine what the content is about and what tags might be most appropriate to apply to the video.

In [None]:

def get_text_response_from_bedrock(text, model_id):    
    llm = Bedrock(model_id=model_id, region_name='us-east-1')
    prompt = PromptTemplate(
    template="""Human:
                    {text}
                    Assistant: """,
                    input_variables=["text"],
    )
    llmchain = LLMChain(llm=llm, prompt=prompt)

    response=llmchain.invoke({"text":text})
    return response

In [None]:
answer = get_text_response_from_bedrock('Here is a transcript from a video: \n\n' 
                                        + str(transcript) + 
                                        '\n\n Analyse the transcript and determine what type of video it is and what is happening. '
                                        ,'anthropic.claude-v2:1')
transcriptllm=answer['text']
transcriptllm

In [None]:
answer = get_text_response_from_bedrock(' Here is the transcript from the video: \n\n' 
                                        + str(transcript) + 
                                        '\n\n What are the top three keywords you would use for the content above. Output as a comma seperated list.'
                                        ,'anthropic.claude-v2:1')
tagsllm=answer['text']
tagsllm 

### Using Amazon Rekognition Shot Detection to determine changes in scene

In this step we will use the Shot Dection feature of Amazon Rekognition to determine where the scene changes. 
This is used to firstly ensure we have a screen shot of every scene to generate a caption for. Secondly it is used to determine where there might be a break in the content. For the breaks we'll round this to the nearest 0.5 second to match with the speech we detected above.

In [None]:

def round_down_to_nearest_half_second(timestamp_millis):
    # Convert milliseconds to seconds and round down to the nearest 0.5 seconds
    rounded_seconds = math.floor(timestamp_millis / 1000.0 * 2) / 2.0
    return float(rounded_seconds)

def populate_segment_indicator_dict(results):
    segment_indicator_dict = {}
    duration=results['VideoMetadata'][0]['DurationMillis']
    duration=round_down_to_nearest_half_second(duration)
    segments=results['Segments'] 
    
    current=0
    while current < duration:
        segment_indicator_dict[current]=0 
        current=current+0.5
    
    for segment in segments:
        start_time=segment['StartTimestampMillis']
        start_time_segment = round_down_to_nearest_half_second(segment['StartTimestampMillis'])
        segment_indicator_dict[start_time_segment]=start_time
    return segment_indicator_dict

rekognition_client = boto3.client('rekognition')

def detect_video_segments(bucket_name, video_file, interval=0.5):

    # Start segment detection
    response = rekognition_client.start_segment_detection(
        Video={
            'S3Object': {
                'Bucket': bucket_name,
                'Name': video_file
            }
        },
        Filters={
             'ShotFilter': {
            'MinSegmentConfidence': 70
        }

        },
        SegmentTypes=['SHOT'] 
    )

    job_id = response['JobId']
    return job_id

def get_results(job_id):
    # Wait for segment detection to complete
    while True:
        result = rekognition_client.get_segment_detection(JobId=job_id)

        if result['JobStatus'] in ['SUCCEEDED', 'FAILED']:
            break
        time.sleep(5)  # Wait for 5 seconds before checking again

    # Analyze segment results
    segments = result
    segment_indicator_dict = populate_segment_indicator_dict(segments)


    return segment_indicator_dict




In [None]:
rek_job_id = detect_video_segments(bucket_name, key)

In [None]:
segment_indicator_dict = get_results(rek_job_id)
rekognition_pd = pd.DataFrame(list(segment_indicator_dict.items()), columns=['Time', 'Shot Transition'])
print(rekognition_pd)

In [None]:
transition_frames_ms = rekognition_pd[rekognition_pd['Shot Transition'] != 0]['Shot Transition'].values

# Display the array
print(transition_frames_ms)


Now we have the frames (milliseconds) where there is a transition and we'd want to generate a caption for each one. 

Firstly, we'll remove any contents of the screenshots folder.

In [None]:
%mkdir screenshots
%rm -Rf screenshots/* 

Download the video from S3 for local processing 

In [None]:
download_from_s3(bucket_name, key, key)


In [None]:


def extract_frame(video_path, frame_milliseconds, output_path):
    # Open the video file
    cap = cv2.VideoCapture(video_path)

    # Set the frame position to the desired milliseconds
    cap.set(cv2.CAP_PROP_POS_MSEC, frame_milliseconds)

    # Read the frame at the specified position
    success, frame = cap.read()

    # If the frame is successfully read, save it
    if success:
        cv2.imwrite(output_path, frame)
        print(f"Frame at {frame_milliseconds} milliseconds extracted and saved to {output_path}")
    else:
        print(f"Failed to extract frame at {frame_milliseconds} milliseconds")

    # Release the video capture object
    cap.release()
    
    


In [None]:
for frame_milliseconds in transition_frames_ms:
    # add 10ms to ensure into the new scene 
    frame_milliseconds=frame_milliseconds+10 
    
    output_path = 'screenshots/frame_' + str(frame_milliseconds) + '.jpg'
    extract_frame(key, frame_milliseconds, output_path)


## Use Claude 3 to create a caption for each scene

By using Amazon Bedrock with the Claude 3 Multi-Modal model, it is possible to generate a caption for every shot captured. 

In [None]:

def encode_image(img_file):
    with open(img_file, "rb") as image_file:
        img_str = base64.b64encode(image_file.read())
        base64_string = img_str.decode("utf-8")
    return base64_string

def run_inference(bedrock_runtime, model_id, messages, max_tokens):
    body = json.dumps(
        {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": max_tokens,
             "messages": messages
        }
    )

    response = bedrock_runtime.invoke_model(
        body=body, modelId=model_id)
    response_body = json.loads(response.get('body').read())

    return response_body["content"][0]["text"] 


def process_files_in_folder(folder_path):
    captions={}
    try:
        # List all files in the specified folder
        files = os.listdir(folder_path)

        # Loop through each file in the folder
        for file_name in files:
            # Full path to the file
            file_path = os.path.join(folder_path, file_name)

            # Check if it's a file (not a subfolder)
            if os.path.isfile(file_path):
                # Perform your action here
                base64_string = encode_image(file_path)
                message = {"role": "user",
                 "content": [
                    {"type": "image", "source": {"type": "base64",
                    "media_type": "image/jpeg", "data": base64_string}},
                    {"type": "text", "text": "What is happening in this image?"}
                    ]}

                match = re.search(r'frame_(\d+)', file_path)

                if match:
                    extracted_number = float(match.group(1))
                else: 
                    extracted_number=""
                seconds_timestamp=round_down_to_nearest_half_second(extracted_number)
                bedrock_runtime = boto3.client(service_name='bedrock-runtime')
                model_id = 'anthropic.claude-3-sonnet-20240229-v1:0'
                messages = [message]
                caption=run_inference(bedrock_runtime,model_id,messages, 100)
                captions[seconds_timestamp]=caption

    except FileNotFoundError:
        print(f"Error: Folder not found - {folder_path}")
    return captions 

captions=process_files_in_folder('screenshots')

# Create a pandas DataFrame

captions_df = pd.DataFrame(list(captions.items()), columns=['Time', 'Captions'])
captions_df = captions_df.sort_values(by='Time')

listofcaptions=captions_df['Captions'].tolist()


In [None]:
listofcaptions

We want a caption for every 0.5s, so we need to 'infill' the gaps with the previous caption. 

In [None]:
captions_df['Time'] = pd.to_numeric(captions_df['Time'])
captions_df = captions_df.sort_values(by='Time')

last_row=captions_df.iloc[-1]
last_row=float(last_row['Time']) 

time_values = [i * 0.5 for i in range(0, int(last_row * 2) + 1)]
df = pd.DataFrame({'Time': time_values})
captions_df = pd.merge(df, captions_df, on='Time', how='left')
captions_df['Captions'].fillna(method='ffill', inplace=True)


In [None]:
captions_df

### Use Bedrock to analyse the captions 

We can now use an LLM via Bedrock to analyse what is happening in the video based on its captions. 

In [None]:

answer = get_text_response_from_bedrock('Here is a list of captions of what is happening in a video: \n\n' 
                                        + str(listofcaptions) + 
                                        '\n\n Summarise what is happening.'
                                        ,'anthropic.claude-v2:1')
captionllm=answer['text']
captionllm


### Combining captions and transcription 

Next we can combine the captions with the transcript to create a richer view of what is happening. 

In [None]:
answer = get_text_response_from_bedrock('Here is a list of captions of what is happening in a video: \n\n' 
                                        + str(listofcaptions) + 
                                        '\n\n Here is the transcript from the video: \n\n' 
                                        + str(transcript) + 
                                        '\n\n What is happening in the video?'
                                        ,'anthropic.claude-v2:1')
tagsllm=answer['text']
tagsllm 

### Gather volume levels from the video

We can use the librosa library to extract the volume levels for the video every 0.5s. 

In [None]:


def extract_volume_levels(audio_file_path):
    # Load audio file using librosa
    audio, sr = librosa.load(audio_file_path, sr=None)

    # Calculate volume levels every 500 milliseconds
    frame_size = int(sr / 2)  # 500 milliseconds
    num_frames = len(audio) // frame_size

    volume_data = {"Time": [], "Volume": []}

    for i in range(num_frames):
        frame = audio[i * frame_size: (i + 1) * frame_size]
        time_in_seconds = i * 0.5  # 500 milliseconds is 0.5 seconds

        volume_level = np.mean(np.abs(frame))
        normalized_volume = int((volume_level / np.max(audio)) * 255)
        
        volume_data["Time"].append(time_in_seconds)
        volume_data["Volume"].append(normalized_volume)

    # Create a Pandas DataFrame
    df = pd.DataFrame(volume_data)
    return df 


#### Extract Volume

If you get errors referring to formats, please ensure you have installed ffmpeg. 
Depending on your input file format, you may get warnings about using a different module to read the audio file. You may continue to the next cell. 

In [None]:
volume_df=extract_volume_levels(key)

### Merge the data together 

Now we have data on the shot transitions, the captions and whether speech is appearing, we can combine this into a single source.

In [None]:
merged_df=pd.merge(rekognition_pd, transcribe_pd, on='Time', how='outer')
merged_df=pd.merge(merged_df, volume_df, on='Time', how='outer')
merged_df=pd.merge(merged_df, captions_df, on='Time', how='outer')

merged_df = merged_df.fillna(0)
csvkey='analysis-'+ key + '.csv'
merged_df.to_csv(csvkey, index=False)
merged_df 

### Create a score for each 0.5s for suitability for a break 

There is some further analysis we can do on this data. We can determine whether a 0.5s interval is in the quietest 10% of the video, and also do this on a sliding window of every 30 seconds. 

We also invert some of the data so it is a binary 'yes or no' and then create a rudimentary score for every 0.5s as to its suitability for a break.

In [None]:
def sliding_label_bottom_10_percent(df):
    total_rows = len(df)
    bottom_10_percent = int(0.1 * total_rows)

    # Create a new column 'Bottom_10_Percent' and initialize with False
    df['sliding_quiet'] = 0

    # Iterate through every 60 data points and label the lowest 10% as True
    for i in range(0, total_rows, 60):
        subset = df.iloc[i:i+60]  # Select every 60 data points
        bottom_10_subset = subset.nsmallest(6, 'Volume')  # Select the lowest 10%
        df.loc[bottom_10_subset.index, 'sliding_quiet'] = 1
    return df 

analysis_df=merged_df 

# Turn the RMS into a postive number 
analysis_df['Volume'] = analysis_df['Volume']

# Invert Speech Appears so its a 0 when there is silence 
analysis_df['No Speech'] = 1 - analysis_df['Speech Appears'].astype(int)

# Shot Transition is to be a 1 or a 0 
analysis_df['Shot Transition Binary'] = analysis_df['Shot Transition'].apply(lambda x: 1 if x != 0.0 else 0)

# now find the lowest 10% of the RMS values 
lowest_10_percent = analysis_df['Volume'].quantile(0.1)
analysis_df['quiet'] = analysis_df['Volume'].apply(lambda x: 1 if x <= lowest_10_percent else 0)

#### get the sliding 10%
analysis_df=sliding_label_bottom_10_percent(analysis_df)

analysis_df['Break Score'] = analysis_df['Shot Transition Binary'] + analysis_df['No Speech'] + analysis_df['quiet']+ analysis_df['sliding_quiet']

statscsvkey='stats-analysis-'+ key + '.csv'
analysis_df.to_csv(statscsvkey, index=False)
analysis_df.set_index('Time', inplace=True)

analysis_df.head(200)

### Optional: Overlay the data onto the video by using SRT captions

Having the data in textual format every 0.5s is useful and can be used for determinig where to break the content and also for tagging the video.
However, for experimentation purposes it is useful to be able to visualise the data we have captured.

For this, we will overlay the data using SRT captions onto the video using Amazon Elemental MediaConvert.

You can obtain your MediaConvert API Endpoint URL from the AWS Elemental MediaConvert console under "Account". 

In [None]:
media_convert_endpoint='https://xxxxxxxx.mediaconvert.us-east-1.amazonaws.com'

You will need to create an IAM role for MediaConvert. If using the IAM console, when creating a IAM role, if the "Use Case" is selected as "MediaConvert" the default permissions will allow access to your S3 bucket. 

In [None]:
media_convert_arn='arn:aws:iam::xxxxxxxxx:role/service-role/MediaConvert_Default_Role'

You will then need to amend the role your are using to execute this notebook to allow the "iam:PassRole" action. 

In [None]:
def seconds_to_srt_time(seconds):
    hours, remainder = divmod(seconds, 3600)
    minutes, seconds = divmod(remainder, 60)
    milliseconds = int((seconds - int(seconds)) * 1000)
    formatted_time = "{:02}:{:02}:{:02},{:03}".format(int(hours), int(minutes), int(seconds), milliseconds)
    return formatted_time


def csv_to_srt(csv_file_path, srt_file_path='output.srt'):
    # Read the CSV file
    df = pd.read_csv(csv_file_path)

    # Create an SRT file
    with open(srt_file_path, 'w') as srt_file:
        for index, row in df.iterrows():
            start_time = seconds_to_srt_time(row['Time']) 
            end_time = seconds_to_srt_time(row['Time']+0.5) 

            # Format time as HH:MM:SS,mmm
            srt_file.write(f"{index+1}\n")
            srt_file.write(f"{start_time} --> {end_time}\n")
            volume=int(row['Volume'])

            text_to_display='' 

            text_to_display += ' Break Score: ' + str(row['Break Score']) + '/4\n' 
                
            if row['Speech Appears'] == 1:
                text_to_display += ' ** SPEECH ** \n'
            else: 
                text_to_display += ' - \n' 
            if row['Shot Transition'] != 0:
                text_to_display += ' ** SHOT TRANSITION ** ' + str(row['Shot Transition']) +'\n'
            else:
                text_to_display += ' - \n'
            if row['Volume']:
                text_to_display += ' VOLUME: ' + str(volume) + '\n' 
            else:
                text_to_display += ' 0\n'
            if row['Captions']:
                text_to_display += ' C: ' + row['Captions'] + '\n' 
            else:
                text_to_display += ' No captions\n'

            srt_file.write(f"{text_to_display}\n\n")


    print(f"SRT file '{srt_file_path}' has been created.")

csv_to_srt(statscsvkey)

In [None]:
def upload_to_s3(file_path, bucket_name, object_name):
    s3 = boto3.client('s3')

    try:
        s3.upload_file(file_path, bucket_name, object_name)
        print(f"File uploaded to S3: s3://{bucket_name}/{object_name}")
    except Exception as e:
        print(f"Error uploading file to S3: {e}")

# Example usage:
srt_file_path = 'output.srt'  # Replace with the actual path to your SRT file
s3_bucket_name = bucket_name
s3_object_name = 'output.srt'

upload_to_s3(srt_file_path, s3_bucket_name, s3_object_name)

In [None]:
def create_mediaconvert_job_captions(media_file_uri,bucket_name):


    # Initialize a MediaConvert client
    mediaconvert = boto3.client('mediaconvert', endpoint_url=media_convert_endpoint)
    jobinputs ={
    "OutputGroups": [
      {
        "Name": "File Group",
        "Outputs": [
          {
            "ContainerSettings": {
              "Container": "MP4",
              "Mp4Settings": {}
            },
            "VideoDescription": {
              "CodecSettings": {
                "Codec": "H_264",
                "H264Settings": {
                  "MaxBitrate": 1000000,
                  "RateControlMode": "QVBR",
                  "SceneChangeDetect": "TRANSITION_DETECTION"
                }
              }
            },
            "AudioDescriptions": [
              {
                "AudioSourceName": "Audio Selector 1",
                "CodecSettings": {
                  "Codec": "AAC",
                  "AacSettings": {
                    "Bitrate": 96000,
                    "CodingMode": "CODING_MODE_2_0",
                    "SampleRate": 48000
                  }
                }
              }
            ],
            "CaptionDescriptions": [
              {
                "CaptionSelectorName": "Captions Selector 1",
                "DestinationSettings": {
                  "DestinationType": "BURN_IN",
                  "BurninDestinationSettings": {
                    "FontSize": 14,
                    "FontColor": "RED",
                    "BackgroundColor": "WHITE"

                  }
                }
              }
            ]
          }
        ],
        "OutputGroupSettings": {
          "Type": "FILE_GROUP_SETTINGS",
          "FileGroupSettings": {
            "Destination": "s3://"+bucket_name+"/outputVideos/"
          }
        }
      }
    ],
    "FollowSource": 1,
    "Inputs": [
      {
        "AudioSelectors": {
          "Audio Selector 1": {
            "Tracks": [
              1
            ],
            "DefaultSelection": "DEFAULT",
            "SelectorType": "TRACK"
          }
        },
        "VideoSelector": {},
        "TimecodeSource": "ZEROBASED",
        "CaptionSelectors": {
          "Captions Selector 1": {
            "SourceSettings": {
              "SourceType": "SRT",
              "FileSourceSettings": {
                "SourceFile": "s3://"+bucket_name+"/output.srt"
              }
            }
          }
        },
        "FileInput": media_file_uri
      }
    ]
  }


    # Create a MediaConvert job
    response = mediaconvert.create_job(
        Role=media_convert_arn,
        Settings=jobinputs
    )
    
    print("MediaConvert job created successfully.")
    print("Job ID:", response['Job']['Id'])

In [None]:
create_mediaconvert_job_captions(media_file_uri,bucket_name)

Once this job is complete, you'll be able to see your original video with our analysis overlayed. It will be in your bucket in the 'outputVideos' folder.

## Clean Up


The first step is to remove the SageMaker endpoint.

In [None]:
sm_client.delete_model(ModelName=model_name)
sm_client.delete_endpoint(EndpointName=endpoint_name)

Next, you should remove any IAM roles you have created.

The S3 bucket you utilised to store your videos and the analysis files can be removed if desired.

The video file and analysis outputs will also have been downloaded locally for volume analysis and can be removed. 

## Additional Code: Compressing Video

If you wish, you can use the following code as a first step to compress your video file using the AWS Elemental MediaConvert service. This will create a compressed version of your video in the original S3 bucket with the suffix "-compressed". 

In [None]:
def create_mediaconvert_job_compress(media_file_uri,bucket_name):


    # Initialize a MediaConvert client
    mediaconvert = boto3.client('mediaconvert', endpoint_url=media_convert_endpoint)
    jobinputs ={
      "OutputGroups": [
      {
        "Name": "File Group",
        "Outputs": [
          {
            "ContainerSettings": {
              "Container": "MP4",
              "Mp4Settings": {}
            },
            "VideoDescription": {
              "Height": 720,
              "CodecSettings": {
                "Codec": "H_264",
                "H264Settings": {
                  "MaxBitrate": 1000000,
                  "RateControlMode": "QVBR",
                  "SceneChangeDetect": "TRANSITION_DETECTION"
                }
              }
            },
            "AudioDescriptions": [
              {
                "CodecSettings": {
                  "Codec": "AAC",
                  "AacSettings": {
                    "Bitrate": 96000,
                    "CodingMode": "CODING_MODE_2_0",
                    "SampleRate": 48000
                  }
                }
              }
            ],
            "NameModifier": "-compressed"
          }
        ],
        "OutputGroupSettings": {
          "Type": "FILE_GROUP_SETTINGS",
          "FileGroupSettings": {
            "Destination": "s3://" + bucket_name + "/"
          }
        }
      }
    ],
    "FollowSource": 1,
    "Inputs": [
      {
        "AudioSelectors": {
          "Audio Selector 1": {
            "DefaultSelection": "DEFAULT"
          }
        },
        "VideoSelector": {},
        "TimecodeSource": "ZEROBASED",
        "FileInput": media_file_uri
      }
    ]
  }



    # Create a MediaConvert job
    response = mediaconvert.create_job(
        Role=media_convert_arn,
        Settings=jobinputs
    )
    return response['Job']['Id']

def check_mediaconvert_job_status(job_id):
    mediaconvert = boto3.client('mediaconvert', endpoint_url=media_convert_endpoint)

    while True:
        response = mediaconvert.get_job(Id=job_id)
        job_status = response['Job']['Status']

        print(f"Job ID: {job_id}, Status: {job_status}")

        if job_status in ['COMPLETE', 'ERROR']:
            break

        time.sleep(30)  # Sleep for 30 seconds before checking again

    if job_status == 'COMPLETE':
        print("MediaConvert job completed successfully.")
    else:
        print("MediaConvert job failed.")



In [None]:
jobid=create_mediaconvert_job_compress(media_file_uri,bucket_name)
check_mediaconvert_job_status(jobid)