# Video Moderation - detecting inappropriate information in stored videos with image API

Generally speaking, We recommend using the Amazon Rekonition video-based API [StartContentModeration](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_StartContentModeration.html) for video content moderation, however, you can also choose to independently sample frames from videos and detect inappropriate content by sending the images to the Amazon Rekognition image-based API [DetectModerationLabels](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_DetectModerationLabels.html). Image results are returned in real time with labels for inappropriate content or offensive content along with a confidence score. 

Depends on your requirements on accuracy, cost, performance, and architecture complexity, you can choose either of the approaches that best suited for your use case. Please refer to this blog for the detailed [comparison of the two content moderation approaches](https://aws.amazon.com/blogs/machine-learning/how-to-decide-between-amazon-rekognition-image-and-video-api-for-video-moderation/).

This lab will show you how to use [ffmpeg](https://ffmpeg.org/) to sample frames from video and store them as images, then send those images for content moderation using image moderatin API and show moderation results in json format

![video-moderation-with-image-api](../images/video-moderation-with-image-api.png)

- [Step 1: Setup Notebook](#step1)
- [Step 2: Sample image frames](#step2)
- [Step 3: Moderate sample image frames](#step3)
- [Step 4: Clean up](#step4)

# Step 1: Setup Notebook <a id="step1"></a>
Run the below cell to install/update Python dependencies if you run the lab using a local IDE. It is optional if you use a SageMaker Studio Juypter Notebook, which already includes the dependencies in the kernel. 

In [3]:
# First, let's get the latest installations of our dependencies
%pip install pip -qU 
%pip install boto3 -qU
%pip install IPython -qU

[0mNote: you may need to restart the kernel to use updated packages.
[0mNote: you may need to restart the kernel to use updated packages.
[0mNote: you may need to restart the kernel to use updated packages.


Run the below cell to install [ffmpeg](https://ffmpeg.org/) which will be used to decode the video file and sample image frames

In [4]:
# Install ffmpeg
!conda install ffmpeg -y
!which ffmpeg

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Retrieving notices: ...working... done
/opt/conda/bin/ffmpeg


Import needed Python libraries and set up environment variables

In [5]:
import boto3
import sagemaker as sm
import os
import io
from datetime import datetime
from IPython.display import HTML, display
import uuid
import json
import time
import subprocess

# Constants
IMAGE_NAME_EXTENSION = '.png'
LOCAL_DIR = '/tmp'
SAMPLE_FREQUENCY = 2 # 2 image every 1 seconds
API_NAME = 'cm_video_moderation_image_sampling'
HOME_DIR=os.getcwd()
VIDEO_LOCATION = HOME_DIR + "/../datasets/moderation-video.mp4"
MIN_CONFIDENCE = 50 

# print(HOME_DIR)
# print(VIDEO_LOCATION)

# Initializing environment variables
bucket_name = sm.Session().default_bucket()
region = boto3.session.Session().region_name

os.environ["BUCKET"] = bucket_name
os.environ["REGION"] = region
role = sm.get_execution_role()
list_temp_s3_prefix = []

print(f"SageMaker role is: {role}\nDefault SageMaker Bucket: s3://{bucket_name}")

s3=boto3.client('s3', region_name=region)
data_bucket = boto3.resource('s3').Bucket(bucket_name)
rekognition=boto3.client('rekognition', region_name=region)

SageMaker role is: arn:aws:iam::206236004915:role/service-role/SageMaker-cm-role
Default SageMaker Bucket: s3://sagemaker-us-east-1-206236004915


# Step 2: Sample image frames <a id="step2"></a>
Use ffmpeg to sample image frames from the stored video file

In [6]:
ffmpeg_cmd = f"ffmpeg -i {VIDEO_LOCATION} -r {SAMPLE_FREQUENCY} {LOCAL_DIR}/%07d{IMAGE_NAME_EXTENSION}"
cmd = ffmpeg_cmd.split(' ')
p1 = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

# Step 3: Moderate sampled image frames <a id="step3"></a>
Upload sampled images to s3 bucket for moderation

In [7]:
# Parse the video filename and generate S3 prefix for sampled image frames
file_name=VIDEO_LOCATION.split('/')[-1].replace('.','-')
print("Video file name is: " + file_name)
folder_suffix = datetime.now().strftime('%Y%m%d-%H-%M')
# Target folder: using the video file name as a sub folder
s3_target_folder = file_name.lower() + "-" + folder_suffix
print("S3 prefix is: " + s3_target_folder)

Video file name is: moderation-video-mp4
S3 prefix is: moderation-video-mp4-20230622-16-38


Upload sampled image frames to S3 and call image-based API [DetectModerationLabels](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_DetectModerationLabels.html) to moderate them.

In [8]:
# Define the function to moderate image samples using image moderation API
def moderate_image(s3_bucket, s3_key):
    ts = s3_key.split('/')[-1].replace(IMAGE_NAME_EXTENSION,'')
    detectModerationLabelsResponse = rekognition.detect_moderation_labels(
           Image={
               'S3Object': {
                   'Bucket': s3_bucket,
                   'Name': s3_key,
               }
           },
           MinConfidence=MIN_CONFIDENCE,
    
        )
    result = {"Timestamp": float(ts), "ModerationLabel": []}
    for l in detectModerationLabelsResponse["ModerationLabels"]:
        result["ModerationLabel"].append(
            {
                "Confidence": l["Confidence"],
                "Name": l["Name"],
                "ParentName": l["ParentName"]
            }
        )
    return result

# Upload images to s3 and perform moderation, and cleanup temp files on local disk
labels = []
for file in os.listdir(LOCAL_DIR):
    if file.endswith(IMAGE_NAME_EXTENSION):
        # convert file name from sequence to time position
        seq = float(file.replace(IMAGE_NAME_EXTENSION,''))
        ms_pos = 1/SAMPLE_FREQUENCY * (seq-1) * 1000
        s3.upload_file(f'{LOCAL_DIR}/{file}', bucket_name, f'{s3_target_folder}/{ms_pos}.png')
            
        # moderate image
        mr = moderate_image(bucket_name, f'{s3_target_folder}/{ms_pos}.png')
        if mr is not None and len(mr["ModerationLabel"]) > 0:
            labels.append(mr)
            
    # Delete local file: image or video
    os.remove(f'{LOCAL_DIR}/{file}')

list_temp_s3_prefix.append(s3_target_folder)

Display the moderation results

In [9]:
# sort labels
labels.sort(key=lambda x: x["Timestamp"], reverse=False)
    
result = {
        "API": API_NAME,
        "Video": {
            "S3Bucket": bucket_name,
            "S3ObjectName": file_name
        },
        "ModerationLabels": labels
    }

# Display results
print(result)
    

{'API': 'cm_video_moderation_image_sampling', 'Video': {'S3Bucket': 'sagemaker-us-east-1-206236004915', 'S3ObjectName': 'moderation-video-mp4'}, 'ModerationLabels': [{'Timestamp': 13500.0, 'ModerationLabel': [{'Confidence': 99.51351165771484, 'Name': 'Barechested Male', 'ParentName': 'Suggestive'}, {'Confidence': 99.51351165771484, 'Name': 'Suggestive', 'ParentName': ''}]}, {'Timestamp': 14000.0, 'ModerationLabel': [{'Confidence': 99.24789428710938, 'Name': 'Barechested Male', 'ParentName': 'Suggestive'}, {'Confidence': 99.24789428710938, 'Name': 'Suggestive', 'ParentName': ''}]}, {'Timestamp': 14500.0, 'ModerationLabel': [{'Confidence': 99.3197250366211, 'Name': 'Barechested Male', 'ParentName': 'Suggestive'}, {'Confidence': 99.3197250366211, 'Name': 'Suggestive', 'ParentName': ''}]}, {'Timestamp': 15000.0, 'ModerationLabel': [{'Confidence': 98.54800415039062, 'Name': 'Barechested Male', 'ParentName': 'Suggestive'}, {'Confidence': 98.54800415039062, 'Name': 'Suggestive', 'ParentName':

# Step 4: Clean up <a id="step4"></a>
Clean up sampled images in S3 bucket

In [10]:
prefix_to_be_deleted = list(set(list_temp_s3_prefix))
for pf in prefix_to_be_deleted:
    for obj in data_bucket.objects.filter(Prefix=pf):
        s3.delete_object(Bucket=bucket_name, Key=obj.key)