## AWS Rekognition and VideoDB - Effortlessly Remove Inappropriate Content from Video
---

This section of our cookbook demonstrates a method for using video analysis to identify sections of inappropriate content, then remove them from video 

🥡 Key components of this technique include::
- **AWS Rekognition API**: The [StartContentModeration](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_StartContentModeration.html) endpoint of the AWS Rekognition API will be used to scan the video and detect inappropriate content.
- **VideoDB**: This tool will be used for storing the video in a database specifically designed for videos. It also aids in extracting clips an removing section of video.


We will collect timestamps where inappropriate content is present in video, then using videodb to filter out inappropriate content,  **without needing to touch video editor, waiting in render queue and instantly playable**


## Setup
---

### Installing Required Packages

To ensure our Python environment has the necessary tools, we need to install following packages:
- boto3: to use aws services such as [S3](https://docs.aws.amazon.com/s3/) and [AWS rekognition api](https://docs.aws.amazon.com/rekognition/)
- pytube: for downloading YouTube Videos.
- VideoDB : to access videodb

In [None]:
!pip install -U boto3 pytube requests videodb

### Helper functions

In [3]:
import requests
import pytube
import os
import datetime
import time
import json

#Downlaods Youtube video
def download_video_yt(youtube_url, output_file="video.mp4"):
    youtube_object = pytube.YouTube(youtube_url)
    video_stream = youtube_object.streams.get_highest_resolution()
    video_stream.download(filename=output_file)
    print(f"Downloaded video to: {output_file}")
    return output_file

## ⚙️ Configuartion
We must set up AWS and the VideoDB api keys.

### 🔗 Setting Up a connection to db

To connect to `VideoDB`, simply create a `Connection` object.

This can be done by either providing your VideoDB API key directly to the constructor or by setting the `VIDEO_DB_API_KEY` environment variable with your API key.

> 💡 Your API key is available in the [VideoDB dashboard](https://console.videodb.io)

In [1]:
from videodb import connect, play_stream
conn = connect(api_key="")

### AWS Configuration

- AWS secrets like `aws_secret_key_id`, `aws_secret_access_key` and `aws_reigon` 
- Ensure your AWS user has access to necessary policies like : `AmazonRekognitionFullAccess` and `AmazonS3FullAccess`

In [None]:
import boto3

aws_access_key_id= "YOUR_AWS_KEY_ID"
aws_secret_access_key = "YOUR_AWS_SECRET_KEY" 
region_name = "YOUR_AWS_REIGON"

bucket_name = "videorekog"

rekognition_client = boto3.client(
    "rekognition",
    aws_access_key_id=aws_access_key_id,
    aws_secret_access_key=aws_secret_access_key,
    region_name=region_name,
)
s3 = boto3.client('s3',
    aws_access_key_id=aws_access_key_id,
    aws_secret_access_key=aws_secret_access_key,
    region_name=region_name,
)

### Downloading media

Our task involves downloading a YouTube video, with a focus on removing any parts that might not be suitable for all audiences, commonly known as Content Moderation.

In this demonstration, we're going to download a 10-minute clip from the TV show "The Breaking Bad," aiming to remove violence, gore, and inappropriate content from the video.

<div style="background-color: #ffffcc; color: black; padding: 10px; border-radius: 5px;">
    <strong>Note:</strong> Please be mindful in selecting your YouTube video, as we are utilizing a premium API service. Opting for a longer video could result in extra charges.
</div>

In [5]:
video_url_yt = "https://www.youtube.com/watch?v=Xa7UaHgOGfM"
video_output = "video.mp4"

download_video_yt(video_url_yt, video_output)


Downloaded video to: video.mp4


'video.mp4'

## Rekognition API Workflow

- Upload a video to S3 Bucket and Start Content moderation using [StartContentModeration](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_StartContentModeration.html)

In [None]:
# Define function to start face search in video
def start_content_moderation(video_path, bucket_name):
    response = rekognition_client.start_content_moderation(
        Video={"S3Object": {"Bucket": bucket_name, "Name": video_path}}
    )

    return response["JobId"]


# Define function to get face search results
def get_content_moderation(job_id):
    wait_for = 5
    pagination_finished = False
    next_token = ""
    response = {
        "ModerationLabels" : []
    }
    while not pagination_finished:
        print(next_token)
        moderation_res = rekognition_client.get_content_moderation(JobId=job_id, NextToken = next_token)
        status = moderation_res["JobStatus"]
        next_token = moderation_res.get("NextToken", "")
        if status == "IN_PROGRESS":
            time.sleep(wait_for)
        elif status == "SUCCEEDED" :
            print(moderation_res)
            if (not next_token):
              pagination_finished = True
            response["ModerationLabels"].extend(moderation_res["ModerationLabels"])
    return response

#Upload Target video to S3 Bucket
s3.create_bucket(Bucket=bucket_name)
s3.upload_file(video_output, bucket_name, video_output)

#Start Content Moderation using Rekognition API 
job_id = start_content_moderation(video_output, bucket_name )
print(job_id)
moderation_res = get_content_moderation(job_id)
print(moderation_res)


### Preparing clips timestamps

The Rekognition API flags moments in a video that are inappropriate, unwanted, or offensive by providing timestamps. Our objective is to consolidate timestamps that belong to the same sequence.

Though the [AWS Segment API](https://docs.aws.amazon.com/rekognition/latest/dg/segment-api.html) offers a method for this, we will employ a more straightforward strategy.

If the gap between two consecutive timestamps is less than a `threshold`, they will be combined into a single continuous scene. To ensure thorough coverage, we'll also introduce a `padding` on both the right and left sides of each scene.

Then, we need to do a compliment operation on video from inappropriate clips to get appropriate and safe content clips.

Feel free to adjust the `threshold` and `padding` settings to optimize the results.

In [20]:
timestamps = []
threshold = 1
padding = 1

for label in moderation_res["ModerationLabels"]:
  timestamp = label["Timestamp"]/1000
  timestamps.append(round(timestamp))

def merge_timestamps(numbers, threshold, padding):
    grouped_numbers = []
    end_last_segment = 0
    current_group = [numbers[0]]

    for i in range(1, len(numbers)):
        # if timestamp is with threshold from previous timestamp, consolidate them under same group
        if numbers[i] - numbers[i-1] <= threshold:
            current_group.append(numbers[i])
            
        # else put last group's end and this group's start in result clips  
        else:
            start_segment = current_group[0] - padding
            end_segment = current_group[-1] + padding
            grouped_numbers.append([end_last_segment, start_segment])
            end_last_segment = end_segment
            current_group = [numbers[i]]

    grouped_numbers.append([end_last_segment, numbers[-1]])
    return grouped_numbers

shots = merge_timestamps(timestamps,threshold=threshold,padding=padding)
print(shots)

[[0, 102], [104, 119], [122, 188], [192, 197], [202, 202], [207, 209], [223, 225], [231, 234], [245, 273], [275, 275], [277, 280], [282, 291], [293, 382], [384, 396], [398, 402], [405, 438], [440, 473], [475, 532], [534, 545], [547, 558]]



### Removing inappropriate content from video Using VideoDB 

The idea behind VideoDB is straightforward: it functions as a database specifically for videos. Similar to how you upload tables or JSON data to a standard database, you can upload your videos to videodb. You can also retrieve your videos through queries, much like accessing regular data from a database.

Additionally, VideoDB enables you to swiftly create clips from your videos, ensuring a ⚡️ process, just like retreiving text data from a db.

For this demo, we'll be uploading our clip from "The Breaking Bad" to `VideoDB`.

Following this, we will compile a master clip composed of smaller segments that depict appropriate contents only (i.e excluding inappropriate portions of clips from video)

In [22]:
video = conn.upload(url=video_url_yt)
stream_link = video.generate_stream(timeline=shots)
play_stream(stream_link)

https://console.dev.videodb.io/player?url=https://dseetlpshk2tb.cloudfront.net/v3/published/manifests/9ec3d1e9-499f-488a-b8b2-f2880f35d1a6.m3u8
