In [None]:
# Copyright 2024 Google LLC

# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at

#     https://www.apache.org/licenses/LICENSE-2.0

# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

In this Colab, we intend to create YouTube Shorts from a long-form VOD video on YouTube using YouTube Analytics and Artificial Intelligence on the video !

We will use the  [YouTube Analytics API](https://developers.google.com/youtube/analytics) to find the most engaging segments of the video and then on each segment of the video that is most likely to perform, we will create a vertically framed version of it using [Video Intelligence API](https://cloud.google.com/video-intelligence/docs/), the Google Cloud Platform service for AI on Videos.

So just read the text cells, provide your inputs, execute all the code cells and get ready to see some magic Shorts.

**Prerequisite**: In order to execute this colab you need to have a CMS. A Content Manager System is a web-based tool for partners who manage content and rights on YouTube. A Content Manager account owns one or more YouTube channels and the assets associated with them. Also known as Studio Content Manager. Here is the [full documentation](https://support.google.com/youtube/answer/6301172?hl=en&sjid=16179625971008095581-EU). You can go to your CMS by going to [the studio page](https://studio.youtube.com/).


# Shorts proposal from Long form video



First, let's complete some preparation steps to set up your project.

## 0. Preparation



### a. Install Relevant Packages

We will use a few packages for this Colab: MoviePy for video editing and the Video Intelligence API for object detection on frames. When executing the installation command lines, you might need to restart the session. If so, you can skip the installation step once the session has restarted.

In [None]:
# useful install
!pip install moviepy
!pip install --upgrade google-cloud-videointelligence

###b. Set up all needed elements for the Colab


 **Step 0: Get Video URL and mp4 file**



* Get the video URL you want to analyse. For example: https://www.youtube.com/watch?v=a1b2c3d4



**Step 1: Create a Google Cloud Platform Project**

* Create a Google Cloud Platform project if don't already have one. Follow instructions [here](https://developers.google.com/workspace/guides/create-project) on how to set that up.

**Step 2: Enable APIs**

* Enable the YouTube Analytics API [here](https://console.cloud.google.com/marketplace/product/google/youtubeanalytics.googleapis.com)

* Enable the Video Intelligence API [here](https://console.cloud.google.com/marketplace/product/google/videointelligence.googleapis.com)

**Step 3: Create a Google Cloud Platform bucket**

You need a Google Cloud Platform Storage bucket for your project. Follow the tutorial [here](https://cloud.google.com/storage/docs/creating-buckets) to create Google Cloud Platform buckets.
Once this is done you can create 3 folders within your Google Cloud Platform bucket:
- one called `input`: this one will contain the video(s) you want to process
- one called `temp`: this one will contain all your temporary files.
- one called `output`': this one will contain all the created Shorts from the longform video.

**Step 4: Create a Service account for authentication to the YouTube Analytics API**

* Create a service account in the IAM section and a service account key for it using the [following tutorial](https://cloud.google.com/iam/docs/keys-create-delete).
* Get the email from the service account key. It ends with iam.gserviceaccount.com and can be found on your credentials page clicking on the key you created in the Details > e-mail section.

* Then generate a JSON key file for this service account using the tutorial [here](https://developers.google.com/workspace/guides/create-credentials#create_credentials_for_a_service_account) (Section: Create Credentials for service account)

* Add your service account email as an admin in your CMS using the [following tutorial](https://support.google.com/youtube/answer/4524878).


### c. Set all necessary variables

Please enter below the different parameters:


1. Your Google Cloud Platform Project ID. This can be found on the home page of the [Google Cloud Console](https://console.cloud.google.com/).

2. Your Google Cloud Bucket name: this is the name of the bucket created in step 3 above (which contains your 3 folders: 'input', 'output', 'temp').

3. The video id: this is what comes after `watch?v=` in your video URL. Example: for the video https://www.youtube.com/watch?v=XXXXXXXX it will be XXXXXXXX.




In [None]:

global GOOGLE_CLOUD_PROJECT_ID
global CONTENT_OWNER_ID
global BUCKET_NAME
global VIDEO_ID

# Google Cloud Project ID

GOOGLE_CLOUD_PROJECT_ID = ''  #@param {type:"string"}

# Google Cloud Bucket name
BUCKET_NAME = ''  #@param {type:"string"}


# Content owner ID
CONTENT_OWNER_ID = ''  #@param {type:"string"}

# Video ID
VIDEO_ID = '' #@param {type:"string"}

### d. Import all relevant python packages

In [None]:
# Useful Imports
# Standard packages

import numpy as np
import re
import os
import cv2
import pandas as pd
from tqdm import tqdm
import json
from datetime import date, timedelta
import io
#Google clients

from googleapiclient import discovery
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import drive
import google.api_core.exceptions as gcs_exceptions

from googleapiclient.discovery import build
from google.colab import drive
from google.colab import files
from google.oauth2 import service_account

from google.cloud import videointelligence
from google.cloud import storage


# Installed packages
import moviepy.editor as mov



# 1. Get the best performing segments of the video using YouTube Analytics

In this section, we are going to create Shorts version of longform videos using the Analytics API relativeRetentionPerformance metric.

The [YouTube Analytics API](https://developers.google.com/youtube/analytics) is the API that allows you to get metrics on your videos when you are authenticated. You can get metrics like watch time, views etc.

To explain further, the [relative Retention Performance metric](https://developers.google.com/youtube/analytics/metrics#Audience_Retention_Metrics) attributes a value between 0 and 1 for all segments of a given video (the video is split in 100 equal time segments).

Those values correspond to how well the video performs compared to videos of the same length. So basically if the value is higher than 0.5, its already quite good. But what if all segments of your video have a value higher than 0.5?

That's great, but you do not want to make 10 shorts out of your video, right? This why we will get the segments where RelativeRetentionPerformance is the highest within each video, which are considered **the most engaging segments** from your video.

In this section we will:

a. Upload your video to your Google Cloud bucket (created above) as an mp4 file.

b. Authenticate to the YT Analytics API using your service account and get the RelativeRetentionPerformance metric for your video.

c. Get the best performing segments of your video.

d. Create the resulting video clips we will use to create YouTube Shorts videos.

##a. Upload the longform video to your bucket

In your YouTube Studio, download the mp4 version of your video using the [following tutorial](https://support.google.com/youtube/answer/56100).

Then go back to your Google Cloud Platform project, open the 'input' folder that you created in your  Google Cloud Platform bucket and upload the video you just downloaded in the bucket. Copy the name of your file and paste it below.

In [None]:
global VIDEO_FILE
VIDEO_FILE = ""  #@param {type:"string"}


##b. Authentication to the YouTube Analytics API and get the RelativeRetentionPerformance data

The code below will allow you to authenticate to the YouTube Analytics API and query the relativeRetentionPerformance data for your video.

You will need to have completed the preparation step 4 of the previous section. This will allow to authenticate within the Colab environment as your service account which has rights on the CMS (granted in the previous steps) and which has rights to make calls on the API

The authentication process gives rights to the following scope:
- yt-analytics-monetary.readonly


When you run the next cell it will ask you for the json key you created and downloaded locally in the preparation step. Upload the file located locally to authenticate to the API with this service account.

In [None]:
"""YouTube Analytics API Authentication"""

SCOPES = [
          'https://www.googleapis.com/auth/yt-analytics.readonly'
          ]


API_SERVICE_NAME = 'youtubeAnalytics'
API_VERSION = 'v2'


# Authorize the request and store authorization credentials.
def get_authenticated_service():
  service_account_upload = files.upload()
  service_account_js = json.loads(
      next(iter(service_account_upload.values()))
  )
  credentials = service_account.Credentials.from_service_account_info(
        service_account_js, scopes=SCOPES)
  print('Success! You are now authenticated to the YouTube Analytics API')
  return build(API_SERVICE_NAME, API_VERSION, credentials = credentials)


youtube_analytics = get_authenticated_service()

Now, let's get the relativeRetentionPerformance data for the video. This can be obtained using the query getting the metric based on elapsedTimeRatio on specific dates.

The relativeRetentionPerformance is calculated on a video for a definite set of time taking the performance of the video over this time frame.We prefer to consider all time performance here in order to avoid any seasonal effect or time-based bias, so we will take the largest time frame possible for each video.

As the freshness of the data is 48 hours ([documentation](https://developers.google.com/youtube/reporting/v1/reports)), we will choose the startDate as the first date a [video](https://www.youtube.com/watch?v=jNQXAC9IVRw) was ever published to YouTube  and the endDate as 2 days ago to have the latest performance.

The elapsedTimeRatio dimension breaks down the results, showing how well different percentages of your video retain viewers (for example, the first 1% of the video, the second 2%, and so on).


In [None]:
# Calculate the D-2 Date which is the latest the Analytics will provide
two_days_ago = date.today() - timedelta(days=2)

# Format the date as a string in the 'YYYY-mm-dd' format
formatted_date = two_days_ago.strftime('%Y-%m-%d')

def execute_api_request(client_library_function, **kwargs):
  response = client_library_function(
    **kwargs
  ).execute()
  return response



relative_retention_performance = execute_api_request(
      youtube_analytics.reports().query,
      filters=f'video=={VIDEO_ID}',
      dimensions='elapsedVideoTimeRatio',
      ids=f'contentOwner=={CONTENT_OWNER_ID}',
      startDate='2005-04-23',
      endDate= formatted_date,
      metrics='relativeRetentionPerformance'
  )
print('Gathered the relativeRetentionMetrics for this video')

## c. Get the best segments

The relative retention performance shows the performance of each segment.\
The output from the query above will be a list that will look like: \

`[[1st segment of the video, 0.75],[2nd segment of the video, 0.62]....,[100th segment of the video, 0.81]]`

We'll now isolate the top 5% performing segments of your video to create engaging Shorts. Since this data isn't grouped together, we'll use a bit of code to organize the segments.

This output will look like:

`[[start_time_short_1, end_time_short_1],[start_time_short_2, end_time_short_2]...]`

In [None]:
def top_5_pc_short_segments(relative_retention_performance):
  """Gets the 5% top segments for a given video
     Args::
            - relativeRetentionPerformance
            result of the Analytics query
    [[xth segment of the video,
    RelativeRetentionPerformance on the segment]...]

    Output: "Segments with a start and end time
    [[start_time, end_time]...]"""

  perfs = [l[1] for l in relative_retention_performance['rows']]

  # Sort the floats in descending order
  perfs.sort(reverse=True)

  # Calculate the index of the top 5%
  top_5_percent_index = int(0.05 * len(perfs))
  limit = perfs[top_5_percent_index]
  best_segments_short = [l for l in relative_retention_performance['rows']
                          if l[1] >= limit]
  #  # Create segments with start and end times
  segments = [[round(i[0]-0.01,2),
               round(i[0],2)] for i in best_segments_short]
  aggregated_segments = []

  if segments:
      current_segment = segments[0]
      for segment in segments[1:]:
          # Check adjacency with rounding
          if current_segment[1] == segment[0]:
              # Extend the current segment
              current_segment[1] = segment[1]
          else:
              # Append the completed segment
              aggregated_segments.append(current_segment)
              # Start a new segment
              current_segment = segment

      # Append the final segment
      aggregated_segments.append(current_segment)

  return limit, aggregated_segments

limit, best_segments = top_5_pc_short_segments(
    relative_retention_performance
    )
print(f'Best performing segments: {best_segments}')

## d. Create Video clips from those segments

The section below will create video clips from the segments above. They will all be marked as temporary files in your 'temp' folder of your Google Cloud Platform bucket.

**About the duration of your clips**

From the section above we got the best segments but what if those add up to just a few seconds ?
In order to prevent that we propose having a variable of min short duration, to set your minimum desired duration for your Short. We will manually add some padding time to make sure your segment reaches the minimum duration.
The segments will also be trimmed if they are longer than 1mn

In [None]:
global MIN_SHORT_DURATION_S
MIN_SHORT_DURATION_S = 0  #@param {type:"number"}


First we will start by reading the video from your file location URI on Google Cloud Storage

The cell below will prompt you to authenticate your project. It will ask you to click on a link below, which opens in a new tab. This will prompt you to choose a Google account to authenticate with. After choosing the one which has access to your Google Cloud Platform project then you'll need to copy the code on the page and paste it below where it says "Enter Authorization Code".

The authentication process gives rights to the following scopes:

* cloud-platform.readonly


Technically, scopes define certain access permissions within your Google Cloud Platform project, and it virtually creates credentials in a JSON file, which are used to authenticate to the API services.

In order to execute the following cells you will need storage.objects.create access to Google Cloud Storage. You can add yourself as storage object creator role in the bucket.

In [None]:
!gcloud config set project $GOOGLE_CLOUD_PROJECT_ID
!gcloud auth application-default login \
    --scopes='https://www.googleapis.com/auth/cloud-platform'

The cells below are utils to download Cloud Storage files locally and upload local files to a Google Cloud Platform bucket.  We will need 2 functions here to download and upload elements from your bucket on Google Cloud Platform. The full documentation for object comprehension on Storage can be found [here](https://cloud.google.com/storage/docs/objects).

In [None]:
# Utils

def download_blob(bucket_name, source_blob_name, destination_file_name):
    """Downloads a blob from the bucket.
    A blob in GCP is the ID of a GCS object"""

    storage_client = storage.Client()

    bucket = storage_client.bucket(bucket_name)

    blob = bucket.blob(source_blob_name)
    try:
      blob.download_to_filename(destination_file_name)
      print(
          "Downloaded storage object {} "
          "from bucket {} to local file {}."
          .format(
              source_blob_name,
              bucket_name,
              destination_file_name
          )
      )
    except Exception as e:  # Catch any error
      print(f"An error occurred during upload: {e}")



def upload_blob(bucket_name,
                source_file_name,
                destination_blob_name):
    """Uploads a file to the bucket."""

    storage_client = storage.Client()
    bucket = storage_client.bucket(bucket_name)
    blob = bucket.blob(destination_blob_name)

    gen_match_precondition = 0
    try:
      blob.upload_from_filename(
          source_file_name,
          if_generation_match=gen_match_precondition)
      print(f"File {source_file_name} "
            f"uploaded to {destination_blob_name}.")
    # Catch 412 error
    except gcs_exceptions.PreconditionFailed:
      print(f"File {destination_blob_name} already exists in the bucket.")

    except Exception as e:  # Catch any other potential errors
      print(f"An error occurred during upload: {e}")


The code section below creates video clips from the segments calculated in part b. We will use the MoviePy python package to create the videos.
As MoviePy requires local file access, we must first download the video from the bucket, process it locally, create the clips, and then upload them to the Google Cloud Platform bucket.

In [None]:

def create_video_clips(bucket, video_file, segments, min_short_duration):
    """
    Creates video clips based on the
    provided segments from the input video.

    Args:
        video_filename (str): Name of the input video file
                            (including path
                            if not present in the same folder).
        segments (list): A list of segments in the format
                        [[start_percent, end_percent], ...].
        output_folder (str, optional): Name of the output folder.
                                       Defaults to "clips".

    """
    # download locally the file from the bucket
    download_blob(bucket, f'input/{video_file}', video_file)
    video = mov.VideoFileClip(video_file)
    video_duration = video.duration
    for i, segment in enumerate(segments):
        start_time = segment[0] * video_duration
        end_time = segment[1] * video_duration
        duration = end_time - start_time
        # Clip is shorter than 15 seconds
        if duration < min_short_duration :
          padding_amount = min_short_duration - duration
          if start_time - padding_amount/2 < 0:
            # then we take the first min_duration_s as the short
            start_time = 0
            end_time = min_short_duration
          elif end_time + padding_amount/2 >  video_duration :
            # then we take the last min_duration_s as the short
            start_time = video_duration - min_short_duration
            end_time = video_duration
          else:
            #otherwise we can pad equally at start and end
            start_time = start_time - padding_amount/2
            end_time = end_time + padding_amount/2

        if duration>60:
          excess_duration = duration - 60
          # Trim equally from start and end
          trim_amount = excess_duration / 2

          start_time = start_time + trim_amount
          end_time = end_time - trim_amount
        clip = video.subclip(start_time, end_time)
        base_name =  re.sub(r"\.[^.]*$", "", video_file)
        output_file_local = (
                f'temporary_{base_name}_top_segment_{i+1}.mp4'
        )
        clip.write_videofile(output_file_local)

        # upload the file created to bucket

        output_blob = (
            f"temp/temporary_{base_name}_top_segment_{i+1}.mp4"
        )
        upload_blob(bucket, output_file_local, output_blob)


    video.close()
    print('All done!')

# Creation of video Clips
create_video_clips(bucket = BUCKET_NAME,
                  video_file = VIDEO_FILE,
                  segments = best_segments,
                  min_short_duration = MIN_SHORT_DURATION_S)


The result produced here is a collection of videos which represents the best segments of the input video in VOD format in a temp folder of your Google Cloud Platform bucket. The next part will be focused on creating the vertically cropped version of those videos  to make Shorts formats of them.

# 2. Image Analysis with Video Intelligence API

The section below will propose a vertically cropped version of the video segment from the video segment mp4 file(s) created above.

We will use for that Google Video Intelligence API from  Google Cloud Platform. You can check the full documentation of Video Intelligence API [here](https://cloud.google.com/video-intelligence/docs/).



This API is a paid service on Google Cloud Platform. You can find all billing information [here](https://cloud.google.com/video-intelligence/pricing).

Otherwise if you want a solution that is free of charge, you can use [Mediapipe](https://research.google/pubs/mediapipe-a-framework-for-perceiving-and-processing-reality/), a framework released by Google Research that allows to make inference (full paper [here](https://static1.squarespace.com/static/5c3f69e1cc8fedbc039ea739/t/5e130ff310a69061a71cbd7c/1578307584840/NewTitle_May1_MediaPipe_CVPR_CV4ARVR_Workshop_2019.pdf)).

## a. Selection of the segments to crop vertically

 Check on your bucket the segments created and enter below the segment for which you want to pursue the study. All the segments created end with `_top_segment_{SEGMENT_NUMBER}.mp4` Choose the segment you want to treat from the list of segments created and input the `SEGMENT_NUMBER` in the cell below.


For example : The first segment will be a file ending with _top_segment_1.mp4 in your temp bucket. If you want to treat _top_segment_1 enter `SEGMENT_NUMBER = 1`.


If you want to produce the resulting vertically cropped Shorts video, re-execute the section below changing the segment number.

In [None]:
global SEGMENT_NUMBER
SEGMENT_NUMBER = 1 #@param {type:"number"}



The code section below allows to have the video as an input file object in order to execute the video intelligence API Object Tracking method on it and visualize the detected objects using Visualizer.

## b. Detect objects within the video and visualize it


The Video Intelligence API allows developers to use Google video analysis technology as part of their applications. The REST API enables users to annotate videos stored locally or in Cloud Storage, or live-streamed, with contextual information at the level of the entire video, per segment, per shot, and per frame.

We will focus on one service here which is Object Tracking (documentation [here](https://cloud.google.com/video-intelligence/docs/feature-object-tracking) ). Object Tracking allows you to see in a video the objects that appear and for each object all the frames for the video on which they appear, their timestamp on the video and the bounding boxes.
We will provide a complementary section below based on your result of object tracking a quick tutorial to visualize those bounding boxes on the [Visualizer](https://zackakil.github.io/video-intelligence-api-visualiser/#Object%20Tracking), an OpenSource tool that allows you to visualize the results from the VideoIntelligenceAPI.


The response object is the result of the Video Intelligence API. This result is going to contain all the objects found with all the frames from the video (24 frames per second) where they appear with the bounding boxes. Sounds like a lot of information right ?
In order to make it more visualizable, this step  is just to help you visualize the objects that are detected on the Visualizer.

In [None]:
video_client = videointelligence.VideoIntelligenceServiceClient()
features = [videointelligence.Feature.OBJECT_TRACKING]
base_name =  re.sub(r"\.[^.]*$", "",VIDEO_FILE)
segment = f"temporary_{base_name}_top_segment_" \
          f"{SEGMENT_NUMBER}.mp4"
segment_blob = 'temp/' + segment
segment_detections_blob = f"temp/" \
                          f"temporary_{base_name}_top_segment_" \
                          f"{SEGMENT_NUMBER}.json"
operation = video_client.annotate_video(
        request={
            "features": features,
            "input_uri": f'gs://{BUCKET_NAME}/{segment_blob}',
            "output_uri": f'gs://{BUCKET_NAME}/{segment_detections_blob}'}
    )
print(f'Your video clip is located at :'
      +f'gs://{BUCKET_NAME}/{segment_blob}')
print(f'Your file containing all the objects detect is located at : '
      f'gs://{BUCKET_NAME}/temp/{segment_detections_blob}')


Now you can download the 2 files in the location printed above in your computer and go to the [visualizer](https://zackakil.github.io/video-intelligence-api-visualiser/#Object%20Tracking)

The first file is your video clip (.mp4) and the second one is a JSON file (.json) containing all objects detected by the Video Intelligence API Object Tracker.

Upload the segment detections blob json file in 'your .json' section and your video in the 'your .mp4'

This visualizer helps you see all the objects no matter the confidence level. Play with the confidence threshold to see the most probable objects and select a confidence for which the objects found make sense for you.
Usually 0.8 is a good threshold to make sure we are not considering objects that aren't there, but it might depend on your case.


## d. Create the focus points for each frame using the ObjectTracking results

The following part  function gets for each frame a focus point that will be the horizontal center of the frame.

To achieve frame-by-frame comprehension, we need to restructure the VideoIntelligenceAPI object result. Currently, the VideoIntelligenceAPI output is a list of objects, where each object contains the frames (timestamps) and bounding boxes for object appearances. Instead, we want the center of the biggest object for each video frame (timestamp). This requires reversing the comprehension logic to extract the coordinates for each frame.

In [None]:
global CONFIDENCE_THRESHOLD
CONFIDENCE_THRESHOLD = 0.8 #@param {type:"number"}

In [None]:
def get_frames_from_bbox(object_annotations,
                         width,
                         height,
                         nb_frames,
                         fps,
                         confidence_thr):
  """
  Creates frames and focus points based on the annotations
  from the ObjectTracking results

  Args:
      object_annotations (dict)): Annotations as a dictionnary

      width (int): the width of the video

      height (int): the height of the video

      nb_frames: the number of frames in the video

      confidence_threshold (float): the threshold of objects to keep
                                    in the output
  Output:

      all_frames: a list of lists that contains for every frame
                  all the objects that are detected
                  with a confidence higher than the threshold

      focus_points: a list that contains for each frame
                    the focus point being the center
                    of the biggest bounding box detected in the frame

  """

  all_frames = [[] for i in range(nb_frames)]
  for object_annotation in object_annotations:
    confidence = object_annotation.confidence
    if confidence > confidence_thr:
      object_name = object_annotation.entity.description
      for frame in object_annotation.frames:
        offset = frame.time_offset
        time_frame_s = offset.seconds + offset.microseconds/1e6
        time_frame_abs = int(time_frame_s*fps)
        right = frame.normalized_bounding_box.right
        left = frame.normalized_bounding_box.left
        top = frame.normalized_bounding_box.top
        bottom = frame.normalized_bounding_box.bottom
        appended_object = {'object_name': object_name,
            'confidence': confidence,
            'bbox_left': left*width,
            'bbox_right': right*width,
            'bbox_top': top*height,
            'bbox_bottom': bottom*height,
            'surface': (right-left)*(bottom - top)*width*height
            }
        all_frames[time_frame_abs].append(appended_object)


  focus_points = []
  # go back on the frame dictionnary and select the biggest bounding box
  prev_focus_x = None

  for frame_objects in all_frames :
    if frame_objects:
      biggest_object = max(frame_objects,
                           key=lambda obj: obj["surface"])
      focus_x = (biggest_object['bbox_right']
                + biggest_object['bbox_left'])/2
      focus_points.append(focus_x)
      prev_focus_x = focus_x
    else:
      focus_points.append(prev_focus_x)
  return all_frames, focus_points


The cell below will produce all the focus points by timeframe.


First here we download the segment from the  Google Cloud Platform bucket to read it locally. \

Then, we execute the functions above on this segment video so we get the object annotations, the frames and the focus points from this video.

As the focus points should represent an object you will follow, the focus points should evolve smoothly. However since the bounding box can be a bit unstable you will see below in the resulting plot that usually the focus points are evolving with some glitches. If you create a video from those, the video itself will glitch.


In [None]:
# Load the video: Download the segment as a blob
# from GCP and read it locally

download_blob(BUCKET_NAME,
              segment_blob,
              segment)

cap = cv2.VideoCapture(segment)

# Get video properties for output video setup
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
nb_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
fps = cap.get(cv2.CAP_PROP_FPS)
duration_s = nb_frames/fps
# Get the annotations

response = operation.result()
object_annotations = response.annotation_results[0].object_annotations

all_frames, focus_points = get_frames_from_bbox(
                              object_annotations,
                              width,
                              height,
                              nb_frames,
                              fps,
                              CONFIDENCE_THRESHOLD
                              )

#Print the focus points
pd.Series(focus_points).plot()

The bounding box usually look smooth on the visualizer but from the plotted version of the focus points, you will agree that those focus points don't look super smooth right ?

## e. Smooth the Focus points

The evolution above of the focus points is usually not smooth. Therefore, if we take each center of frame and aggregate cropped images from that, we will have a resulting glitching video, in the sense that the centers might be slightly offset. In order to avoid that we will propose 2 smoothing techniques to avoid glitches

Both techniques use rolling windows to compute a smoothed version of x at each frame.

- **The first method** will generate centers that are **piecewise constant**. What this means is that we are going to take on rolling windows the x coordinate that appears the most. This will work best if your focus point is quite fix on the image.  It will be adapted **if you know your camera is static**.
- **The second method** will generated centers that are **moving linearly**. What this means is that for each window we take the median of the values of x. This method is more adapted **if your video has a moving subjects**


The window size can also be set

- **small value for window size** <24 : This will mean that we will search for the median or static value over 24 frames. This will produce focus points that will be closer to the the actual values. It is most adapted if you see that the focus points have a linear evolution for instance with few glitches.
- **high value for window size** >48: This will mean that we will search for the median or static value over 24 frames. This will produce focus points that will be a lot more static looking. This is most adapted if you know that you have a lot of glitches (a lot of small bumps in the previous graph).

We set the input to 48 which usually produces nice results but if you see that the frames don't move quick enough, you can reduce the value.

In [None]:
def compute_new_focuses(x,method, window_size):
  """ Creates the new focus points smoothed
  according to a method and window size
  Args:
    x (list) : focus points
    method (str) : 'static' or 'median'
    window_size (int): size of the rolling window

  Output:
    focus_points_smoothed : list of new focus points smoothed
  """
  x_reversed = x[::-1]
  if method == 'median':
    focus_points_smoothed = pd.Series(focus_points).rolling(
        window=window_size,
        center=True
    ).median()

    # This will have NaN values
    # on the first window_size//2 and last window_size//2 values
    # We will consider the last 12 values
    # by having windows centered on the left
    focus_points_smoothed_right = pd.Series(focus_points).rolling(
        window=window_size,
        center=False
    ).median()
    #For the first 12 we will use  the the same reasoning
    # and have a median centered on the right
    # To do that we need to reverse the series
    focus_points_smoothed_rev_right = pd.Series(x_reversed).rolling(
        window=window_size,
        center=False
    ).median()
    focus_points_smoothed_left = focus_points_smoothed_rev_right
                                 .iloc[::-1]
                                 .reset_index(drop=True)
  if method == 'static':
    focus_points_smoothed = pd.Series(focus_points).rolling(
        window=window_size,
        center=True
    ).max()

    focus_points_smoothed_right = pd.Series(focus_points).rolling(
        window=window_size,
        center=False
    ).max()
    focus_points_smoothed_rev_right = pd.Series(x_reversed).rolling(
        window=window_size,
        center=False
    ).max()
    focus_points_smoothed_left = focus_points_smoothed_rev_right
                                 .iloc[::-1]
                                 .reset_index(drop=True)
  focus_points_smoothed[0:window_size//2] = \
    focus_points_smoothed_left[0:window_size//2]

  focus_points_smoothed[- window_size//2 + 1 ::] = \
    focus_points_smoothed_right[- window_size//2 + 1 ::]

  return focus_points_smoothed


Choose below the focal method you want to execute and the window size. We highly recommend to play around with different values for focal method and for window_size to see how the curve of the focal points is smoothed out.


In [None]:

WINDOW_SIZE = 72 #@param {type:"string"}
FOCAL_METHOD = "static" # @param ["static", "median"]

The cell below will produce the new focus points smoothed out and also print the old focus points so you can see if the focus points are smoothed enough.

In [None]:
# first non null in the list
if not focus_points[0]:
  # If first focus is null replace with the first one
  focus_points[0] = next(x for x in focus_points if x)
focus_points_smoothed = compute_new_focuses(focus_points,
                                            method = FOCAL_METHOD,
                                            window_size=WINDOW_SIZE)
pd.Series(focus_points).plot()
focus_points_smoothed.plot()

And the focal points are now smoother !


## f. Create the smoothed Shorts video by aggregating the frames

The section below will create a video that will be the aggregation of all the cropped frames centered on smoothed focus points and corresponding to a 9:16 format. \
We will use videoWriter from cv2 for this, which reads the file frame by frame and aggregates all frames in an output file which will be downloaded locally.

In [None]:

# Define the codec and create VideoWriter object
fourcc = cv2.VideoWriter_fourcc(*'MJPG')
# read the video

cropped_no_audio = f'temporary_cropped_{base_name}_top_segment_' \
                   f'{SEGMENT_NUMBER}.mp4'

cap = cv2.VideoCapture(segment)

#Reframe frame by frame the video
frame_index = 0
video_output = None
while True:
  success, raw_image = cap.read()
  if not success:
      break
  # Use the current frame's height
  output_height = raw_image.shape[0]
  # Adjusted width for 9:16 aspect ratio
  output_width = int(output_height * (9/16))& ~1
  timestamp_ms = int(cap.get(cv2.CAP_PROP_POS_MSEC))
  focus_x = round(focus_points_smoothed[frame_index])
  if focus_x - output_width //2 < 0:
    # then we center on the left
    x_start = 0
    x_end = output_width
  elif focus_x + output_width // 2 > raw_image.shape[1]:
    # then we center on the right
    x_end =  raw_image.shape[1]
    x_start = raw_image.shape[1] - output_width
  else:
    # otherwise we can center on focus_x
    x_start = focus_x - output_width // 2
    x_end = focus_x + output_width // 2
  y_start = 0
  y_end = output_height
  cropped_frame = raw_image[y_start:y_end, x_start:x_end]
  if not video_output:  # Check if 'out' has been initialized
      video_output = cv2.VideoWriter(cropped_no_audio,
                             fourcc,
                             fps,
                            (output_width, output_height))
  # Write the cropped frame to the output video
  video_output.write(cropped_frame)
  frame_index+=1
  # Adjust total as needed
  with tqdm(total=len(focus_points), desc="Progress") as pbar:
      pbar.update(frame_index)

  if cv2.waitKey(1) == ord('q'):
      break


# Release resources

cap.release()
video_output.release()  # Release the VideoWriter
cv2.destroyAllWindows()

The created video is an aggregation of frames so there will be no audio. We will use the MovieEditor package that we used above to create the time cropped version to add the audio part of the cropped version into the AI generated Shorts video.

In [None]:
segment_audio = f'temporary_audio_{base_name}_top_segment_' \
                f'{SEGMENT_NUMBER}.aac'
cropped_with_audio = f'{base_name}_top_segment_{SEGMENT_NUMBER}_' \
                     f'final_proposal.mp4'

my_clip = mov.VideoFileClip(segment)

my_clip.audio.write_audiofile(segment_audio, codec = 'aac')

videoclip = mov.VideoFileClip(cropped_no_audio)
duration = videoclip.duration
audioclip = mov.AudioFileClip(segment_audio)
audio_subclip = audioclip.subclip(0, duration)
new_audioclip = mov.CompositeAudioClip([audio_subclip])
videoclip.audio = new_audioclip
videoclip.write_videofile(cropped_with_audio, audio_codec='aac')

print(f"All done! You have now created locally your short proposal: "
      f"{cropped_with_audio}")

Now that we have locally a short with the audio cropped vertically let's upload it to Google Cloud Storage in the output folder !

In [None]:
upload_blob(BUCKET_NAME, cropped_with_audio,
            f'output/{base_name}_top_segment_{SEGMENT_NUMBER}'
            '_final_proposal.mp4')

Congratulations ! You now have a Shorts video in your output folder of your Google Cloud Storage bucket, cropped vertically in a 9:16 format around the focus points and corresponding to the top segments in terms of Relative Retention performance. \
Now you can re execute this section for all the segments found in your video before wrapping up.

# Wrapping up

**Only go to this step once you repeated part 2 for all segments**

Now that everything is done you will have locally a few files created in your folder.
 - the segments: VOD cut at the right time
 - The audio of the segment: ends with .aac
 - the segment cropped vertically without any audio
 - the segment cropped vertically with an audio

The first 3 are temporary files necessary to make the study complete. We add here an optional step so you can delete those to avoid keeping temp files where your Colab is executed.

This part should only be executed once you have treated all the segments looping through part 2.

In [None]:
# Optional clean up step : Delete all the temporary files here

def delete_local_files(b):
    """Deletes files starting with 'temporary_' within a given folder.

    Args:
        folder_path (str): The path to the folder containing the files.
    """

    for filename in os.listdir():
        if filename.startswith('temporary_') \
          or filename.startswith(base_name):
            try:
                os.remove(filename)
                print(f"Deleted: {filename}")
            except OSError as e:
                print(f"Error deleting {filename}: {e}")

# Example usage:
delete_local_files()