In [None]:
# Copyright 2024 Google LLC

# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at

#     https://www.apache.org/licenses/LICENSE-2.0

# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Introduction

Let's have some fun with AI and YouTube !  
Would you like to be able to create a video recap of all the great things that
happened on your different channels ? Let's use AI to create a nice "recap of
month / year / decade" for your channels.

## Objective
**The goal of this colab** is to create an AI generated video recap of the best
moments that happened on your YouTube channels.  
We will do that by:


1.   Using the YouTube Data API to get the best performing video for each of
     your channels (videos with the most views).
2.   Using the YouTube Analytics API to get the best performing segment of
     each of the videos collected in step 1 (~"most rewatched" segment).
3.   Using the Vertex AI API to find clusters for those videos and create
     sections for our final recap video (things like "You made us laugh",
     "You made us learn new things", etc).
4.   Using the YouTube Player API to put together all those segments, organize
     them in buckets and play our nice recap video.



## Things to know before we start
**Audience for this Colab:**
This Colab is meant to be used by YouTube users with multiple channels - we are
indeed making a recap that will sample only one video per channel (the one with
the most views during the selected period). If you only possess one channel,
the recap will be very short :).

**Disclaimer:**
This colab will not be generating any new content in this recap videos but
rather stringing together content that has already been uploaded to your
channel(s).

<br/>

***

<br/>

Are you ready ? Let's go !!

# STEP 1: Google Cloud Setup

To use Vertex AI, the YouTube Data API and the YouTube Analytics API, we need to create a Google Cloud Project and set it up correctly.

1. First, if it's your first time using Google Cloud, then [create a new free account](https://cloud.google.com/free) and claim your free credits, which will enable you to run this Colab free of charge for countless of hours / days (more info on free credits [here](https://cloud.google.com/pricing/))!
2. Create a [Google Cloud Project](https://developers.google.com/workspace/guides/create-project).
3. When your project is created, you can then go to your [Google Cloud Project Dashboard](https://console.cloud.google.com/home), make sure the right project is selected (otherwise, use the top left dropdown menu to select your newly created project), then copy the ID of your project (Project ID), paste the value below and run the script (by clicking on the play icon that shows when you hover over the script):

In [None]:
global PROJECT_ID
PROJECT_ID = ""  # @param {type:"string"}

print("Project_ID saved!")

4. If this is your first time using Google Cloud, then you may not need to activate billing if you have free credits available to use. Otherwise, make sure billing is activated [here](https://console.cloud.google.com/billing).

5. Enable the following APIs for your project:
- [Vertex AI API](https://console.cloud.google.com/marketplace/product/google/aiplatform.googleapis.com)
- [YouTube Data API](https://console.cloud.google.com/marketplace/product/google/youtube.googleapis.com)
- [YouTube Analytics API](https://console.cloud.google.com/marketplace/product/google/youtubeanalytics.googleapis.com)

6. Follow [this guide](https://developers.google.com/workspace/guides/configure-oauth-consent) to set up your OAuth Consent Screen. Use the following parameters:
  - Select "external" for the user type
  - Add yourself (i.e. your email address) as a test user
  - Select the following scopes: `https://www.googleapis.com/auth/youtube`, `https://www.googleapis.com/auth/yt-analytics.readonly`

7. Follow [this guide](https://developers.google.com/workspace/guides/create-credentials#oauth-client-id) to create OAuth 2.0 Client ID credentials. Use the following parameters:
  - Choose "Web application" for the application type
  - Under "Authorized JavaScript origins" and "Authorized redirect URI", add the following URI: http://localhost
  - Don't forget to download your OAuth Client ID JSON key !

# Step 2: Authenticating to Google Cloud and YouTube using OAuth

## Let's connect this Colab to your Google Cloud Project

1. Connect your Google Cloud Project to this Colab by running the following code snippet (please follow the given instructions and don't forget to hit the "enter" key after copy / pasting your authorization code):

In [None]:
!gcloud config set project $PROJECT_ID
!gcloud auth application-default login --scopes="https://www.googleapis.com/auth/cloud-platform"

## Now, let's Authenticate to YouTube using OAuth

1. First, run the following piece of code to upload your OAuth Client ID JSON key (downloaded in the previous steps):

In [None]:
from google.colab import files

client_secret_file = next(iter(files.upload().keys()))

  2. Authenticate to YouTube via Oauth: run the following code snippet. It will ask you to click on a link. Click it and follow the OAuth flow. At the end of this flow, it will redirect you to an empty page. Copy the url of that page and paste it as requested by the script.

    *Note: the final "empty page" URL is actually supposed to be a redirect URL that redirects to your own service / server. Since there is not a simple way to start servers in Colab, it just redirects to a 404 page, which is why we need to copy and paste that URL to retrieve the OAuth code in the query parameters :)*

In [None]:
from google_auth_oauthlib.flow import Flow
from getpass import getpass
from urllib.parse import parse_qs, urlparse

flow = Flow.from_client_secrets_file(
  client_secret_file,
  scopes=[
    "https://www.googleapis.com/auth/youtube.force-ssl",
    "https://www.googleapis.com/auth/yt-analytics.readonly"
    ],
  redirect_uri="http://localhost")
auth_uri = flow.authorization_url()[0]

oauth_redirect_url = getpass(
  f"Click on this link: {auth_uri} \nThen paste the URL you are redirected to"
  " at the end of the OAuth flow here: ")
oauth_code = parse_qs(urlparse(oauth_redirect_url).query)["code"][0]

flow.fetch_token(code=oauth_code)
credentials = flow.credentials

# Step 3: Making an AI video recap of the best moments that happened on your channels !

Now, let's get to the fun part!

1. First, we need the list of channel IDs from which you'd like to make a recap video. You can find the ID of any channel on YouTube by going to its channel page, click on "more" next to the channel description, scroll down and click "share channel" > "copy channel id". `CHANNEL_IDS` below should be a list of comma separated channel ids, e.g. "UC123ABC, UC456DEF"

    **Important note**: the Google account you used to log via OAuth in previous steps needs to be the primary owner of the channels you'll list here. Alternatively, you can also use the `CONTENT_OWNER_IDS` field below if the channels you listed in `CHANNEL_IDS` are part of on CMS you have access to (only for certain YouTube partners - leave blank if this doesn't apply to you). If not blank, `CONTENT_OWNER_IDS` should be a list of comma separated CO Ids, of the same length of `CHANNEL_IDS` (it specifies for each channel which CO to use), e.g.:
    - CHANNEL_IDS = "UC123ABC, UC456DEF, UC789GHI"
    - CONTENT_OWNER_IDS = "ZYX987, WVU654, RST321"


In [None]:
global CHANNEL_IDS, CONTENT_OWNER_IDS

CHANNEL_IDS = "" # @param {type:"string"}
CHANNEL_IDS = [id.strip() for id in CHANNEL_IDS.split(",")]

CONTENT_OWNER_IDS = "" # @param {type:"string"}
CONTENT_OWNER_IDS =
  [id.strip() for id in CONTENT_OWNER_IDS.split(",")] if CONTENT_OWNER_IDS != ""
  else [None] * len(CHANNEL_IDS)

CHANNELS = [
  {"channelId": cid, "coId": CONTENT_OWNER_IDS[i]}
  for i, cid in enumerate(CHANNEL_IDS)]

print("CHANNEL_IDS & CONTENT_OWNER_IDS saved!")

2. Then, let's select a start date and end date (with a time zone) to only look at videos in a given time frame for each channels

In [None]:
global START_DATE, END_DATE
START_DATE = "" # @param {type:"date"}
END_DATE = "" # @param {type:"date"}

print("START_DATE, and END_DATE saved!")

3. We will now use the the Vertex AI API, the YouTube Data API & the YouTube Analytics API to get the best video in terms of views from each channel that match our dates criterias. Let's first define constants and utils functions we will need for our main script

In [None]:
import re
import google.auth
import os
import time

from datetime import date, datetime, timedelta
from googleapiclient import discovery
from google import genai

video_clustering_prompt = ("You are an assistant task to sort and organize "
  "videos in different buckets. \n"
  "Your task is to find $BUCKET_NUMBER buckets for the list of videos, assign "
  "each videos to each bucket, and name each of the bucket. It's a clustering "
  "exercise: you'll be given the [[VIDEO_ID]] [[TITLE]] and the [[DESCRIPTION]]"
  " of each video has a list, and you need to identify $BUCKET_NUMBER relevant "
  "clusters, name them and assign each video to one cluster. The name of each "
  "bucket / cluster should start with 'You made us'. "
  "Example: 'You made us laugh', "
  "'You made us master new skills', "
  "'You made us witness historic events'."
  "Your final answer should be structured this way: \n"
  "'[Bucket]You made us ... \n"
  "Video_id: 1A \n"
  "Video_id: 2B \n"
  "Video_id: 3C \n"
  "[Bucket]You made us ... \n"
  "Video_id: Z9 \n"
  "Video_id: Y8' \n"
  "etc. \n"
  "IMPORTANT NOTES: It's extremely important that each videos are included "
  "in one bucket and one bucket only. Also, try to make the buckets as even as"
  "possible in terms of number of videos per bucket.\n"
  "Here is the list of videos: \n"
  "$VIDEO_LIST")

# Build YT Data and Analytics API service object
ytdata_service = discovery.build("youtube", "v3", credentials=credentials)
ytanalytics_service = discovery.build(
    "youtubeAnalytics", "v2", credentials=credentials)

# Init VertexAI
cloud_credentials, cloud_project = google.auth.default()
cloud_location = os.environ.get("GOOGLE_CLOUD_REGION", "global")
client = genai.Client(
  vertexai=True, project=cloud_project, location=cloud_location)

def find_video_from_video_list(video_id, videos):
  """Utils function: Find a video in a video list by id"""
  for v in videos:
    if v["videoId"] == video_id:
      return v

  return None

def execute_youtube_api_call(api_method, **kwargs):
  """Utils function: Execute YT API endpoint and catch errors"""
  result = None
  retry = kwargs.pop("retry", False)
  print(
    f"Calling the YouTube API endpoint: {api_method.__name__}, "
    "with the following parameters: ",
    kwargs)
  try:
    result = api_method(**kwargs).execute()
    print(
      f"Called YouTube API endpoint {api_method.__name__} with success !"
    )
  except Exception as e:
    if retry == True:
      print(
        "An error occured while calling the Youtube API endpoint: "
        f"{api_method.__name__}. Error:",
        e)
    else:
      time.sleep(3)
      result = execute_youtube_api_call(
        api_method, **dict(kwargs, **{"retry": True}))

  return result

def get_best_video_for_channel(channel):
  """Get the most viewed video for a channel matching global publish dates"""
  print(f"Fetching best video for channel: {channel}...")

  channel_id = channel["channelId"]
  co_id = channel["coId"]

  video_results = execute_youtube_api_call(
    ytdata_service.search().list,
    part="snippet",
    channelId=channel_id,
    maxResults=50,
    order="viewCount",
    publishedBefore=datetime.strptime(
      END_DATE + " 23:59:59",
      "%Y-%m-%d %H:%M:%S").isoformat() + "Z",
    publishedAfter=datetime.strptime(
      START_DATE + " 00:00:00",
      "%Y-%m-%d %H:%M:%S").isoformat() + "Z",
    type="video",
    onBehalfOfContentOwner=co_id
  )

  if not video_results or not video_results["items"]:
    print(f"WARNING: Could not find any videos for channel {channel_id}")
    return None

  print(f"Successfully fetched video for channel {channel_id} ! ")
  vod_videos = [
    v for v in video_results["items"]
      if v["snippet"]["liveBroadcastContent"] == "none"]
  video = vod_videos[0] or video_results["items"][0]

  return {
      "videoId": video["id"]["videoId"],
      "channelId": channel_id,
      "coId": co_id}

def get_video_details(video):
  """Get details of a video via the YouTube Data API"""
  print(f"Getting video details for video: {video}...")

  video_id = video["videoId"]
  co_id = video["coId"]

  video_details = execute_youtube_api_call(
    ytdata_service.videos().list,
    part="snippet,contentDetails",
    id=video_id,
    onBehalfOfContentOwner=None
  )

  if not video_details or not video_details["items"]:
    print(f"WARNING: Could not find any videos for channel {video_id}")
    return None

  print(f"Successfully fetched video details for video {video_id} ! ")
  video = video_details["items"][0]
  duration = video["contentDetails"]["duration"]
  hours = int((re.findall("([0-9]*)H", duration) or ["0"])[0])
  minutes = int((re.findall("([0-9]*)M", duration) or ["0"])[0])
  seconds = int((re.findall("([0-9]*)S", duration) or ["0"])[0])

  return {
    "videoId": video_id,
    "title": video["snippet"]["title"],
    "description": video["snippet"]["description"],
    "duration": hours * 3600 + minutes * 60 + seconds,
    "thumbnailUrl": video["snippet"]["thumbnails"]["default"]["url"],
    "channelId": video["snippet"]["channelId"],
    "channelTitle": video["snippet"]["channelTitle"],
    "coId": co_id
  }

def find_best_segment_in_video(video):
  """Find the most interesting segment of a given video using the YT Data API"""
  print(f"Finding best segment for video: {video['videoId']}...")

  # Calculate the D-2 Date which is the latest the Analytics will provide
  two_days_ago = date.today() - timedelta(days=2)
  formatted_date = two_days_ago.strftime("%Y-%m-%d")

  video_id = video["videoId"]
  channel_id = video["channelId"]
  co_id = video["coId"]
  video_duration = video["duration"]

  rel_ret_perf = execute_youtube_api_call(
    ytanalytics_service.reports().query,
    filters=f"video=={video_id}",
    dimensions="elapsedVideoTimeRatio",
    ids=f"contentOwner=={co_id}"
      if co_id else f"channel=={channel_id}",
    startDate=START_DATE,
    endDate= formatted_date,
    metrics="relativeRetentionPerformance"
  )

  if not rel_ret_perf or not rel_ret_perf["rows"]:
    print(f"WARNING: Could not find best segment for video: {video_id}")
    return None

  best_perf = sorted(rel_ret_perf["rows"], reverse=True, key=lambda s: s[1])[0]
  best_segment = [
    (best_perf[0] - 0.01) * video_duration,
    best_perf[0] * video_duration]

  # Make sure the segment is max 10 seconds long
  best_segment[1] = min(best_segment[1], best_segment[0] + 10)

  # Make sure the segment is at least 5 seconds long (if possible)
  if best_segment[1] - best_segment[0] < 5:
    best_segment[1] = min(5 + best_segment[0], video_duration)

    if best_segment[1] - best_segment[0] < 5:
      best_segment[0] = max(best_segment[1] - 5, 0)

  print(f"Successfully found best segment for video: {video_id}")

  return best_segment

def pretty_print_videos(videos):
  """Print videos list"""
  video_strings = [
    f"Video ID: {v['videoId']} - Channel ID: {v['channelId']} \n"
    f"Channel title: {v['channelTitle']} \n"
    "Video title: "
    f"{v['title'] if len(v['title']) <= 100 else v['title'][:100] + '...'} \n"
    f"Best segment: {v['bestSegment']} \n"
    f"Best segment url: https://www.youtube.com/embed/{v['videoId']}?"
    f"start={round(v['bestSegment'][0])}"
    f"&end={round(v['bestSegment'][1])}"
    for v in videos
  ]
  print("--------\n" + "\n--------\n".join(video_strings))

def find_clusters_for_videos(videos):
  """Use VertexAI to sort and organize videos in different clusters"""
  print(f"Finding clusters for videos...")

  parameters = {
    "temperature": 0.2,
    "maxOutputTokens": 8192,
  }
  model = "gemini-2.0-flash"

  response = ""
  try:
    bucket_number = min(max(len(CHANNEL_IDS) // 10, 2), 6)
    video_list_str = "".join([
      f"[[VIDEO_ID]] {v['videoId']} "
      f"[[TITLE]] {v['title']} "
      f"[[DESCRIPTION]] {v['description'][:150]}" for v in videos])
    prompt = video_clustering_prompt.replace(
      "$BUCKET_NUMBER", str(bucket_number)).replace(
        "$VIDEO_LIST", video_list_str
      )
    response = client.models.generate_content(
      model=model,
      contents=prompt,
      config=genai.types.GenerateContentConfig(
        **parameters
      )
    )
    print(f"Response from Model: {response.text}")
  except Exception as e:
    print("An error occured while using VertexAI: ", e)
    return []

  # create cluster list from Gemini's answer
  cluster_str_list = filter(
    lambda c: re.search("Video_id", c),
    response.text.split("[Bucket]"))
  clusterlist = [
    {
      "name": re.findall(r"(.*?)\n?Video_id", b)[0].strip(),
      "videos": [v for v in [
        find_video_from_video_list(vid, videos)
        for vid in set(re.findall(r"Video_id:\s?([\w-]+)", b))] if v]
    } for b in cluster_str_list if b
  ]

  # Check if some videos were left out by Gemini and add them in
  # a new generic cluster
  video_ids = [v["videoId"] for v in videos]
  clusters_video_ids = [v["videoId"] for c in clusterlist for v in c["videos"]]
  missing_videos = [
    find_video_from_video_list(vid, videos)
    for vid in list(set(video_ids) - set(clusters_video_ids))]

  if len(missing_videos) > 0:
    generic_cluster_name = "Here's to many more videos to come !"
    print(
      "WARNING: some videos were left out by Gemini.",
      f"Adding these videos in a generic \"{generic_cluster_name}\" cluster...")
    clusterlist.append({"name": generic_cluster_name, "videos": missing_videos})

  print("Succesfully fetched clusters for videos")
  return clusterlist

def sec_to_time(total_seconds):
  """Transform seconds to time. Eg 62 to "00:01:02"."""
  hours = total_seconds // 3600
  remaining_seconds_after_hours = total_seconds % 3600
  minutes = remaining_seconds_after_hours // 60
  seconds = remaining_seconds_after_hours % 60
  return f"{int(hours):02d}:{int(minutes):02d}:{int(seconds):02d}"

def sec_segment_to_time_segment_str(video_segment):
  """Transform a video segment to a time string.
     Eg [62, 66] to 00:01:02 00:01:06"""
  return " ".join([sec_to_time(sec) if sec else "N/A" for sec in video_segment])


def pretty_print_clusters_summary(clusters):
  """Prints clusters - summary only"""
  url = "https://www.youtube.com/watch?v="

  for idx, c in enumerate(clusters):
    print(
      f"\n\n--------------- CLUSTER {idx + 1} ---------------\n\n",
      f"Cluster Name: {c['name']}\n",
      "\n ".join([
        f"{url}{v['videoId']} "
        f"{sec_segment_to_time_segment_str(v['bestSegment'] or [None, None])}"
        for v in c['videos']]))

def pretty_print_clusters_verbose(clusters):
  """Print clusters with full details"""
  for idx, c in enumerate(clusters):
    print(
      f"\n\n--------------- CLUSTER {idx + 1} ---------------\n\n",
      f"Cluster Name: {c['name']}\n")
    pretty_print_videos(c["videos"])

4. We can now run our main script and get our AI generated recap video !

In [None]:
# Let's get the best performing video for each channels in the given timeframe
best_videos = [get_best_video_for_channel(c) for c in CHANNELS]
best_videos = [get_video_details(v) for v in best_videos if v is not None]
best_videos = [
  dict(v, **{"bestSegment": find_best_segment_in_video(v)})
  for v in best_videos]
#pretty_print_videos(best_videos)

clusters = find_clusters_for_videos(best_videos)
pretty_print_clusters_summary(clusters)
#pretty_print_clusters_verbose(clusters)

5. If you want, you can now just use the result above to manually build a nice recap video (and upload it to YouTube !).  
In this Colab, we will rather use the [YouTube Player API](https://developers.google.com/youtube/iframe_api_reference) to automatically play all the video segments one after the other. Run the code below, click on play if the YouTube player does not auto play, and let the YouTube player do it's magic. It will automatically play the videos at the right time, and stop and play the next video in line at the right time as well. Just sit, relax and enjoy this video recap :)  
  
*Pro tips: click on the output action button (square and arrow button) in the top left corner of this code output cell and select "view output fullscreen" for an optimized experience !*

In [None]:
from IPython.display import display, HTML

yt_player_videos = [
  {
    "videoId": v["videoId"],
    "startSeconds": int(v["bestSegment"][0]) if v["bestSegment"] else 0,
    "endSeconds": int(v["bestSegment"][1]) if v["bestSegment"] else 5,
    "clusterName": c["name"],
    "channelTitle": v["channelTitle"],
    "videoTitle": v["title"]
  } for c in clusters for v in c["videos"]]

html_player = """
<h1>YouTube Player</h1>
<h2>loading...</h2>
<h3>--<br/><br/></h3>
<div id="player-container">
  <div id="player"></div>
</div>
<style>
  #player-container {
    position: relative;
    aspect-ratio: 16 / 9;
    width: 80%;
    margin:auto
  }
</style>
<script>
  var VIDEOS = [$VIDEO_LIST];
  var currentVideo;
  var currentPlayerState;
  var player;

  var tag = document.createElement('script');
  tag.src = "https://www.youtube.com/iframe_api";
  var firstScriptTag = document.getElementsByTagName('script')[0];
  firstScriptTag.parentNode.insertBefore(tag, firstScriptTag);

  function onYouTubeIframeAPIReady() {
    player = new YT.Player('player', {
      height: '100%',
      width: '100%',
      playerVars: {
        'autoplay': 1,
        'cc_lang_pref': 'en',
        'cc_load_policy': 1,
        'hl': 'en'
      },
      events: {
        'onReady': onPlayerReady,
        'onStateChange': onPlayerStateChange,
        'onError': onPlayerError
      }
    });
  };

  function loadNextVideo() {
    currentVideo = VIDEOS.shift();
    var titleTag = document.getElementsByTagName('h1')[0];
    var subtitleTag = document.getElementsByTagName('h2')[0];

    titleTag.innerHTML = currentVideo.clusterName;
    subtitleTag.innerHTML =
      currentVideo.channelTitle + " - " + currentVideo.videoTitle;
    player.loadVideoById(currentVideo);
  };

  function onPlayerReady() {
    loadNextVideo();
    resetStatusTag();
  };

  function onPlayerStateChange(event){
    if (currentPlayerState == YT.PlayerState.PLAYING) {
      if (event.data == YT.PlayerState.ENDED) {
        loadNextVideo();
        resetStatusTag();
      }
    }
    currentPlayerState = event.data;
  };

  function onPlayerError(event) {
    var statusTag = document.getElementsByTagName('h3')[0];

    var errorDetails;
    switch (event.data) {
      case 100:
        errorDetails = "Video removed or made private.";
        break;
      case 101:
      case 150:
        errorDetails = "Owner disabled embeds";
        break;
      default:
        errorDetails = "Could not play video";
    }
    statusTag.innerHTML =
      "--<br/>Error for video: " + currentVideo.channelTitle + " - " +
      currentVideo.videoTitle + "<br/>" + "Error details: " + errorDetails;

    loadNextVideo();
  }

  function resetStatusTag() {
    var statusTag = document.getElementsByTagName('h3')[0];
    statusTag.innerHTML = "--<br/><br/>";
  }
</script>
""".replace("$VIDEO_LIST", ",".join(map(str, yt_player_videos)))

display(HTML(html_player))