##### Copyright 2025 Google LLC.

In [41]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Video understanding with Gemini

<a target="_blank" href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Video_understanding.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" height=30/></a>

Gemini has from the begining been a multimodal model, capable of analyzing all sorts of medias using its [long context window](https://developers.googleblog.com/en/new-features-for-the-gemini-api-and-google-ai-studio/).

[Gemini 2.0](https://ai.google.dev/gemini-api/docs/models/gemini-v2) and later bring video analysis to a whole new level as illustrated in [this video](https://www.youtube.com/watch?v=Mot-JEU26GQ):


In [42]:
#@title Building with Gemini 2.0: Video understanding
%%html
<iframe width="560" height="315" src="https://www.youtube.com/embed/Mot-JEU26GQ?si=pcb7-_MZTSi_1Zkw" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>

This notebook will show you how to easily use Gemini to perform the same kind of video analysis. Each of them has different prompts that you can select using the dropdown, also feel free to experiment with your own.

You can also check the [live demo](https://aistudio.google.com/starter-apps/video) and try it on your own videos on [AI Studio](https://aistudio.google.com/starter-apps/video).

## Setup

This section install the SDK, set it up using your [API key](../quickstarts/Authentication.ipynb), imports the relevant libs, downloads the sample videos and upload them to Gemini.

Expand the section if you are curious, but you can also just run it (it should take a couple of minutes since there are large files) and go straight to the examples.

### Install SDK

The new **[Google Gen AI SDK](https://ai.google.dev/gemini-api/docs/sdks)** provides programmatic access to Gemini 2.0 (and previous models) using both the [Google AI for Developers](https://ai.google.dev/gemini-api/docs) and [Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/overview) APIs. With a few exceptions, code that runs on one platform will run on both. This means that you can prototype an application using the Developer API and then migrate the application to Vertex AI without rewriting your code.

More details about this new SDK on the [documentation](https://ai.google.dev/gemini-api/docs/sdks) or in the [Getting started](../quickstarts/Get_started.ipynb) notebook.

In [43]:
%pip install -U -q 'google-genai'

### Setup your API key

To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](../quickstarts/Authentication.ipynb) for an example.

In [44]:
from google.colab import userdata

GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')

### Initialize SDK client

With the new SDK you now only need to initialize a client with you API key (or OAuth if using [Vertex AI](https://cloud.google.com/vertex-ai)). The model is now set in each call.

In [45]:
from google import genai
from google.genai import types

client = genai.Client(api_key=GOOGLE_API_KEY)

### Select the Gemini model

Video understanding works best Gemini 2.5 pro model. You can also select former models to compare their behavior but it is recommended to use at least the 2.0 ones.

For more information about all Gemini models, check the [documentation](https://ai.google.dev/gemini-api/docs/models/gemini) for extended information on each of them.


In [46]:
model_name = "gemini-2.5-pro-preview-05-06" # @param ["gemini-1.5-flash-latest","gemini-2.0-flash-lite","gemini-2.0-flash","gemini-2.5-flash-preview-04-17","gemini-2.5-pro-exp-05-06"] {"allow-input":true, isTemplate: true}

### Get sample videos

You will start with uploaded videos, as it's a more common use-case, but you will also see later that you can also use Youtube videos.

In [47]:
# Load sample images
!wget https://storage.googleapis.com/generativeai-downloads/videos/Pottery.mp4 -O Pottery.mp4 -q
!wget https://storage.googleapis.com/generativeai-downloads/videos/Jukin_Trailcam_Videounderstanding.mp4 -O Trailcam.mp4 -q
!wget https://storage.googleapis.com/generativeai-downloads/videos/post_its.mp4 -O Post_its.mp4 -q
!wget https://storage.googleapis.com/generativeai-downloads/videos/user_study.mp4 -O User_study.mp4 -q

### Upload the videos

Upload all the videos using the File API. You can find modre details about how to use it in the [Get Started](../quickstarts/Get_started.ipynb#scrollTo=KdUjkIQP-G_i) notebook.

This can take a couple of minutes as the videos will need to be processed and tokenized.

In [48]:
import time

def upload_video(video_file_name):
  video_file = client.files.upload(file=video_file_name)

  while video_file.state == "PROCESSING":
      print('Waiting for video to be processed.')
      time.sleep(10)
      video_file = client.files.get(name=video_file.name)

  if video_file.state == "FAILED":
    raise ValueError(video_file.state)
  print(f'Video processing complete: ' + video_file.uri)

  return video_file

pottery_video = upload_video('Pottery.mp4')
trailcam_video = upload_video('Trailcam.mp4')
post_its_video = upload_video('Post_its.mp4')
user_study_video = upload_video('User_study.mp4')

Waiting for video to be processed.
Video processing complete: https://generativelanguage.googleapis.com/v1beta/files/wv3slg4reya5
Waiting for video to be processed.
Waiting for video to be processed.
Video processing complete: https://generativelanguage.googleapis.com/v1beta/files/q9wdrm3uttne
Waiting for video to be processed.
Video processing complete: https://generativelanguage.googleapis.com/v1beta/files/8kukihxnv7xv
Waiting for video to be processed.
Video processing complete: https://generativelanguage.googleapis.com/v1beta/files/pp9gelxpqe09


### Imports

In [49]:
import json
from PIL import Image
from IPython.display import display, Markdown, HTML

# Search within videos

First, try using the model to search within your videos and describe all the animal sightings in the trailcam video.

<video controls width="500"><source src="https://storage.googleapis.com/generativeai-downloads/videos/Jukin_Trailcam_Videounderstanding.mp4" type="video/mp4"></video>

In [56]:
prompt = "For each scene in this video, generate captions that describe the scene along with any spoken text placed in quotation marks. Place each caption into an object with the timecode of the caption in the video."  # @param ["For each scene in this video, generate captions that describe the scene along with any spoken text placed in quotation marks. Place each caption into an object with the timecode of the caption in the video.", "Organize all scenes from this video in a table, along with timecode, a short description, a list of objects visible in the scene (with representative emojis) and an estimation of the level of excitement on a scale of 1 to 10"] {"allow-input":true}

video = trailcam_video # @param ["trailcam_video", "pottery_video", "post_its_video", "user_study_video"] {"type":"raw","allow-input":true}

response = client.models.generate_content(
    model=model_name,
    contents=[
        video,
        prompt,
    ]
)

Markdown(response.text)

```json
[
  {"timecode": "0:00", "caption": "The camera, positioned low to the ground, is quickly obscured by a large, furry animal, possibly a bear, moving past. The animal then clears the frame, revealing a rocky, wooded area. Two gray foxes enter the scene, one in the foreground and another behind it, sniffing the ground. The fox in the foreground walks towards the rocks, and the other follows. One fox climbs onto a large rock, while the other stays on the ground, still sniffing. The fox on the rock then walks further up, and the fox on the ground turns to face the camera briefly before continuing to sniff."},
  {"timecode": "0:16", "caption": "The scene shifts to a black and white, night vision view. A large feline, likely a puma or mountain lion, walks into the frame from the left, sniffing the ground covered in leaves. The animal stops, lifts its head, and looks around before continuing to walk out of the frame to the right."},
  {"timecode": "0:34", "caption": "The scene shifts to a darker, night vision view. Two foxes, with glowing eyes, are in a wooded area. One fox is in the foreground, and the other is further back, partially obscured. They are both sniffing the ground. The fox in the foreground rolls on its back, then sits up. The other fox approaches, and they interact playfully, one pouncing on the other before they both run out of the frame to the right. The camera is then knocked over, showing a brief flash of light and then a blurry view of the ground."},
  {"timecode": "0:50", "caption": "The camera is still on the ground, but the view is clearer. Two foxes with glowing eyes are playing on the rocky terrain, one chasing the other. One fox jumps onto a rock and then down again. The other fox follows, and they continue to play, running around and jumping on and off the rocks. The camera is knocked again, briefly showing a blurry, overexposed image."},
  {"timecode": "1:04", "caption": "A puma or mountain lion stands on a rocky outcrop at night, looking to the left. It then turns and walks down the rocks and out of the frame to the left."},
  {"timecode": "1:18", "caption": "Two pumas or mountain lions are on a rocky slope at night. One is in the foreground, walking towards the camera, and the other is further back, also moving. The one in the foreground walks past the camera and out of the frame to the left. The other puma follows, walking down the slope and also exiting to the left."},
  {"timecode": "1:29", "caption": "A bobcat stands in a wooded area at night, its eyes glowing in the dark. It sniffs the ground, then looks up and around before continuing to sniff the ground and then walking out of the frame to the left."},
  {"timecode": "1:51", "caption": "A large, dark brown bear walks into the frame from the right in a sun-dappled wooded area. It walks towards the camera, then turns and walks out of the frame to the right."},
  {"timecode": "1:56", "caption": "A puma or mountain lion, seen in black and white night vision, walks quickly from right to left across the frame in a wooded area."},
  {"timecode": "2:04", "caption": "The camera is knocked over by a large, furry animal, likely a bear, showing a blurry, close-up view of its fur before settling on a view of the ground. The bear's rear end is visible as it walks away from the camera. Another, smaller bear, likely a cub, follows, sniffing the ground. They both walk out of the frame to the right."},
  {"timecode": "2:23", "caption": "A fox stands on a rocky outcrop at night, overlooking a city skyline with twinkling lights. It sniffs the ground and then looks out at the city."},
  {"timecode": "2:34", "caption": "A large bear walks into the frame from the right, on the same rocky outcrop overlooking the city at night. It walks past the camera and out of the frame to the right."},
  {"timecode": "2:42", "caption": "A puma or mountain lion walks from right to left across the rocky outcrop at night, overlooking the city. It pauses briefly before continuing out of the frame."},
  {"timecode": "2:51", "caption": "A puma or mountain lion is in a wooded area at night, sniffing the ground near a tree. It then looks up and around before continuing to sniff the ground."},
  {"timecode": "3:05", "caption": "A large, dark brown bear stands in a sun-dappled wooded area, facing the camera. It looks around, then opens its mouth and pants, its tongue visible. It continues to look around before turning its head to the side."},
  {"timecode": "3:22", "caption": "A lighter-colored bear stands in a wooded area, sniffing the ground. Another, darker bear approaches from the right and also sniffs the ground."},
  {"timecode": "3:32", "caption": "The two bears are still in the wooded area. The lighter-colored bear looks up and around, then sniffs the ground. The darker bear approaches from the right, sniffs the ground, and then nudges the camera with its nose before walking away to the right. The lighter-colored bear continues to sniff the ground."},
  {"timecode": "3:40", "caption": "Two bears, one lighter and one darker, are in the wooded area, sniffing the ground. The darker bear walks towards the camera, then turns and walks away. The lighter bear follows, and they both exit the frame to the right. A third, smaller bear, likely a cub, enters from the left, sniffs the ground, and then also walks out of the frame to the right."},
  {"timecode": "4:03", "caption": "Three bears, two larger and one smaller, are in the wooded area, sniffing the ground. One of the larger bears walks towards the camera, then turns and walks away. The other two bears follow, and they all exit the frame to the right."},
  {"timecode": "4:22", "caption": "A bobcat sits in a dark, wooded area at night, its eyes glowing. It looks around, then walks a few steps and sits down again, still looking around."},
  {"timecode": "4:30", "caption": "A fox is in a dark, wooded area at night, its eyes glowing. It stands still, looking around, then walks a few steps and stops again, still looking around. It then walks out of the frame to the right. Another fox appears from the left, its eyes also glowing, and walks towards the right before disappearing into the darkness."},
  {"timecode": "4:50", "caption": "A fox walks from right to left in a dark, wooded area at night, its eyes glowing. It then disappears into the darkness."},
  {"timecode": "4:57", "caption": "A puma or mountain lion is in a wooded area at night, sniffing the ground. It looks up and around before walking out of the frame to the right."}
]
```

The prompt used is quite a generic one, but you can get even better results if you cutomize it to your needs (like asking specifically for foxes).

The [live demo on AI Studio](https://aistudio.google.com/starter-apps/video) shows how you can postprocess this output to jump directly to the the specific part of the video by clicking on the timecodes. If you are interested, you can check the [code of that demo on Github](https://github.com/google-gemini/starter-applets/tree/main/video).

# Extract and organize text

Gemini can also read what's in the video and extract it in an organized way. You can even use Gemini reasoning capabilities to generate new ideas for you.

<video controls width="400"><source src="https://storage.googleapis.com/generativeai-downloads/videos/post_its.mp4" type="video/mp4"></video>

In [51]:
prompt = "Transcribe the sticky notes, organize them and put it in a table. Can you come up with a few more ideas?" # @param ["Transcribe the sticky notes, organize them and put it in a table. Can you come up with a few more ideas?", "Which of those names who fit an AI product that can resolve complex questions using its thinking abilities?"] {"allow-input":true}

video = post_its_video # @param ["trailcam_video", "pottery_video", "post_its_video", "user_study_video"] {"type":"raw","allow-input":true}

response = client.models.generate_content(
    model=model_name,
    contents=[
        video,
        prompt,
    ]
)

Markdown(response.text)

Okay, I've transcribed the sticky notes from the whiteboard. There are quite a few, and some themes are emerging (astronomy, mythology, scientific/mathematical concepts).

Here's the list organized into a table:

| Project Name Ideas       | Project Name Ideas      | Project Name Ideas      |
| :----------------------- | :----------------------- | :----------------------- |
| Leo Minor                | Supernova Echo           | Euler's Path            |
| Canis Major              | Chaos Field              | Zephyr                  |
| Andromeda's Reach        | Draco                    | Titan                   |
| Lunar Eclipse            | Lynx                     | Echo                    |
| Convergence              | Delphinus                | Odin                    |
| Stellar Nexus            | Serpens                  | Aether                  |
| Orion's Belt             | Centaurus                | Phoenix                 |
| Lyra                     | Symmetry                 | Cerberus                |
| Fractal                  | Golden Ratio             | Vector                  |
| Chaos Theory             | Infinity Loop            | Orion's Sword           |
| Bayes Theorem            | Medusa                   | Athena                  |
| Riemann's Hypothesis     | Taylor Series            | Hera                    |
| Sagitta                  | Stokes Theorem           | Athena's Eye            |
| Pandora's Box            | Equilibrium              | Chaos Theory (appears again) |
| Celestial Drift          | Perseus Shield           | Echo (appears again)    |
| Astral Forge             | Chimera Dream            |                         |
| Comet's Tail             | Prometheus Rising        |                         |
|                          | Galactic Core            |                         |

---

Here are a few more ideas, trying to blend some of the existing themes or introduce related ones:

1.  **Nebula Prime:** (Astronomy + Importance)
2.  **Quantum Leap:** (Science + Progress)
3.  **Icarus Ascent:** (Mythology + Ambition)
4.  **Valkyrie:** (Mythology - powerful, decisive)
5.  **Mobius Path:** (Mathematics + Continuous/Cyclical)
6.  **Event Horizon:** (Astronomy/Physics - a point of no return, breakthrough)
7.  **Cygnus Core:** (Astronomy + Centrality)
8.  **Oracle Engine:** (Mythology + Prediction/Insight)
9.  **Zenith Point:** (Abstract/Astronomy - Highest point)
10. **Helios Flare:** (Mythology/Astronomy - Bright, powerful)

# Structure information

Gemini 2.0 is not only able to read text but also to reason and structure about real world objects. Like in this video about a display of ceramics with handwritten prices and notes.

<video controls width="500"><source src="https://storage.googleapis.com/generativeai-downloads/videos/Pottery.mp4" type="video/mp4"></video>

In [52]:
prompt = "Give me a table of my items and notes" # @param ["Give me a table of my items and notes", "Help me come up with a selling pitch for my potteries"] {"allow-input":true}

video = pottery_video # @param ["trailcam_video", "pottery_video", "post_its_video", "user_study_video"] {"type":"raw","allow-input":true}

response = client.models.generate_content(
    model=model_name,
    contents=[
        video,
        prompt,
    ],
    config = types.GenerateContentConfig(
        system_instruction="Don't forget to escape the dollar signs",
    )
)

Markdown(response.text)

Okay, here's a table summarizing the items and their notes from the image:

| Item         | Dimensions        | Price | Glaze/Notes               |
| :----------- | :---------------- | :---- | :------------------------ |
| Tumblers     | 4"h x 3"d -ISH    | \$20  | #5 Artichoke double dip   |
| Small bowls  | 3.5"h x 6.5"d     | \$35  | #6 Gemini double dip SLOW COOL |
| Med bowls    | 4"h x 7"d         | \$40  | #6 Gemini double dip SLOW COOL |

As you can see, Gemini is able to grasp to with item corresponds each note, including the last one.

# Analyze screen recordings for key moments

You can also use the model to analyze screen recordings. Let's say you're doing user studies on how people use your product, so you end up with lots of screen recordings, like this one, that you have to manually comb through.
With just one prompt, the model can describe all the actions in your video.

<video controls width="400"><source src="https://storage.googleapis.com/generativeai-downloads/videos/user_study.mp4" type="video/mp4"></video>

In [53]:
prompt = "Generate a paragraph that summarizes this video. Keep it to 3 to 5 sentences with corresponding timecodes." # @param ["Generate a paragraph that summarizes this video. Keep it to 3 to 5 sentences with corresponding timecodes.", "Choose 5 key shots from this video and put them in a table with the timecode, text description of 10 words or less, and a list of objects visible in the scene (with representative emojis).", "Generate bullet points for the video. Place each bullet point into an object with the timecode of the bullet point in the video."] {"allow-input":true}

video = user_study_video # @param ["trailcam_video", "pottery_video", "post_its_video", "user_study_video"] {"type":"raw","allow-input":true}

response = client.models.generate_content(
    model=model_name,
    contents=[
        video,
        prompt,
    ]
)

Markdown(response.text)

Okay, here's a summary of the video:

The video showcases a "My Garden App" interface where users can browse a list of plants. (0:00-0:09) The user interacts by "liking" several plants like the Rose Plant, Fern, Cactus, and Hibiscus, which changes the "Like" button's color to red. (0:10, 0:12, 0:15, 0:21) They then add some of these liked plants (Fern, Cactus, Hibiscus) to their shopping cart; the "Add to Cart" button briefly changes to "Added!" before reverting. (0:13, 0:16, 0:24) The user navigates to the "Cart" section, which correctly displays the added Fern, Cactus, and Hibiscus along with a total price. (0:30-0:32) Finally, they visit the "Profile" page, which accurately shows the count of liked plants and items in the cart. (0:33-0:34)

# Analyze youtube videos

On top of using your own videos you can also ask Gemini to get a video from Youtube and analyze it. He's an example using the keynote from Google IO 2023. Guess what the main theme was?


In [54]:
response = client.models.generate_content(
    model=model_name,
    contents=types.Content(
        parts=[
            types.Part(text="Find all the instances where Sundar says \"AI\". Provide timestamps and broader context for each instance."),
            types.Part(
                file_data=types.FileData(file_uri='https://www.youtube.com/watch?v=ixRanV-rdAQ')
            )
        ]
    )
)

Markdown(response.text)

Okay, here are the instances where Sundar Pichai says "AI" during his opening keynote, along with timestamps and broader context:

1.  **Timestamp:** 0:29
    **Utterance:** "...AI is having a very busy year."
    **Broader Context:** Sundar is starting his keynote, acknowledging the significant attention and development in the field of AI recently, setting the stage for the announcements to come.

2.  **Timestamp:** 0:38
    **Utterance:** "Seven years into our journey as an AI-first company..."
    **Broader Context:** He's highlighting Google's long-term commitment and strategic shift towards being an AI-centric company, emphasizing that they are at an important juncture.

3.  **Timestamp:** 0:45
    **Utterance:** "...opportunity to make AI even more helpful for people..."
    **Broader Context:** Sundar is discussing Google's ambition to leverage AI to enhance its usefulness for individuals, businesses, and communities.

4.  **Timestamp:** 0:54
    **Utterance:** "We've been applying AI to make our products radically more helpful for a while."
    **Broader Context:** He's explaining that AI integration into Google products isn't new and that with generative AI, they are taking the next evolutionary step.

5.  **Timestamp:** 1:40
    **Utterance:** "...advanced writing features powered by AI."
    **Broader Context:** Sundar is discussing the evolution of features in Gmail, from Smart Reply to Smart Compose, and how AI has powered these advancements in writing assistance within Google Workspace.

6.  **Timestamp:** 3:02
    **Utterance:** "Since the early days of Street View, AI has stitched together billions of panoramic images..."
    **Broader Context:** He's explaining the foundational role of AI in Google Maps features like Street View, enabling the creation of explorable digital representations of the world.

7.  **Timestamp:** 3:14
    **Utterance:** "...Immersive View, which uses AI to create a high-fidelity representation of a place..."
    **Broader Context:** Sundar is detailing how AI is used in the Immersive View feature of Google Maps to allow users to experience a digital twin of locations before visiting.

8.  **Timestamp:** 5:08
    **Utterance:** "Another product made better by AI is Google Photos."
    **Broader Context:** He is transitioning to discuss how AI has enhanced Google Photos, introducing it as an example of AI's positive impact on products.

9.  **Timestamp:** 5:15
    **Utterance:** "It was one of our first AI-native products."
    **Broader Context:** Sundar is emphasizing that Google Photos was designed with AI at its core from its inception, allowing for features like photo search by content.

10. **Timestamp:** 5:38
    **Utterance:** "AI advancements give us more powerful ways to do this."
    **Broader Context:** He's referring to the ability to edit and improve photos within Google Photos, highlighting how AI makes these editing tools more effective.

11. **Timestamp:** 5:47
    **Utterance:** "...Magic Eraser... uses AI-powered computational photography to remove unwanted distractions."
    **Broader Context:** Sundar is explaining the technology behind the Magic Eraser feature in Google Photos, specifically citing AI's role.

12. **Timestamp:** 5:58
    **Utterance:** "...using a combination of semantic understanding and generative AI, you can do much more..."
    **Broader Context:** He is introducing the new Magic Editor feature in Google Photos, explaining that it leverages both semantic understanding and generative AI for advanced photo editing.

13. **Timestamp:** 7:40
    **Utterance:** "...examples of how AI can help you in moments that matter."
    **Broader Context:** Sundar is summarizing the product updates (Gmail, Maps, Photos) he just presented as demonstrations of AI's helpfulness.

14. **Timestamp:** 7:47
    **Utterance:** "...deliver the full potential of AI across the products you know and love."
    **Broader Context:** He's reiterating Google's commitment to broadly integrating AI capabilities into its popular products.

15. **Timestamp:** 8:23
    **Utterance:** "Looking ahead, making AI helpful for everyone is the most profound way we will advance our mission."
    **Broader Context:** Sundar is connecting Google's mission to organize the world's information with the goal of making AI universally beneficial.

16. **Timestamp:** 8:53
    **Utterance:** "...by building and deploying AI responsibly so that everyone can benefit equally."
    **Broader Context:** He is outlining one of the four key approaches Google is taking to advance its AI mission, emphasizing responsible development and deployment.

17. **Timestamp:** 9:02
    **Utterance:** "Our ability to make AI helpful for everyone relies on continuously advancing our foundation models."
    **Broader Context:** Sundar is transitioning to discuss the importance of their underlying foundation models (like PaLM) in achieving their AI goals.

18. **Timestamp:** 12:26
    **Utterance:** "It uses AI to better detect malicious scripts..."
    **Broader Context:** He is explaining Sec-PaLM, a specialized version of PaLM 2, and how it uses AI for enhanced security threat detection.

19. **Timestamp:** 12:46
    **Utterance:** "...our decade-long journey to bring AI in responsible ways to billions of people."
    **Broader Context:** Sundar is recapping Google's long-standing efforts in AI and its commitment to responsible development as he introduces PaLM 2.

20. **Timestamp:** 12:57
    **Utterance:** "Looking back at the defining AI breakthroughs over the last decade..."
    **Broader Context:** He is highlighting the contributions of Google Brain and DeepMind to major AI advancements over the past ten years, leading into the announcement of Google DeepMind.

21. **Timestamp:** 14:09
    **Utterance:** "...we are also deeply investing in AI responsibility."
    **Broader Context:** Sundar is emphasizing that alongside developing advanced AI models like Gemini, Google is equally focused on ensuring AI is developed and used responsibly.

22. **Timestamp:** 15:04
    **Utterance:** "...every one of our AI-generated images has that metadata."
    **Broader Context:** He is discussing the importance of identifying synthetically generated content, mentioning that Google's AI-generated images will include metadata for transparency.

23. **Timestamp:** 15:11
    **Utterance:** "...our responsible approach to AI later."
    **Broader Context:** Sundar is referring to a later segment in the keynote where James Manyika will discuss Google's responsible AI practices in more detail.

24. **Timestamp:** 15:29
    **Utterance:** "...our experiment for conversational AI."
    **Broader Context:** He is introducing Bard, Google's conversational AI, and setting the stage for updates on its capabilities and availability.

Once again, you can check the  [live demo on AI Studio](https://aistudio.google.com/starter-apps/video) shows an example on how to postprocess this output. Check the [code of that demo](https://github.com/google-gemini/starter-applets/tree/main/video) for more details.

# Next Steps

Try with you own videos using the [AI Studio's live demo](https://aistudio.google.com/starter-apps/video) or play with the examples from this notebook (in case you haven't seen, there are other prompts you can try in the dropdowns).

For more examples of the Gemini capabilities, check the other guide from the [Cookbook](https://github.com/google-gemini/cookbook/). You'll learn how to use the [Live API](../quickstarts/Get_started_LiveAPI.ipynb), juggle with [multiple tools](../quickstarts/Get_started_LiveAPI_tools.ipynb) or use Gemini 2.0 [spatial understanding](../quickstarts/Spatial_understanding.ipynb) abilities.

The [examples](https://github.com/google-gemini/cookbook/tree/main/examples/) folder from the cookbook is also full of nice code samples illustrating creative ways to use Gemini multimodal capabilities and long-context.