<a href="https://colab.research.google.com/github/VijayaJothi24/Python_Project/blob/main/quickstarts/Video_understanding.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##### Copyright 2025 Google LLC.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Video understanding with Gemini

<a target="_blank" href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Video_understanding.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" height=30/></a>

Gemini has from the begining been a multimodal model, capable of analyzing all sorts of medias using its [long context window](https://developers.googleblog.com/en/new-features-for-the-gemini-api-and-google-ai-studio/).

[Gemini 2.0](https://ai.google.dev/gemini-api/docs/models/gemini-v2) and later bring video analysis to a whole new level as illustrated in [this video](https://www.youtube.com/watch?v=Mot-JEU26GQ):


In [None]:
#@title Building with Gemini 2.0: Video understanding
%%html
<iframe width="560" height="315" src="https://www.youtube.com/embed/Mot-JEU26GQ?si=pcb7-_MZTSi_1Zkw" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>

This notebook will show you how to easily use Gemini to perform the same kind of video analysis. Each of them has different prompts that you can select using the dropdown, also feel free to experiment with your own.

You can also check the [live demo](https://aistudio.google.com/starter-apps/video) and try it on your own videos on [AI Studio](https://aistudio.google.com/starter-apps/video).

## Setup

This section install the SDK, set it up using your [API key](../quickstarts/Authentication.ipynb), imports the relevant libs, downloads the sample videos and upload them to Gemini.

Expand the section if you are curious, but you can also just run it (it should take a couple of minutes since there are large files) and go straight to the examples.

### Install SDK

The new **[Google Gen AI SDK](https://ai.google.dev/gemini-api/docs/sdks)** provides programmatic access to Gemini 2.0 (and previous models) using both the [Google AI for Developers](https://ai.google.dev/gemini-api/docs) and [Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/overview) APIs. With a few exceptions, code that runs on one platform will run on both. This means that you can prototype an application using the Developer API and then migrate the application to Vertex AI without rewriting your code.

More details about this new SDK on the [documentation](https://ai.google.dev/gemini-api/docs/sdks) or in the [Getting started](../quickstarts/Get_started.ipynb) notebook.

In [1]:
%pip install -U -q 'google-genai'

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/154.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m154.7/154.7 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
[?25h

### Setup your API key

To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](../quickstarts/Authentication.ipynb) for an example.

In [2]:
from google.colab import userdata

GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')

### Initialize SDK client

With the new SDK you now only need to initialize a client with you API key (or OAuth if using [Vertex AI](https://cloud.google.com/vertex-ai)). The model is now set in each call.

In [3]:
from google import genai
from google.genai import types

client = genai.Client(api_key=GOOGLE_API_KEY)

### Select the Gemini model

Video understanding works best Gemini 2.5 pro model. You can also select former models to compare their behavior but it is recommended to use at least the 2.0 ones.

For more information about all Gemini models, check the [documentation](https://ai.google.dev/gemini-api/docs/models/gemini) for extended information on each of them.


In [4]:
model_name = "gemini-2.5-pro-exp-03-25" # @param ["gemini-1.5-flash-latest","gemini-2.0-flash-lite","gemini-2.0-flash","gemini-2.5-pro-exp-03-25"] {"allow-input":true, isTemplate: true}

### Get sample videos

You will start with uploaded videos, as it's a more common use-case, but you will also see later that you can also use Youtube videos.

In [5]:
# Load sample images
!wget https://storage.googleapis.com/generativeai-downloads/videos/Pottery.mp4 -O Pottery.mp4 -q
!wget https://storage.googleapis.com/generativeai-downloads/videos/Jukin_Trailcam_Videounderstanding.mp4 -O Trailcam.mp4 -q
!wget https://storage.googleapis.com/generativeai-downloads/videos/post_its.mp4 -O Post_its.mp4 -q
!wget https://storage.googleapis.com/generativeai-downloads/videos/user_study.mp4 -O User_study.mp4 -q

### Upload the videos

Upload all the videos using the File API. You can find modre details about how to use it in the [Get Started](../quickstarts/Get_started.ipynb#scrollTo=KdUjkIQP-G_i) notebook.

This can take a couple of minutes as the videos will need to be processed and tokenized.

In [6]:
import time

def upload_video(video_file_name):
  video_file = client.files.upload(file=video_file_name)

  while video_file.state == "PROCESSING":
      print('Waiting for video to be processed.')
      time.sleep(10)
      video_file = client.files.get(name=video_file.name)

  if video_file.state == "FAILED":
    raise ValueError(video_file.state)
  print(f'Video processing complete: ' + video_file.uri)

  return video_file

pottery_video = upload_video('Pottery.mp4')
trailcam_video = upload_video('Trailcam.mp4')
post_its_video = upload_video('Post_its.mp4')
user_study_video = upload_video('User_study.mp4')

Waiting for video to be processed.
Video processing complete: https://generativelanguage.googleapis.com/v1beta/files/u7sxx2ablrih
Waiting for video to be processed.
Waiting for video to be processed.
Video processing complete: https://generativelanguage.googleapis.com/v1beta/files/lqf6rhsj7jiq
Waiting for video to be processed.
Video processing complete: https://generativelanguage.googleapis.com/v1beta/files/2ydqyj1y8vv6
Waiting for video to be processed.
Video processing complete: https://generativelanguage.googleapis.com/v1beta/files/jjzhwgjzvmlq


### Imports

In [7]:
import json
from PIL import Image
from IPython.display import display, Markdown, HTML

# Search within videos

First, try using the model to search within your videos and describe all the animal sightings in the trailcam video.

<video controls width="500"><source src="https://storage.googleapis.com/generativeai-downloads/videos/Jukin_Trailcam_Videounderstanding.mp4" type="video/mp4"></video>

In [9]:
prompt = "For each scene in this video, generate captions that describe the scene along with any spoken text placed in quotation marks. Place each caption into an object with the timecode of the caption in the video."  # @param ["For each scene in this video, generate captions that describe the scene along with any spoken text placed in quotation marks. Place each caption into an object with the timecode of the caption in the video.", "Organize all scenes from this video in a table, along with timecode, a short description, a list of objects visible in the scene (with representative emojis) and an estimation of the level of excitement on a scale of 1 to 10"] {"allow-input":true}

video = trailcam_video # @param ["trailcam_video", "pottery_video", "post_its_video", "user_study_video"] {"type":"raw","allow-input":true}

response = client.models.generate_content(
    model=model_name,
    contents=[
        video,
        prompt,
    ]
)

Markdown(response.text)

```json
[
  {"timecode": [0, 0], "caption": "\"Grrr\""},
  {"timecode": [0, 1], "caption": "In a rocky, leaf-strewn wooded area during the day, a gray fox walks into view from the right."},
  {"timecode": [0, 5], "caption": "A second gray fox follows the first, sniffing the ground."},
  {"timecode": [0, 11], "caption": "The first fox jumps onto a large boulder in the center."},
  {"timecode": [0, 17], "caption": "Now seen in infrared black and white, a mountain lion walks through the same wooded area, sniffing the ground intently."},
  {"timecode": [0, 29], "caption": "The mountain lion stops, shakes its head, looks around, and continues sniffing before walking out of frame to the right."},
  {"timecode": [0, 35], "caption": "In infrared footage at night, two gray foxes are foraging on the ground near a tree."},
  {"timecode": [0, 40], "caption": "One fox lies down and rolls playfully on its back."},
  {"timecode": [0, 44], "caption": "The standing fox approaches the other, and they briefly tussle before the standing one leaps away."},
  {"timecode": [0, 48], "caption": "The fox runs towards the camera, knocking it over."},
  {"timecode": [0, 50], "caption": "The view is restored. It's still night, and infrared shows three gray foxes interacting near the rocks."},
  {"timecode": [0, 55], "caption": "One fox jumps up onto a rock."},
  {"timecode": [1, 1], "caption": "Another fox runs towards the camera, again knocking it over."},
  {"timecode": [1, 5], "caption": "In infrared at night, a mountain lion stands near the rocks, looking up the slope."},
  {"timecode": [1, 8], "caption": "It turns and walks away up the rocky slope."},
  {"timecode": [1, 18], "caption": "At night, infrared shows a mountain lion cub appearing on the rocks, followed by an adult mountain lion walking past the camera."},
  {"timecode": [1, 29], "caption": "Infrared at night shows a bobcat standing near a tree, looking towards the camera."},
  {"timecode": [1, 32], "caption": "The bobcat lowers its head to sniff the ground."},
  {"timecode": [1, 41], "caption": "It lifts its head, looks around, then walks back slightly before pausing again."},
  {"timecode": [1, 51], "caption": "During the day, a large black bear stands in the wooded area, facing the camera."},
  {"timecode": [1, 53], "caption": "The bear turns and walks away to the right."},
  {"timecode": [1, 57], "caption": "In infrared, a mountain lion walks from left to right past the camera."},
  {"timecode": [2, 4], "caption": "During the day, the rear end of a small black bear cub is seen walking away from the camera before disappearing behind bushes."},
  {"timecode": [2, 12], "caption": "A black bear cub forages on the ground."},
  {"timecode": [2, 14], "caption": "A larger black bear approaches from behind it."},
  {"timecode": [2, 17], "caption": "The two bears forage close together before walking away."},
  {"timecode": [2, 23], "caption": "At night, infrared shows a gray fox standing on a ridge overlooking distant city lights."},
  {"timecode": [2, 25], "caption": "The fox lowers its head to sniff the ground."},
  {"timecode": [2, 34], "caption": "A large black bear walks past the camera from right to left."},
  {"timecode": [2, 42], "caption": "In infrared at night, a mountain lion walks along the same ridge, overlooking the city lights, and moves out of frame."},
  {"timecode": [2, 51], "caption": "Infrared at night shows a mountain lion backing up to a tree and scent marking it by spraying."},
  {"timecode": [2, 55], "caption": "The mountain lion turns and sniffs the ground near the base of the tree."},
  {"timecode": [3, 5], "caption": "During the day, an adult black bear stands in the wooded area, looking towards the camera."},
  {"timecode": [3, 11], "caption": "It turns its head, looking around, and opens and closes its mouth, making jaw-popping sounds."},
  {"timecode": [3, 19], "caption": "The bear walks towards the camera, sniffing the ground."},
  {"timecode": [3, 22], "caption": "A light-brown colored black bear cub stands in the woods."},
  {"timecode": [3, 26], "caption": "It lowers its head and begins foraging on the ground."},
  {"timecode": [3, 30], "caption": "Another, slightly darker cub approaches from the right."},
  {"timecode": [3, 32], "caption": "The first cub looks up briefly before continuing to forage."},
  {"timecode": [3, 40], "caption": "The second cub walks past the camera as a third, darker cub approaches the first."},
  {"timecode": [3, 44], "caption": "The two cubs forage near each other."},
  {"timecode": [3, 50], "caption": "One cub sits down and scratches its side with its hind leg."},
  {"timecode": [3, 57], "caption": "The third, darker cub walks past the sitting cub."},
  {"timecode": [4, 1], "caption": "The two lighter cubs follow the darker one, walking away from the camera."},
  {"timecode": [4, 22], "caption": "In infrared at night, a bobcat sits in a clearing, looking at the camera."},
  {"timecode": [4, 24], "caption": "The bobcat gets up, turns, and walks away over a fallen log."},
  {"timecode": [4, 29], "caption": "Infrared at night shows a gray fox appearing from the left."},
  {"timecode": [4, 33], "caption": "It pauses, looking directly at the camera."},
  {"timecode": [4, 36], "caption": "The fox turns slightly, then looks back at the camera before walking away over the log."},
  {"timecode": [4, 44], "caption": "In infrared at night, a gray fox appears, looking at the camera."},
  {"timecode": [4, 47], "caption": "It suddenly turns and runs away quickly, followed briefly by the rear end of another fox."},
  {"timecode": [4, 57], "caption": "At night, infrared shows a mountain lion sniffing the ground near the base of a tree."},
  {"timecode": [5, 4], "caption": "The mountain lion looks up briefly, then turns and walks away to the right."}
]
```

The prompt used is quite a generic one, but you can get even better results if you cutomize it to your needs (like asking specifically for foxes).

The [live demo on AI Studio](https://aistudio.google.com/starter-apps/video) shows how you can postprocess this output to jump directly to the the specific part of the video by clicking on the timecodes. If you are interested, you can check the [code of that demo on Github](https://github.com/google-gemini/starter-applets/tree/main/video).

# Extract and organize text

Gemini can also read what's in the video and extract it in an organized way. You can even use Gemini reasoning capabilities to generate new ideas for you.

<video controls width="400"><source src="https://storage.googleapis.com/generativeai-downloads/videos/post_its.mp4" type="video/mp4"></video>

In [12]:
prompt = "Transcribe the sticky notes, organize them and put it in a table. Can you come up with a few more ideas?" # @param ["Transcribe the sticky notes, organize them and put it in a table. Can you come up with a few more ideas?", "Which of those names who fit an AI product that can resolve complex questions using its thinking abilities?"] {"allow-input":true}

video = post_its_video # @param ["trailcam_video", "pottery_video", "post_its_video", "user_study_video"] {"type":"raw","allow-input":true}

response = client.models.generate_content(
    model=model_name,
    contents=[
        video,
        prompt,
    ]
)

Markdown(response.text)

Okay, here are the transcribed project names from the sticky notes in the video, organized into a table, followed by a few additional ideas based on the existing themes.

**Transcribed Project Names from Sticky Notes**

| Project Name Ideas         |
| :------------------------- |
| Brainstorm: Project Name (Title on board) |
| Convergence              |
| Supernova Echo           |
| Chaos Field              |
| Prometheus Rising        |
| Lunar Eclipse            |
| Astral Forge             |
| Draco                    |
| Lynx                     |
| Chimera Dream            |
| Galactic Core            |
| Canis Major              |
| Comet's Tail             |
| Delphinus                |
| Perseus Shield           |
| Euler's Path             |
| Zephyr                   |
| Titan                    |
| Stellar Nexus            |
| Centaurus                |
| Serpens                  |
| Equilibrium              |
| Chaos Theory             |
| Echo                     |
| Odin                     |
| Leo Minor                |
| Andromeda's Reach        |
| Orion's Belt             |
| Symmetry                 |
| Golden Ratio             |
| Athena's Eye             |
| Phoenix                  |
| Aether                   |
|                           |
| Bayes Theorem            |
| Lyra                     |
| Fractal                  |
| Infinity Loop            |
| Medusa                   |
| Hera                     |
| Athena                   |
| Cerberus                 |
| Riemann's Hypothesis     |
| Chaos Theory (appears again) |
| Taylor Series            |
| Stokes Theorem           |
| Orion's Sword            |
| Vector                   |
| Sagitta                  |
| Pandora's Box            |
|                           |
| Celestial Drift          |
| *Possibly others obscured/partially visible* |

**Additional Brainstormed Ideas (Based on Themes)**

Here are a few more ideas, following the patterns of astronomy, mythology, math/physics concepts, and abstract terms seen on the board:

1.  **Event Horizon:** (Astronomy/Physics) - The boundary around a black hole beyond which nothing can escape.
2.  **Nebula:** (Astronomy) - An interstellar cloud of dust, hydrogen, helium and other ionized gases.
3.  **Möbius Strip:** (Mathematics/Topology) - A surface with only one side and only one boundary.
4.  **Quantum Leap:** (Physics/Figurative) - An abrupt change or significant advance.
5.  **Icarus Flight:** (Mythology) - Referencing ambition and potential risk.
6.  **Cygnus X-1:** (Astronomy) - A well-known galactic X-ray source, believed to be a black hole.
7.  **Oracle Engine:** (Mythology/Technology) - Suggests prediction or deep insight.
8.  **Lagrange Point:** (Celestial Mechanics) - Points in space where gravitational forces produce enhanced regions of equilibrium.

# Structure information

Gemini 2.0 is not only able to read text but also to reason and structure about real world objects. Like in this video about a display of ceramics with handwritten prices and notes.

<video controls width="500"><source src="https://storage.googleapis.com/generativeai-downloads/videos/Pottery.mp4" type="video/mp4"></video>

In [13]:
prompt = "Give me a table of my items and notes" # @param ["Give me a table of my items and notes", "Help me come up with a selling pitch for my potteries"] {"allow-input":true}

video = pottery_video # @param ["trailcam_video", "pottery_video", "post_its_video", "user_study_video"] {"type":"raw","allow-input":true}

response = client.models.generate_content(
    model=model_name,
    contents=[
        video,
        prompt,
    ],
    config = types.GenerateContentConfig(
        system_instruction="Don't forget to escape the dollar signs",
    )
)

Markdown(response.text)

Okay, here are the items and their associated notes from the image, presented in a table.

| Item        | Notes/Details                                  | Price   |
| :---------- | :--------------------------------------------- | :------ |
| Tumblers    | #5 Artichoke double dip, 4"h x 3"d -ish         | \$20    |
| Small bowls | 3.5"h x 6.5"d                                  | \$35    |
| Med bowls   | 4"h x 7"d                                      | \$40    |
| Glaze Note  | #6 gemini double dip, SLOW COOL (Test tile shown) | N/A     |

As you can see, Gemini is able to grasp to with item corresponds each note, including the last one.

# Analyze screen recordings for key moments

You can also use the model to analyze screen recordings. Let's say you're doing user studies on how people use your product, so you end up with lots of screen recordings, like this one, that you have to manually comb through.
With just one prompt, the model can describe all the actions in your video.

<video controls width="400"><source src="https://storage.googleapis.com/generativeai-downloads/videos/user_study.mp4" type="video/mp4"></video>

In [14]:
prompt = "Generate a paragraph that summarizes this video. Keep it to 3 to 5 sentences with corresponding timecodes." # @param ["Generate a paragraph that summarizes this video. Keep it to 3 to 5 sentences with corresponding timecodes.", "Choose 5 key shots from this video and put them in a table with the timecode, text description of 10 words or less, and a list of objects visible in the scene (with representative emojis).", "Generate bullet points for the video. Place each bullet point into an object with the timecode of the bullet point in the video."] {"allow-input":true}

video = user_study_video # @param ["trailcam_video", "pottery_video", "post_its_video", "user_study_video"] {"type":"raw","allow-input":true}

response = client.models.generate_content(
    model=model_name,
    contents=[
        video,
        prompt,
    ]
)

Markdown(response.text)

Here is a summary of the video:

(00:00-00:12) The video opens on a mobile app interface called "My Garden App," displaying a list of plants like Rose, Fern, Cactus, and Monstera, each with a description, price, a "Like" button, and an "Add to Cart" button. The user begins by clicking the "Like" button for the Rose Plant and then the Fern.
(00:13-00:18) The user then adds the Fern to the cart by clicking the "Add to Cart" button, which temporarily changes to "Added!". They proceed to like the Cactus and then add it to the cart as well.
(00:22-00:35) After scrolling down, the user likes the Hibiscus plant and adds it to the cart. They navigate using the bottom tabs, first viewing the "Cart" which lists the Fern, Cactus, and Hibiscus with a total price, and then the "Profile" page showing counts for liked plants and cart items.
(00:36-00:48) Returning to the "Home" screen, the user unlikes the Hibiscus, scrolls down, likes the Snake Plant, and adds the Orchid to the cart.

# Analyze youtube videos

On top of using your own videos you can also ask Gemini to get a video from Youtube and analyze it. He's an example using the keynote from Google IO 2023. Guess what the main theme was?


In [15]:
response = client.models.generate_content(
    model=model_name,
    contents=types.Content(
        parts=[
            types.Part(text="Find all the instances where Sundar says \"AI\". Provide timestamps and broader context for each instance."),
            types.Part(
                file_data=types.FileData(file_uri='https://www.youtube.com/watch?v=ixRanV-rdAQ')
            )
        ]
    )
)

Markdown(response.text)

Okay, here are all the instances where Sundar Pichai says "AI" during his presentation, along with timestamps and context:

1.  **0:29** - "**AI** is having a very busy year."
    *   **Context:** Sundar is welcoming the audience and setting the stage for the keynote, acknowledging the significant recent activity and public interest in Artificial Intelligence.
2.  **0:38** - "Seven years into our journey as an **AI**-first company..."
    *   **Context:** He's reflecting on Google's long-term strategic commitment to integrating AI across the company, positioning them at an "exciting inflection point."
3.  **0:45** - "...make **AI** even more helpful for people, for businesses, for communities, for everyone."
    *   **Context:** Sundar is describing the opportunity presented by the current advancements in AI to make it broadly beneficial and useful.
4.  **0:54** - "We've been applying **AI** to make our products radically more helpful for a while."
    *   **Context:** He's referencing Google's history of using AI to improve its existing product offerings, leading into the discussion of generative AI.
5.  **1:40** - "...more advanced writing features powered by **AI**."
    *   **Context:** Sundar is discussing the evolution of Gmail features, explaining that Smart Compose led to even more sophisticated AI-driven writing assistance within Google Workspace.
6.  **3:02** - "...Since the early days of Street View, **AI** has stitched together billions of panoramic images..."
    *   **Context:** He is introducing Google Maps improvements and explaining how AI was fundamental in creating the Street View experience from the beginning.
7.  **3:14** - "...Immersive View, which uses **AI** to create a high-fidelity representation of a place..."
    *   **Context:** Sundar is describing the technology behind the Immersive View feature in Google Maps, highlighting AI's role in generating detailed 3D models.
8.  **5:08** - "Another product made better by **AI** is Google Photos."
    *   **Context:** He transitions to discussing Google Photos, explicitly stating it's another example of a product enhanced by AI.
9.  **5:15** - "...It was one of our first **AI**-native products."
    *   **Context:** Sundar is emphasizing the deep integration and foundation of AI within Google Photos since its inception in 2015.
10. **5:38** - "**AI** advancements give us more powerful ways to do this."
    *   **Context:** He's explaining that progress in AI enables Google to create more powerful photo editing tools within Google Photos.
11. **5:47** - "...uses **AI**-powered computational photography to remove unwanted distractions."
    *   **Context:** Sundar is describing how the Magic Eraser feature in Google Photos works, attributing its capabilities to AI and computational photography.
12. **5:58** - "...semantic understanding and generative **AI**, you can do much more..."
    *   **Context:** He is introducing the upcoming Magic Editor feature, explaining that it leverages a combination of different AI techniques, including generative AI.
13. **7:40** - "...examples of how **AI** can help you in moments that matter."
    *   **Context:** Sundar is summarizing the previously shown examples (Gmail, Maps, Photos) to illustrate the practical helpfulness of AI in everyday situations.
14. **7:47** - "...deliver the full potential of **AI** across the products you know and love."
    *   **Context:** He is expressing Google's broader ambition to continue integrating and leveraging AI capabilities throughout its product suite.
15. **8:23** - "...making **AI** helpful for everyone is the most profound way we will advance our mission."
    *   **Context:** Sundar is connecting Google's focus on AI directly to its core mission, positioning helpful AI as the key driver for future progress.
16. **8:53** - "...building and deploying **AI** responsibly so that everyone can benefit equally."
    *   **Context:** He is outlining the fourth key aspect of Google's approach to AI, emphasizing the commitment to responsible and ethical development and deployment.
17. **9:02** - "...make **AI** helpful for everyone relies on continuously advancing our foundation models."
    *   **Context:** Sundar is linking the goal of creating helpful AI applications to the necessity of improving the underlying large-scale AI models (foundation models).
18. **11:27** - "It uses **AI** to better detect malicious scripts..."
    *   **Context:** He is describing Sec-PaLM, a fine-tuned version of their PaLM model, explaining its specific application of AI in identifying security threats.
19. **12:15** - "You can imagine an **AI** collaborator that helps radiologists interpret images..."
    *   **Context:** Sundar is illustrating a potential future use case for Med-PaLM 2, showcasing how AI could assist medical professionals.
20. **12:46** - "...journey to bring **AI** in responsible ways to billions of people."
    *   **Context:** He is framing the development of PaLM 2 as part of Google's ongoing, long-term effort to make AI accessible and beneficial globally, while emphasizing responsibility.
21. **12:57** - "...defining **AI** breakthroughs over the last decade..."
    *   **Context:** Sundar is reflecting on the history of AI innovation at Google, crediting the Brain and DeepMind teams for major advancements.
22. **14:09** - "...deeply investing in **AI** responsibility."
    *   **Context:** He is reiterating Google's commitment to safety and ethical considerations as they develop increasingly powerful AI models like Gemini.
23. **15:04** - "...every one of our **AI**-generated images has that metadata."
    *   **Context:** Sundar is stating Google's policy to include metadata in images created by their AI, ensuring transparency about the origin of the content.
24. **15:11** - "...our responsible approach to **AI** later."
    *   **Context:** He is mentioning that James Manyika will delve deeper into Google's responsible AI practices later in the keynote.
25. **15:29** - "...our experiment for conversational **AI**."
    *   **Context:** Sundar is defining Google Bard and its purpose as an experimental platform for exploring and developing conversational AI interfaces.

Once again, you can check the  [live demo on AI Studio](https://aistudio.google.com/starter-apps/video) shows an example on how to postprocess this output. Check the [code of that demo](https://github.com/google-gemini/starter-applets/tree/main/video) for more details.

# Next Steps

Try with you own videos using the [AI Studio's live demo](https://aistudio.google.com/starter-apps/video) or play with the examples from this notebook (in case you haven't seen, there are other prompts you can try in the dropdowns).

For more examples of the Gemini 2.0 capabilities, check the [Gemini 2.0 folder of the cookbook](https://github.com/google-gemini/cookbook/tree/main/gemini-2/). You'll learn how to use the [Live API](../quickstarts/Get_started_LiveAPI.ipynb), juggle with [multiple tools](../quickstarts/Get_started_LiveAPI_tools.ipynb) or use Gemini 2.0 [spatial understanding](../quickstarts/Spatial_understanding.ipynb) abilities.

The [examples](https://github.com/google-gemini/cookbook/tree/main/examples/) folder from the cookbook is also full of nice code samples illustrating creative ways to use Gemini multimodal capabilities and long-context.