In [1]:
import os
GOOGLE_API_KEY=os.getenv('GOOGLE_GENERATIVE_AI_API_KEY')

### Initialize SDK client

With the new SDK you now only need to initialize a client with you API key (or OAuth if using [Vertex AI](https://cloud.google.com/vertex-ai)). The model is now set in each call.

In [2]:
from google import genai
from google.genai import types

client = genai.Client(api_key=GOOGLE_API_KEY)

In [4]:
MODEL_ID = "gemini-2.5-flash" # @param ["gemini-2.5-flash", "gemini-2.5-pro","gemini-2.0-flash","gemini-2.5-flash-lite-preview-06-17"] {"allow-input":true, isTemplate: true}

In [5]:
import time

def upload_video(video_file_name):
  video_file = client.files.upload(file=video_file_name)

  while video_file.state == "PROCESSING":
      print('Waiting for video to be processed.')
      time.sleep(10)
      video_file = client.files.get(name=video_file.name)

  if video_file.state == "FAILED":
    raise ValueError(video_file.state)
  print(f'Video processing complete: ' + video_file.uri)

  return video_file

pottery_video = upload_video('Pottery.mp4')
trailcam_video = upload_video('Trailcam.mp4')
post_its_video = upload_video('Post_its.mp4')
user_study_video = upload_video('User_study.mp4')

Waiting for video to be processed.
Video processing complete: https://generativelanguage.googleapis.com/v1beta/files/rpk5mjjl6alr


KeyboardInterrupt: 

# Customizing video preprocessing

The Gemini API allows you to define some preprocessing steps to enhance your abilities to understand and extract information from videos.

You can use clipping intervals (or define time offsets to focus on specific video parts) and custom FPS (to define how many frames will be considered to analyze the video.

For more details about those features, you can take a look at the [Customizing video preprocessing](https://ai.google.dev/gemini-api/docs/video-understanding#customize-video-preprocessing) at the Gemini API documentation.

## Analyze specific parts of videos using clipping intervals

Sometimes you want to look for specific parts of your videos. You can define time offsets on your request, pointing to the model which specific video interval you are more interested about.

**Note:** The `video_metadata` that you will inform must be representing the time offsets in seconds.

In this example, you are using this video, from [Google I/O 2025 keynote](https://www.youtube.com/watch?v=XEzRZ35urlk) and asking the model to consider specifically the time offset between 20min50s and 26min10s.

In [6]:
from IPython.display import Markdown

response = client.models.generate_content(
    model=MODEL_ID,
    contents=types.Content(
        parts=[
            types.Part(
                file_data=types.FileData(file_uri='https://www.youtube.com/watch?v=XEzRZ35urlk'),
                video_metadata=types.VideoMetadata(
                    start_offset='1250s',
                    end_offset='1570s',
                    fps=10
                )
            ),
            types.Part(text='Please summarize the video in 3 sentences.')
        ]
    )
)

Markdown(response.text)

ServerError: 500 INTERNAL. {'error': {'code': 500, 'message': 'An internal error has occurred. Please retry or report in https://developers.generativeai.google/guide/troubleshooting', 'status': 'INTERNAL'}}

You can also use clipping intervals for videos uploaded to the File API as also inline videos on your prompts (remembering that inline data cannot exceed 20MB in size).

In [19]:
prompt = "Summarize this video in few short bullets"  # @param ["For each scene in this video, generate captions that describe the scene along with any spoken text placed in quotation marks. Place each caption into an object with the timecode of the caption in the video.", "Organize all scenes from this video in a table, along with timecode, a short description, a list of objects visible in the scene (with representative emojis) and an estimation of the level of excitement on a scale of 1 to 10"] {"allow-input":true}

video = trailcam_video # @param ["trailcam_video", "pottery_video", "post_its_video", "user_study_video"] {"type":"raw","allow-input":true}

response = client.models.generate_content(
    model=MODEL_ID,
    contents=types.Content(
        parts=[
            types.Part(
                file_data=types.FileData(
                    file_uri=video.uri,
                    mimeType=video.mime_type),
                video_metadata=types.VideoMetadata(
                    start_offset='60s',
                    end_offset='120s'
                )
            ),
            types.Part(text=prompt)
        ]
    )
)

Markdown(response.text)

Here are the key events from the trail camera footage:

* **0:00-0:16:** Two grey foxes move around a rocky area.
* **0:17-0:34:** A mountain lion explores a wooded area.
* **0:35-0:50:** Two foxes play/fight, one gets tossed into the air.
* **0:51-1:16:** Two mountain lions, likely a mother and cub, move through a rocky area at night.
* **1:17-1:28:** Two mountain lions walk toward the camera.
* **1:29-1:50:** A bobcat explores a wooded area at night.
* **1:51-2:22:** Two young bears walk toward and investigate the trail camera.
* **2:23-2:51:** A fox and then a bear pass by a scenic overlook with a city lit up in the distance.
* **2:52-3:04:** A mountain lion approaches and scratches at something in the ground.
* **3:05-3:21:** A bear walks toward the trail camera and starts panting.
* **3:22-4:19:** A bear followed by a bear cub are seen walking in the woods, followed by a mountain lion approaching the camera. 
* **4:22-4:56:** A bobcat walks along a log and looks at the trail camera.
* **4:57-5:09:** A mountain lion explores the area, smelling the ground.

## Customize the number of video frames per second (FPS) analyzed

By default, the Gemini API extract 1 (one) FPS to analyze your videos. But this amount may be too much (for videos with less activities, like a lecture) or to preserve more detail in fast-changing visuals, a higher FPS should be selected.

In this scenario, you are using one specific interval of one Nascar pit-stop as also you will capture a higher number of FPS (in this case, 24 FPS).

In [20]:
response = client.models.generate_content(
    model=MODEL_ID,
    contents=types.Content(
        parts=[
            types.Part(
                file_data=types.FileData(file_uri='https://www.youtube.com/watch?v=McN0-DpyHzE'),
                video_metadata=types.VideoMetadata(
                    start_offset='15s',
                    end_offset='35s',
                    fps=24
                )
            ),
            types.Part(text='How many tires where changed? Front tires or rear tires?')
        ]
    )
)

Markdown(response.text)

According to the video, only the tires on the left side of the car were replaced. 

Once again, you can check the  [live demo on AI Studio](https://aistudio.google.com/starter-apps/video) shows an example on how to postprocess this output. Check the [code of that demo](https://github.com/google-gemini/starter-applets/tree/main/video) for more details.

# Next Steps

Try with you own videos using the [AI Studio's live demo](https://aistudio.google.com/starter-apps/video) or play with the examples from this notebook (in case you haven't seen, there are other prompts you can try in the dropdowns).

For more examples of the Gemini capabilities, check the other guide from the [Cookbook](https://github.com/google-gemini/cookbook/). You'll learn how to use the [Live API](../quickstarts/Get_started_LiveAPI.ipynb), juggle with [multiple tools](../quickstarts/Get_started_LiveAPI_tools.ipynb) or use Gemini 2.0 [spatial understanding](../quickstarts/Spatial_understanding.ipynb) abilities.

The [examples](https://github.com/google-gemini/cookbook/tree/main/examples/) folder from the cookbook is also full of nice code samples illustrating creative ways to use Gemini multimodal capabilities and long-context.