##### Copyright 2024 Google LLC.

In [1]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Gemini API: Prompting with Video

<table align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Video.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
</table>

This notebook provides a quick example of how to prompt Gemini 1.5 Pro using a video file. In this case, you'll use a short clip of [Sherlock Jr.](https://en.wikipedia.org/wiki/Sherlock_Jr.)

In [2]:
!pip install -U -q google-generativeai

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m146.8/146.8 kB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.5/664.5 kB[0m [31m11.0 MB/s[0m eta [36m0:00:00[0m
[?25h

In [1]:
import google.generativeai as genai

### Authentication Overview

**Important:** The File API uses API keys for authentication and access. Uploaded files are associated with the API key's cloud project. Unlike other Gemini APIs that use API keys, your API key also grants access data you've uploaded to the File API, so take extra care in keeping your API key secure. For best practices on securing API keys, refer to Google's [documentation](https://support.google.com/googleapi/answer/6310037).

### Setup your API key

To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) for an example.

In [2]:
import os
GOOGLE_API_KEY=os.environ.get('GOOGLE_API_KEY')

import google.generativeai as genai
genai.configure(api_key=GOOGLE_API_KEY)

## Extract frames

The Gemini API currently does not support video files directly. Instead, you can provide a series of timestamps and image files.

We will extract 1 frame a second from a 10 minute clip of the film [Sherlock Jr.](https://en.wikipedia.org/wiki/Sherlock_Jr.).

Note: You can also [upload your own files](https://github.com/google-gemini/cookbook/tree/main/examples/Upload_files.ipynb) to use.

In [3]:
video_file_name = "https://storage.googleapis.com/generativeai-downloads/data/SherlockJr._10min.mp4"

Use OpenCV to extract image frames from the video at 1 frame per second.

In [5]:
import cv2
import os
import shutil

# Create or cleanup existing extracted image frames directory.
FRAME_EXTRACTION_DIRECTORY = "/content/frames"
FRAME_PREFIX = "_frame"
def create_frame_output_dir(output_dir):
  if not os.path.exists(output_dir):
    os.makedirs(output_dir)
  else:
    shutil.rmtree(output_dir)
    os.makedirs(output_dir)

def extract_frame_from_video(video_file_path):
  print(f"Extracting {video_file_path} at 1 frame per second. This might take a bit...")
  create_frame_output_dir(FRAME_EXTRACTION_DIRECTORY)
  vidcap = cv2.VideoCapture(video_file_path)
  fps = vidcap.get(cv2.CAP_PROP_FPS)
  frame_duration = 1 / fps  # Time interval between frames (in seconds)
  output_file_prefix = os.path.basename(video_file_path).replace('.', '_')
  frame_count = 0
  count = 0
  while vidcap.isOpened():
      success, frame = vidcap.read()
      if not success: # End of video
          break
      if int(count / fps) == frame_count: # Extract a frame every second
          min = frame_count // 60
          sec = frame_count % 60
          time_string = f"{min:02d}:{sec:02d}"
          image_name = f"{output_file_prefix}{FRAME_PREFIX}{time_string}.jpg"
          output_filename = os.path.join(FRAME_EXTRACTION_DIRECTORY, image_name)
          cv2.imwrite(output_filename, frame)
          frame_count += 1
      count += 1
  vidcap.release() # Release the capture object\n",
  print(f"Completed video frame extraction!\n\nExtracted: {frame_count} frames")

extract_frame_from_video(video_file_name)

Extracting https://storage.googleapis.com/generativeai-downloads/data/SherlockJr._10min.mp4 at 1 frame per second. This might take a bit...


## Upload frames using the File API

Once we have the frames extracted, we are ready to upload the frames to the API.

The File API accepts files under 2GB in size and can store up to 20GB of files per project. Files last for 2 days and cannot be downloaded from the API.

We will just upload 10 frames so this example runs quickly. You can modify the code below to upload the entire video.

In [7]:
import os

class File:
  def __init__(self, file_path: str, display_name: str = None):
    self.file_path = file_path
    if display_name:
      self.display_name = display_name
    self.timestamp = get_timestamp(file_path)

  def set_file_response(self, response):
    self.response = response

def get_timestamp(filename):
  """Extracts the frame count (as an integer) from a filename with the format
     'output_file_prefix_frame00:00.jpg'.
  """
  parts = filename.split(FRAME_PREFIX)
  if len(parts) != 2:
      return None  # Indicates the filename might be incorrectly formatted
  return parts[1].split('.')[0]

# Process each frame in the output directory
files = os.listdir(FRAME_EXTRACTION_DIRECTORY)
files = sorted(files)
files_to_upload = []
for file in files:
  files_to_upload.append(
      File(file_path=os.path.join(FRAME_EXTRACTION_DIRECTORY, file)))

# Upload the files to the API
# Only upload a 10 second slice of files to reduce upload time.
# Change full_video to True to upload the whole video.
full_video = False

uploaded_files = []
print(f'Uploading {len(files_to_upload) if full_video else 10} files. This might take a bit...')

for file in files_to_upload if full_video else files_to_upload[40:50]:
  print(f'Uploading: {file.file_path}...')
  response = genai.upload_file(path=file.file_path)
  file.set_file_response(response)
  uploaded_files.append(file)

print(f"Completed file uploads!\n\nUploaded: {len(uploaded_files)} files")

Uploading 10 files. This might take a bit...
Uploading: /content/frames/SherlockJr__10min_mp4_frame00:40.jpg...
Uploading: /content/frames/SherlockJr__10min_mp4_frame00:41.jpg...
Uploading: /content/frames/SherlockJr__10min_mp4_frame00:42.jpg...
Uploading: /content/frames/SherlockJr__10min_mp4_frame00:43.jpg...
Uploading: /content/frames/SherlockJr__10min_mp4_frame00:44.jpg...
Uploading: /content/frames/SherlockJr__10min_mp4_frame00:45.jpg...
Uploading: /content/frames/SherlockJr__10min_mp4_frame00:46.jpg...
Uploading: /content/frames/SherlockJr__10min_mp4_frame00:47.jpg...
Uploading: /content/frames/SherlockJr__10min_mp4_frame00:48.jpg...
Uploading: /content/frames/SherlockJr__10min_mp4_frame00:49.jpg...
Completed file uploads!

Uploaded: 10 files


## List Files

After uploading the file, you can verify the API has successfully received the files by calling `files.list`.

`files.list` lets you see all files that have been uploaded to the File API that are associated with the Cloud project your API key belongs to. Only the `name` (and by extension, the `uri`) are unique.

In [None]:
# List files uploaded in the API
for n, f in zip(range(len(uploaded_files)), genai.list_files()):
  print(f.uri)

https://generativelanguage.googleapis.com/v1beta/files/5ea7iyxfz49g
https://generativelanguage.googleapis.com/v1beta/files/g4bug5rmphkz
https://generativelanguage.googleapis.com/v1beta/files/xt2qoh3x0in5
https://generativelanguage.googleapis.com/v1beta/files/pibycmgydtwl
https://generativelanguage.googleapis.com/v1beta/files/q477xj4wnisv
https://generativelanguage.googleapis.com/v1beta/files/hldcyzswc7yh
https://generativelanguage.googleapis.com/v1beta/files/j0fqkxto51af
https://generativelanguage.googleapis.com/v1beta/files/cpq3lorfi4jm
https://generativelanguage.googleapis.com/v1beta/files/ube15rpb295f
https://generativelanguage.googleapis.com/v1beta/files/60x22ejdo34p


## Generate Content

After the file has been uploaded, you can make `GenerateContent` requests that reference the File API URI.

To understand videos with Gemini 1.5 Pro, provide 2 consecutive `Part`s for each frame: a `text` part with the **timestamp** and `fileData` part with the frame's **image URI**:

```
part { text = "00:00" }
part { fileData = fileData {
  fileUri = "https://generativelanguage.googleapis.com/v1/files/frame-0"
  mimeType = "image/jpeg"
}}
```

In [9]:
# Create the prompt.
prompt = "Describe this video."

# Set the model to Gemini 1.5 Pro.
model = genai.GenerativeModel(model_name="models/gemini-1.5-pro-latest")

# Make GenerateContent request with the structure described above.
def make_request(prompt, files):
  request = [prompt]
  for file in files:
    request.append(file.timestamp)
    request.append(file.response)
  return request

# Make the LLM request.
request = make_request(prompt, uploaded_files)
response = model.generate_content(request,
                                  request_options={"timeout": 600})
print(response.text)

Two men in suits and bowler hats are walking down a dirt road. They are walking in the same direction as the cars that are parked on the side of the road. There are houses and other buildings in the background. The men cross a set of railroad tracks. One of the men opens the door of a boxcar and climbs in, while the other man waves goodbye. In the background, there are oil derricks.


## Delete Files

Files are automatically deleted after 2 days or you can manually delete them using `files.delete()`.

In [10]:
print(f'Deleting {len(uploaded_files)} images. This might take a bit...')
for file in uploaded_files:
  genai.delete_file(file.response.name)
  print(f'Deleted {file.file_path} at URI {file.response.uri}')
print(f"Completed deleting files!\n\nDeleted: {len(uploaded_files)} files")

Deleting 10 images. This might take a bit...
Deleted /content/frames/SherlockJr__10min_mp4_frame00:40.jpg at URI https://generativelanguage.googleapis.com/v1beta/files/60x22ejdo34p
Deleted /content/frames/SherlockJr__10min_mp4_frame00:41.jpg at URI https://generativelanguage.googleapis.com/v1beta/files/ube15rpb295f
Deleted /content/frames/SherlockJr__10min_mp4_frame00:42.jpg at URI https://generativelanguage.googleapis.com/v1beta/files/cpq3lorfi4jm
Deleted /content/frames/SherlockJr__10min_mp4_frame00:43.jpg at URI https://generativelanguage.googleapis.com/v1beta/files/j0fqkxto51af
Deleted /content/frames/SherlockJr__10min_mp4_frame00:44.jpg at URI https://generativelanguage.googleapis.com/v1beta/files/hldcyzswc7yh
Deleted /content/frames/SherlockJr__10min_mp4_frame00:45.jpg at URI https://generativelanguage.googleapis.com/v1beta/files/q477xj4wnisv
Deleted /content/frames/SherlockJr__10min_mp4_frame00:46.jpg at URI https://generativelanguage.googleapis.com/v1beta/files/pibycmgydtwl
Del

## Learning more

The File API lets you upload a variety of multimodal MIME types, including images and audio formats. The File API handles inputs that can be used to generate content with [`model.generateContent`](https://ai.google.dev/api/rest/v1/models/generateContent) or [`model.streamGenerateContent`](https://ai.google.dev/api/rest/v1/models/streamGenerateContent).

The File API accepts files under 2GB in size and can store up to 20GB of files per project. Files last for 2 days and cannot be downloaded from the API.

* Learn more about the [File API](https://github.com/google-gemini/cookbook/blob/main/quickstarts/File_API.ipynb) with the quickstart.

* Learn more about prompting with [media files](https://ai.google.dev/tutorials/prompting_with_media) in the docs, including the supported formats and maximum length.