In [4]:
!pip install -U google-generativeai

Collecting google-generativeai
  Downloading google_generativeai-0.7.0-py3-none-any.whl (163 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m163.1/163.1 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting google-ai-generativelanguage==0.6.5 (from google-generativeai)
  Downloading google_ai_generativelanguage-0.6.5-py3-none-any.whl (717 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m717.3/717.3 kB[0m [31m21.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: google-ai-generativelanguage, google-generativeai
  Attempting uninstall: google-ai-generativelanguage
    Found existing installation: google-ai-generativelanguage 0.6.4
    Uninstalling google-ai-generativelanguage-0.6.4:
      Successfully uninstalled google-ai-generativelanguage-0.6.4
  Attempting uninstall: google-generativeai
    Found existing installation: google-generativeai 0.5.4
    Uninstalling google-generativeai-0.5.4:
      Successfully uninstalled goog

In [18]:
import google.generativeai as genai
import os
from google.colab import userdata
import time

In [25]:
os.environ["GOOGLE_API_KEY"] = userdata.get("GOOGLE_API_KEY")
genai.configure(api_key=os.environ['GOOGLE_API_KEY'])

In [26]:
video_file_name = "sam.mp4"

In [27]:
video_file = genai.upload_file(path=video_file_name)

In [28]:
while video_file.state.name == "PROCESSING":
    print('Waiting for video to be processed.')
    time.sleep(2)
    video_file = genai.get_file(video_file.name)

print(f'Video processing complete: ' + video_file.uri)

Waiting for video to be processed.
Video processing complete: https://generativelanguage.googleapis.com/v1beta/files/p68jd6kcni96


" Context caching " is a feature that allows you to reduce cost and latency by caching input tokens and referencing the cached tokens in subsequent requests.

When you store a token in the cache, you specify the cache duration ( TTL ) for the token. The cost of storing the cache depends on the size of the input token and the duration for which the token is kept.

"Context caching" is supported in both "Gemini 1.5 Pro" and "Gemini 1.5 Flash".

In [29]:
from google.generativeai import caching
import datetime

Use cases for Context caching
Context caching is appropriate when the initial context is referenced repeatedly by short requests.

Chatbots with long system instructions Repetitive analysis of long video files Periodic queries against large document sets Frequent code repository analysis Bug fixing


In [30]:
cache = caching.CachedContent.create(
    model="models/gemini-1.5-flash-001",
    display_name="AI Anytime Video",
    system_instruction="You are an expert video analyzer, and your task is to answer user's query based on the video file you have access to.",
    contents=[video_file],
    ttl=datetime.timedelta(minutes=5),
)

In [31]:
model = genai.GenerativeModel.from_cached_content(cached_content=cache)

In [32]:
response = model.generate_content(
    ["What is this video all about?"]
)

In [33]:
response

response:
GenerateContentResponse(
    done=True,
    iterator=None,
    result=protos.GenerateContentResponse({
      "candidates": [
        {
          "content": {
            "parts": [
              {
                "text": "This video is an interview with Sam Altman, CEO of OpenAI, about his advice on work-life balance in your twenties. He stresses that working hard early in your career is important for long-term success.  He also talks about the importance of enjoying your work and believing in what you're doing to stay motivated and push through the difficult times. \n"
              }
            ],
            "role": "model"
          },
          "finish_reason": "STOP",
          "index": 0,
          "safety_ratings": [
            {
              "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
              "probability": "NEGLIGIBLE"
            },
            {
              "category": "HARM_CATEGORY_HATE_SPEECH",
              "probability": "NEGLIGIBLE"
           

In [34]:
print(response.usage_metadata)

prompt_token_count: 77326
candidates_token_count: 72
total_token_count: 77398
cached_content_token_count: 77318



・The minimum number of input tokens for Context caching is 32,768, and the maximum is the same as the maximum for the specified model.

・If the cache retention time (TTL) is not set, it is 1 hour.

・The model does not distinguish between cached and normal tokens. Cached content is simply prefixed to the prompt.

・The cache service provides a delete operation to manually delete content from the cache.

・In paid versions, there are no special rate or usage limits for Context caching. Standard rate limits apply to GenerateContent, and the token limit includes cached tokens. In free versions, "Gemini 1.5 Flash" has a storage limit of 1 million tokens, and caching is not available in "Gemini 1.5 Pro".

・You cannot retrieve or display cached content, but you can retrieve the metadata (name, display_name, model, and create, update, expire times).

・You can update the display_name and set a new ttl or expire_time. No other changes are supported.

. The number of cached tokens is returned by the create, get, and list operations of the usage_metadata cache service, and also when using GenerateContent caching.