In [1]:
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Introduction to Long Context Window with Gemini on Vertex AI

<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/long-context/intro_long_context.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo"><br> Open in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Flong-context%2Fintro_long_context.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo"><br> Open in Colab Enterprise
    </a>
  </td>    
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/gemini/long-context/intro_long_context.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo"><br> Open in Workbench
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/long-context/intro_long_context.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
</table>


| | |
|-|-|
|Author(s) | [Holt Skinner](https://github.com/holtskinner) |

## Overview

Gemini 1.5 Flash comes standard with a 1 million token context window, and Gemini 1.5 Pro comes with a 2 million token context window. Historically, large language models (LLMs) were significantly limited by the amount of text (or tokens) that could be passed to the model at one time. The Gemini 1.5 long context window, with [near-perfect retrieval (>99%)](https://storage.googleapis.com/deepmind-media/gemini/gemini_v1_5_report.pdf), unlocks many new use cases and developer paradigms.

In practice, 1 million tokens would look like:

-   50,000 lines of code (with the standard 80 characters per line)
-   All the text messages you have sent in the last 5 years
-   8 average length English novels
-   Transcripts of over 200 average length podcast episodes
-   1 hour of video
-   ~45 minutes of video with audio
-   9.5 hours of audio

While the standard use case for most generative models is still text input, the Gemini 1.5 model family enables a new paradigm of multimodal use cases. These models can natively understand text, video, audio, and images.

In this notebook, we'll explore multimodal use cases of the long context window.

For more information, refer to the [Gemini documentation about long context](https://ai.google.dev/gemini-api/docs/long-context).

## Tokens

Tokens can be single characters like `z` or whole words like `cat`. Long words
are broken up into several tokens. The set of all tokens used by the model is
called the vocabulary, and the process of splitting text into tokens is called
_tokenization_.

> **Important:** For Gemini models, a token is equivalent to about 4 characters. 100 tokens is equal to about 60-80 English words.

For multimodal input, this is how tokens are calculated regardless of display or file size:

* Images: `258` tokens
* Video: `263` tokens per second
* Audio: `32` tokens per second

## Why is the long context window useful?

The basic way you use the Gemini models is by passing information (context)
to the model, which will subsequently generate a response. An analogy for the
context window is short term memory. There is a limited amount of information
that can be stored in someone's short term memory, and the same is true for
generative models.

You can read more about how models work under the hood in our [generative models guide](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/overview).

Even though the models can take in more and more context, much of the
conventional wisdom about using large language models assumes this inherent
limitation on the model, which as of 2024, is no longer the case.

Some common strategies to handle the limitation of small context windows
included:

-   Arbitrarily dropping old messages / text from the context window as new text
    comes in
-   Summarizing previous content and replacing it with the summary when the
    context window gets close to being full
-   Using RAG with semantic search to move data out of the context window and
    into a vector database
-   Using deterministic or generative filters to remove certain text /
    characters from prompts to save tokens

While many of these are still relevant in certain cases, the default place to start is now just putting all of the tokens into the context window. Because Gemini 1.5 models were purpose-built with a long context window, they are much more capable of in-context learning. This means that instructional materials provided in context can be highly effective for handling inputs that are not covered by the model's training data.

## Getting Started

### Install Vertex AI SDK for Python


In [2]:
%pip install --upgrade --user --quiet google-cloud-aiplatform

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


### Restart runtime

To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which restarts the current kernel.

In [3]:
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

<div class="alert alert-block alert-warning">
<b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. ⚠️</b>
</div>


### Authenticate your notebook environment (Colab only)

If you are running this notebook on Google Colab, run the cell below to authenticate your environment.


In [2]:
import sys

if "google.colab" in sys.modules:
    from google.colab import auth

    auth.authenticate_user()

### Set Google Cloud project information and initialize Vertex AI SDK

To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).

In [5]:
PROJECT_ID = "qwiklabs-gcp-02-9c890bf5875d"  # @param {type:"string"}
LOCATION = "us-central1"  # @param {type:"string"}

import vertexai

vertexai.init(project=PROJECT_ID, location=LOCATION)

### Import libraries


In [6]:
from IPython.display import Markdown, display
from vertexai.generative_models import GenerationConfig, GenerativeModel, Part

### Load the Gemini 1.5 Flash model

To learn more about all [Gemini API models on Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models#gemini-models).


In [7]:
MODEL_ID = "gemini-1.5-flash"  # @param {type:"string"}

model = GenerativeModel(
    MODEL_ID, generation_config=GenerationConfig(max_output_tokens=8192)
)

## Long-form text

Text has proved to be the layer of intelligence underpinning much of the momentum around LLMs. As mentioned earlier, much of the practical limitation of LLMs was because of not having a large enough context window to do certain tasks. This led to the rapid adoption of retrieval augmented generation (RAG) and other techniques which dynamically provide the model with relevant
contextual information.

Some emerging and standard use cases for text based long context include:

-   Summarizing large corpuses of text
    -   Previous summarization options with smaller context models would require
        a sliding window or another technique to keep state of previous sections
        as new tokens are passed to the model
-   Question and answering
    -   Historically this was only possible with RAG given the limited amount of
        context and models' factual recall being low
-   Agentic workflows
    -   Text is the underpinning of how agents keep state of what they have done
        and what they need to do; not having enough information about the world
        and the agent's goal is a limitation on the reliability of agents

[War and Peace by Leo Tolstoy](https://en.wikipedia.org/wiki/War_and_Peace) is considered one of the greatest literary works of all time; however, it is over 1,225 pages and the average reader will spend 37 hours and 48 minutes reading this book at 250 WPM (words per minute). 😵‍💫 The text alone takes up 3.4 MB of storage space. However, the entire novel consists of less than 900,000 tokens, so it will fit within the Gemini context window.

We are going to pass in the entire text into Gemini 1.5 Flash and get a detailed summary of the plot. For this example, we have the text of the novel from [Project Gutenberg](https://www.gutenberg.org/ebooks/2600) stored in a public Google Cloud Storage bucket.

First, we will use the `count_tokens()` method to examine the token count of the full prompt, then send the prompt to Gemini.

In [8]:
# Set contents to send to the model
contents = [
    "Provide a detailed summary of the following novel.",
    Part.from_uri(
        "gs://github-repo/generative-ai/gemini/long-context/WarAndPeace.txt",
        mime_type="text/plain",
    ),
]

# Counts tokens
print(model.count_tokens(contents))

# Prompt the model to generate content
response = model.generate_content(
    contents,
)

# Print the model response
print(f"\nUsage metadata:\n{response.usage_metadata}")

display(Markdown(response.text))

total_tokens: 839583
total_billable_characters: 43
prompt_tokens_details {
  modality: TEXT
  token_count: 839583
}


Usage metadata:
prompt_token_count: 839583
candidates_token_count: 1746
total_token_count: 841329
prompt_tokens_details {
  modality: TEXT
  token_count: 839583
}
candidates_tokens_details {
  modality: TEXT
  token_count: 1746
}



## War and Peace: A Detailed Summary

Leo Tolstoy's epic novel, *War and Peace*, is a sprawling tale of love, loss, and the impact of war on Russian society during the Napoleonic era. It follows the intertwined lives of several aristocratic families, particularly the Rostovs, the Bolkonskys, and the Bezukhovs, as they navigate personal challenges and the upheaval of the 1805 and 1812 wars.

**Part One:**

The novel opens in 1805, amidst the societal and political tensions leading up to Napoleon's invasion of Russia. We are introduced to a vibrant cast of characters:

* **The Rostovs:** A loving and boisterous family who embody the traditional Russian way of life, focused on family, social events, and military service. The count, Ilyá, is a jovial, generous, and slightly frivolous man. The countess, Nataly, is a caring and protective mother. Their children, Nicholas, Natásha, and Pétya, are spirited and full of life.
* **The Bolkonskys:** A family of intellectual and proud aristocrats, deeply rooted in traditional values and service to the Tsar. Prince Nicholas, a stern and demanding man, is the father of Prince Andrew, a disillusioned and contemplative military man, and Princess Mary, a pious and gentle woman.
* **The Bezukhovs:** This family is defined by wealth, intrigue, and loose morals. Count Bezúkhov, a wealthy and powerful old man, is dying, leaving his vast fortune and his illegitimate son, Pierre, to the whims of ambitious relatives. 

The first part explores the characters' individual journeys:

* **Prince Andrew:**  Disillusioned with society and his marriage, he seeks meaning in military service. He develops a strong respect for Napoleon, but also feels a growing sense of alienation and despair.
* **Natásha:**  A spirited, mischievous, and passionate girl, she explores her first love, but remains mostly unfazed by the social expectations surrounding her.
* **Pierre:**  Awkward and introspective, he searches for meaning in various pursuits, including philosophy and the pursuit of a moral life. 

**Part Two:**

The second part focuses on the early stages of the war against Napoleon. The Russian army, led by General Kutúzov, is in a precarious position, facing a superior French force.  

* **Prince Andrew:** Serving on Kutúzov’s staff, he experiences the complexities of war and the shortcomings of the Austrian alliance. He encounters a sense of hopelessness and a desire for something more meaningful.
* **Nicholas:** Joining the Pávlograd Hussars, he experiences his first taste of battlefield action and learns to navigate the chaos of war. He witnesses Dólokhov’s reckless behavior and experiences the consequences of impulsive decisions.
* **Pierre:**  He struggles to find his place in the midst of war, ultimately finding a sense of purpose and meaning in the simplicity of the soldiers he encounters. 

**Part Three:**

This part focuses on the pivotal battle of Austerlitz, a major defeat for the Russian and Austrian alliance. Prince Andrew is wounded and experiences a spiritual awakening as he gazes at the sky, recognizing the futility of war and the importance of something beyond earthly matters.

* **Natásha:** At a Moscow ball, she is captivated by the beauty of society and the possibility of love and romance.
* **Pierre:**  He becomes enamored by Hélène, and despite his initial misgivings, eventually marries her.

**Part Four:**

Now a rich man, Pierre struggles with his newfound wealth and responsibility, grappling with his wife’s infidelity and the corruption of the society around him. He finds solace in Freemasonry and seeks a more meaningful life through philanthropy and self-improvement. 

**Part Five:**

The novel shifts focus to the war against Prussia and Napoleon. Prince Andrew, now disillusioned with military service and unable to find fulfillment in the upper circles of Petersburg society, seeks refuge in the country and dedicates himself to improving the lives of his serfs. 

**Part Six:**

This part explores the personal struggles and transformations of the characters, with a focus on:

* **Natásha:** She experiences heartbreak and despair after the death of her mother and grapples with the loss of her youthful innocence.
* **Prince Andrew:** He finds new purpose in caring for his son, reconnecting with his family, and exploring his spiritual and philosophical interests.
* **Pierre:**  He continues his pursuit of a more fulfilling life through Freemasonry and philanthropic endeavors.

**Part Seven:**

The Russian army prepares for a fresh confrontation with Napoleon. The tension and complexity of the war machine are portrayed, with conflicting opinions and strategies emerging among the military leaders. Prince Andrew is determined to contribute to the defense of his country.

**Part Eight:**

The invasion begins. The Russian army, initially disorganized and lacking clear leadership, experiences the horrors of the war as they retreat from the French advance. 

* **Nicholas:** He faces challenges, including financial woes and his growing feelings for Sónya. 
* **Prince Andrew:** He seeks a role in the war, and ultimately rejoins the Russian army. 
* **Pierre:** He feels a renewed sense of purpose and a desire to sacrifice himself for a greater good.

**Part Nine:**

The battle of Borodinó is fought, a pivotal event that shifts the momentum of the war.

* **Nicholas:**  He experiences the horrors of the battlefield and the consequences of war.
* **Prince Andrew:** He encounters Napoleon, and has a vision of the infinite, but is fatally wounded.
* **Pierre:** He witnesses the horrors of war firsthand, and seeks meaning amidst the chaos. 

**Part Ten:**

The French advance to Moscow, and the city is occupied and burned down. The
impact of these events on the characters is profound.

* **Natásha:** She is devastated by the loss of her brother, Pétya, and falls ill with despair.
* **Pierre:** He embarks on a journey of self-discovery, ultimately seeking solace and truth in the simple wisdom of a Russian peasant, Platón Karatáev.

**Part Eleven:**

The Russian army re-groups and the French are forced to retreat, leading to the battle of Tarútino. 

* **Prince Andrew:**  He dies in the arms of Natásha, who is dedicated to caring for him. 

**Part Twelve:**

The French retreat continues. The Russian army pursues them, and the prisoners they capture, including Pierre, face the brutal realities of war and captivity.  Pierre finds peace and a new sense of life through his interactions with Platón Karatáev.

**Part Thirteen:**

The French army is in full retreat. The Russian army pursues them, but the pursuit is hindered by the exhaustion of the troops.

* **Nicholas:** He seeks to distinguish himself in the war, but faces moral dilemmas and financial difficulties. 
* **Princess Mary:** She grapples with the loss of her father and her love for Nicholas. 
* **Pierre:** He is imprisoned and witnesses the brutality of the French army.

**Part Fourteen:**

The final phase of the war, characterized by the relentless pursuit of
the French. 

* **Pierre:** He is liberated from captivity and joins the Russian army.
* **Nicholas:** He is reunited with Sónya, and experiences a profound shift in his feelings for her.

**Part Fifteen:**

The war comes to an end. 

* **Nicholas:** He marries Princess Mary, and finds happiness in family life and his passion for farming.
* **Natásha:** She eventually marries Pierre, finding solace and a new sense of purpose in family life. 

**Epilogue:**

The novel concludes with reflections on the nature of power, war, and the
meaning of life. It emphasizes the importance of compassion, faith, and
the simple virtues of family and home. 

*War and Peace* is a complex and nuanced exploration of human
nature, love, and the destructive forces of war. It is not a novel
that offers easy answers, but rather invites readers to contemplate
the intricacies of life, the complexity of human relationships, and
the enduring power of love and faith. 


## Long-form video

Video content has been difficult to process due to constraints of the format itself.
It was hard to skim the content, transcripts often failed to capture the nuance of a video, and most tools don't process images, text, and audio together.
The Gemini 1.5 long context window allows the ability to reason and answer questions about multimodal inputs with
sustained performance.

When tested on the needle in a video haystack problem with 1M tokens, Gemini 1.5 Flash obtained >99.8% recall of the video in the context window, and Gemini 1.5 Pro reached state of the art performance on the [Video-MME benchmark](https://video-mme.github.io/home_page.html).

Some emerging and standard use cases for video long context include:

-   Video question and answering
-   Video memory, as shown with [Google's Project Astra](https://deepmind.google/technologies/gemini/project-astra/)
-   Video captioning
-   Video recommendation systems, by enriching existing metadata with new
    multimodal understanding
-   Video customization, by looking at a corpus of data and associated video
    metadata and then removing parts of videos that are not relevant to the
    viewer
-   Video content moderation
-   Real-time video processing

[Google I/O](https://io.google/) is one of the major events when Google's developer tools are announced. Workshop sessions and are filled with a lot of material, so it can be difficult to keep track all that is discussed.

We are going to use a video of a session from Google I/O 2024 focused on [Grounding for Gemini](https://www.youtube.com/watch?v=v4s5eU2tfd4) to calculate tokens and process the information presented. We will ask a specific question about a point in the video and ask for a general summary.

In [9]:
# Set contents to send to the model
video = Part.from_uri(
    "gs://github-repo/generative-ai/gemini/long-context/GoogleIOGroundingRAG.mp4",
    mime_type="video/mp4",
)

contents = ["At what time in the following video is the Cymbal Starlight demo?", video]

# Counts tokens
print(model.count_tokens(contents))

# Prompt the model to generate content
response = model.generate_content(
    contents,
)

# Print the model response
print(f"\nUsage metadata:\n{response.usage_metadata}")

display(Markdown(response.text))

total_tokens: 607064
total_billable_characters: 54
prompt_tokens_details {
  modality: VIDEO
  token_count: 607050
}
prompt_tokens_details {
  modality: TEXT
  token_count: 14
}


Usage metadata:
prompt_token_count: 607064
candidates_token_count: 20
total_token_count: 607084
prompt_tokens_details {
  modality: VIDEO
  token_count: 607050
}
prompt_tokens_details {
  modality: TEXT
  token_count: 14
}
candidates_tokens_details {
  modality: TEXT
  token_count: 20
}



The Cymbal Starlight demo is at **24:53** in the video. 


In [10]:
contents = [
    "Provide an enthusiastic summary of the video, tailored for software developers.",
    video,
]

# Counts tokens
print(model.count_tokens(contents))

# Prompt the model to generate content
response = model.generate_content(contents)

# Print the model response
print(f"\nUsage metadata:\n{response.usage_metadata}")

display(Markdown(response.text))

total_tokens: 607063
total_billable_characters: 69
prompt_tokens_details {
  modality: TEXT
  token_count: 13
}
prompt_tokens_details {
  modality: VIDEO
  token_count: 607050
}


Usage metadata:
prompt_token_count: 607063
candidates_token_count: 167
total_token_count: 607230
prompt_tokens_details {
  modality: TEXT
  token_count: 13
}
prompt_tokens_details {
  modality: VIDEO
  token_count: 607050
}
candidates_tokens_details {
  modality: TEXT
  token_count: 167
}



Hey developers! This video is a must-watch if you want to level up your AI game. Get ready to learn about ground systems for Gemini using Vertex AI search and some DIY RAG techniques. The speaker, Holt Skinner, is a developer advocate for Google Cloud and he breaks down everything you need to know about RAG – Retrieval-Augmented Generation. You'll discover how to ground your large language models on real-world data, make your systems more accurate and up-to-date, and avoid pesky hallucinations. Plus, learn about different approaches to building a DIY RAG system, using both Google Search and Vertex AI vector search.  Don't miss out on this insightful breakdown of powerful tools that will help you build smarter, more informative AI applications.  Get ready to take your AI skills to the next level! 


## Long-form audio

In order to process audio, developers have typically needed to string together multiple models, like a speech-to-text model and a text-to-text model, in order to process audio. This led to additional latency due to multiple round-trip requests, and the context of the audio itself could be lost.

The Gemini 1.5 models were the first natively multimodal large language models that could understand audio.

On standard audio-haystack evaluations, Gemini 1.5 Pro is able to find the hidden audio in 100% of the tests and Gemini 1.5 Flash is able to find it in 98.7% [of the tests](https://storage.googleapis.com/deepmind-media/gemini/gemini_v1_5_report.pdf). Further, on a test set of 15-minute audio clips, Gemini 1.5 Pro archives a word error rate (WER) of ~5.5%, much lower than even specialized speech-to-text models, without the added complexity of extra input segmentation and pre-processing.

The long context window accepts up to 9.5 hours of audio in a single request.

Some emerging and standard use cases for audio context include:

-   Real-time transcription and translation
-   Podcast / video question and answering
-   Meeting transcription and summarization
-   Voice assistants

Podcasts are a great way to learn about the latest news in technology, but there are so many out there that it can be difficult to follow them all. It's also challenging to find a specific episode with a given topic or a quote.

In this example, we will process 9 episodes of the [Google Kubernetes Podcast](https://cloud.google.com/podcasts/kubernetespodcast) and ask specific questions about the content.

In [11]:
# Set contents to send to the model
contents = [
    "According to the following podcasts, what can you tell me about AI/ML workloads on Kubernetes?",
    Part.from_uri(
        "gs://github-repo/generative-ai/gemini/long-context/20240417-kpod223.mp3",
        mime_type="audio/mpeg",
    ),
    Part.from_uri(
        "gs://github-repo/generative-ai/gemini/long-context/20240430-kpod224.mp3",
        mime_type="audio/mpeg",
    ),
    Part.from_uri(
        "gs://github-repo/generative-ai/gemini/long-context/20240515-kpod225.mp3",
        mime_type="audio/mpeg",
    ),
    Part.from_uri(
        "gs://github-repo/generative-ai/gemini/long-context/20240529-kpod226.mp3",
        mime_type="audio/mpeg",
    ),
    Part.from_uri(
        "gs://github-repo/generative-ai/gemini/long-context/20240606-kpod227.mp3",
        mime_type="audio/mpeg",
    ),
    Part.from_uri(
        "gs://github-repo/generative-ai/gemini/long-context/20240611-kpod228.mp3",
        mime_type="audio/mpeg",
    ),
    Part.from_uri(
        "gs://github-repo/generative-ai/gemini/long-context/20240625-kpod229.mp3",
        mime_type="audio/mpeg",
    ),
    Part.from_uri(
        "gs://github-repo/generative-ai/gemini/long-context/20240709-kpod230.mp3",
        mime_type="audio/mpeg",
    ),
    Part.from_uri(
        "gs://github-repo/generative-ai/gemini/long-context/20240723-kpod231.mp3",
        mime_type="audio/mpeg",
    ),
]

# Counts tokens
print(model.count_tokens(contents))

# Prompt the model to generate content
response = model.generate_content(
    contents,
)

# Print the model response
print(f"\nUsage metadata:\n{response.usage_metadata}")

display(Markdown(response.text))

total_tokens: 843569
total_billable_characters: 80
prompt_tokens_details {
  modality: TEXT
  token_count: 19
}
prompt_tokens_details {
  modality: AUDIO
  token_count: 843550
}


Usage metadata:
prompt_token_count: 843569
candidates_token_count: 74
total_token_count: 843643
prompt_tokens_details {
  modality: TEXT
  token_count: 19
}
prompt_tokens_details {
  modality: AUDIO
  token_count: 843550
}
candidates_tokens_details {
  modality: TEXT
  token_count: 74
}



The podcasts discuss how Kubernetes has evolved and become more mature in the last 10 years. They talk about how it has been adopted by a wide range of companies and how it is being used for new types of workloads, such as AI/ML workloads.  They also discuss how the Kubernetes community is trying to make the project more accessible to new contributors. 


## Code

For a long context window use case involving ingesting an entire GitHub repository, check out [Analyze a codebase with Vertex AI Gemini 1.5 Pro](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/code/analyze_codebase_with_gemini_1_5_pro.ipynb)

## Context caching

[Context caching](https://cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-overview) allows developers to reduce the time and cost of repeated requests using the large context window.
For examples on how to use Context Caching with Gemini on Vertex AI, refer to [Intro to Context Caching with Gemini on Vertex AI](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/context-caching/intro_context_caching.ipynb)