In [None]:
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Using Gemini Long Context Window for Text

<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/applied-ai-engineering-samples/blob/main/genai-on-vertex-ai/gemini/long_context_window/gemini_long_context_text.ipynb">
      <img width="32px" src="https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg" alt="Google Colaboratory logo"><br> Open in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fapplied-ai-engineering-samples%2Fmain%2Fgenai-on-vertex-ai%2Fgemini%2Flong_context_window%2Fgemini_long_context_text.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo"><br> Open in Colab Enterprise
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/applied-ai-engineering-samples/main/genai-on-vertex-ai/gemini/long_context_window/gemini_long_context_text.ipynb">
      <img src="https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg" alt="Vertex AI logo"><br> Open in Vertex AI Workbench
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/applied-ai-engineering-samples/blob/main/genai-on-vertex-ai/gemini/long_context_window/gemini_long_context_text.ipynb">
      <img width="32px" src="https://upload.wikimedia.org/wikipedia/commons/9/91/Octicons-mark-github.svg" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
</table>

| | |
|-|-|
| Author(s) | [Vijay Reddy](https://github.com/vijaykyr) |
| Reviewer(s) | [Rajesh Thallam](https://github.com/rthallam), [Skander Hannachi](https://github.com/skanderhn)  |

# Overview

---

Gemini 1.5 Pro supports up to 2 Million input tokens. This is the equivalent of roughly:
- ~2000 pages of text
- ~19 hours of audio
- ~2 hours of video
- ~60K lines of code

This [long context window](https://cloud.google.com/vertex-ai/generative-ai/docs/long-context) (LCW) opens up possibilities for prompting on large contexts that previously could only be approximated using pre-processing steps such as Retrieval Augmented Generation (RAG). Long context windows in LLMs are enabling new use cases and optimizing standard use cases such as:
- Summarizing, analyzing and question-answering on large documents
- Analyzing large code repositories
- Agentic workflows for keeping the state of agents
- [Many-shot in-context learning](https://arxiv.org/pdf/2404.11018) providing examples at the scale of hundreds or thousands leading to performance comparable to fine-tuned models.

---

In this notebook we will demonstrate long context window (LCW) using the text modality*. We will demonstrate 3 approaches to long context prompting and compare each of these approaches along the following dimensions of accuracy, latency and cost.

Below is the summary of results observed at the time these experiments were run. Continue on for a detailed analysis of each.

| Trial | Accuracy | Latency | Cost |
|---|---|---|---|
| **Baseline** | 43% (3/7) | 0.4 min | \$0.004 |
| **LCW - Naive** | 100% (7/7) | 3 min | \$17.24 |
| **LCW - Batched** | 100% (7/7) | 0.6 min | \$2.47 |
| **LCW - Cached** | 100% (7/7) | 2.2 min | \$1.60 |


<div class="alert alert-block alert-info">
* For example of other modality see the companion <a href="https://github.com/GoogleCloudPlatform/applied-ai-engineering-samples/blob/main/genai-on-vertex-ai/gemini/long_context_window/gemini_long_context_video.ipynb">video notebook</a>.
</div>


# Getting Started

The following steps are necessary to run this notebook, no matter what notebook environment you're using.

If you're entirely new to Google Cloud, [get started here](https://cloud.google.com/docs/get-started).

## Google Cloud Project Setup

1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.
1. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).
1. [Enable the Service Usage API](https://console.cloud.google.com/apis/library/serviceusage.googleapis.com)
1. [Enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).
1. [Enable the Cloud Storage API](https://console.cloud.google.com/flows/enableapi?apiid=storage.googleapis.com).

## Google Cloud Permissions

**To run the complete Notebook, including the optional section, you will need to have the [Owner role](https://cloud.google.com/iam/docs/understanding-roles) for your project.**

If you want to skip the optional section, you need at least the following [roles](https://cloud.google.com/iam/docs/granting-changing-revoking-access):
* **`roles/serviceusage.serviceUsageAdmin`** to enable APIs
* **`roles/iam.serviceAccountAdmin`** to modify service agent permissions
* **`roles/aiplatform.user`** to use AI Platform components
* **`roles/storage.objectAdmin`** to modify and delete GCS buckets

## Install Vertex AI SDK for Python and other dependencies (If Needed)

The list `packages` contains tuples of package import names and install names. If the import name is not found then the install name is used to install quitely for the current user.

In [None]:
! pip install pandas google-cloud-aiplatform --upgrade --quiet --user

## Restart Runtime

To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which will restart the current kernel.

In [None]:
# Restart kernel after installs so that your environment can access the new packages
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

## Authenticate

If you're using Colab, run the code in the next cell. Follow the popups and authenticate with an account that has access to your Google Cloud [project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#identifying_projects).

If you're running this notebook somewhere besides Colab, make sure your environment has the right Google Cloud access. If that's a new concept to you, consider looking into [Application Default Credentials for your local environment](https://cloud.google.com/docs/authentication/provide-credentials-adc#local-dev) and [initializing the Google Cloud CLI](https://cloud.google.com/docs/authentication/gcloud). In many cases, running `gcloud auth application-default login` in a shell on the machine running the notebook kernel is sufficient.

More authentication options are discussed [here](https://cloud.google.com/docs/authentication).

In [None]:
# Colab authentication.
import sys

if "google.colab" in sys.modules:
    from google.colab import auth

    auth.authenticate_user()
    print("Authenticated")

## Set Google Cloud project information and Initialize Vertex AI SDK

To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).

Make sure to change `PROJECT_ID` in the next cell. You can leave the values for `REGION` unless you have a specific reason to change them.

In [None]:
import vertexai

PROJECT_ID = "[your-project-id]"  # @param {type:"string"}
REGION = "us-central1"  # @param {type:"string"}

vertexai.init(project=PROJECT_ID, location=REGION)
print("Vertex AI SDK initialized.")
print(f"Vertex AI SDK version = {vertexai.__version__}")

Vertex AI SDK initialized.
Vertex AI SDK version = 1.64.0


## Import Libraries

In [None]:
import datetime

import pandas as pd
from IPython.display import Markdown
from vertexai.generative_models import (GenerativeModel, HarmBlockThreshold,
                                        HarmCategory, Part)

pd.set_option("display.max_colwidth", None)

 ## Initialize Gemini

In [None]:
# Gemini Config
GENERATION_CONFIG = dict(temperature=0)
SAFETY_CONFIG = {
    HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_NONE,
    HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_NONE,
    HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,
    HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE,
}

gemini_pro_model = GenerativeModel(
    model_name="gemini-1.5-pro-001",
    generation_config=GENERATION_CONFIG,
    safety_settings=SAFETY_CONFIG,
)

# Long Context for Question and Answering

To demonstrate Gemini's long context capabilities in the text modality we will do Questions and Answer about the novel [David Copperfield](https://www.gutenberg.org/cache/epub/43111/pg43111) by Charles Dickens. It is ~360K words and ~506K tokens, sufficiently long to evaluate long context capabilities of Gemini.

The questions were sourced manually from the novel, without any prior knowledge of how they would perform in the various tests. The answers for these questions are roughly evenly distributed throughout the source material (beginning, middle, end) so as to evaluate performance across the full context window.

In [None]:
questions = [
    "What are the first objects that David can recall from his infancy?",
    "At the inn where the mail stops, what is painted on the door?",
    "What name does David's aunt suggest to Mr. Dick they call him by?",
    "Describe the room in which Mr. Copperfield meets Steerforth for breakfast.",
    "After his engagement to Dora, David write Agnes a letter. What is the letter about?",
    "What does David find in the hotel where Mr. Micawber requested him to meet in the middle of the night?",
    "Who are David's final thoughts about in the book?",
]

answers = [
    "His mother with her pretty hair, and Peggotty.",
    "DOLPHIN",
    "Trotwood Copperfield",
    "A snug private apartment, red-curtained and Turkey-carpeted, where the fire burnt bright, and a fine hot breakfast was set forth on a table covered with a clean cloth; and a cheerful miniature of the room, the fire, the breakfast, Steerforth, and all, was shining in the little round mirror over the sideboard.",
    "He writes to Agnes to assure her of his deep love for Agnes and that this was not a passing fancy or hasty decision.",
    "A letter stating that he would appear in the morning at half past nine.",
    "Agnes",
]

## Baseline Behavior: Without adding any context

As David Copperfield is a popular classic novel, it is likely Gemini already knows something about it from its internal knowledge. So that we can later measure how adding the novel explicitly as context helps, let's first ask our sample questions without any added context (aka zero-shot).

In [None]:
prompt_template = "Answer the following question about {context}: {question}"
context = "the book 'David Copperfield' by Charles Dickens"


def evaluate(
    questions, answers, prompt_template, context, model, is_context_cached=False
):
    df = pd.DataFrame(
        columns=[
            "question",
            "ground_truth",
            "model_response",
            "input_token_count",
            "output_token_count",
        ]
    )
    for i in range(len(questions)):
        if is_context_cached:
            prompt = prompt_template.format(question=questions[i])
        else:
            prompt = prompt_template.format(context=context, question=questions[i])
        response = model.generate_content(prompt)
        res = response.text
        input_token_count = 0
        output_token_count = 0
        if response.usage_metadata:
            input_token_count = response.usage_metadata.prompt_token_count
            output_token_count = response.usage_metadata.candidates_token_count

        df.loc[len(df)] = {
            "question": questions[i],
            "model_response": res,
            "ground_truth": answers[i],
            "input_token_count": input_token_count,
            "output_token_count": output_token_count,
        }
    return df

In [None]:
%%time
df_zeroshot = evaluate(questions, answers, prompt_template, context, gemini_pro_model)
df_zeroshot

CPU times: user 110 ms, sys: 4.49 ms, total: 114 ms
Wall time: 33.4 s


Unnamed: 0,question,ground_truth,model_response,input_token_count,output_token_count
0,What are the first objects that David can recall from his infancy?,"His mother with her pretty hair, and Peggotty.","The first objects David Copperfield remembers from his infancy are:\n\n* **The hand of his mother:** He describes it as a ""soft hand"" that he used to ""fancy was always holding mine.""\n* **The sparkling objects on his mother's dress:** He remembers them as ""twinkling"" and ""shining"" and associates them with the loving presence of his mother.\n\nThese early memories, though fragmented, highlight the importance of his mother and the sense of security she provided in his early childhood. \n",29,105
1,"At the inn where the mail stops, what is painted on the door?",DOLPHIN,"At the inn where the mail stops in ""David Copperfield,"" the door features a painting of a **large cheese**. \n\nThis detail is mentioned in Chapter 5, when young David is traveling by coach to Yarmouth. The image of the cheese is described as being so poorly painted that it's difficult to discern what it is supposed to be, highlighting the shabbiness of the inn. \n",31,82
2,What name does David's aunt suggest to Mr. Dick they call him by?,Trotwood Copperfield,"David's aunt, Betsey Trotwood, suggests that Mr. Dick call David by the name **""Trotwood""**. \n\nShe dislikes the name ""David"" because it was her deceased fiancé's name, and she associates it with bad luck. She believes that calling David ""Trotwood"" will help to distance him from any potential misfortune. \n",33,76
3,Describe the room in which Mr. Copperfield meets Steerforth for breakfast.,"A snug private apartment, red-curtained and Turkey-carpeted, where the fire burnt bright, and a fine hot breakfast was set forth on a table covered with a clean cloth; and a cheerful miniature of the room, the fire, the breakfast, Steerforth, and all, was shining in the little round mirror over the sideboard.","While the specific details of the breakfast room where Mr. Copperfield (David) meets Steerforth aren't extensively described in ""David Copperfield,"" we can glean some information from the text to paint a picture. \n\nThe meeting takes place at the residence of Steerforth's mother in Highgate. Considering their social standing and Steerforth's somewhat arrogant nature, we can assume the following:\n\n* **The room would be spacious and well-appointed:** It likely reflects the wealth and status of Steerforth's family. Expect high ceilings, large windows, and possibly a fireplace.\n* **The furnishings would be elegant and likely somewhat traditional:** Think polished wood furniture, perhaps mahogany, with plush upholstery and tasteful decorations. \n* **The breakfast setting itself would be lavish:** A white tablecloth, silverware, fine china, and a spread of breakfast delicacies would be the norm for a family like Steerforth's. \n\nHowever, Dickens doesn't dwell on the physical details of the room. He's more interested in the social dynamics and the contrast between David's awe of Steerforth and the latter's somewhat condescending demeanor. \n\nTherefore, while we can infer the general opulence of the setting, the specific details are left to the reader's imagination. \n",31,266
4,"After his engagement to Dora, David write Agnes a letter. What is the letter about?",He writes to Agnes to assure her of his deep love for Agnes and that this was not a passing fancy or hasty decision.,"In Charles Dickens's ""David Copperfield,"" after David becomes engaged to Dora Spenlow, he does indeed write a letter to his dear friend Agnes Wickfield. This letter is full of turmoil and reveals David's inner conflict about his engagement. \n\nHere's a breakdown of the letter's content:\n\n* **Expressing Doubt:** David confides in Agnes his growing doubts about his relationship with Dora. He acknowledges that while Dora is beautiful and charming, he sees her more as a ""child-wife"" who needs constant care and may not possess the depth of character he truly desires in a life partner.\n* **Seeking Validation:** He seeks validation from Agnes, hoping she will reassure him that he's doing the right thing. Deep down, he craves her approval and perhaps even secretly wishes for a different future with her.\n* **Acknowledging Agnes's Qualities:** The letter implicitly contrasts Dora's childlike innocence with Agnes's maturity, intelligence, and steadfastness. He praises her virtues, further highlighting his internal conflict.\n* **Fear of Causing Pain:** David expresses his fear of hurting Dora by breaking off the engagement. He feels trapped by his own sense of obligation and his desire to protect her from pain.\n\nIn essence, the letter to Agnes is a cry for help, a confession of David's growing unease about his impending marriage. It reveals his emotional dependence on Agnes and foreshadows the challenges and heartache that lie ahead in his relationship with Dora. \n",34,312
5,What does David find in the hotel where Mr. Micawber requested him to meet in the middle of the night?,A letter stating that he would appear in the morning at half past nine.,"David finds Mr. Micawber in a state of emotional distress, having already consumed a significant amount of punch. He's about to expose the villainy of his employer, Uriah Heep. \n\nWhile the prompt mentions finding something ""in the hotel,"" it's more accurate to say David finds a situation unfolding rather than a physical object. \n",40,74
6,Who are David's final thoughts about in the book?,Agnes,"In the closing lines of ""David Copperfield,"" David's final thoughts are about **Agnes Wickfield**. \n\nHe reflects on their shared past, her unwavering love and support, and the happiness they have found together. He acknowledges her profound influence on his life, stating that ""The light of the fire is upon your face, and the light of the last confidences I have ever to make, my dearest wife, is upon it also."" \n\nThis ending emphasizes the importance of Agnes as the anchor and true love of David's life, bringing the novel to a close on a note of peace and fulfillment. \n",28,127


### Analysis

- Latency: 25 seconds
- Cost: $0.004
- Accuracy: 3/7

Accuracy is determined manually by comparing the ground truth to the model response. There is some subjectivity in this evaluation, and at the time of this writing Gemini is non-deterministic, so your results may vary slightly. 

In our analysis only 2 answers are unambigiously correct, and another two we  consider close enough to give partial credit, for a total of 1+1+0.5+0.5=3 out of 7, or 43% accuracy.

The cost is negligable.

## Long Context Window

Now tet's take advantage of the 2M context window with Gemini 1.5 Pro and see if accuracy improves by feeding the entire novel text as context. 

### Download Novel

In [None]:
import requests

url = "https://www.gutenberg.org/ebooks/766.txt.utf-8"
try:
    response = requests.get(url)
    response.raise_for_status()  # Raise an exception for bad status codes (4xx and 5xx)
except requests.exceptions.RequestException as e:
    print(f"Error downloading file: {e}")

novel = response.text

### Naive approach

We will first construct our prompt in a naive way by stuffing the entirety of the novel into the prompt and asking it one question at a time.

<div class="alert alert-block alert-info">

💡 <b>TIP</b> </br>
<hr/>

When working with a long context prompts, you can follow a few prompting strategies: </br>

<ol>
    <li>It’s important to structure your prompt separating out input data (documents) from the instructions. In the prompt template, we are using XML tags to separate out document and instructions. This helps Gemini 1.5 Pro to disambiguate data from instructions and process the prompt optimally.</li></br>
    <li>Location of instruction and user input matters! Documents are added first followed by instructions and user input/question. This placement helps the model to address the question better.</li>
</ol>
</div>

In [None]:
%%time
prompt_template = """
Your task is to read the full text of the novel David Copperfield and then answer the questions below. 

<document>
{context}
</document>

Based on the novel text provided, answer the following: 
{question}
"""

df_noncache = evaluate(questions, answers, prompt_template, novel, gemini_pro_model)
df_noncache

CPU times: user 10.7 s, sys: 93.8 ms, total: 10.8 s
Wall time: 3min 8s


Unnamed: 0,question,ground_truth,model_response,input_token_count,output_token_count
0,What are the first objects that David can recall from his infancy?,"His mother with her pretty hair, and Peggotty.","The first objects David Copperfield remembers from his infancy are his mother, with her pretty hair and youthful shape, and Peggotty, with ""no shape at all"" and dark eyes, red cheeks, and hard arms. \n",539714,48
1,"At the inn where the mail stops, what is painted on the door?",DOLPHIN,The door has **DOLPHIN** painted on it. \n,539716,14
2,What name does David's aunt suggest to Mr. Dick they call him by?,Trotwood Copperfield,"David's aunt suggests to Mr. Dick that they call him ""Trotwood"". \n",539718,20
3,Describe the room in which Mr. Copperfield meets Steerforth for breakfast.,"A snug private apartment, red-curtained and Turkey-carpeted, where the fire burnt bright, and a fine hot breakfast was set forth on a table covered with a clean cloth; and a cheerful miniature of the room, the fire, the breakfast, Steerforth, and all, was shining in the little round mirror over the sideboard.","The room where David Copperfield meets Steerforth for breakfast is described as a ""snug private apartment,"" a welcome contrast to the ""little loft over a stable"" David had been initially assigned. It is decorated with red curtains and has a Turkey carpet. A bright fire burns in the fireplace, and a ""fine hot breakfast"" is laid out on a table covered with a clean tablecloth. A ""cheerful miniature"" of the room, including the fire and breakfast, is reflected in a small, round mirror over the sideboard. \n",539716,107
4,"After his engagement to Dora, David write Agnes a letter. What is the letter about?",He writes to Agnes to assure her of his deep love for Agnes and that this was not a passing fancy or hasty decision.,"After David gets engaged to Dora, he writes Agnes a long letter telling her how happy he is and how wonderful Dora is. He describes his love for Dora as profound and unlike anything ever known, trying to convince Agnes that this is not a passing fancy like his childhood infatuations. He wants her to understand the depth and seriousness of his feelings for Dora. \n",539719,73
5,What does David find in the hotel where Mr. Micawber requested him to meet in the middle of the night?,A letter stating that he would appear in the morning at half past nine.,"In the hotel where Mr. Micawber requested to meet David in the middle of the night, David finds a **letter**. \n\nThis letter informs David that Mr. Micawber will be appearing in the morning at precisely half past nine. \n",539725,52
6,Who are David's final thoughts about in the book?,Agnes,"At the end of the book, David's final thoughts are about **Agnes**. He reflects on their journey together through life, surrounded by their children and friends. He acknowledges her unwavering support and guidance through his darkest times, and expresses his enduring love for her. He sees her as a guiding light, a source of strength and inspiration, and hopes that her face will be the last thing he sees before he dies. \n",539713,86


#### Analysis

- Latency: ~3 minutes
- Cost: \$17.24
- Accuracy: 7/7

Unsuprisingly latency is much greater since we're increasing our prompt size by 500K tokens. 

Cost is also significantly increased. At the time of this writing (refer to the [pricing](https://cloud.google.com/vertex-ai/generative-ai/pricing) for latest) Gemini 1.5 Pro costs $0.00125/1k input characters. The novel is 1,970,730 characters which amounts to \$2.46 per invocation.

However we now have 100% accuracy.

### Batching multiple questions

One way to save on cost and latency when dealing with long contexts is by batching multiple questions into one prompt. Let's try asking all 7 of our questions at once.

In [None]:
%%time
prompt_template = """
Your task is to read the full text of the novel David Copperfield and then answer the questions below. 

<document>
{context}
</document>

Based on the novel text provided, answer the following: 
{question}
"""

prompt = prompt_template.format(context=novel, question=questions)
response = gemini_pro_model.generate_content(prompt).text
Markdown(response)

CPU times: user 1.54 s, sys: 16.6 ms, total: 1.56 s
Wall time: 36.3 s


Here are the answers to your questions, based on the provided text of *David Copperfield*:

1. **What are the first objects that David can recall from his infancy?**  The first objects David remembers are his mother, with her pretty hair and youthful shape, and Peggotty, with her dark eyes and hard, red cheeks and arms.

2. **At the inn where the mail stops, what is painted on the door?** The door of David's room at the inn has a DOLPHIN painted on it.

3. **What name does David's aunt suggest to Mr. Dick they call him by?** David's aunt suggests they call him Trotwood, which later gets shortened to Trot.

4. **Describe the room in which Mr. Copperfield meets Steerforth for breakfast.** Steerforth and David have breakfast in a 'snug private apartment, red-curtained and Turkey-carpeted, where the fire burnt bright, and a fine hot breakfast was set forth on a table covered with a clean cloth'. A miniature of the room is reflected in a 'little round mirror over the sideboard'.

5. **After his engagement to Dora, David writes Agnes a letter. What is the letter about?** David writes to tell Agnes about his engagement to Dora, trying to convey how deeply in love he is and how wonderful Dora is. He also mentions the sadness at Yarmouth over Emily's disappearance, saying it is a double wound for him due to the circumstances. He doesn't mention Steerforth by name.

6. **What does David find in the hotel where Mr. Micawber requested him to meet in the middle of the night?** David finds a letter from Mr. Micawber stating that there is no hope of the remittance he was expecting and that he is financially ruined. The letter is written in a melodramatic style, typical of Mr. Micawber.

7. **Who are David's final thoughts about in the book?** David's final thoughts are about Agnes, his wife. He reflects on how she has always been his guiding light and how much he loves her. He expresses his hope that her face will be with him when he dies. 


#### Analysis

- Latency: 37 seconds
- Cost: $2.47
- Accuracy: 6/7

While we save considerably on latency and cost, it now hallucinates the answer for question 6. Asking several questions in a single prompt, while cost and latency efficient, can sacrifice accuracy. 

You can treat the number of questions per prompt as a sort of hyperparameter, reducing it to prioritize accuracy and increasing it prioritize cost and/or latency. 

### Context Caching

In cases where we anticipate multiple model invocations about the same long context, instead of passing the whole context in the prompt each time we can take advantage of [context caching](https://cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-overview).

Caching can be combined with batching to further reduce cost and latency. For example if you have 100 questions, ask in 10 batches of 10 questions. However in this case for the sake of comparison we will forgo batching and ask only one question per prompt.

A few things to note with context caching:
- The minimum size of a context cache is 32K tokens.
- By default, each context cache has a expiration time of 60min, which can be updated either at or after cache creation.

In [None]:
from vertexai.preview import caching
from vertexai.preview.generative_models import GenerativeModel

system_instruction = """
Your task is to read the full text of the novel David Copperfield and then answer the questions below. 
"""

contents = [Part.from_text(novel)]

cached_content = caching.CachedContent.create(
    model_name="gemini-1.5-pro-001",
    system_instruction=system_instruction,
    contents=contents,
    ttl=datetime.timedelta(minutes=10),
)
cached_content = caching.CachedContent(cached_content_name=cached_content.name)

model_cached = GenerativeModel.from_cached_content(
    cached_content=cached_content,
    generation_config=GENERATION_CONFIG,
    safety_settings=SAFETY_CONFIG,
)

In [None]:
%%time
cached_prompt_template = "Answer the following question from the full text: {question}"

df_cache = evaluate(
    questions,
    answers,
    cached_prompt_template,
    novel,
    model_cached,
    is_context_cached=True,
)
df_cache

CPU times: user 90.7 ms, sys: 21.3 ms, total: 112 ms
Wall time: 58.5 s


Unnamed: 0,question,ground_truth,model_response,input_token_count,output_token_count
0,What are the first objects that David can recall from his infancy?,"His mother with her pretty hair, and Peggotty.","The first objects David Copperfield remembers from his infancy are his mother, with her pretty hair and youthful shape, and Peggotty, with ""no shape at all"" and dark eyes, red cheeks, and hard arms. \n",539701,48
1,"At the inn where the mail stops, what is painted on the door?",DOLPHIN,The door has **DOLPHIN** painted on it. \n,539703,14
2,What name does David's aunt suggest to Mr. Dick they call him by?,Trotwood Copperfield,"David's aunt suggests to Mr. Dick that they call David ""Trotwood"". \n",539705,20
3,Describe the room in which Mr. Copperfield meets Steerforth for breakfast.,"A snug private apartment, red-curtained and Turkey-carpeted, where the fire burnt bright, and a fine hot breakfast was set forth on a table covered with a clean cloth; and a cheerful miniature of the room, the fire, the breakfast, Steerforth, and all, was shining in the little round mirror over the sideboard.","Mr. Copperfield describes the room where he has breakfast with Steerforth as a ""snug private apartment."" It is decorated with red curtains and has a Turkey carpet. A bright fire burns in the fireplace, and a hot breakfast is laid out on a table covered with a clean tablecloth. The narrator notes that a miniature reflection of the cozy scene, including the fire, breakfast, and Steerforth himself, is visible in the small, round mirror hanging over the sideboard. \n",539703,96
4,"After his engagement to Dora, David write Agnes a letter. What is the letter about?",He writes to Agnes to assure her of his deep love for Agnes and that this was not a passing fancy or hasty decision.,"After getting engaged to Dora, David writes a long letter to Agnes telling her about his engagement and how blissful he is. He describes how much he adores Dora and tries to convince Agnes that this love is different from the boyish fancies they used to joke about, claiming its depth is unfathomable. He avoids mentioning Steerforth and only tells her about the sadness in Yarmouth due to Emily's flight, which has caused him additional pain because of the circumstances surrounding it. \n",539706,98
5,What does David find in the hotel where Mr. Micawber requested him to meet in the middle of the night?,A letter stating that he would appear in the morning at half past nine.,David finds a letter from Mr. Micawber stating that he will appear in the morning at half past nine. \n,539712,25
6,Who are David's final thoughts about in the book?,Agnes,"At the end of the book, David's final thoughts are about **Agnes**. He reflects on their journey together through life, surrounded by their children and friends. He acknowledges her unwavering support and guidance, and expresses his enduring love for her. He sees her as a guiding light, a source of solace and inspiration, and hopes that her presence will remain with him until the end of his life. \n",539700,82


### Analysis

- Latency: 2.2 minutes
- Cost: \$1.60 (\$1.23 query cost + \$0.37 storage for 10 minutes)
- Accuracy: 7/7


Cached input is almost 10x discounted at \$0.000625/1k input characters (> 128K context window), plus a storage charge of \$0.001125/1k characters/hour. 

The more often you query, the more the caching approach saves. There is a latency improvement from caching and accuracy is back at 100%.

# Conclusion

| Trial | Accuracy | Latency | Cost |
|---|---|---|---|
| **Baseline** | 43% (3/7) | 0.4 min | \$0.004 |
| **LCW - Naive** | 100% (7/7) | 3 min | \$17.24 |
| **LCW - Batched** | 100% (7/7) | 0.6 min | \$2.47 |
| **LCW - Cached** | 100% (7/7) | 2.2 min | \$1.60 |

We have demonstrated various approaches to long context prompting and compared them across the dimensions of latency, cost and accuracy. 

Whenever using the same long context across multiple prompts, caching is a great option to reduce cost. You can amplify the cost savings and reduce latency by batching, but be careful as batching too many questions will start to negatively impact accuracy.