# Prompt Caching

Gonna load all of the app data in and ask it questions with follow ups

https://ai.google.dev/gemini-api/docs/caching?lang=python



In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
%pip install -qU google-generativeai python-dotenv pandas

You should consider upgrading via the '/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


### Setup

#### Load Env

In [3]:
import os
import google.generativeai as genai
from google.generativeai import caching
import datetime
import time
from dotenv import load_dotenv
from utils import get_app_store_data, get_context

load_dotenv()

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])



#### Get all of the app store data

In [4]:
# Get app store data
df = get_app_store_data()

# Get 150,000 tokens of context
app_data_str, app_df = get_context(150000, df)


#### System Prompt and Examples

In [14]:
system_prompt = """You are an App Store Data Analyzer. You should analyze the provided app store data and answer the user's questions.

- Only use the App Store Data provided in your context.
- Do not answer questions you are not confident in answering because the answer can't be found in the provided context.
- Think through your answer slowly, step by step before providing the final answer."""


In [None]:
print(app_data_str)

## Gemini

Gemini says 
> The model doesn't make any distinction between cached tokens and regular input tokens. Cached content is simply a prefix to the prompt.

That means we are pretty much only able to cache the system prompt. So you wouldnt be able to cache search results or function call responses. We cant cache tools. It does make sense though, in an agent the only thing that is static is the system prompt.

In [18]:
examples = [
    "What app has the most ratings?",
    "What are the features for the app 'Online Head Ball'?",
    "What is the most expensive app in the 'Games' category?",
    # "Which app has the longest description in the app store?",
    # "What is the average rating of all free apps?",
    # "Identify any app that is paid and has fewer than 100 ratings.",
    # "List all apps that are categorized under 'Games' and have more than 400,000 ratings.",
    # "Which app has the lowest price in the app store?",
    # "Find all apps that have a title starting with the letter 'A'.",
    # "What is the total number of ratings for all apps in the 'Health & Fitness' category?"
]

In [16]:
# Create a cache with a 5 minute TTL
cache = caching.CachedContent.create(
    model='models/gemini-1.5-flash-001',
    display_name='App Store Data',  # used to identify the cache
    system_instruction=system_prompt,
    contents=[app_data_str],
    ttl=datetime.timedelta(minutes=1),
)

In [19]:
model = genai.GenerativeModel.from_cached_content(cached_content=cache)

for example in examples:
    # Call the model
    response = model.generate_content([(example)])

    print(response.usage_metadata)
    print(f"Question: {example}")
    print(f"Answer: {response.text}")
    print("\n======\n")

prompt_token_count: 157119
cached_content_token_count: 157111
candidates_token_count: 21
total_token_count: 157140

Question: What app has the most ratings?
Answer: The app with the most ratings is **Township**, with 80,801 ratings. 



prompt_token_count: 157124
cached_content_token_count: 157111
candidates_token_count: 138
total_token_count: 157262

Question: What are the features for the app 'Online Head Ball'?
Answer: Here are the features for the app 'Online Head Ball':

- Play with more than 20 million players 
- Almost 200 different characters and even more accessories
- Super Powers that bring more excitement to matches
- Cross-platform gameplay support
- Realistic stadiums and sound effects
- Playing 1 on 1 matches against friends
- Ability to play without internet in Offline Mode
- Weekly seasons with in-game rewards
- Continuous rivalry with all-time leaderboards
- Community events and promotional in-game rewards
- New characters being added with updates
- Fast and fluent ga