# Prompting

In [2]:
import google.generativeai as genai
from IPython.display import HTML, Markdown, display
import os

genai.configure(api_key=os.environ["GOOGLE_API_KEY"])


Single-turn, text-in/text-out  structure

In [3]:
flash = genai.GenerativeModel('gemini-1.5-flash')
response = flash.generate_content("Explain AI to me like I'm a kid.")
print(response.text)

Imagine you have a really smart robot friend. This robot can learn things just like you do! You can teach it by showing it pictures, telling it stories, or playing games with it. The more you teach it, the smarter it gets.

That's kind of like AI (Artificial Intelligence). It's a computer program that can learn and solve problems like a human. AI can do lots of cool things, like:

* **Play games:** AI can beat you at chess or even video games!
* **Answer questions:** Ask AI a question and it can try to give you an answer.
* **Recognize faces:** AI can look at pictures and tell you who's in them.
* **Write stories:** Some AIs can even write stories and poems.

AI is still learning and getting smarter every day. It can help us do many things, like make our lives easier, learn new things, and even solve problems in the world. It's like having a really cool robot friend that can help us do amazing things! 



In [4]:
Markdown(response.text)

Imagine you have a really smart robot friend. This robot can learn things just like you do! You can teach it by showing it pictures, telling it stories, or playing games with it. The more you teach it, the smarter it gets.

That's kind of like AI (Artificial Intelligence). It's a computer program that can learn and solve problems like a human. AI can do lots of cool things, like:

* **Play games:** AI can beat you at chess or even video games!
* **Answer questions:** Ask AI a question and it can try to give you an answer.
* **Recognize faces:** AI can look at pictures and tell you who's in them.
* **Write stories:** Some AIs can even write stories and poems.

AI is still learning and getting smarter every day. It can help us do many things, like make our lives easier, learn new things, and even solve problems in the world. It's like having a really cool robot friend that can help us do amazing things! 


**Start a chat**

multi-turn chat structure

In [5]:
chat  = flash.start_chat(history=[])
response = chat.send_message("Hello! My name is Maharajan.")
print(response.text)

Hello Maharajan! It's nice to meet you. 😊  What can I do for you today? 



In [6]:
response = chat.send_message("Can you tell something interesting about dinosaurs")
print(response.text)

You bet! Dinosaurs are fascinating creatures, and there's a lot to learn about them. Here are a few interesting facts:

* **The biggest land animal ever was a dinosaur:** The Argentinosaurus, a long-necked sauropod, could have reached up to 100 feet long and weighed over 100 tons! That's bigger than a Boeing 737 airplane!

* **Some dinosaurs had feathers:**  While we often imagine dinosaurs as scaly beasts, many, including the Velociraptor, had feathers.  Some even used them for display or flight!

* **Dinosaurs ruled the Earth for over 180 million years:** That's a very long time! During that time, they evolved into a vast array of shapes, sizes, and lifestyles. 

* **Dinosaurs lived on all the continents:** Even Antarctica, which was much warmer millions of years ago, was home to dinosaurs!

* **Not all dinosaurs were giants:**  While some were truly enormous, many were smaller than a chicken.  The Compsognathus, one of the smallest, was about the size of a turkey. 

* **The T-Rex wa

In [7]:
# While you have the `chat` object around, the conversation state
# persists. Confirm that by asking if it knows my name.
response = chat.send_message('Do you remember what my name is?')
print(response.text)

Of course! You are Maharajan.  😊  It's great to be talking with you.  Do you want to continue learning about dinosaurs, or is there something else you'd like to chat about? 



**Choose a model**

Available model and their capabilities

In [8]:
for model in genai.list_models():
    print(model.name)

models/chat-bison-001
models/text-bison-001
models/embedding-gecko-001
models/gemini-1.0-pro-latest
models/gemini-1.0-pro
models/gemini-pro
models/gemini-1.0-pro-001
models/gemini-1.0-pro-vision-latest
models/gemini-pro-vision
models/gemini-1.5-pro-latest
models/gemini-1.5-pro-001
models/gemini-1.5-pro-002
models/gemini-1.5-pro
models/gemini-1.5-pro-exp-0801
models/gemini-1.5-pro-exp-0827
models/gemini-1.5-flash-latest
models/gemini-1.5-flash-001
models/gemini-1.5-flash-001-tuning
models/gemini-1.5-flash
models/gemini-1.5-flash-exp-0827
models/gemini-1.5-flash-002
models/gemini-1.5-flash-8b
models/gemini-1.5-flash-8b-001
models/gemini-1.5-flash-8b-latest
models/gemini-1.5-flash-8b-exp-0827
models/gemini-1.5-flash-8b-exp-0924
models/embedding-001
models/text-embedding-004
models/aqa


In [10]:
for model in genai.list_models():
    if model.name == 'models/gemini-1.5-flash':
        print(model)
        break

Model(name='models/gemini-1.5-flash',
      base_model_id='',
      version='001',
      display_name='Gemini 1.5 Flash',
      description='Fast and versatile multimodal model for scaling across diverse tasks',
      input_token_limit=1000000,
      output_token_limit=8192,
      supported_generation_methods=['generateContent', 'countTokens'],
      temperature=1.0,
      max_temperature=2.0,
      top_p=0.95,
      top_k=40)


## explore generation parameters

**Output length**

When generating text with an LLM, the output length affects cost and performance. Generating more tokens increases computation, leading to higher energy consumption, latency, and cost.

To stop the model from generating tokens past a limit, you can specify the `max_output_tokens` parameter when using the API.

Prompt engineering may be required to generate a more complete output for your given limit


In [11]:
short_model = genai.GenerativeModel(
    'gemini-1.5-flash',
    generation_config=genai.GenerationConfig(max_output_tokens=200))

response = short_model.generate_content('Write a 1000 word essay on the importance of olives in modern society.')
print(response.text)

## The Enduring Significance of Olives: A Modern Tale of Antiquity

The olive tree, a symbol of peace, longevity, and prosperity, has graced the Mediterranean landscape for millennia. Its fruit, the olive, has transcended its culinary role to become a cornerstone of various aspects of modern society, weaving its influence through food, health, and even the very fabric of the environment. This essay will explore the diverse and enduring significance of olives in the 21st century, highlighting their multifaceted impact on our lives.

Firstly, olives are undeniably a culinary staple in many cultures. The distinct, briny flavor and versatility of olives have made them a ubiquitous ingredient in Mediterranean cuisine. From the classic Greek olive oil and Kalamata olives to the vibrant tapenade spread in France, olives add a unique depth of flavor to countless dishes. Their versatility extends beyond savory applications as well. Olives are used in salads, pizzas, pasta sauces, and even desse

In [12]:
response = short_model.generate_content('Write a short poem on the importance of olives in modern society.')
print(response.text)

A tiny fruit, a verdant sphere,
A symbol of the earth held dear.
The olive, small, yet mighty strong,
A taste of history, all along.

From ancient feasts to modern plates,
It graces tables, conquers fates.
In oil it sings, a golden gleam,
A culinary masterpiece, it would seem.

From salads bright to savory bread,
Its flavor dances, unsaid.
A source of health, a vibrant hue,
The olive's presence, ever true.

So raise a glass, a toast we share,
To this small fruit, beyond compare.
For in its essence, we find grace,
The olive's story, in time and space. 



**Temprature**

Temperature controls the degree of randomness in token selection. Higher temperatures result in a higher number of candidate tokens from which the next output token is selected, and can produce more diverse results, while lower temperatures have the opposite effect, such that a temperature of 0 results in greedy decoding, selecting the most probable token at each step.

Temperature doesn't provide any guarantees of randomness, but it can be used to "nudge" the output somewhat.

In [13]:
from google.api_core import retry

high_temp_model = genai.GenerativeModel(
    'gemini-1.5-flash',
    generation_config=genai.GenerationConfig(temperature=2.0))


# When running lots of queries, it's a good practice to use a retry policy so your code
# automatically retries when hitting Resource Exhausted (quota limit) errors.
retry_policy = {
    "retry": retry.Retry(predicate=retry.if_transient_error, initial=10, multiplier=1.5, timeout=300)
}

for _ in range(5):
  response = high_temp_model.generate_content('Pick a random colour... (respond in a single word)',
                                              request_options=retry_policy)
  if response.parts:
    print(response.text, '-' * 25)

Teal 
 -------------------------
Purple 
 -------------------------
Blue. 
 -------------------------
Purple 
 -------------------------
Purple. 
 -------------------------


Now try the same prompt with temperature set to zero. Note that the output is not completely deterministic, as other parameters affect token selection, but the results will tend to be more stable.

In [14]:
low_temp_model = genai.GenerativeModel(
    'gemini-1.5-flash',
    generation_config=genai.GenerationConfig(temperature=0.0))

for _ in range(5):
  response = low_temp_model.generate_content('Pick a random colour... (respond in a single word)',
                                             request_options=retry_policy)
  if response.parts:
    print(response.text, '-' * 25)

Purple 
 -------------------------
Purple 
 -------------------------
Purple 
 -------------------------
Purple 
 -------------------------
Purple 
 -------------------------


**Top-K and top-P**

Like temperature, top-K and top-P parameters are also used to control the diversity of the model's output.

Top-K is a positive integer that defines the number of most probable tokens from which to select the output token. A top-K of 1 selects a single token, performing greedy decoding.

Top-P defines the probability threshold that, once cumulatively exceeded, tokens stop being selected as candidates. A top-P of 0 is typically equivalent to greedy decoding, and a top-P of 1 typically selects every token in the model's vocabulary.

When both are supplied, the Gemini API will filter top-K tokens first, then top-P and then finally sample from the candidate tokens using the supplied temperature.

Run this example a number of times, change the settings and observe the change in output.

In [15]:
model = genai.GenerativeModel(
    'gemini-1.5-flash-001',
    generation_config=genai.GenerationConfig(
        # These are the default values for gemini-1.5-flash-001.
        temperature=1.0,
        top_k=64,
        top_p=0.95,
    ))

story_prompt = "You are a creative writer. Write a short story about a cat who goes on an adventure."
response = model.generate_content(story_prompt, request_options=retry_policy)
print(response.text)

Bartholomew, a ginger tabby with a heart of gold and a belly perpetually rumbling with hunger, was tired of the routine. The same old food bowl, the same old window perch, the same old scratching post – it all felt a little too… predictable. One day, as the sun dipped below the horizon, casting long shadows across the living room, a daring idea sparked in his emerald eyes. He would go on an adventure. 

He snuck out through the cat flap, the cool night air tingling his whiskers. The world was a symphony of scents – damp earth, honeysuckle, and the intoxicating aroma of a distant bakery. Bartholomew, his tail held high, set off towards the beckoning darkness. 

He navigated the maze of back alleys, dodging stray dogs and avoiding the grumpy old lady who always threw water at him. His journey led him to the heart of the city, where towering buildings scraped the sky and neon signs flickered like fireflies. He climbed a fire escape, his claws finding purchase on the rusty metal, and gazed

## Prompting

**Zero-shot**

Zero-shot prompts are prompts that describe the request for the model directly.

In [16]:
model = genai.GenerativeModel(
    'gemini-1.5-flash-001',
    generation_config=genai.GenerationConfig(
        temperature=0.1,
        top_p=1,
        max_output_tokens=5,
    ))

zero_shot_prompt = """Classify movie reviews as POSITIVE, NEUTRAL or NEGATIVE.
Review: "Her" is a disturbing study revealing the direction
humanity is headed if AI is allowed to keep evolving,
unchecked. I wish there were more movies like this masterpiece.
Sentiment: """

response = model.generate_content(zero_shot_prompt, request_options=retry_policy)
print(response.text)

Sentiment: **POSITIVE**


**Enum mode

The models are trained to generate text, and can sometimes produce more text than you may wish for. In the preceding example, the model will output the label, sometimes it can include a preceding "Sentiment" label, and without an output token limit, it may also add explanatory text afterwards.

The Gemini API has an Enum mode feature that allows you to constrain the output to a fixed set of values.

In [17]:
import enum

class Sentiment(enum.Enum):
    POSITIVE = "positive"
    NEUTRAL = "neutral"
    NEGATIVE = "negative"


model = genai.GenerativeModel(
    'gemini-1.5-flash-001',
    generation_config=genai.GenerationConfig(
        response_mime_type="text/x.enum",
        response_schema=Sentiment
    ))

response = model.generate_content(zero_shot_prompt, request_options=retry_policy)
print(response.text)
    

positive


**One-shot and few-shot**

Providing an example of the expected response is known as a "one-shot" prompt. When you provide multiple examples, it is a "few-shot" prompt.

In [18]:
model = genai.GenerativeModel(
    'gemini-1.5-flash-latest',
    generation_config=genai.GenerationConfig(
        temperature=0.1,
        top_p=1,
        max_output_tokens=250,
    ))

few_shot_prompt = """Parse a customer's pizza order into valid JSON:

EXAMPLE:
I want a small pizza with cheese, tomato sauce, and pepperoni.
JSON Response:
```
{
"size": "small",
"type": "normal",
"ingredients": ["cheese", "tomato sauce", "peperoni"]
}
```

EXAMPLE:
Can I get a large pizza with tomato sauce, basil and mozzarella
JSON Response:
```
{
"size": "large",
"type": "normal",
"ingredients": ["tomato sauce", "basil", "mozzarella"]
}

ORDER:
"""

customer_order = "Give me a large with cheese & pineapple"


response = model.generate_content([few_shot_prompt, customer_order], request_options=retry_policy)
print(response.text)

```json
{
"size": "large",
"type": "normal",
"ingredients": ["cheese", "pineapple"]
}
``` 



**JSON mode**

To provide control over the schema, and to ensure that you only receive JSON (with no other text or markdown), you can use the Gemini API's JSON mode. This forces the model to constrain decoding, such that token selection is guided by the supplied schema.

In [19]:
import typing_extensions as typing

class PizzaOrder(typing.TypedDict):
    size: str
    ingredients: list[str]
    type: str


model = genai.GenerativeModel(
    'gemini-1.5-flash-latest',
    generation_config=genai.GenerationConfig(
        temperature=0.1,
        response_mime_type="application/json",
        response_schema=PizzaOrder,
    ))

response = model.generate_content("Can I have a large dessert pizza with apple and chocolate")
print(response.text)

{"ingredients": ["apple", "chocolate"], "size": "large", "type": "dessert"}



**Chain of Thought (CoT)**

Direct prompting on LLMs can return answers quickly and (in terms of output token usage) efficiently, but they can be prone to hallucination. The answer may "look" correct (in terms of language and syntax) but is incorrect in terms of factuality and reasoning.

Chain-of-Thought prompting is a technique where you instruct the model to output intermediate reasoning steps, and it typically gets better results, especially when combined with few-shot examples. It is worth noting that this technique doesn't completely eliminate hallucinations, and that it tends to cost more to run, due to the increased token count.

As models like the Gemini family are trained to be "chatty" and provide reasoning steps, you can ask the model to be more direct in the prompt.

In [21]:
prompt = """When I was 4 years old, my partner was 3 times my age. Now, I
am 20 years old. How old is my partner? Return the answer directly."""

model = genai.GenerativeModel('gemini-1.5-flash-latest')
response = model.generate_content(prompt, request_options=retry_policy)

print(response.text)

52 



In [22]:
prompt = """When I was 4 years old, my partner was 3 times my age. Now,
I am 20 years old. How old is my partner? Let's think step by step."""

response = model.generate_content(prompt, request_options=retry_policy)
print(response.text)

Here's how to solve this:

* **When you were 4:** Your partner was 3 times your age, meaning they were 4 * 3 = 12 years old.
* **Age difference:** Your partner is 12 - 4 = 8 years older than you.
* **Current age:** Since you are now 20 years old, your partner is 20 + 8 = **28 years old**. 



**ReAct: Reason and act**

In this example you will run a ReAct prompt directly in the Gemini API and perform the searching steps yourself. As this prompt follows a well-defined structure, there are frameworks available that wrap the prompt into easier-to-use APIs that make tool calls automatically, such as the LangChain example from the chapter.

In [23]:
model_instructions = """
Solve a question answering task with interleaving Thought, Action, Observation steps. Thought can reason about the current situation,
Observation is understanding relevant information from an Action's output and Action can be one of three types:
 (1) <search>entity</search>, which searches the exact entity on Wikipedia and returns the first paragraph if it exists. If not, it
     will return some similar entities to search and you can try to search the information from those topics.
 (2) <lookup>keyword</lookup>, which returns the next sentence containing keyword in the current context. This only does exact matches,
     so keep your searches short.
 (3) <finish>answer</finish>, which returns the answer and finishes the task.
"""

example1 = """Question
Musician and satirist Allie Goertz wrote a song about the "The Simpsons" character Milhouse, who Matt Groening named after who?

Thought 1
The question simplifies to "The Simpsons" character Milhouse is named after who. I only need to search Milhouse and find who it is named after.

Action 1
<search>Milhouse</search>

Observation 1
Milhouse Mussolini Van Houten is a recurring character in the Fox animated television series The Simpsons voiced by Pamela Hayden and created by Matt Groening.

Thought 2
The paragraph does not tell who Milhouse is named after, maybe I can look up "named after".

Action 2
<lookup>named after</lookup>

Observation 2
Milhouse was named after U.S. president Richard Nixon, whose middle name was Milhous.

Thought 3
Milhouse was named after U.S. president Richard Nixon, so the answer is Richard Nixon.

Action 3
<finish>Richard Nixon</finish>
"""

example2 = """Question
What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?

Thought 1
I need to search Colorado orogeny, find the area that the eastern sector of the Colorado orogeny extends into, then find the elevation range of the area.

Action 1
<search>Colorado orogeny</search>

Observation 1
The Colorado orogeny was an episode of mountain building (an orogeny) in Colorado and surrounding areas.

Thought 2
It does not mention the eastern sector. So I need to look up eastern sector.

Action 2
<lookup>eastern sector</lookup>

Observation 2
The eastern sector extends into the High Plains and is called the Central Plains orogeny.

Thought 3
The eastern sector of Colorado orogeny extends into the High Plains. So I need to search High Plains and find its elevation range.

Action 3
<search>High Plains</search>

Observation 3
High Plains refers to one of two distinct land regions

Thought 4
I need to instead search High Plains (United States).

Action 4
<search>High Plains (United States)</search>

Observation 4
The High Plains are a subregion of the Great Plains. From east to west, the High Plains rise in elevation from around 1,800 to 7,000 ft (550 to 2,130m).

Thought 5
High Plains rise in elevation from around 1,800 to 7,000 ft, so the answer is 1,800 to 7,000 ft.

Action 5
<finish>1,800 to 7,000 ft</finish>
"""


To capture a single step at a time, while ignoring any hallucinated Observation steps, you will use `stop_sequences` to end the generation process. The steps are `Thought,` `Action,` `Observation,` in that order.

In [24]:
question = """Question
Who was the youngest author listed on the transformers NLP paper?
"""

model = genai.GenerativeModel('gemini-1.5-flash-latest')
react_chat = model.start_chat()

# You will perform the Action, so generate up to, but not including, the Observation.
config = genai.GenerationConfig(stop_sequences=["\nObservation"])

resp = react_chat.send_message(
    [model_instructions, example1, example2, question],
    generation_config=config,
    request_options=retry_policy)
print(resp.text)

Thought 1
I need to search for the transformers NLP paper and look for the authors list. Then, I need to find the youngest author.

Action 1
<search>transformers NLP paper</search>



Now you can perform this research yourself and supply it back to the model.

In [25]:
observation = """Observation 1
[1706.03762] Attention Is All You Need
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.
"""
resp = react_chat.send_message(observation, generation_config=config, request_options=retry_policy)
print(resp.text)

Thought 2
The authors are listed in the first paragraph. I need to find the youngest author. I do not have enough information to determine the youngest author.

Action 2
<finish>I need more information to answer this question. I cannot determine the youngest author from the provided information.</finish> 



This process repeats until the <finish> action is reached. You can continue running this yourself if you like, or try the [Wikipedia example](https://github.com/google-gemini/cookbook/blob/main/examples/Search_Wikipedia_using_ReAct.ipynb) to see a fully automated ReAct system at work.