# Day 1 - Prompt Engineering

# Day 1 - Prompting

Welcome to the Kaggle 5-day Generative AI course!

This notebook will show you how to get started with the Gemini API and walk you through some of the example prompts and techniques that you can also read about in the Prompting whitepaper. You don't need to read the whitepaper to use this notebook, but the papers will give you some theoretical context and background to complement this interactive notebook.


## Before you begin

In this notebook, you'll start exploring prompting using the Python SDK and AI Studio. For some inspiration, you might enjoy exploring some apps that have been built using the Gemini family of models. Here are a few that we like, and we think you will too.

* [TextFX](https://textfx.withgoogle.com/) is a suite of AI-powered tools for rappers, made in collaboration with Lupe Fiasco,
* [SQL Talk](https://sql-talk-r5gdynozbq-uc.a.run.app/) shows how you can talk directly to a database using the Gemini API,
* [NotebookLM](https://notebooklm.google/) uses Gemini models to build your own personal AI research assistant.


## New for Gemini 2.0!

This course material was first launched in November 2024. The AI and LLM space is moving incredibly fast, so we have made some updates to use the latest models and capabilities.

* These codelabs have been updated to use the Gemini 2.0 family of models.
* The Python SDK has been updated from `google-generativeai` to the new, unified [`google-genai`](https://pypi.org/project/google-genai) SDK.
  * This new SDK works with both the developer Gemini API as well as Google Cloud Vertex AI, and switching is [as simple as changing some fields](https://pypi.org/project/google-genai/#:~:text=.Client%28%29-,API%20Selection,-By%20default%2C%20the).
* New model capabilities have been added to the relevant codelabs, such as "thinking mode" in this lab.
* Day 1 includes a new [Evaluation codelab](https://www.kaggle.com/code/markishere/day-1-evaluation-and-structured-output).

## Install SDK

In [1]:
!pip uninstall -qqy jupyterlab  # Remove unused packages from Kaggle's base image that conflict
!pip install -U -q "google-genai==1.7.0"

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m144.7/144.7 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[?25h

Import the SDK and some helpers for rendering the output.

In [1]:
from google import genai
from google.genai import types

from IPython.display import HTML, Markdown, display

Set up a retry helper. This allows us to "Run all" without worrying about per minute quota

In [2]:
from google.api_core import retry

is_retriable = lambda e: (isinstance(e, geni.errors.APIError) and e.code in {429, 503})

genai.models.Models.generate_content = retry.Retry(
    predicate = is_retriable)(genai.models.Models.generate_content)


# Set Up your API Key

To run the following cell, Your API Key must be stored in secrets named `GOOGLE_API_KEY`.


In [4]:
from google.colab import userdata
import os

os.environ["GOOGLE_API_KEY"] = userdata.get('GOOGLE_API_KEY')

## Run your first prompt

In [5]:
client = genai.Client(api_key = os.environ["GOOGLE_API_KEY"])

response = client.models.generate_content(
    model = "gemini-2.0-flash",
    contents = "Explain AI to me like I'm a kid."
)

print(response.text)

Imagine you have a really, really smart robot that can learn new things. That's kind of like AI!

It's like teaching a puppy to fetch. At first, the puppy doesn't know what "fetch" means. But you show it, you throw the ball, and when it brings it back, you give it a treat!

AI is similar. We give it lots and lots of information (like showing the puppy the ball over and over). Then, we tell it what's right and what's wrong (like giving the puppy a treat when it brings back the ball). 

Over time, the AI learns to do things on its own, like:

*   **Answering questions:** Like Siri or Alexa, but even smarter!
*   **Playing games:** It can learn the best way to win at games like chess or video games.
*   **Driving cars:** Some cars can even drive themselves now because of AI!
*   **Drawing pictures:** You can ask AI to draw a cat wearing a hat, and it will!

So, AI is like a really smart, learning machine that can help us do lots of cool things. It's not magic, but it's pretty amazing!



## The response will often comes back in markdown format, which you can render directly in this notebook

In [9]:
Markdown(response.text)

Imagine you have a really, really smart robot that can learn new things. That's kind of like AI!

It's like teaching a puppy to fetch. At first, the puppy doesn't know what "fetch" means. But you show it, you throw the ball, and when it brings it back, you give it a treat!

AI is similar. We give it lots and lots of information (like showing the puppy the ball over and over). Then, we tell it what's right and what's wrong (like giving the puppy a treat when it brings back the ball). 

Over time, the AI learns to do things on its own, like:

*   **Answering questions:** Like Siri or Alexa, but even smarter!
*   **Playing games:** It can learn the best way to win at games like chess or video games.
*   **Driving cars:** Some cars can even drive themselves now because of AI!
*   **Drawing pictures:** You can ask AI to draw a cat wearing a hat, and it will!

So, AI is like a really smart, learning machine that can help us do lots of cool things. It's not magic, but it's pretty amazing!


> ### The Previous example uses a single-turn, text-in/text-out structure, but you can also set up a multi-turn chat structure too.

## Start a Chat

In [18]:
chat = client.chats.create(model = "gemini-2.0-flash", history=[])
response = chat.send_message("Hello! My name is Pavan")
print(response.text)

Hello Pavan! It's nice to meet you. How can I help you today?



In [19]:
response = chat.send_message("Can you tell me something interesting about AI Agents?")
Markdown(response.text)

Okay, here's something interesting about AI Agents:

**AI Agents are starting to be designed with "intrinsic motivation," much like humans.**

Traditionally, AI agents were purely driven by extrinsic rewards – a programmed goal they were trying to achieve, like winning a game or completing a task. However, researchers are now exploring how to equip agents with *intrinsic* motivation, things like curiosity, novelty-seeking, and the desire to learn.

**Why is this interesting?**

*   **More Efficient Learning:** Intrinsic motivation can lead to more efficient and robust learning. Instead of simply optimizing for a specific reward, agents explore their environment more broadly, discover new skills, and become more adaptable to unexpected situations.
*   **Autonomous Discovery:** These agents can discover things on their own that their programmers never explicitly told them to look for. They can identify patterns and solve problems in creative ways.
*   **Human-Like Behavior:** Modeling intrinsic motivation makes AI agents seem more natural and believable. They're not just machines executing instructions; they appear to have a drive to learn and explore, which is fundamental to human intelligence.
*   **Potential for Unforeseen Consequences:**  (And this is where it gets really interesting... and potentially a little concerning!)  An agent driven by curiosity might, in theory, pursue knowledge or actions that aren't necessarily beneficial or aligned with human values if not properly designed and monitored.

**Examples of Intrinsic Motivation in AI Agents:**

*   **Curiosity-driven exploration:** An agent explores an environment to find "interesting" states, where "interesting" could be defined as states that are unpredictable or have high information content.
*   **Mastery:** An agent is motivated to master skills or learn new abilities, even if those skills aren't immediately useful for achieving a specific goal.

So, the shift towards incorporating intrinsic motivation is a fascinating development in AI research, potentially leading to more intelligent, adaptable, and human-like agents. However, it also raises important ethical and safety considerations.

Does that make sense, Pavan? Would you like to explore a specific aspect of this further?


> ### While you have the `chat` objective alive, the conversation state persists. Confirm that by asking if it knows the user's name.

In [21]:
response = chat.send_message("Do you rember what my name is?")
print(response.text)

Yes, I remember your name is Pavan.



### Choose a model

The Gemini API provides access to a number of models from the Gemini model family. Read about the available models and their capabilities on the [model overview page](https://ai.google.dev/gemini-api/docs/models/gemini).

In this step you'll use the API to list all of the available models.

In [30]:
for model in client.models.list():
    print(f"'{model.name}': '{model.description}'")

'models/chat-bison-001': 'A legacy text-only model optimized for chat conversations'
'models/text-bison-001': 'A legacy model that understands text and generates text as an output'
'models/embedding-gecko-001': 'Obtain a distributed representation of a text.'
'models/gemini-1.0-pro-vision-latest': 'The original Gemini 1.0 Pro Vision model version which was optimized for image understanding. Gemini 1.0 Pro Vision was deprecated on July 12, 2024. Move to a newer Gemini version.'
'models/gemini-pro-vision': 'The original Gemini 1.0 Pro Vision model version which was optimized for image understanding. Gemini 1.0 Pro Vision was deprecated on July 12, 2024. Move to a newer Gemini version.'
'models/gemini-1.5-pro-latest': 'Alias that points to the most recent production (non-experimental) release of Gemini 1.5 Pro, our mid-size multimodal model that supports up to 2 million tokens.'
'models/gemini-1.5-pro-001': 'Stable version of Gemini 1.5 Pro, our mid-size multimodal model that supports up 

In [31]:
from pprint import pprint

for model in client.models.list():
  if model.name == 'models/gemini-2.0-flash':
    pprint(model.to_json_dict())
    break

{'description': 'Gemini 2.0 Flash',
 'display_name': 'Gemini 2.0 Flash',
 'input_token_limit': 1048576,
 'name': 'models/gemini-2.0-flash',
 'output_token_limit': 8192,
 'supported_actions': ['generateContent', 'countTokens', 'createCachedContent'],
 'tuned_model_info': {},
 'version': '2.0'}


### Output length

When generating text with an LLM, the output length affects cost and performance. Generating more tokens increases computation, leading to higher energy consumption, latency, and cost.

To stop the model from generating tokens past a limit, you can specify the `max_output_tokens` parameter when using the Gemini API. Specifying this parameter does not influence the generation of the output tokens, so the output will not become more stylistically or textually succinct, but it will stop generating tokens once the specified length is reached. Prompt engineering may be required to generate a more complete output for your given limit.

In [33]:
from google.genai import types

short_config = types.GenerateContentConfig(max_output_tokens=200)

response = client.models.generate_content(
    model='gemini-2.0-flash',
    config=short_config,
    contents='Write a 1000 word essay on the importance of olives in modern society.')

Markdown(response.text)

## The Enduring Olive: A Cornerstone of Modern Society

The olive, a humble fruit borne from a gnarled and enduring tree, holds a significance far exceeding its small size and seemingly simple flavor. From its historical roots in the cradle of civilization to its continued presence in modern cuisine, medicine, and industry, the olive plays a pivotal, often understated, role in shaping societies worldwide. Its importance transcends geographical boundaries, cultural differences, and dietary trends, cementing its place as a vital component of our modern world.

One of the most obvious contributions of the olive lies in its culinary applications. Olive oil, extracted from the fruit, is a cornerstone of the Mediterranean diet, a dietary pattern renowned for its health benefits and widely adopted globally. Its rich, fruity flavor and versatility make it an essential ingredient in countless dishes, from simple salads to complex stews. Beyond flavor, olive oil boasts a remarkable nutritional profile, rich in monounsaturated fats, antioxidants, and anti-inflammatory compounds. These components

In [36]:
response = client.models.generate_content(
    model='gemini-2.0-flash',
    config=short_config,
    contents='Write a short poem on the importance of olives in modern society.')

Markdown(response.text)

From sun-baked groves to tables grand,
The humble olive, close at hand.
In oil it flows, a golden stream,
A culinary waking dream.

On pizzas perched, in salads bright,
A briny burst, a pure delight.
From tapas plates to martinis cool,
It adds a touch, a subtle rule.

A taste of health, a story told,
Of ancient roots, in days of old.
So raise a glass, to branches green,
The olive's worth, a vital scene.


### Temperature

Temperature controls the degree of randomness in token selection. Higher temperatures result in a higher number of candidate tokens from which the next output token is selected, and can produce more diverse results, while lower temperatures have the opposite effect, such that a temperature of 0 results in greedy decoding, selecting the most probable token at each step.

Temperature doesn't provide any guarantees of randomness, but it can be used to "nudge" the output somewhat.

In [37]:
high_temp_config = types.GenerateContentConfig(temperature=2.0)


for _ in range(5):
  response = client.models.generate_content(
      model='gemini-2.0-flash',
      config=high_temp_config,
      contents='Pick a random colour... (respond in a single word)')

  if response.text:
    print(response.text, '-' * 25)

Azure
 -------------------------
Cerulean.
 -------------------------
Purple
 -------------------------
Orange
 -------------------------
Orange
 -------------------------


Now try the same prompt with temperature set to zero. Note that the output is not completely deterministic, as other parameters affect token selection, but the results will tend to be more stable.

In [39]:
low_temp_config = types.GenerateContentConfig(temperature=0.0)

for _ in range(5):
  response = client.models.generate_content(
      model='gemini-2.0-flash',
      config=low_temp_config,
      contents='Pick a random colour... (respond in a single word)')

  if response.text:
    print(response.text, '-' * 25)

Azure
 -------------------------
Azure
 -------------------------
Azure
 -------------------------
Azure
 -------------------------
Azure
 -------------------------


### Top-P

Like temperature, the top-P parameter is also used to control the diversity of the model's output.

Top-P defines the probability threshold that, once cumulatively exceeded, tokens stop being selected as candidates. A top-P of 0 is typically equivalent to greedy decoding, and a top-P of 1 typically selects every token in the model's vocabulary.

You may also see top-K referenced in LLM literature. Top-K is not configurable in the Gemini 2.0 series of models, but can be changed in older models. Top-K is a positive integer that defines the number of most probable tokens from which to select the output token. A top-K of 1 selects a single token, performing greedy decoding.


Run this example a number of times, change the settings and observe the change in output.

In [40]:
model_config = types.GenerateContentConfig(
    # These are the default values for gemini-2.0-flash.
    temperature=1.0,
    top_p=0.95,
)

story_prompt = "You are a creative writer. Write a short story about a cat who goes on an adventure."
response = client.models.generate_content(
    model='gemini-2.0-flash',
    config=model_config,
    contents=story_prompt)

print(response.text)

Clementine, a ginger tabby with a perpetually surprised expression, considered herself a connoisseur of sunbeams. Her days were usually spent draped across sun-drenched windowsills, dreaming of chasing dust motes in sunlit galaxies. But today, the ordinary felt dull. The usual robin's song lacked its usual zest. A peculiar restlessness stirred in her furry belly.

It started with the scent. A wild, earthy aroma, unlike anything she'd encountered in her pampered life, wafted in through the crack of the back door. It smelled of damp earth, decaying leaves, and something…untamed.

Clementine hesitated. Outside the back door lay the Great Unknown – the garden. A terrifying expanse ruled by buzzing bees, rustling leaves that could be monstrous snakes, and the dreaded vacuum cleaner that lived in the shed. But the scent, oh, the scent! It tugged at a primal instinct she didn’t know she possessed.

With a deep breath, Clementine squeezed through the crack and found herself on a mossy stepping

## Prompting

This section contains some prompts from the chapter for you to try out directly in the API. Try changing the text here to see how each prompt performs with different instructions, more examples, or any other changes you can think of.

### Zero-shot

Zero-shot prompts are prompts that describe the request for the model directly.

<table align=left>
  <td>
    <a target="_blank" href="https://aistudio.google.com/prompts/1gzKKgDHwkAvexG5Up0LMtl1-6jKMKe4g"><img src="https://ai.google.dev/site-assets/images/marketing/home/icon-ais.png" style="height: 24px" height=24/> Open in AI Studio</a>
  </td>
</table>

In [41]:
model_config = types.GenerateContentConfig(
    temperature=0.1,
    top_p=1,
    max_output_tokens=5,
)

zero_shot_prompt = """Classify movie reviews as POSITIVE, NEUTRAL or NEGATIVE.
Review: "Her" is a disturbing study revealing the direction
humanity is headed if AI is allowed to keep evolving,
unchecked. I wish there were more movies like this masterpiece.
Sentiment: """

response = client.models.generate_content(
    model='gemini-2.0-flash',
    config=model_config,
    contents=zero_shot_prompt)

print(response.text)

POSITIVE



#### Enum mode

The models are trained to generate text, and while the Gemini 2.0 models are great at following instructions, other models can sometimes produce more text than you may wish for. In the preceding example, the model will output the label, but sometimes it can include a preceding "Sentiment" label, and without an output token limit, it may also add explanatory text afterwards. See [this prompt in AI Studio](https://aistudio.google.com/prompts/1gzKKgDHwkAvexG5Up0LMtl1-6jKMKe4g) for an example.

The Gemini API has an [Enum mode](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Enum.ipynb) feature that allows you to constrain the output to a fixed set of values.

In [42]:
import enum

class Sentiment(enum.Enum):
    POSITIVE = "positive"
    NEUTRAL = "neutral"
    NEGATIVE = "negative"


response = client.models.generate_content(
    model='gemini-2.0-flash',
    config=types.GenerateContentConfig(
        response_mime_type="text/x.enum",
        response_schema=Sentiment
    ),
    contents=zero_shot_prompt)

print(response.text)

positive


When using constrained output like an enum, the Python SDK will attempt to convert the model's text response into a Python object automatically. It's stored in the `response.parsed` field.

In [43]:
enum_response = response.parsed
print(enum_response)
print(type(enum_response))

Sentiment.POSITIVE
<enum 'Sentiment'>


### One-shot and few-shot

Providing an example of the expected response is known as a "one-shot" prompt. When you provide multiple examples, it is a "few-shot" prompt.

<table align=left>
  <td>
    <a target="_blank" href="https://aistudio.google.com/prompts/1jjWkjUSoMXmLvMJ7IzADr_GxHPJVV2bg"><img src="https://ai.google.dev/site-assets/images/marketing/home/icon-ais.png" style="height: 24px" height=24/> Open in AI Studio</a>
  </td>
</table>


In [44]:
few_shot_prompt = """Parse a customer's pizza order into valid JSON:

EXAMPLE:
I want a small pizza with cheese, tomato sauce, and pepperoni.
JSON Response:
```
{
"size": "small",
"type": "normal",
"ingredients": ["cheese", "tomato sauce", "pepperoni"]
}
```

EXAMPLE:
Can I get a large pizza with tomato sauce, basil and mozzarella
JSON Response:
```
{
"size": "large",
"type": "normal",
"ingredients": ["tomato sauce", "basil", "mozzarella"]
}
```

ORDER:
"""

customer_order = "Give me a large with cheese & pineapple"

response = client.models.generate_content(
    model='gemini-2.0-flash',
    config=types.GenerateContentConfig(
        temperature=0.1,
        top_p=1,
        max_output_tokens=250,
    ),
    contents=[few_shot_prompt, customer_order])

print(response.text)

```json
{
"size": "large",
"type": "normal",
"ingredients": ["cheese", "pineapple"]
}
```



#### JSON mode

To provide control over the schema, and to ensure that you only receive JSON (with no other text or markdown), you can use the Gemini API's [JSON mode](https://github.com/google-gemini/cookbook/blob/main/quickstarts/JSON_mode.ipynb). This forces the model to constrain decoding, such that token selection is guided by the supplied schema.

In [46]:
import typing_extensions as typing

class PizzaOrder(typing.TypedDict):
    size: str
    ingredients: list[str]
    type: str


response = client.models.generate_content(
    model='gemini-2.0-flash',
    config=types.GenerateContentConfig(
        temperature=0.1,
        response_mime_type="application/json",
        response_schema=PizzaOrder,
    ),
    contents="Can I have a large dessert pizza with apple and chocolate")

print(response.text)

{
  "size": "large",
  "ingredients": ["apple", "chocolate"],
  "type": "dessert"
}


### Chain of Thought (CoT)

Direct prompting on LLMs can return answers quickly and (in terms of output token usage) efficiently, but they can be prone to hallucination. The answer may "look" correct (in terms of language and syntax) but is incorrect in terms of factuality and reasoning.

Chain-of-Thought prompting is a technique where you instruct the model to output intermediate reasoning steps, and it typically gets better results, especially when combined with few-shot examples. It is worth noting that this technique doesn't completely eliminate hallucinations, and that it tends to cost more to run, due to the increased token count.

Models like the Gemini family are trained to be "chatty" or "thoughtful" and will provide reasoning steps without prompting, so for this simple example you can ask the model to be more direct in the prompt to force a non-reasoning response. Try re-running this step if the model gets lucky and gets the answer correct on the first try.

In [47]:
prompt = """When I was 4 years old, my partner was 3 times my age. Now, I
am 20 years old. How old is my partner? Return the answer directly."""

response = client.models.generate_content(
    model='gemini-2.0-flash',
    contents=prompt)

print(response.text)

52



Now try the same approach, but indicate to the model that it should "think step by step".

In [48]:
prompt = """When I was 4 years old, my partner was 3 times my age. Now,
I am 20 years old. How old is my partner? Let's think step by step."""

response = client.models.generate_content(
    model='gemini-2.0-flash',
    contents=prompt)

Markdown(response.text)

Here's how to solve this step-by-step:

1.  **Find the age difference:** When you were 4, your partner was 3 times your age, so they were 4 * 3 = 12 years old.
2.  **Calculate the age gap:** The age difference between you and your partner is 12 - 4 = 8 years.
3.  **Determine current partner's age:** Since your partner is 8 years older than you, and you are now 20, your partner is currently 20 + 8 = 28 years old.

**Answer:** Your partner is 28 years old.


### ReAct: Reason and act

In this example you will run a ReAct prompt directly in the Gemini API and perform the searching steps yourself. As this prompt follows a well-defined structure, there are frameworks available that wrap the prompt into easier-to-use APIs that make tool calls automatically, such as the LangChain example from the "Prompting" whitepaper.

To try this out with the Wikipedia search engine, check out the [Searching Wikipedia with ReAct](https://github.com/google-gemini/cookbook/blob/main/examples/Search_Wikipedia_using_ReAct.ipynb) cookbook example.


> Note: The prompt and in-context examples used here are from [https://github.com/ysymyth/ReAct](https://github.com/ysymyth/ReAct) which is published under an [MIT license](https://opensource.org/licenses/MIT), Copyright (c) 2023 Shunyu Yao.

<table align=left>
  <td>
    <a target="_blank" href="https://aistudio.google.com/prompts/18oo63Lwosd-bQ6Ay51uGogB3Wk3H8XMO"><img src="https://ai.google.dev/site-assets/images/marketing/home/icon-ais.png" style="height: 24px" height=24/> Open in AI Studio</a>
  </td>
</table>


In [49]:
model_instructions = """
Solve a question answering task with interleaving Thought, Action, Observation steps. Thought can reason about the current situation,
Observation is understanding relevant information from an Action's output and Action can be one of three types:
 (1) <search>entity</search>, which searches the exact entity on Wikipedia and returns the first paragraph if it exists. If not, it
     will return some similar entities to search and you can try to search the information from those topics.
 (2) <lookup>keyword</lookup>, which returns the next sentence containing keyword in the current context. This only does exact matches,
     so keep your searches short.
 (3) <finish>answer</finish>, which returns the answer and finishes the task.
"""

example1 = """Question
Musician and satirist Allie Goertz wrote a song about the "The Simpsons" character Milhouse, who Matt Groening named after who?

Thought 1
The question simplifies to "The Simpsons" character Milhouse is named after who. I only need to search Milhouse and find who it is named after.

Action 1
<search>Milhouse</search>

Observation 1
Milhouse Mussolini Van Houten is a recurring character in the Fox animated television series The Simpsons voiced by Pamela Hayden and created by Matt Groening.

Thought 2
The paragraph does not tell who Milhouse is named after, maybe I can look up "named after".

Action 2
<lookup>named after</lookup>

Observation 2
Milhouse was named after U.S. president Richard Nixon, whose middle name was Milhous.

Thought 3
Milhouse was named after U.S. president Richard Nixon, so the answer is Richard Nixon.

Action 3
<finish>Richard Nixon</finish>
"""

example2 = """Question
What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?

Thought 1
I need to search Colorado orogeny, find the area that the eastern sector of the Colorado orogeny extends into, then find the elevation range of the area.

Action 1
<search>Colorado orogeny</search>

Observation 1
The Colorado orogeny was an episode of mountain building (an orogeny) in Colorado and surrounding areas.

Thought 2
It does not mention the eastern sector. So I need to look up eastern sector.

Action 2
<lookup>eastern sector</lookup>

Observation 2
The eastern sector extends into the High Plains and is called the Central Plains orogeny.

Thought 3
The eastern sector of Colorado orogeny extends into the High Plains. So I need to search High Plains and find its elevation range.

Action 3
<search>High Plains</search>

Observation 3
High Plains refers to one of two distinct land regions

Thought 4
I need to instead search High Plains (United States).

Action 4
<search>High Plains (United States)</search>

Observation 4
The High Plains are a subregion of the Great Plains. From east to west, the High Plains rise in elevation from around 1,800 to 7,000 ft (550 to 2,130m).

Thought 5
High Plains rise in elevation from around 1,800 to 7,000 ft, so the answer is 1,800 to 7,000 ft.

Action 5
<finish>1,800 to 7,000 ft</finish>
"""

# Come up with more examples yourself, or take a look through https://github.com/ysymyth/ReAct/

To capture a single step at a time, while ignoring any hallucinated Observation steps, you will use `stop_sequences` to end the generation process. The steps are `Thought`, `Action`, `Observation`, in that order.

In [50]:
question = """Question
Who was the youngest author listed on the transformers NLP paper?
"""

# You will perform the Action; so generate up to, but not including, the Observation.
react_config = types.GenerateContentConfig(
    stop_sequences=["\nObservation"],
    system_instruction=model_instructions + example1 + example2,
)

# Create a chat that has the model instructions and examples pre-seeded.
react_chat = client.chats.create(
    model='gemini-2.0-flash',
    config=react_config,
)

resp = react_chat.send_message(question)
print(resp.text)

Thought 1
I need to find the transformers NLP paper and then find the youngest author listed on the paper.

Action 1
<search>transformers NLP paper</search>



Now you can perform this research yourself and supply it back to the model.

In [51]:
observation = """Observation 1
[1706.03762] Attention Is All You Need
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.
"""
resp = react_chat.send_message(observation)
print(resp.text)

Thought 2
The paper is titled "Attention Is All You Need" and the authors are Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. I need to find the youngest author listed on the paper. I will have to search each of them to find out their age.

Action 2
<search>Ashish Vaswani age</search>



This process repeats until the `<finish>` action is reached. You can continue running this yourself if you like, or try the [Wikipedia example](https://github.com/google-gemini/cookbook/blob/main/examples/Search_Wikipedia_using_ReAct.ipynb) to see a fully automated ReAct system at work.

## Thinking mode

The experiemental Gemini Flash 2.0 "Thinking" model has been trained to generate the "thinking process" the model goes through as part of its response. As a result, the Flash Thinking model is capable of stronger reasoning capabilities in its responses.

Using a "thinking mode" model can provide you with high-quality responses without needing specialised prompting like the previous approaches. One reason this technique is effective is that you induce the model to generate relevant information ("brainstorming", or "thoughts") that is then used as part of the context in which the final response is generated.

Note that when you use the API, you get the final response from the model, but the thoughts are not captured. To see the intermediate thoughts, try out [the thinking mode model in AI Studio](https://aistudio.google.com/prompts/new_chat?model=gemini-2.0-flash-thinking-exp-01-21).

<table align=left>
  <td>
    <a target="_blank" href="https://aistudio.google.com/prompts/1Z991SV7lZZZqioOiqIUPv9a9ix-ws4zk"><img src="https://ai.google.dev/site-assets/images/marketing/home/icon-ais.png" style="height: 24px" height=24/> Open in AI Studio</a>
  </td>
</table>

In [52]:
import io
from IPython.display import Markdown, clear_output


response = client.models.generate_content_stream(
    model='gemini-2.0-flash-thinking-exp',
    contents='Who was the youngest author listed on the transformers NLP paper?',
)

buf = io.StringIO()
for chunk in response:
    buf.write(chunk.text)
    # Display the response as it is streamed
    print(chunk.text, end='')

# And then render the finished response as formatted markdown.
clear_output()
Markdown(buf.getvalue())

Based on available information, the youngest author listed on the "Attention is All You Need" paper (the Transformer paper) is likely **Aidan N. Gomez**.

Here's why:

* **Aidan N. Gomez** was a PhD student at the University of Toronto at the time of the paper's publication. PhD students are generally younger than research scientists and other established researchers.
* The other authors were mostly researchers at Google Brain and other established institutions, suggesting they were further along in their careers and likely older than a PhD student.

While we don't have the exact birthdates of all authors publicly available to definitively confirm age, being a PhD student at the time strongly indicates that Aidan N. Gomez was likely the youngest author on the paper.

## Code prompting

### Generating code

The Gemini family of models can be used to generate code, configuration and scripts. Generating code can be helpful when learning to code, learning a new language or for rapidly generating a first draft.

It's important to be aware that since LLMs can make mistakes, and can repeat training data, it's essential to read and test your code first, and comply with any relevant licenses.

<table align=left>
  <td>
    <a target="_blank" href="https://aistudio.google.com/prompts/1YX71JGtzDjXQkgdes8bP6i3oH5lCRKxv"><img src="https://ai.google.dev/site-assets/images/marketing/home/icon-ais.png" style="height: 24px" height=24/> Open in AI Studio</a>
  </td>
</table>

In [53]:
# The Gemini models love to talk, so it helps to specify they stick to the code if that
# is all that you want.
code_prompt = """
Write a Python function to calculate the factorial of a number. No explanation, provide only the code.
"""

response = client.models.generate_content(
    model='gemini-2.0-flash',
    config=types.GenerateContentConfig(
        temperature=1,
        top_p=1,
        max_output_tokens=1024,
    ),
    contents=code_prompt)

Markdown(response.text)

```python
def factorial(n):
  """
  Calculate the factorial of a non-negative integer.
  """
  if n == 0:
    return 1
  else:
    result = 1
    for i in range(1, n + 1):
      result *= i
    return result
```


### Code execution

The Gemini API can automatically run generated code too, and will return the output.

<table align=left>
  <td>
    <a target="_blank" href="https://aistudio.google.com/prompts/11veFr_VYEwBWcLkhNLr-maCG0G8sS_7Z"><img src="https://ai.google.dev/site-assets/images/marketing/home/icon-ais.png" style="height: 24px" height=24/> Open in AI Studio</a>
  </td>
</table>

In [54]:
from pprint import pprint

config = types.GenerateContentConfig(
    tools=[types.Tool(code_execution=types.ToolCodeExecution())],
)

code_exec_prompt = """
Generate the first 14 odd prime numbers, then calculate their sum.
"""

response = client.models.generate_content(
    model='gemini-2.0-flash',
    config=config,
    contents=code_exec_prompt)

for part in response.candidates[0].content.parts:
  pprint(part.to_json_dict())
  print("-----")

{'text': 'Okay, I can do that. First, I need to generate the first 14 odd '
         'prime numbers. Remember that a prime number is a number greater than '
         '1 that has no positive divisors other than 1 and itself. The first '
         'few prime numbers are 2, 3, 5, 7, 11, and so on. Since the question '
         'asks for odd prime numbers, I will exclude 2.\n'
         '\n'
         "Then, I'll sum those prime numbers up.\n"
         '\n'}
-----
{'executable_code': {'code': 'primes = [3, 5, 7, 11, 13, 17, 19, 23, 29, 31, '
                             '37, 41, 43, 47]\n'
                             'sum_of_primes = sum(primes)\n'
                             "print(f'{primes=}')\n"
                             "print(f'{sum_of_primes=}')\n",
                     'language': 'PYTHON'}}
-----
{'code_execution_result': {'outcome': 'OUTCOME_OK',
                           'output': 'primes=[3, 5, 7, 11, 13, 17, 19, 23, 29, '
                                     '31, 37, 41, 43

This response contains multiple parts, including an opening and closing text part that represent regular responses, an `executable_code` part that represents generated code and a `code_execution_result` part that represents the results from running the generated code.

You can explore them individually.

In [55]:
for part in response.candidates[0].content.parts:
    if part.text:
        display(Markdown(part.text))
    elif part.executable_code:
        display(Markdown(f'```python\n{part.executable_code.code}\n```'))
    elif part.code_execution_result:
        if part.code_execution_result.outcome != 'OUTCOME_OK':
            display(Markdown(f'## Status {part.code_execution_result.outcome}'))

        display(Markdown(f'```\n{part.code_execution_result.output}\n```'))

Okay, I can do that. First, I need to generate the first 14 odd prime numbers. Remember that a prime number is a number greater than 1 that has no positive divisors other than 1 and itself. The first few prime numbers are 2, 3, 5, 7, 11, and so on. Since the question asks for odd prime numbers, I will exclude 2.

Then, I'll sum those prime numbers up.



```python
primes = [3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
sum_of_primes = sum(primes)
print(f'{primes=}')
print(f'{sum_of_primes=}')

```

```
primes=[3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
sum_of_primes=326

```

The first 14 odd prime numbers are 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, and 47. Their sum is 326.


### Explaining code

The Gemini family of models can explain code to you too. In this example, you pass a [bash script](https://github.com/magicmonty/bash-git-prompt) and ask some questions.

<table align=left>
  <td>
    <a target="_blank" href="https://aistudio.google.com/prompts/1N7LGzWzCYieyOf_7bAG4plrmkpDNmUyb"><img src="https://ai.google.dev/site-assets/images/marketing/home/icon-ais.png" style="height: 24px" height=24/> Open in AI Studio</a>
  </td>
</table>

In [56]:
file_contents = !curl https://raw.githubusercontent.com/magicmonty/bash-git-prompt/refs/heads/master/gitprompt.sh

explain_prompt = f"""
Please explain what this file does at a very high level. What is it, and why would I use it?

```
{file_contents}
```
"""

response = client.models.generate_content(
    model='gemini-2.0-flash',
    contents=explain_prompt)

Markdown(response.text)

This file is a Bash script designed to enhance your command-line prompt with information about the Git repository you're currently working in. It customizes the prompt to display things like the current branch, the status of your working directory (e.g., staged changes, untracked files), and how your local branch relates to the remote repository.

Here's a breakdown:

*   **Git Status Display:**  The primary function is to show you the current state of your Git repository directly in your prompt. This includes branch name, whether you have uncommitted changes, if you're ahead/behind the remote, etc.  This makes it much easier to keep track of your Git status without constantly running `git status`.
*   **Customization:** It allows for extensive customization of the prompt's appearance through themes and configuration options.  You can control colors, symbols, and the information displayed. You can define your own themes to make the prompt look exactly how you want it.
*   **Asynchronous Operations:** To avoid slowing down your shell, it uses asynchronous operations to fetch remote Git status in the background.  This means the prompt update won't block your typing while it's checking for remote changes.
*   **Cross-Shell Compatibility:** The script attempts to work in both Bash and Zsh shells, handling the differences in their syntax.
*   **Virtual Environment Awareness:** The script is also aware of python virtual environments and will display what venv you're in, in your prompt as well.

**Why you would use it:**

*   **Improved Workflow:** It provides immediate, at-a-glance information about your Git repository, improving your workflow by reducing the need to run `git status` constantly.
*   **Enhanced Productivity:** Knowing the status of your Git repository directly in the prompt allows you to make quicker decisions about your next steps.
*   **Customizable Appearance:** You can tailor the prompt's appearance to match your personal preferences and workflow.
*   **Convenience:** It automates the process of displaying Git information, saving you time and effort.

In essence, this script aims to make working with Git repositories from the command line more efficient and informative by providing a dynamic and customizable prompt that reflects the current Git status. You would typically include/source this script in your `.bashrc` (or `.zshrc`) file to have the prompt automatically updated whenever you open a new terminal.
