In [1]:
# Copyright 2025 Narges Kurkani

# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# 
#     http://www.apache.org/licenses/LICENSE-2.0
# 
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

In [2]:
# Note:
# This notebook is based on the original GenAI notebook from Google (Apache 2.0 License).
# I have added some description and changes.

# 1. Prompt Engineering

You don’t need to be a data scientist or a machine learning engineer – everyone can write a prompt. Prompt engineering is the process of designing high-quality prompts that guide LLMs to produce accurate outputs.
LLM output configuration: 
- Output length: the number of tokens to generate in a response. 
- Sampling controls: LLMs predict probabilities for what the next token could be.Those token probabilities are then sampled to determine what the next produced token will be.
- Temperature: Temperature controls the degree of randomness in token selection. Lower temperatures are good for prompts that expect a more deterministic response, while higher temperatures can lead to more diverse or unexpected results.
- Top-K: Top-K sampling selects the top K most likely tokens from the model’s predicted distribution. 
- Top-P: Top-P sampling selects the top tokens whose cumulative probability does not exceed a certain value (P). Values for P range from 0 (greedy decoding) to 1 (all tokens in the LLM’s vocabulary).

NOTE: With more freedom (higher temperature, top-K, top-P, and output tokens), the LLM might generate text that is less relevant.

## 1.1. Install the SDK

All of the exercises in this notebook will use the Gemini API. So install google-genai firstly.

In [None]:
!pip install -U -q "google-genai==1.7.0"

In [2]:
from google import genai
from google.genai import types

from IPython.display import HTML, Markdown, display

## 1.2. Set up API key

You can grab API key from [AI Studio](https://aistudio.google.com/app/apikey). And then choose creat API key and save it in GOOGLE_API_KEY. Free API key is available in Google AI studio.

In [None]:
GOOGLE_API_KEY = 'API_Key'

client = genai.Client(api_key=GOOGLE_API_KEY)

## 1.3. Choose A Model

The Gemini API provides access to a number of models from the Gemini model family. Using following Command You can know about models of Gemini API.

In [3]:
for model in client.models.list():
  print(model.name)

models/chat-bison-001
models/text-bison-001
models/embedding-gecko-001
models/gemini-1.0-pro-vision-latest
models/gemini-pro-vision
models/gemini-1.5-pro-latest
models/gemini-1.5-pro-001
models/gemini-1.5-pro-002
models/gemini-1.5-pro
models/gemini-1.5-flash-latest
models/gemini-1.5-flash-001
models/gemini-1.5-flash-001-tuning
models/gemini-1.5-flash
models/gemini-1.5-flash-002
models/gemini-1.5-flash-8b
models/gemini-1.5-flash-8b-001
models/gemini-1.5-flash-8b-latest
models/gemini-1.5-flash-8b-exp-0827
models/gemini-1.5-flash-8b-exp-0924
models/gemini-2.5-pro-exp-03-25
models/gemini-2.0-flash-exp
models/gemini-2.0-flash
models/gemini-2.0-flash-001
models/gemini-2.0-flash-exp-image-generation
models/gemini-2.0-flash-lite-001
models/gemini-2.0-flash-lite
models/gemini-2.0-flash-lite-preview-02-05
models/gemini-2.0-flash-lite-preview
models/gemini-2.0-pro-exp
models/gemini-2.0-pro-exp-02-05
models/gemini-exp-1206
models/gemini-2.0-flash-thinking-exp-01-21
models/gemini-2.0-flash-thinking

## 1.4. Explore generation parameters


### 1.4.1. Output Length

When generating text with an LLM, the length of the output impacts both cost and performance. More tokens require greater computational resources, resulting in increased energy consumption, latency, and expenses. You can specify number of output length using mac_output_tokens variable.

In [4]:
from google.genai import types

short_config = types.GenerateContentConfig(max_output_tokens=200)

response = client.models.generate_content(
    model='gemini-2.0-flash',
    config=short_config,
    contents='Write a 600 word essay on the importance of Fruits in modern life.')

print(response.text)

## The Sweet Sustenance: Why Fruits are Indispensable in Modern Life

In the hustle and bustle of modern life, where processed foods and quick fixes often dominate dietary choices, the importance of fruits is frequently overlooked. Yet, these naturally occurring gifts of nature are far more than just a sweet treat. They are vital contributors to our health, well-being, and overall quality of life, offering a plethora of benefits that are indispensable in navigating the challenges of a fast-paced and often unhealthy world.

One of the most compelling reasons to prioritize fruits in modern diets is their unparalleled nutritional value. Fruits are brimming with essential vitamins and minerals, vital for maintaining optimal bodily functions. Vitamin C, abundant in citrus fruits and berries, strengthens the immune system and protects against oxidative stress. Vitamin A, found in mangoes and papayas, promotes healthy vision and skin. Potassium, plentiful in bananas and avocados, helps regula

### 1.4.2. Temprature

Temperature adjusts the randomness in token selection. Higher values increase diversity by considering more candidate tokens, while lower values make outputs more deterministic, with 0 resulting in greedy decoding. Though it doesn't guarantee randomness, temperature helps steer the output in a desired direction.

In [5]:
high_temp_config = types.GenerateContentConfig(temperature=2.0)


for _ in range(5):
  response = client.models.generate_content(
      model='gemini-2.0-flash',
      config=high_temp_config,
      contents='Pick a random colour... (respond in a single word)')

  if response.text:
    print(response.text, '-' * 25)

Magenta
 -------------------------
Azure
 -------------------------
Magenta
 -------------------------
Purple
 -------------------------
Purple
 -------------------------


In [6]:
low_temp_config = types.GenerateContentConfig(temperature=0.0)

for _ in range(5):
  response = client.models.generate_content(
      model='gemini-2.0-flash',
      config=low_temp_config,
      contents='Pick a random colour... (respond in a single word)')

  if response.text:
    print(response.text, '-' * 25)

Azure
 -------------------------
Azure
 -------------------------
Azure
 -------------------------
Azure
 -------------------------
Azure
 -------------------------


### 1.4.3. Top-P

Like temperature, the top-P parameter controls output diversity by setting a probability threshold for token selection. A top-P of 0 results in greedy decoding, while 1 includes all possible tokens. Top-K, which is not adjustable in Gemini 2.0 models, limits selection to the K most probable tokens, with K=1 also leading to greedy decoding.

In [6]:
content_config = types.GenerateContentConfig(
    temperature=0.8,  
    top_p=0.9,        
)

story_prompt = "You are an imaginative writer. Create a captivating short story about a curious dog embarking on an adventure."

response = client.models.generate_content(
    model='gemini-2.0-flash',
    config=content_config, 
    contents=story_prompt  
)

print(response.text)

Barnaby wasn't your average Beagle. Sure, he loved belly rubs and chasing squirrels, but beneath his floppy ears and wagging tail resided a spirit of boundless curiosity. He yearned for more than the familiar scent of Mrs. Higgins' lavender and the predictability of his backyard. He craved adventure, a quest worthy of his magnificent nose.

One Tuesday, while Mrs. Higgins was distracted by a particularly juicy episode of "Gardening Gurus," Barnaby noticed it. A glint of gold, nestled beneath the ancient oak tree. Not the usual lost button or discarded bottle cap, but something…different.

He nudged it with his nose. It was a tiny, tarnished key. Not a house key, not a car key, but a key that hummed with the faintest whisper of magic. Barnaby, never one to back down from a good mystery, knew he had to find the lock it opened.

His investigation began immediately. He sniffed the key, tracing its faint metallic scent. It led him to the edge of the woods bordering Mrs. Higgins' garden. He'

## 1.5. Prompting

Large language models (LLMs) can generate text that seems remarkably human-like, but guiding them to produce specific responses can be challenging. To achieve more accurate or desired outputs, you can use various prompting techniques to direct the model's behavior more effectively. Here are a few techniques that can help you get closer to the responses you want from LLMs:


1. A zero-shot Prompt      
2. Enum Mode  
3. A one-shot and few shot prompt     
4. Json Mode  
5. Chain of Thoughts  
6. ReAct: Reason and Act
7. Thinking Mode  

### 1.5.1. Zero Shot

Zero-shot prompting is when an LLM generates a response to a task without any prior examples, relying solely on its pre-trained knowledge.

In [8]:
model_config = types.GenerateContentConfig(
    temperature=0.1,
    top_p=1,
    max_output_tokens=5,
)

zero_shot_prompt = """Classify movie reviews as POSITIVE, NEUTRAL or NEGATIVE.
Review: "Her" is a disturbing study revealing the direction
humanity is headed if AI is allowed to keep evolving,
unchecked. I wish there were more movies like this masterpiece.
Sentiment: """

response = client.models.generate_content(
    model='gemini-2.0-flash',
    config=model_config,
    contents=zero_shot_prompt)

print(response.text)

POSITIVE



### 1.5.2. Enum Mode

Enum mode in the Gemini API helps constrain model output to a predefined set of values, preventing unnecessary text generation.

In [9]:
import enum

class Sentiment(enum.Enum):
    POSITIVE = "positive"
    NEUTRAL = "neutral"
    NEGATIVE = "negative"


response = client.models.generate_content(
    model='gemini-2.0-flash',
    config=types.GenerateContentConfig(
        response_mime_type="text/x.enum",
        response_schema=Sentiment
    ),
    contents=zero_shot_prompt)

print(response.text)

positive


### 1.5.3. One- Shot and few- Shot

Providing one example of the expected response is known as a "one-shot" prompt. When you provide multiple examples, it is a "few-shot" prompt.

In [7]:
few_shot_prompt = """Parse a customer's pizza order into valid JSON:

EXAMPLE:
I want a small pizza with cheese, tomato sauce, and pepperoni.
JSON Response:
```
{
"size": "small",
"type": "normal",
"ingredients": ["cheese", "tomato sauce", "pepperoni"]
}
```

EXAMPLE:
Can I get a large pizza with tomato sauce, basil and mozzarella
JSON Response:
```
{
"size": "large",
"type": "normal",
"ingredients": ["tomato sauce", "basil", "mozzarella"]
}
```

ORDER:
"""

customer_order = "Give me a large with cheese & pineapple"

response = client.models.generate_content(
    model='gemini-2.0-flash',
    config=types.GenerateContentConfig(
        temperature=0.1,
        top_p=1,
        max_output_tokens=250,
    ),
    contents=[few_shot_prompt, customer_order])

print(response.text)

```json
{
"size": "large",
"type": "normal",
"ingredients": ["cheese", "pineapple"]
}
```



### 1.5.4. JSON Mode

JSON mode in the Gemini API enforces structured output by restricting token selection to a predefined schema, ensuring responses are in pure JSON format. 

In [8]:
import typing_extensions as typing

class PizzaOrder(typing.TypedDict):
    size: str
    ingredients: list[str]
    type: str


response = client.models.generate_content(
    model='gemini-2.0-flash',
    config=types.GenerateContentConfig(
        temperature=0.1,
        response_mime_type="application/json",
        response_schema=PizzaOrder,
    ),
    contents="Can I have a large dessert pizza with apple and chocolate")

print(response.text)

{
  "size": "large",
  "ingredients": ["apple", "chocolate"],
  "type": "dessert"
}


### 1.5.5. Chain of Thought (CoT)

Direct prompting in LLMs offers quick and efficient responses but can lead to hallucinations, where the answer seems correct but is factually or logically flawed. Chain-of-Thought prompting, which involves instructing the model to show intermediate reasoning, generally improves accuracy, especially with few-shot examples. However, it doesn't eliminate hallucinations and tends to be more costly due to increased token usage. Models like Gemini are naturally "chatty" and provide reasoning by default, so you may need to prompt for more direct responses if needed.

In [14]:
prompt = """When I was 4 years old, my partner was 3 times my age. Now, I
am 20 years old. How old is my partner? Return the answer directly."""

response = client.models.generate_content(
    model='gemini-2.0-flash',
    contents=prompt)

print(response.text)

52



In [15]:
prompt = """When I was 4 years old, my partner was 3 times my age. Now,
I am 20 years old. How old is my partner? Let's think step by step."""

response = client.models.generate_content(
    model='gemini-2.0-flash',
    contents=prompt)

Markdown(response.text)

Here's how to solve the problem:

1.  When you were 4, your partner was 3 times your age, so they were 4 * 3 = 12 years old.
2.  This means your partner is 12 - 4 = 8 years older than you.
3.  Since you are now 20, your partner is 20 + 8 = 28 years old.

So the answer is 28.

### 1.5.6. ReAct: Reason and act

In prompt engineering, react, reason, and act refer to specific steps to guide the behavior of language models:

React: This is about setting the model up to respond quickly and appropriately to the user’s input, often based on the context and instructions provided. It focuses on immediate output without extensive reasoning.

Reason: This step involves prompting the model to explain its thought process, typically through Chain-of-Thought prompting, where the model breaks down the task into logical steps. This can improve accuracy and reliability by providing intermediate reasoning before the final output.

Act: This focuses on instructing the model to take action based on the reasoning or directly execute the task. It emphasizes practical application of the model's understanding to produce a result, whether that's generating text, making a decision, or performing a task.

In [10]:
model_instructions = """
Solve a question answering task with interleaving Thought, Action, Observation steps. Thought can reason about the current situation,
Observation is understanding relevant information from an Action's output and Action can be one of three types:
 (1) <search>entity</search>, which searches the exact entity on Wikipedia and returns the first paragraph if it exists. If not, it
     will return some similar entities to search and you can try to search the information from those topics.
 (2) <lookup>keyword</lookup>, which returns the next sentence containing keyword in the current context. This only does exact matches,
     so keep your searches short.
 (3) <finish>answer</finish>, which returns the answer and finishes the task.
"""

example1 = """Question
Musician and satirist Allie Goertz wrote a song about the "The Simpsons" character Milhouse, who Matt Groening named after who?

Thought 1
The question simplifies to "The Simpsons" character Milhouse is named after who. I only need to search Milhouse and find who it is named after.

Action 1
<search>Milhouse</search>

Observation 1
Milhouse Mussolini Van Houten is a recurring character in the Fox animated television series The Simpsons voiced by Pamela Hayden and created by Matt Groening.

Thought 2
The paragraph does not tell who Milhouse is named after, maybe I can look up "named after".

Action 2
<lookup>named after</lookup>

Observation 2
Milhouse was named after U.S. president Richard Nixon, whose middle name was Milhous.

Thought 3
Milhouse was named after U.S. president Richard Nixon, so the answer is Richard Nixon.

Action 3
<finish>Richard Nixon</finish>
"""

example2 = """Question
What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?

Thought 1
I need to search Colorado orogeny, find the area that the eastern sector of the Colorado orogeny extends into, then find the elevation range of the area.

Action 1
<search>Colorado orogeny</search>

Observation 1
The Colorado orogeny was an episode of mountain building (an orogeny) in Colorado and surrounding areas.

Thought 2
It does not mention the eastern sector. So I need to look up eastern sector.

Action 2
<lookup>eastern sector</lookup>

Observation 2
The eastern sector extends into the High Plains and is called the Central Plains orogeny.

Thought 3
The eastern sector of Colorado orogeny extends into the High Plains. So I need to search High Plains and find its elevation range.

Action 3
<search>High Plains</search>

Observation 3
High Plains refers to one of two distinct land regions

Thought 4
I need to instead search High Plains (United States).

Action 4
<search>High Plains (United States)</search>

Observation 4
The High Plains are a subregion of the Great Plains. From east to west, the High Plains rise in elevation from around 1,800 to 7,000 ft (550 to 2,130m).

Thought 5
High Plains rise in elevation from around 1,800 to 7,000 ft, so the answer is 1,800 to 7,000 ft.

Action 5
<finish>1,800 to 7,000 ft</finish>
"""

In [11]:
question = """Question
Who was the youngest author listed on the transformers NLP paper?
"""

# You will perform the Action; so generate up to, but not including, the Observation.
react_config = types.GenerateContentConfig(
    stop_sequences=["\nObservation"],
    system_instruction=model_instructions + example1 + example2,
)

# Create a chat that has the model instructions and examples pre-seeded.
react_chat = client.chats.create(
    model='gemini-2.0-flash',
    config=react_config,
)

resp = react_chat.send_message(question)
print(resp.text)

Thought 1
I need to find the transformers NLP paper and then find the list of authors. After that, I need to find the youngest author.

Action 1
<search>transformers NLP paper</search>



In [12]:
observation = """Observation 1
[1706.03762] Attention Is All You Need
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.
"""
resp = react_chat.send_message(observation)
print(resp.text)

Thought 2
I have the list of authors: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. Now I need to find their age. Since I don't have access to their ages, I should search each name along with the keyword "age" or "birthdate" to determine their birthdates and thus, their age when the paper was published (2017). Let's start with the first author.

Action 2
<search>Ashish Vaswani age</search>



### 1.5.7. Thinking mode

The Gemini Flash 2.0 "Thinking" model is designed to generate its reasoning process as part of its response, enhancing its logical capabilities without requiring specialized prompting. This approach improves response quality by incorporating intermediate "thoughts" into the final output.

In [None]:
import io

response = client.models.generate_content_stream(
    model='gemini-2.0-flash-thinking-exp',
    contents='Who was the youngest author listed on the transformers NLP paper?',
)

buf = io.StringIO()
for chunk in response:
    buf.write(chunk.text)
    # Display the response as it is streamed
    print(chunk.text, end='')


The youngest author listed on the original Transformers NLP paper, "Attention is All You Need," is **Aidan N. Gomez**.

While determining the exact birthdates of all authors to definitively say who is *absolutely* the youngest is difficult without personal information, Aidan N. Gomez was a PhD student at the University of Oxford at the time of publication (2017).  PhD students are typically younger than researchers in more senior positions at companies like Google Research (where many of the other authors were affiliated).

Based on publicly available information and typical career paths, it's highly likely that **Aidan N. Gomez** was the youngest author on the paper.

## 1.6. Code Prompting

### 1.6.1. Generating Code

The Gemini family of models can be used to generate code, configuration and scripts. Generating code can be helpful when learning to code, learning a new language or for rapidly generating a first draft

In [22]:
# The Gemini models love to talk, so it helps to specify they stick to the code if that
# is all that you want.
code_prompt = """
Write a Python function to calculate the factorial of a number. No explanation, provide only the code.
"""

response = client.models.generate_content(
    model='gemini-2.0-flash',
    config=types.GenerateContentConfig(
        temperature=1,
        top_p=1,
        max_output_tokens=1024,
    ),
    contents=code_prompt)

Markdown(response.text)

```python
def factorial(n):
  if n == 0:
    return 1
  else:
    return n * factorial(n-1)
```


### 1.6.2. Code Execution

The Gemini API can automatically run generated code too, and will return the output.


In [23]:
from pprint import pprint

config = types.GenerateContentConfig(
    tools=[types.Tool(code_execution=types.ToolCodeExecution())],
)

code_exec_prompt = """
Generate the first 14 odd prime numbers, then calculate their sum.
"""

response = client.models.generate_content(
    model='gemini-2.0-flash',
    config=config,
    contents=code_exec_prompt)

for part in response.candidates[0].content.parts:
  pprint(part.to_json_dict())
  print("-----")

{'text': 'Okay, I can do that. First, I need to generate the first 14 odd '
         "prime numbers. I'll start by listing prime numbers and excluding "
         'even numbers (except for 2, but since we want *odd* primes, we will '
         "skip 2). Then, I'll sum them up.\n"
         '\n'
         "Here's the list of prime numbers, and I'll pick out the first 14 odd "
         'ones: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, '
         '59...\n'
         '\n'
         'The first 14 odd prime numbers are: 3, 5, 7, 11, 13, 17, 19, 23, 29, '
         '31, 37, 41, 43, 47.\n'
         '\n'
         "Now, let's calculate their sum using a python tool.\n"
         '\n'}
-----
{'executable_code': {'code': 'numbers = [3, 5, 7, 11, 13, 17, 19, 23, 29, 31, '
                             '37, 41, 43, 47]\n'
                             'sum_of_numbers = sum(numbers)\n'
                             'print(sum_of_numbers)\n'
                             '\n',
                   

### 1.6.3. Explaining code


The Gemini family of models can explain code to you too.

In [24]:
file_contents = !curl https://raw.githubusercontent.com/magicmonty/bash-git-prompt/refs/heads/master/gitprompt.sh

explain_prompt = f"""
Please explain what this file does at a very high level. What is it, and why would I use it?

```
{file_contents}
```
"""

response = client.models.generate_content(
    model='gemini-2.0-flash',
    contents=explain_prompt)

Markdown(response.text)

This file, `git-prompt.sh`, is a shell script designed to enhance your command-line prompt with information about the current Git repository.

**What it does:**

*   **Displays Git Status:** Shows the current branch, whether there are uncommitted changes (staged, unstaged, untracked), if your branch is ahead or behind the remote, and other relevant Git status information.
*   **Customization:** It allows you to customize the appearance of the Git information in your prompt through theming and variable settings. You can change the colors, symbols, and even what information is displayed.
*   **Asynchronous Fetching:** It can optionally fetch remote branch information in the background, so your prompt doesn't lag while waiting for the `git fetch` command.
*   **Virtualenv Awareness:** It can also display information about activated Python virtual environments.
*   **Works with Bash and Zsh:** The script aims to be compatible with both Bash and Zsh shells.

**Why you would use it:**

*   **Improved Workflow:**  The visual Git status in your prompt makes it easier to keep track of the state of your repositories, reducing the need to constantly run `git status`.
*   **Quick Glance Information:** At a glance, you can see the branch you're on, if you have changes to commit, and whether you need to pull or push.
*   **Enhanced Productivity:** By providing immediate feedback on your Git status, it can speed up your development workflow.

In essence, `git-prompt.sh` is a tool to make your command-line prompt more informative and Git-aware, leading to a more efficient and less error-prone development experience.  You would typically source this script in your `.bashrc` or `.zshrc` file to activate its features in your shell.
