## IMPORTING THE REQUIRED LIBRARIES

In [56]:
# pip install -U -q "google-generativeai>=0.8.3"

In [2]:
import google.generativeai as genai
from IPython.display import HTML, Markdown, display

## Setup the Gemini API Key

In [5]:
from dotenv import load_dotenv
import os

# Load the .env file
load_dotenv()

# Access environment variables
GOOGLE_API_KEY = os.getenv('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)

In [59]:
flash = genai.GenerativeModel('gemini-1.5-flash')
response = flash.generate_content('Provide a brief summary in a single paragraph about the Cricket')
print(response.text)

Cricket is a bat-and-ball game played between two teams of eleven players on a field at the centre of which is a 22-yard pitch with a wicket at each end.  One team bats, attempting to score runs by hitting a ball bowled by the opposing team, while the fielding team tries to prevent runs and dismiss batsmen.  The game can range from a few hours (Twenty20) to several days (Test matches), with the objective being to score more runs than the opposition.  Variations in format and rules exist, making it a globally diverse and popular sport.



The response comes back as a markdown, which we can always render to the notebook.

In [10]:
Markdown(response.text)

Cricket is a bat-and-ball game played between two teams of eleven players on a field at the centre of which is a 22-yard pitch with a wicket at each end.  One team bats, attempting to score runs by hitting a ball bowled by the opposing team, while the fielding team tries to prevent this by getting the batters "out" and taking wickets.  The game can vary in length, from Twenty20 matches lasting a few hours to Test matches spanning five days.  The team with the most runs at the end wins.


## Start a Chat
The previous example sets up a single input/single output interaction. We may also start a Chat session with a LLM

In [11]:
chat = flash.start_chat(history=[])
response = chat.send_message("Hello")
print(response.text)

Hello there! How can I help you today?



In [12]:
response = chat.send_message("Nataka kujua kuhusu mpira wa Tanzania")
print(response.text)

Nataka kuelewa unachomaanisha na "mpira wa Tanzania".  Je, unamaanisha:

* **Mpira wa miguu (soka) nchini Tanzania?**  Kama ndio, naweza kuzungumzia kuhusu ligi kuu ya Tanzania (ligi kuu bara), timu za taifa (Taifa Stars), wachezaji maarufu wa Kitanzania, historia ya mpira wa miguu nchini Tanzania, au maendeleo ya mchezo huo.

* **Mchezo mwingine wa mpira?**  Kama kuna mchezo mwingine wa mpira unaofikiria, tafadhali nieleze.

Tafadhali fafanua swali lako ili niweze kukupa majibu sahihi zaidi.



In [13]:
response = chat.send_message("Ligi Kuu ya Tanzania")
print(response.text)

Ligi Kuu ya Tanzania, pia inajulikana kama Ligi Kuu Bara, ni ligi ya juu zaidi ya mpira wa miguu nchini Tanzania.  Hapa kuna baadhi ya mambo muhimu kuhusu ligi hiyo:

* **Muundo:** Ligi ina timu 16 zinazocheza dhidi ya kila timu nyingine mara mbili (nyumbani na ugenini), jumla ya michezo 30 kwa kila timu.

* **Washindi:** Timu mbalimbali zimeshinda ligi hiyo kwa miaka mingi, ikiwa ni pamoja na Simba SC na Young Africans SC, ambazo ni timu zenye ushindani mkubwa na historia ndefu.

* **Viwanja:**  Michezo huchezwa katika viwanja mbalimbali nchini kote,  ikiwa na viwanja vikubwa kama vile Uwanja wa Benjamin Mkapa (Dar es Salaam) na Uwanja wa Azam Complex (Chamazi).

* **Changamoto:** Kama ligi nyingi za Afrika, Ligi Kuu Bara inakabiliwa na changamoto mbalimbali, ikiwa ni pamoja na:  ufadhili, miundombinu ya viwanja, uamuzi wa waamuzi, na masuala ya uongozi.

* **Ukuaji:** Licha ya changamoto hizo, ligi inaendelea kukua na kuvutia mashabiki wengi. Kuna juhudi zinazofanywa kuboresha ubora 

In [14]:
response = chat.send_message("Hapana kwa sasa, inatosha")
print(response.text)

Sawa.  Kama utahitaji maelezo zaidi kuhusu Ligi Kuu ya Tanzania au mpira wa miguu nchini Tanzania kwa ujumla, tafadhali usisite kuniuliza.



## Choose a model
The Gemini API provides access to a number of models from the Gemini model family. Read about the available models and their capabilities on the model overview page.

In this step you'll use the API to list all of the available models.

In [16]:
for model in genai.list_models():
    print(model.name)

models/chat-bison-001
models/text-bison-001
models/embedding-gecko-001
models/gemini-1.0-pro-latest
models/gemini-1.0-pro
models/gemini-pro
models/gemini-1.0-pro-001
models/gemini-1.0-pro-vision-latest
models/gemini-pro-vision
models/gemini-1.5-pro-latest
models/gemini-1.5-pro-001
models/gemini-1.5-pro-002
models/gemini-1.5-pro
models/gemini-1.5-pro-exp-0801
models/gemini-1.5-pro-exp-0827
models/gemini-1.5-flash-latest
models/gemini-1.5-flash-001
models/gemini-1.5-flash-001-tuning
models/gemini-1.5-flash
models/gemini-1.5-flash-exp-0827
models/gemini-1.5-flash-002
models/gemini-1.5-flash-8b
models/gemini-1.5-flash-8b-001
models/gemini-1.5-flash-8b-latest
models/gemini-1.5-flash-8b-exp-0827
models/gemini-1.5-flash-8b-exp-0924
models/learnlm-1.5-pro-experimental
models/gemini-exp-1114
models/gemini-exp-1121
models/embedding-001
models/text-embedding-004
models/aqa


In [20]:
models = [model for model in genai.list_models()]
len(models)

32

In [22]:
for model in genai.list_models():
    if(model.name=='models/gemini-1.5-flash'):
        print(model)
        break

Model(name='models/gemini-1.5-flash',
      base_model_id='',
      version='001',
      display_name='Gemini 1.5 Flash',
      description=('Alias that points to the most recent stable version of Gemini 1.5 Flash, our '
                   'fast and versatile multimodal model for scaling across diverse tasks.'),
      input_token_limit=1000000,
      output_token_limit=8192,
      supported_generation_methods=['generateContent', 'countTokens'],
      temperature=1.0,
      max_temperature=2.0,
      top_p=0.95,
      top_k=40)


## Exploring Generation Parameters
### Output length
When generating text with an LLM, the output length affects cost and performance. Generating more tokens increases computation, leading to higher energy consumption, latency, and cost.

To stop the model from generating tokens past a limit, you can specify the max_output_tokens parameter when using the Gemini API. Specifying this parameter does not influence the generation of the output tokens, so the output will not become more stylistically or textually succinct, but it will stop generating tokens once the specified length is reached. Prompt engineering may be required to generate a more complete output for your given limit.

In [30]:
short_model = genai.GenerativeModel(
    'gemini-1.5-flash',
    generation_config=genai.GenerationConfig(max_output_tokens=100))

response = short_model.generate_content('Write a 1000 word essay on the importance of olives in modern society.')
print(response.text)

## The Enduring Significance of Olives in Modern Society

The olive, *Olea europaea*, a seemingly unassuming fruit, holds a position of profound importance in modern society that extends far beyond its culinary applications.  Its impact reverberates across economic, environmental, cultural, and even health spheres, shaping landscapes, livelihoods, and traditions worldwide.  Understanding the significance of olives requires acknowledging its multifaceted contributions, from its economic power as a global commodity to its deep-seated cultural relevance and emerging role in


### Temperature¶
Temperature controls the degree of randomness in token selection. Higher temperatures result in a higher number of candidate tokens from which the next output token is selected, and can produce more diverse results, while lower temperatures have the opposite effect, such that a temperature of 0 results in greedy decoding, selecting the most probable token at each step.

Temperature doesn't provide any guarantees of randomness, but it can be used to "nudge" the output somewhat.

Note that if you see a 429 Resource Exhausted error here, you may be able to edit the words in the prompt slightly to progress.

In [36]:
from google.api_core import retry

high_temp_model = genai.GenerativeModel(
    'gemini-1.5-flash',
    generation_config=genai.GenerationConfig(temperature=2))


# When running lots of queries, it's a good practice to use a retry policy so your code
# automatically retries when hitting Resource Exhausted (quota limit) errors.
retry_policy = {
    "retry": retry.Retry(predicate=retry.if_transient_error, initial=10, multiplier=1.5, timeout=300)
}

for _ in range(5):
  response = high_temp_model.generate_content('Pick a random colour... (respond in a single word)',
                                              request_options=retry_policy)
  if response.parts:
    print(response.text, '-' * 25)


Maroon
 -------------------------
Purple
 -------------------------
Purple
 -------------------------
Purple
 -------------------------
Maroon
 -------------------------


### Top-K and top-P¶
Like temperature, top-K and top-P parameters are also used to control the diversity of the model's output.

Top-K is a positive integer that defines the number of most probable tokens from which to select the output token. A top-K of 1 selects a single token, performing greedy decoding.

Top-P defines the probability threshold that, once cumulatively exceeded, tokens stop being selected as candidates. A top-P of 0 is typically equivalent to greedy decoding, and a top-P of 1 typically selects every token in the model's vocabulary.

When both are supplied, the Gemini API will filter top-K tokens first, then top-P and then finally sample from the candidate tokens using the supplied temperature.

Run this example a number of times, change the settings and observe the change in output.

In [43]:
model = genai.GenerativeModel(
    'gemini-1.5-flash-001',
    generation_config=genai.GenerationConfig(
        # These are the default values for gemini-1.5-flash-001.
        temperature=1.0,
        top_k=64,
        top_p=0.95,
    ))

story_prompt = "You are a creative writer. Write a short story about a cat who goes on an adventure."
response = model.generate_content(story_prompt, request_options=retry_policy)
print(response.text)

Bartholomew was a cat of simple pleasures. He enjoyed sunbeams, naps, and the occasional chase after a rogue dust bunny. His life, however, was about to be turned upside down by a simple, yet audacious, act: a misplaced tuna can. 

It all started with a rumbling stomach and a mischievous glint in Bartholomew's eye. He had managed to snag a tuna can lid off the counter, leaving the can precariously balanced on the edge. A slight nudge, a playful paw swipe, and the can tumbled to the floor, rolling away with a satisfying clatter.

Bartholomew, ever the opportunist, followed the can, his nose twitching with anticipation. He was met with the sight of the can rolling under a heavy, old bookcase, a gateway to a world beyond his familiar living room. Intrigued, he squeezed under the bookcase, feeling a thrill of adventure.

The world beyond the bookcase was a labyrinth of forgotten objects and dusty shadows. Bartholomew, fueled by curiosity and tuna-induced hunger, pressed on. He dodged forgo

## Prompting


### Zero Shot Prompting
Zero shot prompts are prompts that describes the request to the model directly.

In [44]:
model = genai.GenerativeModel(
    'gemini-1.5-flash-001',
    generation_config=genai.GenerationConfig(
        temperature=0.1,
        top_p=1,
        max_output_tokens=5,
    ))

zero_shot_prompt = """Classify movie reviews as POSITIVE, NEUTRAL or NEGATIVE.
Review: "Her" is a disturbing study revealing the direction
humanity is headed if AI is allowed to keep evolving,
unchecked. I wish there were more movies like this masterpiece.
Sentiment: """

response = model.generate_content(zero_shot_prompt, request_options=retry_policy)
print(response.text)

Sentiment: **POSITIVE**


### Enum mode¶
The models are trained to generate text, and can sometimes produce more text than you may wish for. In the preceding example, the model will output the label, sometimes it can include a preceding "Sentiment" label, and without an output token limit, it may also add explanatory text afterwards.

The Gemini API has an Enum mode feature that allows you to constrain the output to a fixed set of values.

In [45]:
import enum

class Sentiment(enum.Enum):
    POSITIVE = "positive"
    NEUTRAL = "neutral"
    NEGATIVE = "negative"


model = genai.GenerativeModel(
    'gemini-1.5-flash-001',
    generation_config=genai.GenerationConfig(
        response_mime_type="text/x.enum",
        response_schema=Sentiment
    ))

response = model.generate_content(zero_shot_prompt, request_options=retry_policy)
print(response.text)

positive


### One-shot and few-shot
Providing an example of the expected response is known as a "one-shot" prompt. When you provide multiple examples, it is a "few-shot" prompt.

In [46]:
model = genai.GenerativeModel(
    'gemini-1.5-flash-latest',
    generation_config=genai.GenerationConfig(
        temperature=0.1,
        top_p=1,
        max_output_tokens=250,
    ))

few_shot_prompt = """Parse a customer's pizza order into valid JSON:

EXAMPLE:
I want a small pizza with cheese, tomato sauce, and pepperoni.
JSON Response:
```
{
"size": "small",
"type": "normal",
"ingredients": ["cheese", "tomato sauce", "peperoni"]
}
```

EXAMPLE:
Can I get a large pizza with tomato sauce, basil and mozzarella
JSON Response:
```
{
"size": "large",
"type": "normal",
"ingredients": ["tomato sauce", "basil", "mozzarella"]
}

ORDER:
"""

customer_order = "Give me a large with cheese & pineapple"


response = model.generate_content([few_shot_prompt, customer_order], request_options=retry_policy)
print(response.text)

```json
{
  "size": "large",
  "type": "normal",
  "ingredients": ["cheese", "pineapple"]
}
```



### JSON mode¶
To provide control over the schema, and to ensure that you only receive JSON (with no other text or markdown), you can use the Gemini API's JSON mode. This forces the model to constrain decoding, such that token selection is guided by the supplied schema.

In [47]:
import typing_extensions as typing

class PizzaOrder(typing.TypedDict):
    size: str
    ingredients: list[str]
    type: str


model = genai.GenerativeModel(
    'gemini-1.5-flash-latest',
    generation_config=genai.GenerationConfig(
        temperature=0.1,
        response_mime_type="application/json",
        response_schema=PizzaOrder,
    ))

response = model.generate_content("Can I have a large dessert pizza with apple and chocolate")
print(response.text)


{"ingredients": ["apple", "chocolate"], "size": "large", "type": "dessert pizza"}


### Chain of Thought (CoT)
Direct prompting on LLMs can return answers quickly and (in terms of output token usage) efficiently, but they can be prone to hallucination. The answer may "look" correct (in terms of language and syntax) but is incorrect in terms of factuality and reasoning.

Chain-of-Thought prompting is a technique where you instruct the model to output intermediate reasoning steps, and it typically gets better results, especially when combined with few-shot examples. It is worth noting that this technique doesn't completely eliminate hallucinations, and that it tends to cost more to run, due to the increased token count.

As models like the Gemini family are trained to be "chatty" and provide reasoning steps, you can ask the model to be more direct in the prompt.

In [48]:
prompt = """When I was 4 years old, my partner was 3 times my age. Now, I
am 20 years old. How old is my partner? Return the answer directly."""

model = genai.GenerativeModel('gemini-1.5-flash-latest')
response = model.generate_content(prompt, request_options=retry_policy)

print(response.text)

47



Now try the same approach, but indicate to the model that it should "think step by step".

In [49]:
prompt = """When I was 4 years old, my partner was 3 times my age. Now,
I am 20 years old. How old is my partner? Let's think step by step."""

response = model.generate_content(prompt, request_options=retry_policy)
print(response.text)

Step 1: Find the partner's age when you were 4.

* Your partner was 3 times your age, so they were 3 * 4 = 12 years old.

Step 2: Find the age difference between you and your partner.

* The age difference is 12 - 4 = 8 years.

Step 3: Calculate your partner's current age.

* You are now 20 years old.
* Your partner is 8 years older than you, so they are 20 + 8 = 28 years old.

Therefore, your partner is now 28 years old.



### ReAct: Reason and act¶
In this example you will run a ReAct prompt directly in the Gemini API and perform the searching steps yourself. As this prompt follows a well-defined structure, there are frameworks available that wrap the prompt into easier-to-use APIs that make tool calls automatically, such as the LangChain example from the chapter.

To try this out with the Wikipedia search engine, check out the Searching Wikipedia with ReAct cookbook example.

Note: The prompt and in-context examples used here are from https://github.com/ysymyth/ReAct which is published under a MIT license, Copyright (c) 2023 Shunyu Yao.

In [50]:
model_instructions = """
Solve a question answering task with interleaving Thought, Action, Observation steps. Thought can reason about the current situation,
Observation is understanding relevant information from an Action's output and Action can be one of three types:
 (1) <search>entity</search>, which searches the exact entity on Wikipedia and returns the first paragraph if it exists. If not, it
     will return some similar entities to search and you can try to search the information from those topics.
 (2) <lookup>keyword</lookup>, which returns the next sentence containing keyword in the current context. This only does exact matches,
     so keep your searches short.
 (3) <finish>answer</finish>, which returns the answer and finishes the task.
"""

example1 = """Question
Musician and satirist Allie Goertz wrote a song about the "The Simpsons" character Milhouse, who Matt Groening named after who?

Thought 1
The question simplifies to "The Simpsons" character Milhouse is named after who. I only need to search Milhouse and find who it is named after.

Action 1
<search>Milhouse</search>

Observation 1
Milhouse Mussolini Van Houten is a recurring character in the Fox animated television series The Simpsons voiced by Pamela Hayden and created by Matt Groening.

Thought 2
The paragraph does not tell who Milhouse is named after, maybe I can look up "named after".

Action 2
<lookup>named after</lookup>

Observation 2
Milhouse was named after U.S. president Richard Nixon, whose middle name was Milhous.

Thought 3
Milhouse was named after U.S. president Richard Nixon, so the answer is Richard Nixon.

Action 3
<finish>Richard Nixon</finish>
"""

example2 = """Question
What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?

Thought 1
I need to search Colorado orogeny, find the area that the eastern sector of the Colorado orogeny extends into, then find the elevation range of the area.

Action 1
<search>Colorado orogeny</search>

Observation 1
The Colorado orogeny was an episode of mountain building (an orogeny) in Colorado and surrounding areas.

Thought 2
It does not mention the eastern sector. So I need to look up eastern sector.

Action 2
<lookup>eastern sector</lookup>

Observation 2
The eastern sector extends into the High Plains and is called the Central Plains orogeny.

Thought 3
The eastern sector of Colorado orogeny extends into the High Plains. So I need to search High Plains and find its elevation range.

Action 3
<search>High Plains</search>

Observation 3
High Plains refers to one of two distinct land regions

Thought 4
I need to instead search High Plains (United States).

Action 4
<search>High Plains (United States)</search>

Observation 4
The High Plains are a subregion of the Great Plains. From east to west, the High Plains rise in elevation from around 1,800 to 7,000 ft (550 to 2,130m).

Thought 5
High Plains rise in elevation from around 1,800 to 7,000 ft, so the answer is 1,800 to 7,000 ft.

Action 5
<finish>1,800 to 7,000 ft</finish>
"""

# Come up with more examples yourself, or take a look through https://github.com/ysymyth/ReAct/

To capture a single step at a time, while ignoring any hallucinated Observation steps, you will use stop_sequences to end the generation process. The steps are Thought, Action, Observation, in that order.

In [51]:
question = """Question
Who was the youngest author listed on the transformers NLP paper?
"""

model = genai.GenerativeModel('gemini-1.5-flash-latest')
react_chat = model.start_chat()

# You will perform the Action, so generate up to, but not including, the Observation.
config = genai.GenerationConfig(stop_sequences=["\nObservation"])

resp = react_chat.send_message(
    [model_instructions, example1, example2, question],
    generation_config=config,
    request_options=retry_policy)
print(resp.text)

Thought 1
I need to find the Transformers NLP paper and then find the authors' ages to determine the youngest.  This will require multiple steps. First, I need to find the paper.

Action 1
<search>Transformers NLP paper</search>



Now you can perform this research yourself and supply it back to the model.

In [52]:
observation = """Observation 1
[1706.03762] Attention Is All You Need
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.
"""
resp = react_chat.send_message(observation, generation_config=config, request_options=retry_policy)
print(resp.text)

Thought 2
The observation provides the authors of the paper "Attention is All You Need".  I don't have their ages, so I cannot answer the question directly.  I need to find another way to determine their ages.  Finding their ages directly might be difficult.  I'll try a different approach.


Action 2
<finish>I cannot answer this question. The provided text gives the authors of the paper, but does not provide their ages.</finish>



## Code prompting
### Generating code
The Gemini family of models can be used to generate code, configuration and scripts. Generating code can be helpful when learning to code, learning a new language or for rapidly generating a first draft.

It's important to be aware that since LLMs can't reason, and can repeat training data, it's essential to read and test your code first, and comply with any relevant licenses.

In [53]:
model = genai.GenerativeModel(
    'gemini-1.5-flash-latest',
    generation_config=genai.GenerationConfig(
        temperature=1,
        top_p=1,
        max_output_tokens=1024,
    ))

# Gemini 1.5 models are very chatty, so it helps to specify they stick to the code.
code_prompt = """
Write a Python function to calculate the factorial of a number. No explanation, provide only the code.
"""

response = model.generate_content(code_prompt, request_options=retry_policy)
Markdown(response.text)

```python
def factorial(n):
  if n == 0:
    return 1
  else:
    return n * factorial(n-1)
```


#### Code execution¶
The Gemini API can automatically run generated code too, and will return the output.

In [54]:
model = genai.GenerativeModel(
    'gemini-1.5-flash-latest',
    tools='code_execution',)

code_exec_prompt = """
Calculate the sum of the first 14 prime numbers. Only consider the odd primes, and make sure you count them all.
"""

response = model.generate_content(code_exec_prompt, request_options=retry_policy)
Markdown(response.text)

To calculate the sum of the first 14 odd prime numbers, I need to first identify those primes.  Odd primes are prime numbers that are not equal to 2.  Let's use Python to generate these numbers.



``` python
def is_prime(n):
    """Checks if a number is prime."""
    if n <= 1:
        return False
    for i in range(2, int(n**0.5) + 1):
        if n % i == 0:
            return False
    return True

count = 0
number = 1
sum_of_primes = 0
while count < 14:
    number += 2 # Start from 3 and increment by 2 to only check odd numbers
    if is_prime(number):
        sum_of_primes += number
        count += 1

print(f"The sum of the first 14 odd prime numbers is: {sum_of_primes}")


```
```
The sum of the first 14 odd prime numbers is: 326

```
Therefore, the sum of the first 14 odd prime numbers is 326.


### Explaining code
The Gemini family of models can explain code to you too.

In [55]:
file_contents = !curl https://raw.githubusercontent.com/magicmonty/bash-git-prompt/refs/heads/master/gitprompt.sh

explain_prompt = f"""
Please explain what this file does at a very high level. What is it, and why would I use it?

```
{file_contents}
```
"""

model = genai.GenerativeModel('gemini-1.5-flash-latest')

response = model.generate_content(explain_prompt, request_options=retry_policy)
Markdown(response.text)

This file is a bash script that enhances your shell prompt to display information about your current Git repository.  Think of it as a highly customizable Git status indicator integrated directly into your terminal prompt.

**What it does:**

At a high level, the script adds to your shell prompt:

* **Branch name:** Shows the current Git branch you're working on.
* **Status indicators:**  Indicates whether the repository has uncommitted changes, staged changes, conflicts, untracked files, etc., often using color-coding for visual clarity.
* **Upstream tracking:**  If your branch is tracking a remote branch, it shows how many commits you're ahead or behind.
* **Optional features:** It offers many options for customization, including themes, symbols, and the inclusion of username/repo information.

**Why you would use it:**

You'd use this script if you frequently work with Git and want a quick and visual way to see the status of your repositories without needing to run `git status` every time. It improves your workflow by providing essential Git information at a glance.  The customization allows it to be tailored to your preferences and terminal setup.
