<a href="https://colab.research.google.com/github/RDGopal/IB9LQ0-GenAI/blob/main/Google_LLMs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Working with Google Gemini Models

Prerequisites: You need an API key from [Google AI Studio](https://aistudio.google.com/apikey). Everything can be done on the free tier.

Acknowledgement: [Patrickloeber](https://github.com/patrickloeber/workshop-build-with-gemini/blob/main/README.md)

#Setup

In [None]:
%pip install -q -U google-genai

In [None]:
from google.colab import userdata
GOOGLE_API_KEY = userdata.get('Google_API')

In [None]:
from google import genai
from google.genai import types

client = genai.Client(api_key=GOOGLE_API_KEY)

Models available: [Models](https://ai.google.dev/gemini-api/docs/models)

In [None]:
MODEL = "gemini-2.0-flash"

In [None]:
from IPython.display import Markdown

# Prompting


In [None]:
response = client.models.generate_content(
    model=MODEL,
    contents="Tell me three funny dad jokes"
)
display(Markdown(response.text))

##List of Prompts

In [None]:
response = client.models.generate_content(
    model=MODEL,
    contents=["Find three best Chinese restaurants","city=Birmingham"]
)
display(Markdown(response.text))

##Streaming response
By default, the model returns a response after completing the entire text generation process. You can achieve faster interactions by using streaming to return outputs as they are generated.

In [None]:
response = client.models.generate_content_stream(
    model=MODEL,
    contents=["Explain how Variational Autoencoders Work"]
)

for chunk in response:
    display(Markdown(chunk.text))

##Parameters

Every prompt you send to the model includes parameters that control how the model generates responses. You can configure these parameters, or let the model use the default options.

* `max_output_tokens`: Sets the maximum number of tokens to include in a candidate.
* `temperature`: Controls the randomness of the output. Use higher values for more creative responses, and lower values for more deterministic responses. Values can range from [0.0, 2.0].
* `top_p`: Changes how the model selects tokens for output. Tokens are selected from the most to least probable until the sum of their probabilities equals the top_p value.
^ `top_k`: Changes how the model selects tokens for output. A top_k of 1 means the selected token is the most probable among all the tokens in the model's vocabulary, while a top_k of 3 means that the next token is selected from among the 3 most probable using the temperature. Tokens are further filtered based on top_p with the final token selected using temperature sampling.
* `stop_sequences`: List of strings (up to 5) that tells the model to stop generating text if one of the strings is encountered in the response. If specified, the API will stop at the first appearance of a stop sequence.
* `seed`: If specified, the model makes a best effort to provide the same response for repeated requests. By default, a random number is used.

In [None]:
response = client.models.generate_content(
    model=MODEL,
    contents=["Explain Gaussian Splatting"],
    config=types.GenerateContentConfig(
        max_output_tokens=100,
        temperature=1.0,
        top_p=0.95,
        top_k=40,
        stop_sequences=None,
        seed=1234,
    )
)
display(Markdown(response.text))

##Long context and token counting

Gemini 2.0 Flash and 2.5 Pro have a 1M token context window.

In practice, 1 million tokens could look like:

50,000 lines of code (with the standard 80 characters per line)
All the text messages you have sent in the last 5 years
8 average length English novels
1 hour of video data
Let's feed in an entire book and ask questions:

In [None]:
import requests
res = requests.get("https://gutenberg.org/cache/epub/16317/pg16317.txt")
book = res.text

In [None]:
display(Markdown(book[:300]))

In [None]:
display(Markdown(f"# characters {len(book)}"))
display(Markdown(f"# words {len(book.split())}"))
display(Markdown(f"# tokens: ~{int(len(book.split()) * 4/3)}"))


In [None]:
prompt = f"""Summarize the book.

Book:
{book}
"""

response = client.models.generate_content(
    model=MODEL,
    contents=prompt
)

display(Markdown(response.text))

To understand the token usage, you can check usage_metadata:

In [None]:
display(Markdown(f"{response.usage_metadata.candidates_token_count}"))  # output
display(Markdown(f"{response.usage_metadata.prompt_token_count}"))   # input
display(Markdown(f"{response.usage_metadata.total_token_count}"))   # total

You can also use count_tokens to check the size of your input prompt(s):

In [None]:
res = client.models.count_tokens(model=MODEL, contents=prompt)
display(Markdown(f"{res}"))

##Chat with a book !!


Create a chat
Use a system prompt: "You are an expert book reviewer with a witty tone."
Use a temperature of 1.5
Ask 1 to summarize the book
Ask 1 question to explain more detail about a certain topic from the book
Ask to create a social media post based on the book
Print the total number of tokens used during the chat

In [None]:
chat = client.chats.create(
    model=MODEL,
    config=types.GenerateContentConfig(
        system_instruction="You are an expert book reviewer with a smart and funny tone.",
        temperature=1.5
    )
)

prompt = f"""Summarize the book in 10 bullet points.

Book:
{book}
"""

response = chat.send_message(prompt)
display(Markdown(response.text))

In [None]:
response = chat.send_message("Create a linkedin post with 1 or 2 key insighs from the book. Keep the tone casual and make it inspirational")
display(Markdown(response.text))

#Multimodality



##Image understanding
Gemini models are able to process and understand images. You can prompt Gemini to describe, caption, and answer questions about images, and use it for object detection.

In [None]:
!curl -o image.jpg "https://storage.googleapis.com/generativeai-downloads/images/Cupcakes.jpg"

In [None]:
from PIL import Image
image = Image.open("image.jpg")
print(image.size)
image

In [None]:
response = client.models.generate_content(
    model=MODEL,
    contents=["What is this image?", image])

display(Markdown(response.text))

###Your turn
Tell Gemini to describe the image
Then asked Gemini for a recipe to bake this item. Include item names and quantities for the recipe.

In [None]:
!curl -o croissant.jpg "https://storage.googleapis.com/generativeai-downloads/images/croissant.jpg"
image2 = Image.open("croissant.jpg")
image2

##Video
Gemini models are able to process videos. The 1M context window support up to approximately an hour of video data.

The Gemini API and AI Studio support YouTube URLs as a file data Part. You can include a YouTube URL with a prompt asking the model to summarize, translate, or otherwise interact with the video content.

In [None]:
youtube_url = "https://youtu.be/LlWDx0LSDok"

response = client.models.generate_content(
    model=MODEL,
    contents=types.Content(
        parts=[
            types.Part(text='Can you summarize this video?'),
            types.Part(
                file_data=types.FileData(file_uri=youtube_url)
            )
        ]
    )
)

display(Markdown(response.text))

##Audio

You can use Gemini to process audio files. For example, you can use it to generate a transcript of an audio file or to summarize the content of an audio file.

Gemini represents each second of audio as 32 tokens; for example, one minute of audio is represented as 1,920 tokens.

In [None]:
URL = "https://storage.googleapis.com/generativeai-downloads/data/jeff-dean-presentation.mp3"
!wget -q $URL -O sample.mp3

In [None]:
import IPython
IPython.display.Audio("sample.mp3")

In [None]:
uploaded_audio_file = client.files.upload(file='sample.mp3')

response = client.models.generate_content(
  model=MODEL,
  contents=[
    'Here is a talk by Jeff Dean. Summarize the talk in 4-5 sentences.',
    uploaded_audio_file,
  ]
)

display(Markdown(response.text))

##Structured output
Gemini generates unstructured text by default, but some applications require structured text. For these use cases, you can constrain Gemini to respond with JSON, a structured data format suitable for automated processing.

In [None]:
from pydantic import BaseModel

class Recipe(BaseModel):
  recipe_name: str
  ingredients: list[str]

response = client.models.generate_content(
    model=MODEL,
    contents='List a three popular cookie recipes. Be sure to include the amounts of ingredients.',
    config={
        'response_mime_type': 'application/json',
        'response_schema': list[Recipe],
    },
)
# Use the response as a JSON string.
print(response.text)
# Use instantiated objects.
my_recipes: list[Recipe] = response.parsed

##Grounding with Google Search
If Google Search is configured as a tool, Gemini can decide when to use Google Search to improve the accuracy and recency of responses.

In [None]:
from google.genai.types import Tool, GenerateContentConfig, GoogleSearch

google_search_tool = Tool(
    google_search = GoogleSearch()
)

response = client.models.generate_content(
    model=MODEL,
    contents="What is the big news item today?",
    config=GenerateContentConfig(
        tools=[google_search_tool],
        response_modalities=["TEXT"],
    )
)

In [None]:
for part in response.candidates[0].content.parts:
    display(Markdown(part.text))

##Function Calling
Function calling lets you connect models to external tools and APIs. Instead of generating text responses, the model understands when to call specific functions and provides the necessary parameters to execute real-world actions.

In [None]:
from google.genai import types

# Define the function declaration for the model
def add_numbers(a, b):
  """Adds two numbers together."""
  return a + b

add_numbers_function = {
    "name": "add_numbers",
    "description": "Adds two numbers together.",
    "parameters": {
        "type": "object",
        "properties": {
            "a": {
                "type": "number",
                "description": "The first number",
            },
            "b": {
                "type": "number",
                "description": "The second number",
            },
        },
        "required": ["a", "b"],
    },
}

# Configure the client and tools
tools = types.Tool(function_declarations=[add_numbers_function])

# Send request with function declarations
response = client.models.generate_content(
    model=MODEL,
    contents="What is the sum of 300 and 5?",
    config=types.GenerateContentConfig(tools=[tools])
)

# Check if the response contains a function call
if response.candidates[0].content.parts[0].function_call:
    function_call = response.candidates[0].content.parts[0].function_call
    print(f"Function to call: {function_call.name}")
    print(f"Arguments: {function_call.args}")

    # Call the function with the provided arguments
    result = add_numbers(**function_call.args)  # Unpack arguments
    print(f"Result: {result}")
else:
    print("No function call found in the response.")
    print(response.text)

In [None]:
import pandas as pd
from google.genai import types
import io # import the io module

# Define a function to analyze data
def analyze_data(data: str, column: str, operation: str) -> dict:
  """Analyzes the provided data using pandas."""

  # Convert the data string to a DataFrame using io.StringIO
  df = pd.read_csv(io.StringIO(data))

  # Perform the requested operation
  if operation == "mean":
    result = df[column].mean()
  elif operation == "median":
    result = df[column].median()
  elif operation == "sum":
    result = df[column].sum()
  else:
    raise ValueError(f"Invalid operation: {operation}")

  return {"result": result}

# Create a function declaration
analyze_data_function = {
    "name": "analyze_data",
    "description": "Analyzes data using pandas. Accepts a CSV string, column name, and operation (mean, median, sum).",
    "parameters": {
        "type": "object",
        "properties": {
            "data": {
                "type": "string",
                "description": "The data to analyze in CSV format",
            },
            "column": {
                "type": "string",
                "description": "The name of the column to analyze",
            },
            "operation": {
                "type": "string",
                "description": "The operation to perform (mean, median, sum)",
            },
        },
        "required": ["data", "column", "operation"],
    },
}

# Configure the tools
tools = types.Tool(function_declarations=[analyze_data_function])

# Sample data
data = """
col1,col2,col3,col4
1,2,3,4
5,6,7,8
9,10,11,12
1,5,9,8
"""

# Send the request
response = client.models.generate_content(
    model=MODEL,
    contents=f"Calculate the median of the 'col3' column in this data:\n\n{data}",
    config=types.GenerateContentConfig(tools=[tools])
)

# Check for function call and process the response
if response.candidates[0].content.parts[0].function_call:
  function_call = response.candidates[0].content.parts[0].function_call
  print(f"Function to call: {function_call.name}")
  print(f"Arguments: {function_call.args}")

  result = analyze_data(**function_call.args)
  print(f"Result: {result}")
else:
  print("No function call found in the response.")
  print(response.text)