# Gemini for Text Generation

Can you read this???

In [None]:
# !pip install -q -U google-genai

In [1]:
from google import genai
import os

gemini_api_key = os.getenv("GEMINI_API_KEY")

client = genai.Client(api_key=gemini_api_key)

response = client.models.generate_content(
    model="gemini-2.0-flash", contents="Explain how AI works in a few words"
)
print(response.text)

AI learns from data to make predictions or decisions.



In [2]:
from IPython.display import Markdown

response = client.models.generate_content(
    model="gemini-2.0-flash", contents="Explain how AI works in 200 words."
)

Markdown(response.text)

AI, or Artificial Intelligence, broadly encompasses computer systems mimicking human intelligence. Currently, most AI relies on **machine learning**, particularly **deep learning**.

Machine learning involves feeding algorithms massive datasets. The algorithm then identifies patterns and relationships within this data, allowing it to make predictions or decisions on new, unseen data.

Deep learning uses artificial neural networks with multiple layers to analyze data in a hierarchical way, similar to how the human brain processes information. These networks learn complex features from raw data like images, text, or sound.

For example, to recognize cats in images, a deep learning model is trained on millions of cat pictures. It learns to identify edges, textures, and shapes that define a cat, allowing it to accurately identify cats in new images.

While impressive, AI is not truly "thinking" like humans. It's sophisticated pattern recognition and application, reliant on the quality and quantity of data it's trained on. Different AI approaches exist beyond machine learning, but this is the dominant method today.


# Gemini for Image Understanding

<img src="./assets-resources/sample-image.png" width=20%>

In [3]:
from google import genai

client = genai.Client(api_key=gemini_api_key)

my_file = client.files.upload(file="./assets-resources/sample-image.png")

response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=[my_file, "Caption this image."],
)

print(response.text)

Here are the bounding box detections:
```json
[
  {"box_2d": [563, 276, 579, 312], "label": "Table"},
  {"box_2d": [896, 237, 942, 937], "label": "(illustrated by tokens of different colors in the input sequence). They can output responses with\ninterleaved image and text."},
  {"box_2d": [725, 792, 780, 839], "label": "Aa"},
  {"box_2d": [877, 236, 896, 292], "label": "Figure"},
  {"box_2d": [877, 301, 896, 309], "label": "2"},
  {"box_2d": [616, 289, 637, 310], "label": "Aa"},
  {"box_2d": [580, 276, 595, 325], "label": "Sequence"},
  {"box_2d": [664, 528, 782, 630], "label": "Transformer"},
  {"box_2d": [877, 315, 896, 916], "label": "Gemini models support interleaved sequences of text, image, audio, and video as inputs"},
  {"box_2d": [343, 58, 382, 173], "label": "arrow"},
  {"box_2d": [746, 719, 758, 757], "label": "Decoder"}
]
```
The image presents a diagram of the Gemini 1.0 model architecture. It illustrates how the model is structured to handle various types of input, includ

## To pass image data inline

In [4]:
from google.genai import types

with open('./assets-resources/sample-image.png', 'rb') as f:
    image_bytes = f.read()

response = client.models.generate_content(
model='gemini-2.5-flash-preview-04-17',
contents=[
    types.Part.from_bytes(
    data=image_bytes,
    mime_type='image/png',
    ),
    'What is the diagram in this picture?Answer it concisely.'
]
)

print(response.text)

Figure 2 illustrates how Gemini models support interleaved sequences of text, image, audio, and video as inputs and can output interleaved image and text responses.


See more examples for working with images in Gemini [here](https://ai.google.dev/gemini-api/docs/image-understanding).

# Document Understanding

In [5]:
from google import genai
from google.genai import types
import httpx

client = genai.Client(api_key=gemini_api_key)

doc_url = "https://discovery.ucl.ac.uk/id/eprint/10089234/1/343019_3_art_0_py4t4l_convrt.pdf"

# Retrieve and encode the PDF byte
doc_data = httpx.get(doc_url).content

prompt = "Summarize this document in bullet points"
response = client.models.generate_content(
  model="gemini-2.5-flash-preview-04-17",
  contents=[
      types.Part.from_bytes(
        data=doc_data,
        mime_type='application/pdf',
      ),
      prompt])
print(response.text)

Here is a bullet-point summary of the document:

*   **Problem:** Predicting a protein's 3D structure from its amino acid sequence is crucial for understanding function but experimentally difficult and time-consuming.
*   **Background:** Previous methods used genetic covariation analysis to infer residue contacts, aiding structure prediction. Fragment assembly methods used statistical potentials and sampling.
*   **AlphaFold's Novel Approach:**
    *   Trains a deep neural network to predict accurate *distances* between pairs of residues (specifically, the Cβ atoms). Distance predictions provide richer structural information than binary contact predictions.
    *   Constructs a protein-specific potential of mean force based on these predicted distance distributions.
    *   Optimizes this potential using a simple gradient descent algorithm (L-BFGS) to realize protein structures, avoiding complex sampling procedures.
    *   Predicts full chains without requiring explicit domain segment

## For locally stored pdfs

In [10]:
from google import genai
from google.genai import types
import pathlib
import httpx

client = genai.Client(api_key=gemini_api_key)

# Uncomment this to download from the internet and save it locally
# doc_url = "https://discovery.ucl.ac.uk/id/eprint/10089234/1/343019_3_art_0_py4t4l_convrt.pdf"
# filepath = pathlib.Path('./assets-resources/prompt-eng-guide-google.pdf')
# filepath.write_bytes(httpx.get(doc_url).content)

# This example assumes the pdf is stored locally
# Retrieve and encode the PDF byte
filepath = pathlib.Path('./assets-resources/prompt-eng-guide-google.pdf')

prompt = "Write markdown style 3 page report on this prompt engineering guide with just the practical tips and relevant information."
response = client.models.generate_content(
  model="gemini-2.5-pro-preview-03-25",
  contents=[
      types.Part.from_bytes(
        data=filepath.read_bytes(),
        mime_type='application/pdf',
      ),
      prompt])
print(response.text)

Okay, here is a 3-page markdown report summarizing the practical tips and relevant information from the Google Prompt Engineering guide.

---

# Prompt Engineering Guide: Practical Summary (Page 1/3)

## 1. Introduction to Prompt Engineering

*   **Core Idea:** Prompt engineering is the iterative process of designing effective inputs (prompts) to guide Large Language Models (LLMs) toward desired outputs. It's essential because LLMs are prediction engines, and the prompt sets the context for that prediction.
*   **Accessibility:** You don't need to be a data scientist; anyone can write prompts, but crafting *effective* ones takes practice and iteration.
*   **Goal:** To create prompts that are clear, specific, and provide sufficient context, leading to accurate, relevant, and useful LLM responses. Inadequate prompts cause ambiguity and poor results.
*   **Scope:** This guide focuses on prompting models like Gemini directly (via API or tools like Vertex AI Studio) where configuration is 

In [6]:
# Live coding sesh!

# can germini 2.5 flash be used to pull all tables, of different format, in  a pdf, say of around 100 pages? or is there a more efficient way, model or steps you could recommend? Some table have lines, others do not. 

<img src="./table1-paper.png" width=50%>

In [8]:
import pathlib
pdf_path = "./paper.pdf"

filepath = pathlib.Path(pdf_path)

prompt = "Extract table 1 from this paper. Your output should be just the table. Make it markdown style."
response = client.models.generate_content(
  model="gemini-2.5-flash-preview-04-17",
  contents=[
      types.Part.from_bytes(
        data=filepath.read_bytes(),
        mime_type='application/pdf',
      ),
      prompt])
print(response.text)

```markdown
| LLM Type   | Model              | #Size | Form          | Ver.   | Creator               |
| :--------- | :----------------- | :---- | :------------ | :----- | :-------------------- |
| API        | gpt-4              | N/A   | api           | 0613   | OpenAI                |
| API        | gpt-3.5-turbo      | N/A   | api           | 0613   | OpenAI                |
| API        | text-davinci-003   | N/A   | api           |        | OpenAI                |
| API        | text-davinci-002   | N/A   | api           |        | OpenAI                |
| API        | claude-2           | N/A   | api           |        | Anthropic             |
| API        | claude             | N/A   | api           | v1.3   | Anthropic             |
| API        | claude-instant     | N/A   | api           | v1.1   | Anthropic             |
| API        | chat-bison-001     | N/A   | api           |        | Google                |
| OSS        | codellama-34b      | 34B   | open instruct 

In [9]:
from google.genai import types

pdf_figure_image_path = "./table1-paper.png"

with open(pdf_figure_image_path, 'rb') as f:
    image_bytes = f.read()

response = client.models.generate_content(
model='gemini-2.5-flash-preview-04-17',
contents=[
    types.Part.from_bytes(
    data=image_bytes,
    mime_type='image/png',
    ),
    'Extract the full table in this example as a dictionary.'
]
)

print(response.text)

```json
{
 "Table 1: AGENTBENCH evaluates 27 API-based or OSS LLMs on LLM-as-Agent challenges.": {
  "Model": [
   "gpt-4 (OpenAI, 2023)",
   "gpt-3.5-turbo (OpenAI, 2022)",
   "text-davinci-003 (Ouyang et al., 2022)",
   "text-davinci-002 (Ouyang et al., 2022)",
   "claude-2 (Anthropic, 2023b)",
   "claude (Anthropic, 2023a)",
   "claude-instant (Anthropic, 2023a)",
   "chat-bison-001 (Anil et al., 2023)",
   "chatglm-6b (Zeng et al., 2022; Du et al., 2022)",
   "codegeex2-6b (Zheng et al., 2023)",
   "codellama-34b (Rozière et al., 2023)",
   "codellama-13b (Rozière et al., 2023)",
   "codellama-7b (Rozière et al., 2023)",
   "dolly-12b (Conover et al., 2023)",
   "llama2-70b (Touvron et al., 2023)",
   "llama2-13b (Touvron et al., 2023)",
   "llama2-7b (Touvron et al., 2023)",
   "guanaco-65b (Dettmers et al., 2023)",
   "guanaco-33b (Dettmers et al., 2023)",
   "vicuna-33b (Chiang et al., 2023)",
   "vicuna-13b (Chiang et al., 2023)",
   "vicuna-7b (Chiang et al., 2023)",
   "openc

In [13]:
import pandas as pd
import json

def extract_table_to_dataframe(json_str):
    """
    Convert the JSON table output from Gemini into a pandas DataFrame.
    
    Args:
        json_str (str): JSON string containing the table data, expected to be wrapped in ```json
        
    Returns:
        pd.DataFrame: DataFrame containing the table data
    """
    # Clean the string by removing ```json and ``` if present
    cleaned_str = json_str.strip()
    if cleaned_str.startswith('```json'):
        cleaned_str = cleaned_str[7:]
    if cleaned_str.endswith('```'):
        cleaned_str = cleaned_str[:-3]
    cleaned_str = cleaned_str.strip()
    
    # Parse the JSON string into a Python dictionary
    data = json.loads(cleaned_str)
    
    # Convert to DataFrame
    df = pd.DataFrame(data)
    
    return df

# Example usage:
df = extract_table_to_dataframe(response.text)
df

Unnamed: 0,Table 1: AGENTBENCH evaluates 27 API-based or OSS LLMs on LLM-as-Agent challenges.
Model,"[gpt-4 (OpenAI, 2023), gpt-3.5-turbo (OpenAI, ..."
#Size,"[N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, 6B, 6..."
Form,"[api, api, api, api, api, api, api, api, open,..."
Ver.,"[0613, 0613, -, -, -, v1.3, v1.1, -, v1.1, -, ..."
Creator,"[OpenAI, OpenAI, OpenAI, OpenAI, Anthropic, An..."


In [20]:
from pydantic import BaseModel, Field
from typing import List, Optional, Dict, Any
from enum import Enum

class Table(BaseModel):
    table_title: str = Field(description="title of the figure or table")
    table_contents: dict = Field(description="the data within the table as a dictionary")

prompt = f"Extract the table 1 from this paper:"
# Example usage with the previous response
table_contents = client.models.generate_content(
        model='gemini-2.5-flash-preview-04-17',
        contents=[types.Part.from_bytes(
        data=filepath.read_bytes(),
        mime_type='application/pdf',
      ),
      prompt],
        config={
            'response_mime_type': 'application/json',
            'response_schema': list[Table],
        },
    )

table_contents

ValidationError: 1 validation error for Schema
items.properties.table_contents.additionalProperties
  Extra inputs are not permitted [type=extra_forbidden, input_value=True, input_type=bool]
    For further information visit https://errors.pydantic.dev/2.11/v/extra_forbidden

In [None]:
print("\nAs Markdown:")
print(table.to_markdown())

More examples with PDFs [here](https://ai.google.dev/gemini-api/docs/document-processing?lang=python).

# Check Gemini Docs for All Capabilities

- [Audio Understanding](https://ai.google.dev/gemini-api/docs/audio)
- [Video Understanding](https://ai.google.dev/gemini-api/docs/video-understanding)