# Playing with the Gemini API

**Description**: This notebook demonstrates how to use the Gemini API with the OpenAI Python library.

## Imports

In [1]:
import os

from openai import OpenAI
import pymupdf
import pypdfium2 as pdfium

## Setup client

In [2]:
GEMINI_API_KEY = os.getenv("GEMINI_API_KEY")
model_id = "gemini-2.5-pro-exp-03-25"  # "gemini-2.5-flash-preview-04-17"

In [3]:
client = OpenAI(
    api_key=GEMINI_API_KEY,
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)
client.models.list().to_dict()

{'data': [{'id': 'models/chat-bison-001',
   'object': 'model',
   'owned_by': 'google'},
  {'id': 'models/text-bison-001', 'object': 'model', 'owned_by': 'google'},
  {'id': 'models/embedding-gecko-001',
   'object': 'model',
   'owned_by': 'google'},
  {'id': 'models/gemini-1.0-pro-vision-latest',
   'object': 'model',
   'owned_by': 'google'},
  {'id': 'models/gemini-pro-vision', 'object': 'model', 'owned_by': 'google'},
  {'id': 'models/gemini-1.5-pro-latest',
   'object': 'model',
   'owned_by': 'google'},
  {'id': 'models/gemini-1.5-pro-001', 'object': 'model', 'owned_by': 'google'},
  {'id': 'models/gemini-1.5-pro-002', 'object': 'model', 'owned_by': 'google'},
  {'id': 'models/gemini-1.5-pro', 'object': 'model', 'owned_by': 'google'},
  {'id': 'models/gemini-1.5-flash-latest',
   'object': 'model',
   'owned_by': 'google'},
  {'id': 'models/gemini-1.5-flash-001',
   'object': 'model',
   'owned_by': 'google'},
  {'id': 'models/gemini-1.5-flash-001-tuning',
   'object': 'model',

## Load PDF document for testing

We compare how both, the PyMupdf and PyPDF2 libraries, extract text from the same PDF document.

In [4]:
pdf_path = "data/pdf_docs/a-practical-guide-to-building-agents.pdf"

In [5]:
pdf1 = pymupdf.open(pdf_path)
print(pdf1[3].get_text())

What is an 
agent?
While conventional software enables users to streamline and automate workflows, agents are able 
to perform the same workflows on the users’ behalf with a high degree of independence.
Agents are systems that independently accomplish tasks on your behalf.
A workflow is a sequence of steps that must be executed to meet the user’s goal, whether that's 
resolving a customer service issue, booking a restaurant reservation, committing a code change,  
or generating a report.
Applications that integrate LLMs but don’t use them to control workflow execution—think simple 
chatbots, single-turn LLMs, or sentiment classifiers—are not agents.
More concretely, an agent possesses core characteristics that allow it to act reliably and 
consistently on behalf of a user:
01
It leverages an LLM to manage workflow execution and make decisions. It recognizes 
when a workflow is complete and can proactively correct its actions if needed. In case  
of failure, it can halt execution and tr

In [6]:
pdf2 = pdfium.PdfDocument(pdf_path)
print(pdf2[3].get_textpage().get_text_range())

What is an 
agent?
While conventional software enables users to streamline and automate workflows, agents are able 
to perform the same workflows on the users’ behalf with a high degree of independence.
Agents are systems that independently accomplish tasks on your behalf.
A workflow is a sequence of steps that must be executed to meet the user’s goal, whether that's 
resolving a customer service issue, booking a restaurant reservation, committing a code change,  
or generating a report.

Applications that integrate LLMs but don’t use them to control workflow execution—think simple 
chatbots, single-turn LLMs, or sentiment classifiers—are not agents.

More concretely, an agent possesses core characteristics that allow it to act reliably and 
consistently on behalf of a user:
01 It leverages an LLM to manage workflow execution and make decisions. It recognizes 
when a workflow is complete and can proactively correct its actions if needed. In case  
of failure, it can halt execution and 



In [7]:
response = client.chat.completions.create(
    model=model_id,
    messages=[
        {
            "role": "system", 
            "content": """You are DocFormatter, an AI assistant specialized in transforming raw, line-broken, code-style text extracted from PDFs into clean, human-readable, richly formatted documents. Always output valid Markdown, preserving the original content and logical structure. Follow these rules:

1.Detect and format headings
- Lines in ALL CAPS or surrounded by blank lines with no punctuation → convert to Markdown headings (#, ##, etc.) based on logical hierarchy.

2. Reflow paragraphs
- Remove hard line-breaks within sentences; merge wrapped lines into single paragraphs.
- Preserve intentional blank lines between paragraphs.

3. Restore lists
- Lines beginning with bullets (-, *, •) or ordered markers (1., a)) → convert to Markdown lists.
- Properly indent nested lists.

4. Convert simple tables
- Sequences of lines with consistent spacing → convert to Markdown tables.

5. Handle footnotes & citations
- Detect bracketed markers like [1] or (Smith et al., 2020) → preserve in-text and, if possible, collect into a “References” section at the end in proper Markdown list form.

6. Clean hyphenation
- Remove orphaned hyphens at line ends (exam-\nple → example).

7. Preserve special elements
- Blockquotes (> …), code blocks (indented or fenced), images (URLs), and figures → retain or convert to Markdown equivalents.

8. Maintain fidelity
- Do not add or omit content; if something is ambiguous, preserve it verbatim and flag with a comment like <!-- Check formatting -->.
"""},
        {
            "role": "user",
            "content": f"""Here is the raw text extracted from a PDF. Please reformat it into clean, readable Markdown, following the system instructions exactly:

```
{pdf2[3].get_textpage().get_text_range()}
```
"""
        }
    ]
)

print(response.choices[0].message.content)

```markdown
## What is an agent?

While conventional software enables users to streamline and automate workflows, agents are able to perform the same workflows on the users’ behalf with a high degree of independence.

Agents are systems that independently accomplish tasks on your behalf.

A workflow is a sequence of steps that must be executed to meet the user’s goal, whether that's resolving a customer service issue, booking a restaurant reservation, committing a code change, or generating a report.

Applications that integrate LLMs but don’t use them to control workflow execution—think simple chatbots, single-turn LLMs, or sentiment classifiers—are not agents.

More concretely, an agent possesses core characteristics that allow it to act reliably and consistently on behalf of a user:

1.  It leverages an LLM to manage workflow execution and make decisions. It recognizes when a workflow is complete and can proactively correct its actions if needed. In case of failure, it can halt exec