# Gemini App Chatting with PDFs

In [1]:
import os
import getpass

def _set_env(var: str):
    if not os.environ.get(var):
        os.environ[var] = getpass.getpass(f"var: ")

_set_env("GEMINI_API_KEY")

In [4]:
!wget https://arxiv.org/pdf/2308.03688 && mv 2308.03688 ./paper.pdf

--2025-05-01 12:23:59--  https://arxiv.org/pdf/2308.03688
Resolving arxiv.org (arxiv.org)... 151.101.195.42, 151.101.3.42, 151.101.67.42, ...
Connecting to arxiv.org (arxiv.org)|151.101.195.42|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 23176585 (22M) [application/pdf]
Saving to: ‘2308.03688’


2025-05-01 12:23:59 (105 MB/s) - ‘2308.03688’ saved [23176585/23176585]



In [4]:
from google import genai
from google.genai import types
import httpx
import os

gemini_api_key = os.getenv("GEMINI_API_KEY")

client = genai.Client(api_key=gemini_api_key)

doc_url = "https://arxiv.org/pdf/2308.03688"

# Retrieve and encode the PDF byte
doc_data = httpx.get(doc_url).content

prompt = "Summarize this document"
response = client.models.generate_content(
  model="gemini-2.0-flash",
  contents=[
      types.Part.from_bytes(
        data=doc_data,
        mime_type='application/pdf',
      ),
      prompt])
print(response.text)

Here is a summary of the document:

The paper introduces AgentBench, a new multi-dimensional benchmark designed to evaluate Large Language Models (LLMs) as agents in complex interactive environments. AgentBench consists of 8 distinct environments that assess an LLM-as-Agent's reasoning and decision-making abilities in a multi-turn open-ended generation setting. The evaluation of 27 LLMs (API-based and open-sourced) on AgentBench reveals a significant disparity in performance between commercial LLMs and OSS competitors, with commercial LLMs demonstrating a strong ability to act as agents in complex environments. The paper identifies the typical reasons for failures, highlighting poor long-term reasoning, decision-making, and instruction following abilities as the main obstacles for developing usable LLM agents. The authors suggest that training on code and high-quality multi-turn alignment data could improve agent performance. Datasets, environments, and an integrated evaluation package

In [5]:
doc_url = "https://arxiv.org/pdf/2308.03688"

# Retrieve and encode the PDF byte
doc_data = httpx.get(doc_url).content

prompt = "Extract all the tables from this paper as json objects."
response = client.models.generate_content(
  model="gemini-2.0-flash",
  contents=[
      types.Part.from_bytes(
        data=doc_data,
        mime_type='application/pdf',
      ),
      prompt])
print(response.text)

```json
[
  {
    "table_num": 1,
    "title": "AGENTBENCH evaluates 27 API-based or OSS LLMs on LLM-as-Agent challenges",
    "columns": [
      "Model",
      "#Size",
      "Form",
      "Ver.",
      "Creator",
      "Model",
      "#Size",
      "Form",
      "Ver.",
      "Creator"
    ],
    "rows": [
      [
        "gpt-4 (OpenAI, 2023)",
        "N/A",
        "api",
        "0613",
        "OpenAI",
        "text-davinci-002 (Ouyang et al., 2022)",
        "N/A",
        "api",
        "",
        "OpenAI"
      ],
      [
        "gpt-3.5-turbo (OpenAI, 2022)",
        "N/A",
        "api",
        "0613",
        "OpenAI",
        "llama2-70b (Touvron et al., 2023)",
        "70B",
        "open chat",
        "",
        "Meta"
      ],
      [
        "text-davinci-003 (Ouyang et al., 2022)",
        "N/A",
        "api",
        "",
        "OpenAI",
        "llama2-13b (Touvron et al., 2023)",
        "13B",
        "open chat",
        "",
        "Meta"
      ],
    

![](sample_image_from_paper_agent_bench.png)

In [6]:
from google import genai

client = genai.Client(api_key=gemini_api_key)

my_file = client.files.upload(file="./sample_image_from_paper_agent_bench.png")

response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=[my_file, f"Explain this figure."],
)

print(response.text)

Here's a breakdown of the figure, explaining the two plots and the overall message:

**Figure 1: Overview of LLMs on AgentBench**

The figure presents the performance of various Large Language Models (LLMs) on the AgentBench benchmark.  AgentBench is a tool designed to evaluate how well LLMs perform as agents in different simulated environments.  The key takeaway is that there are significant differences in performance, indicating that the gap towards practical usability is still considerable.

**Panel (a):  Typical LLMs' AgentBench Performance (Relative)**

*   **Chart Type:**  Radar chart
*   **What it shows:**  This shows the *relative* performance of several LLMs across 8 different environment categories within the AgentBench framework. The environments are:
    *   Operating System
    *   Database
    *   Knowledge Graph
    *   Digital Card Game
    *   Lateral Thinking Puzzle
    *   House-holding
    *   Web Shopping
    *   Web Browsing

*   **How to interpret:**  The further