# Document Understanding

- The Gemini API supports PDF input, including long documents (up to 1000 pages). Gemini models process PDFs with native vision, and are therefore able to understand both text and image contents inside documents. With native PDF vision support, Gemini models are able to:

- Analyze diagrams, charts, and tables inside documents
Extract information into structured output formats
Answer questions about visual and text contents in documents
Summarize documents
Transcribe document content (e.g. to HTML) preserving layouts and formatting, for use in downstream applications

# PDF input

- For PDF payloads under 20MB, you can choose between uploading base64 encoded documents or directly uploading locally stored files.

# Get your GEMINI_API_KEY from here: https://aistudio.google.com/apikey

In [2]:
import os
import os
import getpass

os.environ["Gemini_API_KEY"] = getpass.getpass("Gemini API Key:")

Gemini API Key:··········


In [4]:
from google import genai
from google.genai import types
import httpx
import os

# Retrieve the API key from the environment variable set earlier
api_key = os.environ.get("Gemini_API_KEY")

# Initialize the client with the API key
client = genai.Client(api_key=api_key)

doc_url = "/content/datastructures-and-algorithms-roadmap.pdf"

# Retrieve and encode the PDF byte
# Note: httpx.get will fail for local file paths.
# For local files, you need to read the file directly.
# doc_data = httpx.get(doc_url).content # This line will fail for local files

# Correct way to read a local file
with open(doc_url, "rb") as f:
    doc_data = f.read()


prompt = "Summarize this document"
response = client.models.generate_content(
  model="gemini-2.0-flash",
  contents=[
      types.Part.from_bytes(
        data=doc_data,
        mime_type='application/pdf',
      ),
      prompt])
print(response.text)

The image shows a comprehensive roadmap for learning Data Structures and Algorithms (DSA). It starts with selecting a programming language (like JavaScript, Python, Java, etc.) and covers fundamental programming concepts. It then dives into data structures, including basic types like arrays and linked lists, and progresses to more advanced structures like trees and graphs. Algorithmic complexity, sorting and searching algorithms, graph algorithms, and indexing techniques are also covered. It recommends practicing problem-solving with different techniques (Brute force, backtracking, greedy) and leveraging platforms like Leetcode and Edabit. The final section emphasizes "Keep Learning".



# Locally Stored PDFs

- For locally stored Pdfs you can use below approach.

In [10]:
from google import genai
from google.genai import types
import pathlib
import httpx
import os # Import os to access environment variables

# Retrieve the API key from the environment variable set earlier
# This assumes you have already set the Gemini_API_KEY environment variable in a previous cell
api_key = os.environ.get("Gemini_API_KEY")

# Initialize the client with the API key
client = genai.Client(api_key=api_key)

# Update the doc_url to a path accessible in the notebook's environment
# Assuming the file is uploaded to /content/prompt_engineering_cheat_sheet.pdf
doc_url = "/content/prompt_engineering_cheat_sheet.pdf"

# Read the local PDF file directly
try:
    with open(doc_url, "rb") as f:
        doc_data = f.read()
except FileNotFoundError:
    print(f"Error: The file {doc_url} was not found.")
    # You might want to handle this error appropriately, e.g., exit the cell or raise an exception.
    doc_data = None # Set doc_data to None or handle the error as needed.


if doc_data: # Proceed only if the file was read successfully
    prompt = "Summarize this document"
    response = client.models.generate_content(
      model="gemini-2.0-flash",
      contents=[
          types.Part.from_bytes(
            data=doc_data,
            mime_type='application/pdf',
          ),
          prompt])
    print(response.text)

Here's a summary of the document:

This document provides tips for writing effective prompts for Large Language Models (LLMs), particularly in a technical or programming context. It uses a "DO" and "DON'T" format with examples to illustrate good and bad prompting practices.

**Key takeaways:**

*   **Be Clear and Specific:**  Provide ample detail and avoid ambiguity.
*   **Define the LLM's Role:**  Explicitly tell the LLM what role you want it to take (e.g., "You are an expert..."). and provide your own expertise.
*   **Contextualize:**  Mention relevant programming languages, libraries, and other technologies.
*   **Explain the Code's Purpose:**  Describe what the code is intended to do.
*   **Specify Constraints:**  Include any constraints or requirements (e.g., memory limitations, software versions).
*   **Example of your expected output:** This is important to help the LLM better understand what you are looking for.
*   **Step-by-step instructions:** Ask the LLM to work step-by-ste

# Multiple PDFs

- The Gemini API is capable of processing multiple PDF documents in a single request, as long as the combined size of the documents and the text prompt stays within the model's context window.

In [15]:
from google import genai
import io
import httpx
import os # Import os to access environment variables

api_key = os.environ.get("Gemini_API_KEY")

# Initialize the client with the API key
client = genai.Client(api_key=api_key)

doc_url_1 = "/content/datastructures-and-algorithms-roadmap.pdf"
doc_url_2 = "/content/prompt_engineering_cheat_sheet.pdf"

# Read the local PDF files directly
try:
    with open(doc_url_1, "rb") as f:
        doc_data_1 = f.read()
    with open(doc_url_2, "rb") as f:
        doc_data_2 = f.read()
except FileNotFoundError as e:
    print(f"Error: File not found - {e}")
    # Handle the error, e.g., exit or set doc_data to None
    doc_data_1 = None
    doc_data_2 = None


if doc_data_1 and doc_data_2: # Proceed only if both files were read successfully
    # Upload the PDF data to the Gemini API using the File API
    sample_pdf_1 = client.files.upload(
      file=io.BytesIO(doc_data_1), # Wrap the bytes data in BytesIO
      config=dict(mime_type='application/pdf')
    )
    sample_pdf_2 = client.files.upload(
      file=io.BytesIO(doc_data_2), # Wrap the bytes data in BytesIO
      config=dict(mime_type='application/pdf')
    )

    prompt = "What is the difference between each of the pdfs? Output these in a table."

    response = client.models.generate_content(
      model="gemini-2.0-flash",
      contents=[sample_pdf_1, sample_pdf_2, prompt])
    print(response.text)

Okay, I will compare the two PDFs and summarize the differences in a table.

| Feature | PDF 1 (Data Structures & Algorithms Roadmap) | PDF 2 (Prompt Engineering Tips) |
|---|---|---|
| **Content Focus** | Focuses on Computer Science concepts, specifically Data Structures and Algorithms. | Focuses on how to write effective prompts for Large Language Models (LLMs). |
| **Type of Information** | Presents a structured learning path for Data Structures and Algorithms. It outlines topics, subtopics, and relationships. | Provides practical advice and examples on how to formulate prompts to get better results from LLMs.  Includes "DO" and "DON'T" examples. |
| **Target Audience** | Students or developers who want to learn or improve their knowledge of Data Structures and Algorithms. | Individuals who are using or planning to use Large Language Models (like ChatGPT, etc.) and want to learn how to better interact with them. |
| **Format** | A roadmap using a directed graph, with topics represen