# Multimodal Parsing with Gemini 2.0 Flash

<a href="https://colab.research.google.com/github/run-llama/llama_parse/blob/main/examples/multimodal/gemini2_flash.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This cookbook shows you how to use LlamaParse to parse any document with the multimodal capabilities of Gemini 2.0 Flash.

LlamaParse allows you to plug in external, multimodal model vendors for parsing - we handle the error correction, validation, and scalability/reliability for you.


## Setup

Download the data - we'll use a technical datasheet for a programmable logic device (Xilinx's XC9500 In-System Programmable CPLD).

In [1]:
import nest_asyncio

nest_asyncio.apply()

In [5]:
!wget "https://media.digikey.com/pdf/Data%20Sheets/AMD/XC9500_CPLD_Family.pdf" -O data/XC9500_CPLD_Family.pdf

--2025-02-12 17:28:56--  https://media.digikey.com/pdf/Data%20Sheets/AMD/XC9500_CPLD_Family.pdf
Resolving media.digikey.com (media.digikey.com)... 2.22.70.9
Connecting to media.digikey.com (media.digikey.com)|2.22.70.9|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 201899 (197K) [application/pdf]
Saving to: ‘data/XC9500_CPLD_Family.pdf’


2025-02-12 17:28:57 (7.19 MB/s) - ‘data/XC9500_CPLD_Family.pdf’ saved [201899/201899]



## Initialize LlamaParse

Initialize LlamaParse in multimodal mode, and specify the vendor as `gemini-2.0-flash-001`.

**NOTE**: Current pricing is 2 credits for a 1 page ($0.006 USD / page). This includes core model, infra, and algorithm costs to fully process the page. 

In [6]:
from llama_index.core.schema import TextNode
from typing import List
import json

def get_text_nodes(json_list: List[dict]):
    text_nodes = []
    for idx, page in enumerate(json_list):
        text_node = TextNode(text=page["md"], metadata={"page": page["page"]})
        text_nodes.append(text_node)
    return text_nodes


def save_jsonl(data_list, filename):
    """Save a list of dictionaries as JSON Lines."""
    with open(filename, "w") as file:
        for item in data_list:
            json.dump(item, file)
            file.write("\n")


def load_jsonl(filename):
    """Load a list of dictionaries from JSON Lines."""
    data_list = []
    with open(filename, "r") as file:
        for line in file:
            data_list.append(json.loads(line))
    return data_list

In [12]:
from llama_parse import LlamaParse

parsing_instruction = """
You are given a technical datasheet of an electronic component.
For any graphs, try to create a 2D table of relevant values, along with a description of the graph.
For any schematic diagrams, MAKE SURE to describe a list of all components and their connections to each other.
Make sure that you always parse out the text with the correct reading order.
"""

parser = LlamaParse(
    result_type="markdown",
    use_vendor_multimodal_model=True,
    vendor_multimodal_model_name="gemini-2.0-flash-001",
    invalidate_cache=True,
    parsing_instruction=parsing_instruction,
)
json_objs = parser.get_json_result("./data/pb116349-business-health-select-handbook-1024-pdfa.pdf")
json_list = json_objs[0]["pages"]
docs = get_text_nodes(json_list)

Started parsing the file under job_id d231f1d0-73f1-43d5-8bed-79f5a3b60045


KeyboardInterrupt: 

Error while parsing the file './data/pb116349-business-health-select-handbook-1024-pdfa.pdf': 


In [10]:
# Optional: Save
save_jsonl([d.dict() for d in docs], "docs_gemini_2.0_flash.jsonl")

In [9]:
# Optional: Load
from llama_index.core import Document

docs_dicts = load_jsonl("docs_gemini_2.0_flash.jsonl")
docs = [Document.parse_obj(d) for d in docs_dicts]

/var/folders/cg/ftn0fq2j6hbdx6dv0dp7drrr0000gn/T/ipykernel_4591/4252911382.py:5: PydanticDeprecatedSince20: The `parse_obj` method is deprecated; use `model_validate` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.10/migration/
  docs = [Document.parse_obj(d) for d in docs_dicts]


### Setup GPT-4o baseline

For comparison, we will also parse the document using GPT-4o ($0.03 per page).

In [10]:
from llama_parse import LlamaParse

parser_gpt4o = LlamaParse(
    result_type="markdown",
    use_vendor_multimodal_model=True,
    vendor_multimodal_model="openai-gpt4o",
    invalidate_cache=True,
    parsing_instruction=parsing_instruction,
)
json_objs_gpt4o = parser_gpt4o.get_json_result("./data/XC9500_CPLD_Family.pdf")
json_list_gpt4o = json_objs_gpt4o[0]["pages"]
docs_gpt4o = get_text_nodes(json_list_gpt4o)

Started parsing the file under job_id f5096af5-72f1-49c9-b00c-b5c245fd20a1


In [11]:
# Optional: Save
save_jsonl([d.dict() for d in docs_gpt4o], "docs_gpt4o.jsonl")

In [12]:
# Optional: Load
from llama_index.core import Document

docs_gpt4o_dicts = load_jsonl("docs_gpt4o.jsonl")
docs_gpt4o = [Document.parse_obj(d) for d in docs_gpt4o_dicts]

/var/folders/cg/ftn0fq2j6hbdx6dv0dp7drrr0000gn/T/ipykernel_4591/1974592914.py:5: PydanticDeprecatedSince20: The `parse_obj` method is deprecated; use `model_validate` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.10/migration/
  docs_gpt4o = [Document.parse_obj(d) for d in docs_gpt4o_dicts]


## View Results

Let's visualize the results between GPT-4o and Gemini Flash 2.0 along with the original document page.

Check out an example page 2 below.

![xc9500_img](XC9500_CPLD_Family_p3.png)

We see that the parsed text is fairly similar between Gemini 2.0 Flash and GPT-4o. 

In [13]:
# using Gemini 2.0 Flash
print(docs[2].get_content(metadata_mode="all"))

page: 3

Here's a breakdown of the provided document content, including a description of the schematic diagram and relevant text:

**1.  Overall Document Context**

*   The document is a technical datasheet for the Xilinx XC9500 In-System Programmable CPLD (Complex Programmable Logic Device) family.
*   It is marked as "PRODUCT OBSOLETE / UNDER OBSOLESCENCE".

**2.  Schematic Diagram (Figure 1: XC9500 Architecture)**

*   **Description:** The diagram shows a high-level architecture of the XC9500 CPLD. It illustrates the connections between key functional blocks.

*   **Components and Connections:**

    *   **JTAG Port:**  Connects to the JTAG Controller via 3 lines.
    *   **JTAG Controller:**  Bidirectionally connected to the In-System Programming Controller.
    *   **In-System Programming Controller:** Bidirectionally connected to the Fast CONNECT II Switch Matrix.
    *   **I/O Blocks:** Multiple I/O blocks are connected to the Fast CONNECT II Switch Matrix.  There are also dedic

In [14]:
# using GPT-4o
print(docs_gpt4o[2].get_content(metadata_mode="all"))

page: 3

The image is a block diagram of the XC9500 In-System Programmable CPLD Family architecture. Here's a breakdown of the components and their connections:

### Components and Connections:

1. **JTAG Port:**
   - Connects to the JTAG Controller.

2. **JTAG Controller:**
   - Interfaces with the In-System Programming Controller.

3. **In-System Programming Controller:**
   - Connects to the Fast CONNECT Switch Matrix.

4. **I/O Blocks:**
   - Multiple I/O lines connect to the Fast CONNECT Switch Matrix.

5. **Fast CONNECT Switch Matrix:**
   - Connects to multiple Function Blocks (1 to N).
   - Each Function Block has 36 inputs and 18 outputs.

6. **Function Blocks (1 to N):**
   - Each block contains 18 Macrocells.
   - Outputs from the Function Blocks drive the I/O Blocks directly.

7. **I/O/GCK, I/O/GSR, I/O/GTS:**
   - Special I/O lines for global clock, set/reset, and output enable signals.

### Function Block Details:

- Each Function Block consists of 18 independent macrocel

## Setup RAG Pipeline

Let's setup a RAG pipeline over this data.

(we also use gpt4o-mini for the actual text synthesis step).

In [15]:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

Settings.llm = OpenAI(model="o3-mini")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-large")

In [16]:
# from llama_index.core import SummaryIndex
from llama_index.core import VectorStoreIndex
from llama_index.llms.openai import OpenAI

index = VectorStoreIndex(docs)
query_engine = index.as_query_engine(similarity_top_k=5)

index_gpt4o = VectorStoreIndex(docs_gpt4o)
query_engine_gpt4o = index_gpt4o.as_query_engine(similarity_top_k=5)

In [17]:
query = "Give me the full output slew-Rate curve for (a) Rising and (b) Falling Outputs"

response = query_engine.query(query)
response_gpt4o = query_engine_gpt4o.query(query)

In [18]:
print(response)

For rising outputs the transition begins at a low voltage and then ramps upward until it settles at the high level. In a standard (unlimited) drive the transition is quite steep; in contrast, when slew‐rate control is used an extra delay is introduced (denoted by a TSLEW delay) so that the voltage increases more gradually. The curve in this case shows a controlled, slower rise with a gentler slope—starting from the low level, moving upward (often noted by an intermediate level such as around 1.5V on the graph), and finally reaching the full high voltage.

For falling outputs the behavior is analogous but with the voltage decreasing. Without slew rate limitation the fall is abrupt, whereas with slew‐rate control the output voltage descends gradually. The controlled fall begins at the high voltage level, then follows a smoothed, slower descent (again incorporating the TSLEW delay), and ultimately stabilizes at the low voltage.

Thus, the full slew‐rate curves depict two cases for each tr

In [19]:
print(response.source_nodes[0].get_content())

Here is the information parsed from the document:

**Figure 11: Output slew-Rate for (a) Rising and (b) Falling Outputs**

| Parameter | Description                               |
| :-------- | :---------------------------------------- |
| TSLEW     | Additional time delay for slew rate control |
| 1.5V      | Voltage level shown on the graph          |

**Description of the Graph:**

Figure 11 shows the output slew rate for rising and falling outputs. The graph shows the output voltage over time for both standard and slew-rate limited outputs.

**Figure 12: XC9500 Devices in (a) 5V Systems and (b) Mixed 5V/3.3V Systems**

**(a) 5V Systems**

*   **XC9500 CPLD:**
    *   VCCINT connected to 5V
    *   VCCIO connected to 5V
    *   GND connected to ground
    *   IN connected to:
        *   5V CMOS or 5V with a switch
        *   5V TTL or 3.6V with a switch
        *   3.3V or 3.3V with a switch
    *   OUT connected to 5V TTL or -4V with a switch

**(b) Mixed 5V/3.3V Systems**

*   

In [None]:
print(response_gpt4o)

The output slew-rate curve for (a) Rising and (b) Falling Outputs is represented in a timing diagram where the output voltage transitions from a low state to a high state and vice versa. 

For the rising output, the curve starts at 1.5V and transitions to the desired output voltage level over a time period defined as T<sub>SLEW</sub>. 

For the falling output, the curve similarly begins at the high output voltage and decreases to a low state, also taking the time defined as T<sub>SLEW</sub> to complete the transition.

The specific values and graphical representation would typically be illustrated in a figure, but the key takeaway is that the output slew rate can be controlled to manage system noise by programming the desired T<sub>SLEW</sub> time.


In [None]:
print(response_gpt4o.source_nodes[0].get_content())

# XC9500 In-System Programmable CPLD Family

Each output has independent slew rate control. Output edge rates may be slowed down to reduce system noise (with an additional time delay of T<sub>SLEW</sub>) through programming. See Figure 11.

Each IOB provides user programmable ground pin capability. This allows device I/O pins to be configured as additional ground pins. By tying strategically located programmable ground pins to the external ground connection, system noise generated from large numbers of simultaneous switching outputs may be reduced.

A control pull-up resistor (typically 10K ohms) is attached to each device I/O pin to prevent them from floating when the device is not in normal user operation. This resistor is active during device programming mode and system power-up. It is also activated for an erased device. The resistor is deactivated during normal operation.

The output driver is capable of supplying 24 mA output drive. All output drivers in the device may be configu