## Demo Tab Extractor Example 1:

Extract structured data from documents containing repeating entities like tables, lists, or catalogs.

---
Config:
* ``Schema alignment``:
    * extraction_target= PER_TABLE_ROW
        - ****Clear structure****: The document has explicit table formatting with rows and columns
        - ****Repeating entities****: Each row represents one hospital with consistent attributes
        - ****Local information****: All data for each hospital (county, name, plans) is contained within its row
    ⚠️
* ``Model settings``: 
    * extraction_mode= PREMIUM (suitable for complex tables and information-dense documents ) 
    * parse_model= "anthropic-sonnet-4.5"

* ``Advanced options``:
    * chunk_mode= PAGE 
---


### Choosing the Right Extraction Target
| Extraction Target                   |                                                                                                                                                                                                                                                                                                Procedure/Result                                                                                                                                                                                                                                                                                                |
|-------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| PER_DOC (Document-level extraction) | * ``Operation mode``: looks at the entire document's context. <br> * ``Issue(s)``: When extracting lists of entities, **LLM-based extraction has a critical failure mode** — it often only extracts the first few tens of entries from a long list.  This happens because LLMs have **limited attention spans for repetitive data**. <br> &rarr; ``Result``: complete extractions of long lists are not guaranteed.**                                                                                                                   |
| PER_TABLE_ROW                       | * ``Operation mode``: It defines a schema for a single entity (e.g., one hospital, one product, one invoice line item), not the full document. <br> LlamaExtract automatically:<br>  * Detects the formatting patterns that distinguish individual entities (table rows, list items, section headers, etc.)<br> * Applies your schema to each identified entity<br> * Returns a **list[YourSchema]** with one object per entity<br> * ``Fixes``: It solves incomplete list extractions by processing each entity individually or in smaller batches.<br> &rarr; ``Result`` Ensuring exhaustive extraction of all entries regardless of list length. |

### Provide api-keys manually

In [1]:
import os
from getpass import getpass

if "LLAMA_CLOUD_API_KEY" not in os.environ:
    os.environ["LLAMA_CLOUD_API_KEY"] = getpass("Enter your Llama Cloud API Key: ")
    os.environ["OPENAI_KEY"] = getpass("Enter your OpenAI API Key: ")

### Create instance of extractor

In [16]:
import nest_asyncio

nest_asyncio.apply()

In [17]:
from llama_cloud_services import LlamaExtract

# Optionally, provide your project id, if not, it will use the 'Default' project
llama_extract = LlamaExtract()

### Define the data schema

In [18]:
from pydantic import BaseModel, Field


class Hospital(BaseModel):
    """List of hospitals by county available for different BSC plans"""

    county: str = Field(description="County name")
    hospital_name: str = Field(description="Name of the hospital")
    plan_names: list[str] = Field(
        description="List of plans available at the hospital. One of: Trio HMO, SaveNet, Access+ HMO, BlueHPN PPO, Tandem PPO, PPO"
    )

---
### Testing
---

In [19]:
from llama_cloud_services.extract import ExtractConfig, ExtractMode, ExtractTarget

# Using aextract to run multiple extractions in parallel without extractions jobs having to wait for others to finish
result = await llama_extract.aextract(
    data_schema=Hospital,
    # files="/home/daghbeji/ragragi/genAI_3D_CAD/llamaindex/data/resumes/BSC-Hospital-List-by-County.pdf",
    files="/home/daghbeji/ragragi/genAI_3D_CAD/llamaindex/data/resumes/BSC-Hospital-short.pdf",
    config=ExtractConfig(
        extraction_mode=ExtractMode.PREMIUM,
        extraction_target=ExtractTarget.PER_TABLE_ROW,
        parse_model="anthropic-sonnet-4.5",
    ),
)

AttributeError: 'LlamaExtract' object has no attribute 'aextract'

## Result
Success! We extracted all 380 hospitals from the multi-page PDF. Each entity was correctly parsed with its county, hospital name, and applicable insurance plans. With PER_DOC, we would likely have only gotten the first 20-30 entries.