## Structured Tab example (Hospital data)

Extract structured data from documents containing repeating entities like tables, lists, or catalogs.

---
Config:
* ``Schema alignment``:
    * extraction_target= PER_TABLE_ROW
        - ****Clear structure****: The document has explicit table formatting with rows and columns
        - ****Repeating entities****: Each row represents one hospital with consistent attributes
        - ****Local information****: All data for each hospital (county, name, plans) is contained within its row
    ⚠️
* ``Model settings``: 
    * extraction_mode= PREMIUM (suitable for complex tables and information-dense documents ) 
    * parse_model= "anthropic-sonnet-4.5"

* ``Advanced options``:
    * chunk_mode= PAGE 
---


### Choosing the Right Extraction Target
| Extraction Target                   |                                                                                                                                                                                                                                                                                                Procedure/Result                                                                                                                                                                                                                                                                                                |
|-------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| PER_DOC (Document-level extraction) | * ``Operation mode``: looks at the entire document's context. <br> * ``Issue(s)``: When extracting lists of entities, **LLM-based extraction has a critical failure mode** — it often only extracts the first few tens of entries from a long list.  This happens because LLMs have **limited attention spans for repetitive data**. <br> &rarr; ``Result``: complete extractions of long lists are not guaranteed.**                                                                                                                   |
| PER_TABLE_ROW                       | * ``Operation mode``: It defines a schema for a single entity (e.g., one hospital, one product, one invoice line item), not the full document. <br> LlamaExtract automatically:<br>  * Detects the formatting patterns that distinguish individual entities (table rows, list items, section headers, etc.)<br> * Applies your schema to each identified entity<br> * Returns a **list[YourSchema]** with one object per entity<br> * ``Fixes``: It solves incomplete list extractions by processing each entity individually or in smaller batches.<br> &rarr; ``Result`` Ensuring exhaustive extraction of all entries regardless of list length. |

### Provide api-keys manually

In [6]:
import os
from getpass import getpass

if "LLAMA_CLOUD_API_KEY" not in os.environ:
    os.environ["LLAMA_CLOUD_API_KEY"] = getpass("Enter your Llama Cloud API Key: ")
    os.environ["OPENAI_KEY"] = getpass("Enter your OpenAI API Key: ")

### Create instance of extractor

In [None]:
import nest_asyncio
nest_asyncio.apply()

In [8]:
from llama_cloud_services import (
    LlamaExtract,
    EU_BASE_URL,
)

# Optionally, provide your project id, if not, it will use the 'Default' project
llama_extract = LlamaExtract(base_url=EU_BASE_URL)

print(EU_BASE_URL)

https://api.cloud.eu.llamaindex.ai


### Define the data schema

In [None]:
from pydantic import BaseModel, Field

class Hospital(BaseModel):
    """List of hospitals by county available for different BSC plans"""

    county: str = Field(description="County name")
    hospital_name: str = Field(description="Name of the hospital")
    plan_names: list[str] = Field(
        description="List of plans available at the hospital. One of: Trio HMO, SaveNet, Access+ HMO, BlueHPN PPO, Tandem PPO, PPO"
    )

---
### Testing - With ``extraction_target``: 'PER_TABLE_ROW'
---

In [None]:
from llama_cloud_services.extract import ExtractConfig, ExtractMode, ExtractTarget

result_per_tab_row = await llama_extract.aextract(
    data_schema=Hospital,
    files="/home/daghbeji/ragragi/genAI_3D_CAD/llamaindex/data/tabs/BSC-Hospital-List-by-County.pdf",
    config=ExtractConfig(
        extraction_mode=ExtractMode.PREMIUM,
        extraction_target=ExtractTarget.PER_TABLE_ROW,
        parse_model="anthropic-sonnet-4.5",
    ),
)

In [11]:
len(result_per_tab_row.data)

380

In [12]:
result_per_tab_row.data[:10]

[{'county': 'Alameda',
  'hospital_name': 'Alameda Hospital',
  'plan_names': ['Trio HMO',
   'SaveNet',
   'Access+ HMO',
   'BlueHPN PPO',
   'Tandem PPO',
   'PPO']},
 {'county': 'Alameda',
  'hospital_name': 'Alta Bates Med Ctr Herrick Campus',
  'plan_names': ['Trio HMO',
   'Access+ HMO',
   'BlueHPN PPO',
   'Tandem PPO',
   'PPO']},
 {'county': 'Alameda',
  'hospital_name': 'Alta Bates Summit Med Ctr Alta Bates Campus',
  'plan_names': ['Trio HMO',
   'Access+ HMO',
   'BlueHPN PPO',
   'Tandem PPO',
   'PPO']},
 {'county': 'Alameda',
  'hospital_name': 'Alta Bates Summit Med Ctr Summit Campus',
  'plan_names': ['Trio HMO',
   'Access+ HMO',
   'BlueHPN PPO',
   'Tandem PPO',
   'PPO']},
 {'county': 'Alameda',
  'hospital_name': 'Alta Bates Summit Medical Center',
  'plan_names': ['Trio HMO',
   'Access+ HMO',
   'BlueHPN PPO',
   'Tandem PPO',
   'PPO']},
 {'county': 'Alameda',
  'hospital_name': 'BHC Fremont Hospital',
  'plan_names': ['Trio HMO',
   'SaveNet',
   'Access+ HM

---
### Testing - With ``extraction_target``: 'PER_DOC'
---

In [None]:
from llama_cloud_services.extract import ExtractConfig, ExtractMode, ExtractTarget

result_per_doc = await llama_extract.aextract(
    data_schema=Hospital,
    files="/home/daghbeji/ragragi/genAI_3D_CAD/llamaindex/data/tabs/BSC-Hospital-List-by-County.pdf",
    config=ExtractConfig(
        extraction_mode=ExtractMode.PREMIUM,
        extraction_target=ExtractTarget.PER_DOC,
        parse_model="anthropic-sonnet-4.5",
    ),
)

In [17]:
len(result_per_doc.data)

3

In [19]:
result_per_doc.data

{'county': 'Los Angeles',
 'hospital_name': 'Ronald Reagan UCLA Med Ctr',
 'plan_names': ['Trio HMO',
  'SaveNet',
  'Access+ HMO',
  'BlueHPN PPO',
  'Tandem PPO',
  'PPO']}

---
### Testing - With ``extraction_target``: 'PER_PAGE'
---

In [20]:
from llama_cloud_services.extract import ExtractConfig, ExtractMode, ExtractTarget

result_per_page = await llama_extract.aextract(
    data_schema=Hospital,
    files="/home/daghbeji/ragragi/genAI_3D_CAD/llamaindex/data/tabs/BSC-Hospital-List-by-County.pdf",
    config=ExtractConfig(
        extraction_mode=ExtractMode.PREMIUM,
        extraction_target=ExtractTarget.PER_PAGE,
        parse_model="anthropic-sonnet-4.5",
    ),
)

In [21]:
len(result_per_page.data)

12

In [22]:
result_per_page.data

[{'county': 'Alameda',
  'hospital_name': 'Alameda Hospital',
  'plan_names': ['Trio HMO',
   'SaveNet',
   'Access+ HMO',
   'BlueHPN PPO',
   'Tandem PPO',
   'PPO']},
 {'county': 'El Dorado',
  'hospital_name': 'Marshall Medical Center',
  'plan_names': ['Trio HMO',
   'SaveNet',
   'Access+ HMO',
   'BlueHPN PPO',
   'Tandem PPO',
   'PPO']},
 {'county': 'Kings',
  'hospital_name': 'Adventist Medical Center',
  'plan_names': ['Trio HMO', 'SaveNet', 'Access+ HMO', 'Tandem PPO', 'PPO']},
 {'county': 'Los Angeles',
  'hospital_name': 'Henry Mayo Newhall Hospital',
  'plan_names': ['Trio HMO',
   'SaveNet',
   'Access+ HMO',
   'BlueHPN PPO',
   'Tandem PPO',
   'PPO']},
 {'county': 'Los Angeles',
  'hospital_name': 'Providence Little Co Of Mary Med Ctr Torrance',
  'plan_names': ['Trio HMO',
   'SaveNet',
   'Access+ HMO',
   'BlueHPN PPO',
   'Tandem PPO',
   'PPO']},
 {'county': 'Modoc',
  'hospital_name': 'Modoc Medical Center',
  'plan_names': ['Trio HMO', 'SaveNet', 'Access+ HMO'

## Result

-``PER_TABLE_ROW``: All hospitals (380) were extracted<br>
-``PER_DOC``: Only 3 hospitals were extracted<br>
-``PER_PAGE``: Only 12 hospitals were extracted