## Unstructured Tab example (Toy Catalog)

Extract structured data from documents containing repeating entities like tables, lists, or catalogs.

---
Config:
* ``Schema alignment``:
    * extraction_target= PER_TABLE_ROW
        - ****Clear structure****: The document has explicit table formatting with rows and columns
        - ****Repeating entities****: Each row represents one hospital with consistent attributes
        - ****Local information****: All data for each hospital (county, name, plans) is contained within its row
    ⚠️
* ``Model settings``: 
    * extraction_mode= PREMIUM (suitable for complex tables and information-dense documents ) 
    * parse_model= "anthropic-sonnet-4.5"

* ``Advanced options``:
    * chunk_mode= PAGE 
---


### Choosing the Right Extraction Target
| Extraction Target                   |                                                                                                                                                                                                                                                                                                Procedure/Result                                                                                                                                                                                                                                                                                                |
|-------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| PER_DOC (Document-level extraction) | * ``Operation mode``: looks at the entire document's context. <br> * ``Issue(s)``: When extracting lists of entities, **LLM-based extraction has a critical failure mode** — it often only extracts the first few tens of entries from a long list.  This happens because LLMs have **limited attention spans for repetitive data**. <br> &rarr; ``Result``: complete extractions of long lists are not guaranteed.**                                                                                                                   |
| PER_TABLE_ROW                       | * ``Operation mode``: It defines a schema for a single entity (e.g., one hospital, one product, one invoice line item), not the full document. <br> LlamaExtract automatically:<br>  * Detects the formatting patterns that distinguish individual entities (table rows, list items, section headers, etc.)<br> * Applies your schema to each identified entity<br> * Returns a **list[YourSchema]** with one object per entity<br> * ``Fixes``: It solves incomplete list extractions by processing each entity individually or in smaller batches.<br> &rarr; ``Result`` Ensuring exhaustive extraction of all entries regardless of list length. |

### Provide api-keys manually

In [1]:
import os
from getpass import getpass

if "LLAMA_CLOUD_API_KEY" not in os.environ:
    os.environ["LLAMA_CLOUD_API_KEY"] = getpass("Enter your Llama Cloud API Key: ")
    os.environ["OPENAI_KEY"] = getpass("Enter your OpenAI API Key: ")

### Create instance of extractor

In [2]:
import nest_asyncio
nest_asyncio.apply()

In [3]:
from llama_cloud_services import (
    LlamaExtract,
    EU_BASE_URL,
)

# Optionally, provide your project id, if not, it will use the 'Default' project
llama_extract = LlamaExtract(base_url=EU_BASE_URL)

print(EU_BASE_URL)

https://api.cloud.eu.llamaindex.ai


### Define the data schema

In [4]:
from pydantic import BaseModel, Field


class ToyCatalog(BaseModel):
    """Product information from a toy catalog."""

    section_name: str = Field(
        description="The name of the toy section (e.g. Table Toys, Active Toys)."
    )
    product_code: str = Field(
        description="The unique product code for the toy (e.g., GA457)."
    )
    toy_name: str = Field(description="The name of the toy.")
    age_range: str = Field(
        description="The recommended age range for the toy (e.g., 6 +, 4 +).",
    )
    player_range: str = Field(
        description="The number of players the toy is designed for (e.g., 2, 2-4, 1-6).",
    )
    material: str = Field(
        description="The primary material(s) the toy is made of (e.g., wood, cardboard).",
    )
    description: str = Field(
        description="A brief description of the toy and its components and dimensions.",
    )

---
### Testing - With ``extraction_target``: 'PER_TABLE_ROW'
---

In [5]:
from llama_cloud_services.extract import ExtractConfig, ExtractMode, ExtractTarget

result_per_tab_row = await llama_extract.aextract(
    data_schema=ToyCatalog,
    files="/home/daghbeji/ragragi/genAI_3D_CAD/llamaindex/data/tabs/Click-BS-Toys-Catalogue-2024.pdf",
    config=ExtractConfig(
        extraction_mode=ExtractMode.PREMIUM,
        extraction_target=ExtractTarget.PER_TABLE_ROW,
        parse_model="anthropic-sonnet-4.5",
    ),
)

In [16]:
result_per_tab_row.config.chunk_mode

<DocumentChunkMode.PAGE: 'PAGE'>

In [7]:
len(result_per_tab_row.data)

159

In [10]:
result_per_tab_row.data

[{'section_name': 'Table Toys',
  'product_code': 'GA457',
  'toy_name': 'Dots and Boxes',
  'age_range': '6+',
  'player_range': '2',
  'material': 'wood',
  'description': 'base 17x17 cm; 50 border pieces 4x1,2x0,3 cm; 34 trees 2,6x1,4 cm'},
 {'section_name': 'Table Toys',
  'product_code': 'GA456',
  'toy_name': '3 In a Row',
  'age_range': '8+',
  'player_range': '2',
  'material': 'wood, pine, cardboard',
  'description': 'base 24x22,5x2,5 cm; 30 cards 5,5x5 cm; 6 chips'},
 {'section_name': 'Table Toys',
  'product_code': 'GA467',
  'toy_name': 'Which Cow am i?',
  'age_range': '6+',
  'player_range': '2',
  'material': 'wood, beech',
  'description': '2 cow bases 56x4x4,5 cm; 16 cards 4x5 cm'},
 {'section_name': 'Table Toys',
  'product_code': 'GA460',
  'toy_name': 'Balance Bunnies',
  'age_range': '4+',
  'player_range': '2',
  'material': 'wood',
  'description': '1 base 35x12x25 cm; 7 bunnies 7 foxes; 1 dice 3 cm'},
 {'section_name': 'Table Toys',
  'product_code': 'GA462',
 

---
### Testing - With ``extraction_target``: 'PER_DOC'
---

In [9]:
result_per_doc = await llama_extract.aextract(
    data_schema=ToyCatalog,
    files="/home/daghbeji/ragragi/genAI_3D_CAD/llamaindex/data/tabs/Click-BS-Toys-Catalogue-2024.pdf",
    config=ExtractConfig(
        extraction_mode=ExtractMode.PREMIUM,
        extraction_target=ExtractTarget.PER_DOC,
        parse_model="anthropic-sonnet-4.5",
    ),
)

In [11]:
len(result_per_doc.data)

7

In [12]:
result_per_doc.data

{'section_name': 'Table Toys',
 'product_code': 'GA457',
 'toy_name': 'Dots and Boxes',
 'age_range': '6+',
 'player_range': '2',
 'material': 'wood',
 'description': 'Base 17x17 cm, 50 border pieces 4x1,2x0,3 cm, 34 trees 2,6x1,4 cm. A tabletop game for 2 players, made of wood.'}

---
### Testing - With ``extraction_target``: 'PER_PAGE'
---

In [13]:
result_per_page = await llama_extract.aextract(
    data_schema=ToyCatalog,
    files="/home/daghbeji/ragragi/genAI_3D_CAD/llamaindex/data/tabs/Click-BS-Toys-Catalogue-2024.pdf",
    config=ExtractConfig(
        extraction_mode=ExtractMode.PREMIUM,
        extraction_target=ExtractTarget.PER_PAGE,
        parse_model="anthropic-sonnet-4.5",
    ),
)

In [14]:
len(result_per_page.data)

39

In [15]:
result_per_page.data

[{'section_name': '',
  'product_code': '',
  'toy_name': '',
  'age_range': '',
  'player_range': '',
  'material': '',
  'description': ''},
 {'section_name': '',
  'product_code': '',
  'toy_name': '',
  'age_range': '',
  'player_range': '',
  'material': '',
  'description': 'This page is an introduction to the BS Toys catalog. It does not provide explicit product information or describe specific toys according to the schema. It highlights the catalog contents, company mission, rebranding, and the developmental benefits symbols used throughout the catalog.'},
 {'section_name': 'Table Toys',
  'product_code': 'GA457',
  'toy_name': 'Dots and Boxes',
  'age_range': '6+',
  'player_range': '2',
  'material': 'wood',
  'description': 'Base 17x17 cm, 50 border pieces 4x1.2x0.3 cm, 34 trees 2.6x1.4 cm.'},
 {'section_name': 'Table Toys',
  'product_code': 'GA465',
  'toy_name': 'Plop It',
  'age_range': '6+',
  'player_range': '2-4',
  'material': 'wood, elastic, cardboard',
  'descripti

## Result

-``PER_TABLE_ROW``: All Toys (159) were extracted<br>
-``PER_DOC``: Only 7 Toys were extracted<br>
-``PER_PAGE``: Only 39 Toys were extracted