## Demo - Extract with agent 
---
Custom config:
* ``Schema alignment``:
    * extraction_target= PER_DOC 
* ``Model settings``: 
    * extraction_mode= MULTIMODAL (suitable for visually rich documents with a mix of text, simple tables, and images) 
    * parse_model=GEMINI-2.0-flash (Default model)

* ``system_prompt``

* ``Metadata extensions``:
    * use_reasoning= True
    * cite_sources= True
    * confidence_scores= True (Confidence scores provide quantitative measures of how confident the system is in the extracted values, helping you identify potentially unreliable extractions.)

* ``Advanced options``:
    * chunk_mode= PAGE 
---

### Provide api-keys manually

In [1]:
import os
from getpass import getpass

if "LLAMA_CLOUD_API_KEY" not in os.environ:
    os.environ["LLAMA_CLOUD_API_KEY"] = getpass("Enter your Llama Cloud API Key: ")
    os.environ["OPENAI_KEY"] = getpass("Enter your OpenAI API Key: ")

### Create instance of extractor

In [2]:
from llama_cloud_services import LlamaExtract

# Optionally, provide your project id, if not, it will use the 'Default' project
llama_extract = LlamaExtract()
# llama_extract = LlamaExtract(api_key="YOUR_API_KEY")

### Define the data schema

In [3]:
from pydantic import BaseModel, Field
from typing import List, Optional

class TechnicalSkills(BaseModel):
    programming_languages: List[str] = Field(
        description="The programming languages the candidate is proficient in."
    )
    frameworks: List[str] = Field(
        description="The tools/frameworks the candidate is proficient in, e.g. React, Django, PyTorch, etc."
    )
    skills: List[str] = Field(
        description="Other general skills the candidate is proficient in, e.g. Data Engineering, Machine Learning, etc."
    )

class Education(BaseModel):
    institution: str = Field(description="The institution of the candidate")
    degree: str = Field(description="The degree of the candidate")
    start_date: Optional[str] = Field(
        default=None, description="The start date of the candidate's education"
    )
    end_date: Optional[str] = Field(
        default=None, description="The end date of the candidate's education"
    )

class Experience(BaseModel):
    company: str = Field(description="The name of the company")
    title: str = Field(description="The title of the candidate")
    description: Optional[str] = Field(
        default=None, description="The description of the candidate's experience"
    )
    start_date: Optional[str] = Field(
        default=None, description="The start date of the candidate's experience"
    )
    end_date: Optional[str] = Field(
        default=None, description="The end date of the candidate's experience"
    )

class Resume(BaseModel):
    name: str = Field(description="The name of the candidate")
    email: str = Field(description="The email address of the candidate")
    links: List[str] = Field(
        description="The links to the candidate's social media profiles"
    )
    experience: List[Experience] = Field(description="The candidate's experience")
    education: List[Education] = Field(description="The candidate's education")
    technical_skills: TechnicalSkills = Field(
        description="The candidate's technical skills"
    )
    key_accomplishments: str = Field(
        description="Summarize the candidates highest achievements."
    )

### Define extraction configuration

In [None]:
from llama_cloud import ExtractConfig, ExtractMode, ChunkMode, ExtractTarget

custom_config = ExtractConfig(
    
    # Schema alignment
    extraction_target=ExtractTarget.PER_DOC, 
    # Model settings
    extraction_mode=ExtractMode.MULTIMODAL,         # Required for confidence scores
    parse_model="gemini-2.0-flash",                 # Default

    # System prompt
    system_prompt="You are dealing with a professional resume, focus on personal information, contact information and qualifications",

    # Metadata extensions 
    cite_sources=True,                       
    use_reasoning=True,                    
    confidence_scores=True,                 

    # Advanced options
    chunk_mode=ChunkMode.PAGE,                
    high_resolution_mode=False,              
    invalidate_cache=False,           
)

### Create extraction Agent

In [5]:
from llama_cloud.core.api_error import ApiError

try:
    existing_agent = llama_extract.get_agent(name="resume-screening")
    if existing_agent:
        print("============== Agent exists already ==============")
        llama_extract.delete_agent(existing_agent.id)
        print("============== Old Agent deleted ==============")
    else:
        print("============== Creating Agent from scratch ==============")
except ApiError as e:
    if e.status_code == 404:
        pass
    else:
        raise

agent = llama_extract.create_agent(
    name="resume-screening", 
    data_schema=Resume,
    config=custom_config)





---
### Testing
---

#### List the agents

In [6]:
llama_extract.list_agents()

[ExtractionAgent(id=628c02ec-722a-411d-91fb-818ff5b46500, name=resume-screening)]

#### Extract information

In [7]:
resume = agent.extract("/home/daghbeji/ragragi/genAI_3D_CAD/llamaindex/data/resumes/ai_researcher.pdf")
print("============== Extraction finishied successfully ==============")

Uploading files: 100%|██████████| 1/1 [00:02<00:00,  2.24s/it]
Creating extraction jobs: 100%|██████████| 1/1 [00:00<00:00,  2.18it/s]
Extracting files: 100%|██████████| 1/1 [00:22<00:00, 22.61s/it]






#### Print results

In [8]:
resume.data

{'name': 'Dr. Rachel Zhang, Ph.D.',
 'email': 'rachel.zhang@email.com',
 'links': ['linkedin.com/in/rachelzhang',
  'github.com/rzhang-ai',
  'scholar.google.com/rachelzhang'],
 'experience': [{'company': 'DeepMind',
   'title': 'Senior Research Scientist',
   'description': 'Lead researcher on large-scale multi-task learning systems, developing novel architectures that improve cross-task generalization by 40%. Pioneered new approach to zero-shot learning using contrastive training, published in NeurIPS 2023. Built and led team of 6 researchers working on foundational ML models. Developed novel regularization techniques for large language models, reducing catastrophic forgetting by 35%.',
   'start_date': '2019',
   'end_date': None},
  {'company': 'Google Research',
   'title': 'Research Scientist',
   'description': 'Developed probabilistic frameworks for robust ML, published in ICML 2018. Created novel attention mechanisms for computer vision models, improving accuracy by 25%. Led c

In [9]:
resume.extraction_metadata

{'field_metadata': {'name': {'reasoning': 'VERBATIM EXTRACTION',
   'parsing_confidence': 0.995353622168994,
   'extraction_confidence': 0.9801656136732345,
   'confidence': 0.9756113938951488,
   'citation': [{'page': 1, 'matching_text': 'Rachel Zhang, Ph.D.'}]},
  'email': {'reasoning': 'VERBATIM EXTRACTION',
   'parsing_confidence': 0.995353622168994,
   'extraction_confidence': 0.9999999081293959,
   'confidence': 0.9953535307252555,
   'citation': [{'page': 1,
     'matching_text': 'New York City Area | rachel.zhang@email.com | (555) 123-4567'}]},
  'links': [{'reasoning': 'VERBATIM EXTRACTION',
    'parsing_confidence': 0.995353622168994,
    'extraction_confidence': 0.9990411213314138,
    'confidence': 0.9943991988129961,
    'citation': [{'page': 1,
      'matching_text': 'linkedin.com/in/rachelzhang | github.com/rzhang-ai | scholar.google.com/rachelzhang'}]},
   {'reasoning': 'VERBATIM EXTRACTION',
    'parsing_confidence': 0.995353622168994,
    'extraction_confidence': 1.0,

#### Save extraction template for later use

In [10]:
agent.save()
print("============== Saved extraction agent's schema and config to the database ==============")

agent = llama_extract.get_agent("resume-screening")
agent.data_schema  # Latest schema should be returned



{'additionalProperties': False,
 'properties': {'name': {'description': 'The name of the candidate',
   'type': 'string'},
  'email': {'description': 'The email address of the candidate',
   'type': 'string'},
  'links': {'description': "The links to the candidate's social media profiles",
   'items': {'type': 'string'},
   'type': 'array'},
  'experience': {'description': "The candidate's experience",
   'items': {'additionalProperties': False,
    'properties': {'company': {'description': 'The name of the company',
      'type': 'string'},
     'title': {'description': 'The title of the candidate', 'type': 'string'},
     'description': {'anyOf': [{'type': 'string'}, {'type': 'null'}],
      'description': "The description of the candidate's experience"},
     'start_date': {'anyOf': [{'type': 'string'}, {'type': 'null'}],
      'description': "The start date of the candidate's experience"},
     'end_date': {'anyOf': [{'type': 'string'}, {'type': 'null'}],
      'description': "The 

### Test extracted data_schema on my personal resume

#### complex CV (3 pages)

In [14]:
from llama_cloud.core.api_error import ApiError

try:
    existing_agent = llama_extract.get_agent(name="resume-screening")
    if existing_agent:
        print("============== Agent exists already ==============")
        print(existing_agent.data_schema)
        print(existing_agent.config)

        new_cv_path = "/home/daghbeji/ragragi/genAI_3D_CAD/llamaindex/data/resumes/Lebenslauf_complex.pdf"
        my_resume = existing_agent.extract(new_cv_path)

except ApiError as e:
    if e.status_code == 404:
        pass
    else:
        raise

{'additionalProperties': False, 'properties': {'name': {'description': 'The name of the candidate', 'type': 'string'}, 'email': {'description': 'The email address of the candidate', 'type': 'string'}, 'links': {'description': "The links to the candidate's social media profiles", 'items': {'type': 'string'}, 'type': 'array'}, 'experience': {'description': "The candidate's experience", 'items': {'additionalProperties': False, 'properties': {'company': {'description': 'The name of the company', 'type': 'string'}, 'title': {'description': 'The title of the candidate', 'type': 'string'}, 'description': {'anyOf': [{'type': 'string'}, {'type': 'null'}], 'description': "The description of the candidate's experience"}, 'start_date': {'anyOf': [{'type': 'string'}, {'type': 'null'}], 'description': "The start date of the candidate's experience"}, 'end_date': {'anyOf': [{'type': 'string'}, {'type': 'null'}], 'description': "The end date of the candidate's experience"}}, 'required': ['company', 'ti

Uploading files: 100%|██████████| 1/1 [00:02<00:00,  2.52s/it]
Creating extraction jobs: 100%|██████████| 1/1 [00:00<00:00,  2.78it/s]
Extracting files: 100%|██████████| 1/1 [00:33<00:00, 33.13s/it]


In [15]:
my_resume.data

{'name': 'Abderraouf Ayadi',
 'email': 'ayadi_raouf@outlook.com',
 'links': ['https://www.linkedin.com/in/raouf-ayadi-a0a142223/'],
 'experience': [{'company': 'Leibniz Universität Hannover | Institut für Produktentwicklung und Gerätebau (iPeG)',
   'title': 'WISSENSCHAFTLICHE HILFSKRAFT',
   'description': 'Aufbau eines RAG-basierten (Retrieval-Augmented Generation) Systems zum effizienten Durchsuchen von Nachschlagewerken zur mechanischen Konstruktionstechnik. Testen und Bewerten vortrainierter LLM-Modelle auf dem neuesten Stand der Technik zur Generierung parametrischer 3D-CAD-Modelle.',
   'start_date': '11.2025',
   'end_date': '12.2025'},
  {'company': 'Leibniz Universität Hannover | Institut für Montagetechnik und Industrierobotik (Match)',
   'title': 'WISSENSCHAFTLICHE HILFSKRAFT',
   'description': 'Entwicklung eines ROS-basierten Simulations- und Steuerungsframeworks für Multikopter, mit Integration von PX4 und MAVROS. Integration, Test und Benchmarking moderner SLAM-Algorit

In [16]:
my_resume.extraction_metadata

{'field_metadata': {'name': {'reasoning': 'VERBATIM EXTRACTION.',
   'parsing_confidence': 0.8618428227656375,
   'extraction_confidence': 0.9669882941743665,
   'confidence': 0.8333919210325647,
   'citation': [{'page': 1, 'matching_text': '# M. Sc. Abderraouf Ayadi'}]},
  'email': {'reasoning': 'VERBATIM EXTRACTION.',
   'parsing_confidence': 0.8618428227656375,
   'extraction_confidence': 0.9999999757984191,
   'confidence': 0.8618428019076787,
   'citation': [{'page': 1, 'matching_text': 'ayadi_raouf@outlook.com'}]},
  'links': [{'reasoning': 'VERBATIM EXTRACTION.',
    'parsing_confidence': 0.8618428227656375,
    'extraction_confidence': 1.0,
    'confidence': 0.8618428227656375,
    'citation': [{'page': 1,
      'matching_text': 'https://www.linkedin.com/in/raouf-ayadi-a0a142223/'}]}],
  'experience': [{'company': {'parsing_confidence': 0.8618428227656375,
     'extraction_confidence': 0.9999494262530574,
     'confidence': 0.8617992361448147,
     'citation': [{'page': 1,
    