### Microsoft Agent Framework-Part4

#### Create the agent with structured output
- An agent with structured output is an AI agent that returns information in a fixed schema — not free text — using models like Pydantic to enforce predictable JSON output.

- It ensures the model never improvises or adds fluff; instead, it produces clean, machine-readable fields (like name, age, skills, experience) that backend systems can trust.

- Structured output is essential when integrating AI into real applications (ATS, HRMS, ERP, CRM) because the output follows a strict format that downstream automations rely on.

- The ChatAgent is built on top of any chat client implementation that supports structured output. The ChatAgent uses the response_format parameter to specify the desired output schema.

- When creating or running the agent, you can provide a Pydantic model that defines the structure of the expected output.

- This example creates an agent that produces structured output in the form of a JSON object that conforms to a Pydantic model schema.

In [1]:

from pydantic import BaseModel

class PersonInfo(BaseModel):
    name: str | None = None
    age: int | None = None
    occupation: str | None = None

In [6]:
from agent_framework.azure import AzureOpenAIChatClient
from azure.identity import AzureCliCredential

agent = AzureOpenAIChatClient(credential=AzureCliCredential()).create_agent(
    name="HelpfulAssistant",
    instructions="You are a helpful assistant that extracts person information from text."
)


In [3]:
response = await agent.run(
    "John Sir is a 35-year-old software engineer who loves automation.",
    response_format=PersonInfo
)

if response.value:
    info = response.value
    print(info)


name='John Sir' age=35 occupation='software engineer'


### Use Case: Parsing a Resume PDF to Get Structured Output


“Imagine you're building an AI-powered hiring system. Candidates upload resumes in PDF format, but HR doesn’t want free-form text — they want clean structured data:

Name

Email

Phone

Skills

Years of experience

Education

Last job title

Using MAF structured output + PDF parsing, we can convert ANY resume into clean JSON ready for our ATS.

**Step-1: Extract Resume Text Using PyMuPDF**

In [3]:
import fitz  # PyMuPDF

def extract_text_from_pdf(pdf_path: str) -> str:
    doc = fitz.open(pdf_path)
    text = ""
    for page in doc:
        text += page.get_text()
    doc.close()
    return text

resume_text = extract_text_from_pdf("./data/Resume-1.pdf")
print(resume_text[:500])


1 of 2 
Mountain View, CA 94041 
650-336-4590 | johnsir@gmail.com
linkedin.com/in/johnsir | 
johnsir.github.io 
John Sir 
 Data Scientist 
Professional Profile 
Passionate about data analysis and experiments, mainly focused on user behavior, experience, and engagement, with a solid 
background in data science and statistics, and extensive experience using data insights to drive business growth. 
Education
2016 
University of California, Berkeley 
Master of Information and Data Science 
GPA: 3.93


**Step-2: Define a Richer Pydantic Model for Resume Parsing**

In [4]:
from pydantic import BaseModel
from typing import List, Optional

class ResumeInfo(BaseModel):
    name: Optional[str] = None
    email: Optional[str] = None
    phone: Optional[str] = None
    skills: List[str] = []
    total_experience_years: Optional[float] = None
    last_job_title: Optional[str] = None
    education: Optional[str] = None


**Step-3: Create Resume Extraction Agent**

In [7]:
resume_agent = AzureOpenAIChatClient(credential=AzureCliCredential()).create_agent(
    name="ResumeParser",
    instructions="""
    You are an advanced AI Resume Parser.
    Extract structured candidate information ONLY in the schema provided.
    Do NOT add extra text. Do NOT summarize.
    """
)


**Step-4: Run Structured Extraction on the Resume**

In [8]:
extracted = await resume_agent.run(
    resume_text,
    response_format=ResumeInfo
)

if extracted.value:
    parsed_resume = extracted.value
    print(parsed_resume)


name='John Sir' email='johnsir@gmail.com' phone='650-336-4590' skills=['R', 'Python', 'SQL', 'Hadoop', 'Hive', 'MrJob', 'Tableau', 'Git', 'AWS', 'SPSS', 'SAS', 'Matlab', 'Spark', 'Storm', 'Bash', 'EViews', 'Demetra+', 'D3.js', 'Gephi', 'Neo4j', 'QGIS'] total_experience_years=5.0 last_job_title='Data Scientist' education='Master of Information and Data Science from University of California, Berkeley'


In [40]:
r=extracted.to_dict()
r

{'type': 'agent_run_response',
 'messages': [{'type': 'chat_message',
   'role': {'type': 'role', 'value': 'assistant'},
   'contents': [{'type': 'text',
     'text': '{"name":"John Sir","email":"johnsir@gmail.com","phone":"650-336-4590","skills":["R","Python","SQL","Hadoop","Hive","MrJob","Tableau","Git","AWS","SPSS","SAS","Matlab","Spark","Storm","Bash","EViews","Demetra+","D3.js","Gephi","Neo4j","QGIS"],"total_experience_years":5,"last_job_title":"Data Scientist","education":"Master of Information and Data Science from University of California, Berkeley"}'}],
   'author_name': 'ResumeParser',
   'additional_properties': {}}],
 'response_id': 'chatcmpl-CfUXB8pwDUcHG0v5N1JTmSCsAQKJt',
 'created_at': '2025-11-24T22:50:57.000000Z',
 'usage_details': {'type': 'usage_details',
  'input_token_count': 1637,
  'output_token_count': 114,
  'total_token_count': 1751},
 'additional_properties': {'system_fingerprint': 'fp_efad92c60b',
  'logprobs': None}}

In [41]:
raw = r['messages'][0]['contents'][0]['text']
raw

'{"name":"John Sir","email":"johnsir@gmail.com","phone":"650-336-4590","skills":["R","Python","SQL","Hadoop","Hive","MrJob","Tableau","Git","AWS","SPSS","SAS","Matlab","Spark","Storm","Bash","EViews","Demetra+","D3.js","Gephi","Neo4j","QGIS"],"total_experience_years":5,"last_job_title":"Data Scientist","education":"Master of Information and Data Science from University of California, Berkeley"}'

In [42]:
import json
data = json.loads(raw)
data

{'name': 'John Sir',
 'email': 'johnsir@gmail.com',
 'phone': '650-336-4590',
 'skills': ['R',
  'Python',
  'SQL',
  'Hadoop',
  'Hive',
  'MrJob',
  'Tableau',
  'Git',
  'AWS',
  'SPSS',
  'SAS',
  'Matlab',
  'Spark',
  'Storm',
  'Bash',
  'EViews',
  'Demetra+',
  'D3.js',
  'Gephi',
  'Neo4j',
  'QGIS'],
 'total_experience_years': 5,
 'last_job_title': 'Data Scientist',
 'education': 'Master of Information and Data Science from University of California, Berkeley'}

In [43]:
data['skills']

['R',
 'Python',
 'SQL',
 'Hadoop',
 'Hive',
 'MrJob',
 'Tableau',
 'Git',
 'AWS',
 'SPSS',
 'SAS',
 'Matlab',
 'Spark',
 'Storm',
 'Bash',
 'EViews',
 'Demetra+',
 'D3.js',
 'Gephi',
 'Neo4j',
 'QGIS']

Benefits

1. ATS Integration:

    Clean JSON goes directly into your Applicant Tracking System.

2. No Post-Processing Required:

    No regex, no custom parsing — the LLM formats everything for you.

3. Zero Hallucination Formatting:

    Schema forces strict JSON output.

4. Scalable for Enterprises:
    
    Thousands of resumes per day → automated.

5. Reusable Across Domains

Same technique works for:

        Invoice parsing

        Contract extraction

        Support ticket classification

        Insurance claims

        Medical records