# HR Structured Outputs with LangChain 1.0

**Module:** Working with Structured Response Formats

**Learning Objectives:**
- Understand 4 different ways to define structured outputs
- Compare Pydantic, Dataclass, TypedDict, and JSON Schema
- Build production-ready HR agents with structured responses
- Apply best practices for data extraction

**Use Case:** Extract structured employee information from unstructured text

**Time:** 2-3 hours

---
## Setup: Install Dependencies

In [1]:
# Install LangChain 1.0 alpha packages
!pip install --pre -U langchain langchain-openai pydantic

Collecting langchain
  Downloading langchain-1.0.0-py3-none-any.whl.metadata (4.6 kB)
Collecting langchain-openai
  Downloading langchain_openai-1.0.0-py3-none-any.whl.metadata (1.8 kB)
Collecting pydantic
  Downloading pydantic-2.12.3-py3-none-any.whl.metadata (87 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m87.7/87.7 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-core<2.0.0,>=1.0.0 (from langchain)
  Downloading langchain_core-1.0.0-py3-none-any.whl.metadata (3.4 kB)
Collecting langgraph<1.1.0,>=1.0.0 (from langchain)
  Downloading langgraph-1.0.0-py3-none-any.whl.metadata (7.4 kB)
Collecting pydantic-core==2.41.4 (from pydantic)
  Downloading pydantic_core-2.41.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.3 kB)
Collecting langgraph-checkpoint<3.0.0,>=2.1.0 (from langgraph<1.1.0,>=1.0.0->langchain)
  Downloading langgraph_checkpoint-2.1.2-py3-none-any.whl.metadata (4.2 kB)
Collecting langgraph-prebuilt<1.1.0

## Setup: Configure OpenAI API Key

In [2]:
# For Google Colab
from google.colab import userdata
import os

OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY

print("✅ API Key configured!")

✅ API Key configured!


In [None]:
# Alternative: For local Jupyter or other environments
# import os
# os.environ['OPENAI_API_KEY'] = 'your-api-key-here'
# print("✅ API Key configured!")

## Import Required Libraries

In [3]:
from typing import Optional, List
from dataclasses import dataclass
from typing_extensions import TypedDict
from pydantic import BaseModel, Field
from langchain.agents import create_agent
from langchain_core.tools import tool

print("✅ All imports successful!")

✅ All imports successful!


---
# Lab 1: Pydantic BaseModel (⭐ Recommended)

**Objective:** Use Pydantic BaseModel for structured output

**Benefits:**
- Automatic validation
- Rich field descriptions
- IDE autocomplete support
- Easy serialization
- Best integration with LangChain

## Step 1: Define Pydantic Model

In [4]:
class EmployeeInfo(BaseModel):
    """Structured employee information using Pydantic."""

    employee_id: str = Field(
        description="Unique employee identifier (e.g., EMP001)"
    )
    full_name: str = Field(
        description="Full name of the employee"
    )
    email: str = Field(
        description="Work email address"
    )
    phone: str = Field(
        description="Contact phone number"
    )
    department: str = Field(
        description="Department name (e.g., Engineering, HR, Sales)"
    )
    position: str = Field(
        description="Job title/position"
    )
    salary: Optional[float] = Field(
        default=None,
        description="Annual salary in INR (optional)"
    )
    joining_date: Optional[str] = Field(
        default=None,
        description="Date of joining in YYYY-MM-DD format"
    )
    skills: Optional[List[str]] = Field(
        default=None,
        description="List of key skills"
    )

print("✅ EmployeeInfo Pydantic model defined!")
print(f"\nModel fields: {list(EmployeeInfo.model_fields.keys())}")

✅ EmployeeInfo Pydantic model defined!

Model fields: ['employee_id', 'full_name', 'email', 'phone', 'department', 'position', 'salary', 'joining_date', 'skills']


## Step 2: Create Agent with Pydantic Response Format

In [5]:
# Define a simple tool (optional - for demonstration)
@tool
def get_employee_database(query: str) -> str:
    """Search employee database for information."""
    return "Database contains employee records..."

# Create agent with Pydantic response format
tools = [get_employee_database]

agent_pydantic = create_agent(
    model="openai:gpt-4o-mini",
    tools=tools,
    response_format=EmployeeInfo  # Auto-selects ProviderStrategy
)

print("✅ Agent created with Pydantic response format!")

✅ Agent created with Pydantic response format!


## Step 3: Test the Agent

In [6]:
# Sample unstructured employee data
input_text = """
Extract employee info: Priya Sharma, EMP101, works in Engineering
department as Senior Developer. Email: priya.sharma@company.com,
Phone: +91-9876543210. Joined on 2020-05-15. Salary: 1200000 INR.
Skills: Python, Django, AWS, Docker.
"""

result = agent_pydantic.invoke({
    "messages": [{"role": "user", "content": input_text}]
})

employee = result["structured_response"]

print("=" * 70)
print("PYDANTIC BASEMODEL RESULT")
print("=" * 70)
print(f"Type: {type(employee)}")
print(f"\nEmployee ID: {employee.employee_id}")
print(f"Name: {employee.full_name}")
print(f"Email: {employee.email}")
print(f"Phone: {employee.phone}")
print(f"Department: {employee.department}")
print(f"Position: {employee.position}")
if employee.salary:
    print(f"Salary: ₹{employee.salary:,.2f}")
if employee.joining_date:
    print(f"Joining Date: {employee.joining_date}")
if employee.skills:
    print(f"Skills: {', '.join(employee.skills)}")

print("\n✅ Pydantic provides validation, serialization, and IDE support!")

PYDANTIC BASEMODEL RESULT
Type: <class '__main__.EmployeeInfo'>

Employee ID: EMP101
Name: Priya Sharma
Email: priya.sharma@company.com
Phone: +91-9876543210
Department: Engineering
Position: Senior Developer
Salary: ₹1,200,000.00
Joining Date: 2020-05-15
Skills: Python, Django, AWS, Docker

✅ Pydantic provides validation, serialization, and IDE support!


## Bonus: Serialize to Dictionary/JSON

In [7]:
import json

# Convert to dictionary
employee_dict = employee.model_dump()
print("As Dictionary:")
print(employee_dict)

# Convert to JSON
employee_json = employee.model_dump_json(indent=2)
print("\nAs JSON:")
print(employee_json)

As Dictionary:
{'employee_id': 'EMP101', 'full_name': 'Priya Sharma', 'email': 'priya.sharma@company.com', 'phone': '+91-9876543210', 'department': 'Engineering', 'position': 'Senior Developer', 'salary': 1200000.0, 'joining_date': '2020-05-15', 'skills': ['Python', 'Django', 'AWS', 'Docker']}

As JSON:
{
  "employee_id": "EMP101",
  "full_name": "Priya Sharma",
  "email": "priya.sharma@company.com",
  "phone": "+91-9876543210",
  "department": "Engineering",
  "position": "Senior Developer",
  "salary": 1200000.0,
  "joining_date": "2020-05-15",
  "skills": [
    "Python",
    "Django",
    "AWS",
    "Docker"
  ]
}


---
# Lab 2: Python Dataclass

**Objective:** Use Python's built-in dataclass for structured output

**Benefits:**
- Built into Python 3.7+
- No external dependencies
- Simple and lightweight
- Good for prototypes

## Step 1: Define Dataclass

In [8]:
@dataclass
class EmployeeInfoDataclass:
    """Structured employee information using dataclass."""

    employee_id: str
    full_name: str
    email: str
    phone: str
    department: str
    position: str
    salary: Optional[float] = None
    joining_date: Optional[str] = None
    skills: Optional[List[str]] = None

print("✅ EmployeeInfoDataclass defined!")

✅ EmployeeInfoDataclass defined!


## Step 2: Create Agent with Dataclass Response Format

In [9]:
agent_dataclass = create_agent(
    model="openai:gpt-4o-mini",
    tools=tools,
    response_format=EmployeeInfoDataclass
)

print("✅ Agent created with Dataclass response format!")

✅ Agent created with Dataclass response format!


## Step 3: Test the Agent

In [10]:
input_text = """
Extract info: Rahul Verma (EMP102) - Engineering Manager
Contact: rahul.verma@company.com, +91-9876543211
Joined: 2018-03-20, Salary: 1800000 INR
Skills: Team Management, System Design, Kubernetes
"""

result = agent_dataclass.invoke({
    "messages": [{"role": "user", "content": input_text}]
})

employee = result["structured_response"]

print("=" * 70)
print("PYTHON DATACLASS RESULT")
print("=" * 70)
print(f"Type: {type(employee)}")
print(f"\nEmployee ID: {employee.employee_id}")
print(f"Name: {employee.full_name}")
print(f"Email: {employee.email}")
print(f"Phone: {employee.phone}")
print(f"Department: {employee.department}")
print(f"Position: {employee.position}")
if employee.salary:
    print(f"Salary: ₹{employee.salary:,.2f}")
if employee.skills:
    print(f"Skills: {', '.join(employee.skills)}")

print("\n✅ Dataclass is simple and built into Python!")

PYTHON DATACLASS RESULT
Type: <class '__main__.EmployeeInfoDataclass'>

Employee ID: EMP102
Name: Rahul Verma
Email: rahul.verma@company.com
Phone: +91-9876543211
Department: Engineering
Position: Manager
Salary: ₹1,800,000.00
Skills: Team Management, System Design, Kubernetes

✅ Dataclass is simple and built into Python!


---
# Lab 3: TypedDict

**Objective:** Use TypedDict for dictionary-based structured output

**Benefits:**
- Dictionary-based access
- Type hints for IDEs
- Flexible structure
- Works well with dict workflows

## Step 1: Define TypedDict

In [11]:
class EmployeeInfoTypedDict(TypedDict):
    """Structured employee information using TypedDict."""

    employee_id: str
    full_name: str
    email: str
    phone: str
    department: str
    position: str
    salary: Optional[float]
    joining_date: Optional[str]
    skills: Optional[List[str]]

print("✅ EmployeeInfoTypedDict defined!")

✅ EmployeeInfoTypedDict defined!


## Step 2: Create Agent with TypedDict Response Format

In [12]:
agent_typeddict = create_agent(
    model="openai:gpt-4o-mini",
    tools=tools,
    response_format=EmployeeInfoTypedDict
)

print("✅ Agent created with TypedDict response format!")

✅ Agent created with TypedDict response format!


## Step 3: Test the Agent

In [13]:
input_text = """
Employee details: Anjali Patel, ID: EMP103
HR Director, anjali.patel@company.com
Phone: +91-9876543212, Joined: 2015-01-10
Annual compensation: 2500000 INR
Key skills: Recruitment, Policy Development, Employee Relations
"""

result = agent_typeddict.invoke({
    "messages": [{"role": "user", "content": input_text}]
})

employee = result["structured_response"]

print("=" * 70)
print("TYPEDDICT RESULT")
print("=" * 70)
print(f"Type: {type(employee)}")
print(f"\nEmployee ID: {employee['employee_id']}")
print(f"Name: {employee['full_name']}")
print(f"Email: {employee['email']}")
print(f"Phone: {employee['phone']}")
print(f"Department: {employee['department']}")
print(f"Position: {employee['position']}")
if employee.get('salary'):
    print(f"Salary: ₹{employee['salary']:,.2f}")
if employee.get('skills'):
    print(f"Skills: {', '.join(employee['skills'])}")

print("\n✅ TypedDict returns a dictionary with type hints!")

TYPEDDICT RESULT
Type: <class 'dict'>

Employee ID: EMP103
Name: Anjali Patel
Email: anjali.patel@company.com
Phone: +91-9876543212
Department: Human Resources
Position: HR Director
Salary: ₹2,500,000.00
Skills: Recruitment, Policy Development, Employee Relations

✅ TypedDict returns a dictionary with type hints!


---
# Lab 4: JSON Schema

**Objective:** Use JSON Schema for structured output

**Benefits:**
- Language-agnostic
- Fine-grained validation
- Enum constraints
- Cross-platform compatibility

## Step 1: Define JSON Schema

In [14]:
EMPLOYEE_INFO_JSON_SCHEMA = {
    "type": "object",
    "title": "EmployeeInfo",
    "description": "Structured employee information using JSON Schema",
    "properties": {
        "employee_id": {
            "type": "string",
            "description": "Unique employee identifier (e.g., EMP001)"
        },
        "full_name": {
            "type": "string",
            "description": "Full name of the employee"
        },
        "email": {
            "type": "string",
            "description": "Work email address",
            "format": "email"
        },
        "phone": {
            "type": "string",
            "description": "Contact phone number"
        },
        "department": {
            "type": "string",
            "description": "Department name",
            "enum": ["Engineering", "HR", "Sales", "Marketing", "Finance", "Operations"]
        },
        "position": {
            "type": "string",
            "description": "Job title/position"
        },
        "salary": {
            "type": ["number", "null"],
            "description": "Annual salary in INR"
        },
        "joining_date": {
            "type": ["string", "null"],
            "description": "Date of joining in YYYY-MM-DD format",
            "format": "date"
        },
        "skills": {
            "type": ["array", "null"],
            "description": "List of key skills",
            "items": {
                "type": "string"
            }
        }
    },
    "required": ["employee_id", "full_name", "email", "phone", "department", "position"],
    "additionalProperties": False
}

print("✅ JSON Schema defined!")
import json
print("\nSchema preview:")
print(json.dumps(EMPLOYEE_INFO_JSON_SCHEMA, indent=2)[:500] + "...")

✅ JSON Schema defined!

Schema preview:
{
  "type": "object",
  "title": "EmployeeInfo",
  "description": "Structured employee information using JSON Schema",
  "properties": {
    "employee_id": {
      "type": "string",
      "description": "Unique employee identifier (e.g., EMP001)"
    },
    "full_name": {
      "type": "string",
      "description": "Full name of the employee"
    },
    "email": {
      "type": "string",
      "description": "Work email address",
      "format": "email"
    },
    "phone": {
      "type": "stri...


## Step 2: Create Agent with JSON Schema Response Format

In [15]:
agent_json_schema = create_agent(
    model="openai:gpt-4o-mini",
    tools=tools,
    response_format=EMPLOYEE_INFO_JSON_SCHEMA
)

print("✅ Agent created with JSON Schema response format!")

✅ Agent created with JSON Schema response format!


## Step 3: Test the Agent

In [16]:
input_text = """
Parse employee info: Arjun Reddy (EMP104), Sales Team Lead
Email: arjun.reddy@company.com, Mobile: +91-9876543213
Department: Sales, Joining: 2019-07-01
CTC: 1500000 per annum
Expertise: B2B Sales, CRM Management, Negotiation
"""

result = agent_json_schema.invoke({
    "messages": [{"role": "user", "content": input_text}]
})

employee = result["structured_response"]

print("=" * 70)
print("JSON SCHEMA RESULT")
print("=" * 70)
print(f"Type: {type(employee)}")
print(f"\nEmployee ID: {employee['employee_id']}")
print(f"Name: {employee['full_name']}")
print(f"Email: {employee['email']}")
print(f"Phone: {employee['phone']}")
print(f"Department: {employee['department']}")
print(f"Position: {employee['position']}")
if employee.get('salary'):
    print(f"Salary: ₹{employee['salary']:,.2f}")
if employee.get('skills'):
    print(f"Skills: {', '.join(employee['skills'])}")

print("\n✅ JSON Schema provides fine-grained validation and is language-agnostic!")

JSON SCHEMA RESULT
Type: <class 'dict'>

Employee ID: EMP104
Name: Arjun Reddy
Email: arjun.reddy@company.com
Phone: +91-9876543213
Department: Sales
Position: Sales Team Lead
Salary: ₹1,500,000.00
Skills: B2B Sales, CRM Management, Negotiation

✅ JSON Schema provides fine-grained validation and is language-agnostic!


---
# Comparison Summary

Let's compare all four approaches side by side.

In [19]:
import pandas as pd

comparison_data = {
    "Format": ["Pydantic BaseModel", "Python Dataclass", "TypedDict", "JSON Schema"],
    "Validation": ["✅ Rich", "⚠️ Basic", "❌ None", "✅ Rich"],
    "Complexity": ["Medium", "Low", "Low", "High"],
    "Dependencies": ["External", "Built-in", "Built-in", "None"],
    "IDE Support": ["✅ Excellent", "✅ Good", "✅ Good", "❌ Limited"],
    "Documentation": ["✅ Field-level", "❌ Class-only", "❌ Class-only", "✅ Property-level"],
    "Best For": ["Production", "Simple cases", "Dict workflows", "Cross-platform"]
}

df = pd.DataFrame(comparison_data)
print("\n" + "=" * 80)
print("COMPARISON: STRUCTURED OUTPUT FORMATS")
print("=" * 80)
print(df.to_string(index=False))
print("\n" + "=" * 80)
print("RECOMMENDATION: Use Pydantic BaseModel for most HR use cases!")
print("=" * 80)


COMPARISON: STRUCTURED OUTPUT FORMATS
            Format Validation Complexity Dependencies IDE Support    Documentation       Best For
Pydantic BaseModel     ✅ Rich     Medium     External ✅ Excellent    ✅ Field-level     Production
  Python Dataclass   ⚠️ Basic        Low     Built-in      ✅ Good     ❌ Class-only   Simple cases
         TypedDict     ❌ None        Low     Built-in      ✅ Good     ❌ Class-only Dict workflows
       JSON Schema     ✅ Rich       High         None   ❌ Limited ✅ Property-level Cross-platform

RECOMMENDATION: Use Pydantic BaseModel for most HR use cases!


---
# Exercises

## Exercise 1: Create a Leave Request Model

Create a Pydantic model for leave requests that includes:
- employee_id
- leave_type (Casual/Sick/Earned)
- start_date
- end_date
- reason
- days_requested

Then create an agent that extracts this information from text.

In [None]:
# Your code here
class LeaveRequest(BaseModel):
    """TODO: Define the leave request model."""
    pass

# TODO: Create agent and test

## Exercise 2: Performance Review Model

Create a model for performance reviews with:
- employee_id
- reviewer_id
- review_period
- technical_rating (1-5)
- communication_rating (1-5)
- achievements (list)
- areas_of_improvement (list)
- promotion_recommended (boolean)

Add validation to ensure ratings are between 1-5.

In [None]:
# Your code here
class PerformanceReview(BaseModel):
    """TODO: Define the performance review model."""
    pass

# TODO: Add validation constraints
# Hint: Use Field(ge=1, le=5) for ratings

## Exercise 3: Compare All Four Formats

For the same input text, extract employee information using all four formats and compare:
1. Execution time
2. Response structure
3. Ease of access to fields

Which format would you choose for a production HR system?

In [None]:
# Your code here
import time

test_input = "Your test employee data here"

# TODO: Test all four formats and measure time
# TODO: Compare results

## 🌟 Bonus Challenge: Multi-Department Report

Create a complex nested model that can:
1. Process employees from multiple departments
2. Calculate average salary per department
3. List top skills across all employees
4. Identify departments that are understaffed (< 3 employees)
5. Generate an executive summary

Test with at least 10 employees across 4 departments.

In [None]:
# Your code here - Be creative!
class DepartmentReport(BaseModel):
    """TODO: Design your comprehensive report model."""
    pass

---
# Conclusion

**What you learned:**
1. ✅ Four different ways to define structured outputs in LangChain 1.0
2. ✅ Using Pydantic BaseModel for production-ready extraction
3. ✅ Python Dataclass for simple, lightweight schemas
4. ✅ TypedDict for dictionary-based workflows
5. ✅ JSON Schema for language-agnostic specifications
6. ✅ Nested models for complex data structures
7. ✅ Best practices for HR data extraction

**Key Takeaways:**
- **Pydantic** is the recommended choice for most production use cases
- **Field descriptions** are critical for LLM understanding
- **Validation** catches errors early and ensures data quality
- **Nested models** enable complex hierarchical data structures

**Next Steps:**
- Integrate with actual HR databases
- Add more complex validation rules
- Build end-to-end HR automation workflows
- Deploy as production API services

---
**Created with:** LangChain 1.0 + OpenAI + Pydantic

**References:**
- [LangChain Documentation](https://python.langchain.com/)
- [Pydantic Documentation](https://docs.pydantic.dev/)
- [JSON Schema](https://json-schema.org/)