# Structured Output

In **LangChain**, structured output refers to the practice of having language models return responses in a well-defined data format (for example, JSON), rather than free-form text. This makes the model output easier to parse and work with programmatically.

**[Prompt]** - Can you create a one-day travel itinerary for Paris?

**[LLM’s Unstructured Response]**  
Here’s a suggested itinerary:  
- Morning: Visit the Eiffel Tower.  
- Afternoon: Walk through the Louvre Museum.  
- Evening: Enjoy dinner at a Seine riverside café.

**[JSON enforced output]**  
```json  
{  
  "time": "Morning",  
  "activity": "Visit the Eiffel Tower"  
},  
{  
  "time": "Afternoon",  
  "activity": "Walk through the Louvre Museum"  
},  
{  
  "time": "Evening",  
  "activity": "Enjoy dinner at a Seine riverside café"  
}  

### What is Structured Output in LangChain?

**Structured output** is a technique where language models (LLMs) are prompted to return their results in a specific, machine-readable format—such as JSON, lists, tables, or key-value pairs—rather than just free-form text. This is essential for downstream processing and reliable automation.

### Why Use Structured Output?

- **Reliable Parsing:** Easily extract values for use in your application.
- **Automation:** Automate post-processing, function calls, or data storage.
- **Validation:** Ensure the response follows a schema or expected structure.
- **Chaining:** Pass structured results between components or chains.

### How to Achieve Structured Output in LangChain

1. **Prompt Engineering:**  
   Instruct the LLM to return output in the desired format, e.g.,  
   “Return your answer as a JSON object with the fields ‘summary’ and ‘keywords’.”

2. **Output Parsers:**  
   Use LangChain’s built-in output parser classes to validate and extract fields from the LLM’s response.


### Example (Python)

```python
from langchain.prompts import PromptTemplate
from langchain.output_parsers import StructuredOutputParser, ResponseSchema

response_schemas = [
    ResponseSchema(name="summary", description="A short summary"),
    ResponseSchema(name="keywords", description="List of main keywords")
]

output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

prompt = PromptTemplate(
    template="Summarize the following text and extract keywords. {format_instructions}\nText: {text}",
    input_variables=["text"],
    partial_variables={"format_instructions": output_parser.get_format_instructions()}
)

# Sample usage
chain_input = {"text": "LangChain is a framework for building LLM-powered applications."}
final_prompt = prompt.format(**chain_input)
# Pass final_prompt to your LLM and parse with output_parser
```

### Advanced Usage

- **Custom Schemas:** Define complex nested data structures for more advanced tasks.
- **Error Handling:** Gracefully handle responses that don’t match the desired format.
- **Integration:** Use structured outputs to trigger actions, populate databases, or drive APIs.

# Why do we need Structured Output

- Data Extraction
- API building
- Agents

We need structured output to ensure that responses from language models are in a predictable, machine-readable format (like JSON, lists, or tables). This makes it easy to:

- Reliably extract information for downstream processing or automation
- Validate and parse results with code, reducing errors
- Integrate LLM outputs into other systems and workflows
- Chain multiple components together by passing structured data

In short, structured output turns free-form AI responses into actionable, dependable data for real-world applications.

# Ways to get Structured Output

### 1. Prompt Engineering  
Carefully instruct the model to respond in a specific format, such as JSON, a table, or bullet points.  
**Example:**  
```
Please summarize the following text. Return the result as a JSON object with "summary" and "keywords" fields.
```

### 2. Output Parsers  
LangChain provides output parsers that check, validate, and extract structured data from the model’s response.  
- **StructuredOutputParser:** Works with prompts that request JSON or similar formats.
- **PydanticOutputParser:** Uses Pydantic schemas to validate and parse the output.

### 3. Response Schemas  
Define a schema for the output (e.g., what fields should be present) and use it to guide both the prompt and the parser.


### 4. Function Calling (Tool Use)  
Some LLMs (like OpenAI GPT-4) support “function calling” where the model outputs a JSON object matching a specified schema, which can be directly used by downstream code.

### 5. Few-shot Examples  
Include examples of the desired structured output in your prompt.  
**Example:**  
```
Text: "LangChain is a framework for LLM applications."
Output: {"summary": "LangChain is a framework for LLM apps.", "keywords": ["LangChain", "framework", "LLM"]}
```

### 6. Post-processing  
If the model output is not strictly structured, use regular expressions or custom code to extract the needed fields as a last resort.


### Summary Table

| Method                | Key Benefit                    | LangChain Support    |
|-----------------------|-------------------------------|---------------------|
| Prompt Engineering    | Flexible, quick                | Yes                 |
| Output Parsers        | Automatic validation           | Yes                 |
| Response Schemas      | Consistency, reliability       | Yes                 |
| Function Calling      | Highly reliable, direct JSON   | Yes (w/ OpenAI, etc.)|
| Few-shot Examples     | More control, higher accuracy  | Yes                 |
| Post-processing       | Works with messy outputs       | Manual              |

**Best Practice:**  
Combine prompt engineering with output parsers or schemas for the most robust structured output in LangChain.

# With structured output


### 1. Prompt Engineering (with explicit format instruction)
```python
prompt = PromptTemplate(
    template="Extract keywords from the text and return them as a JSON list: {text}",
    input_variables=["text"]
)
```


### 2. Output Parsers (LangChain)
```python
from langchain.output_parsers import StructuredOutputParser, ResponseSchema

schemas = [
    ResponseSchema(name="summary", description="Short summary"),
    ResponseSchema(name="keywords", description="List of keywords")
]
output_parser = StructuredOutputParser.from_response_schemas(schemas)

prompt = PromptTemplate(
    template="Summarize and extract keywords. {format_instructions}\nText: {text}",
    input_variables=["text"],
    partial_variables={"format_instructions": output_parser.get_format_instructions()}
)
```
- The parser will validate and parse the model’s response into structured Python data.


### 3. Response Schemas
- Define the expected JSON/schema fields and provide them in both prompt and parser, ensuring reliable output.


### 4. Function Calling (Tool Use)
- For OpenAI (GPT-4) and similar models, define a function schema and ask the model to "call" it; the model will return a structured JSON object.


### 5. Few-shot Prompting
```
Text: "LangChain is a framework for LLMs."
Output: {"summary": "LangChain is a framework.", "keywords": ["LangChain", "framework", "LLM"]}
```
- Including such examples in your prompt improves structured compliance.


### 6. Post-processing (as fallback)
- Use regex or custom code to extract fields from unstructured output if needed.


**Best Practice:** Combine prompt engineering with LangChain’s output parsers and schemas for the most robust structured outputs. This ensures responses are machine-readable, reliable, and ready for downstream automation.

# TypedDict

TypedDict is a way to define a dictionary in Python where you specify what keys and values should exist. It helps ensure that your dictionary follows a specific structure.

### Why use TypedDict?

- It tells Python what keys are required and what types of values they should have.
- It does not validate data at runtime (it just helps with type hints for better coding).

#### Example:
- simple TypedDict
- Annotated TypedDict
- Literal
- More complex — with pros and cons

### TypedDict in Python

**TypedDict** is a feature in Python’s `typing` module that allows you to specify the expected keys and value types of a dictionary, enabling type checking and better code clarity. It is especially useful when working with dictionaries that have a fixed schema, similar to records or objects.

#### Example Usage

```python
from typing import TypedDict

class Movie(TypedDict):
    title: str
    year: int
    rating: float

movie: Movie = {
    "title": "Inception",
    "year": 2010,
    "rating": 8.8
}
```

#### Key Points

- Introduced in Python 3.8 (`typing.TypedDict`). For Python 3.9+, you can also use `typing_extensions`.
- Helps with static type checking using type checkers like mypy or Pyright.
- **Optional fields**: You can make some keys optional using `total=False`:

```python
class User(TypedDict, total=False):
    name: str
    age: int
```

- Supports inheritance to extend schemas.


#### When to Use TypedDict

- When you want type safety for dictionary objects with a known structure.
- When collaborating on codebases or using static analysis tools for bug prevention.

#### Learn More

- [Python Docs: TypedDict](https://docs.python.org/3/library/typing.html#typed-dict)
- [PEP 589 – TypedDict: Type Hints for Dictionaries](https://peps.python.org/pep-0589/)

**Summary:**  
`TypedDict` brings the advantages of static typing to Python dicts, making your code safer and clearer.

# Pydantic

Pydantic is a data validation and data parsing library for Python. It ensures that the data you work with is correct, structured, and type-safe.

### Pydantic Overview

[Pydantic](https://docs.pydantic.dev/) is a Python library for data validation and settings management using Python type annotations. It is widely used for creating data models with type safety, automatic parsing, and validation of input data.

#### Key Features

- **Data validation:** Ensures the data matches the specified types.
- **Type annotations:** Models are defined using standard Python type hints.
- **Automatic parsing:** Converts input types (e.g., strings to integers) if possible.
- **Error reporting:** Provides clear error messages for invalid data.
- **Nested models:** Supports complex, nested data structures.
- **Serialization:** Easily convert models to and from dictionaries and JSON.


#### Example Usage

```python
from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str
    is_active: bool = True

# Creating an instance (automatic type coercion)
user = User(id="123", name="Alice")
print(user.id)         # 123 (int)
print(user.is_active)  # True

# Validation error example
try:
    User(id="abc", name="Bob")
except Exception as e:
    print(e)
```


#### When to Use Pydantic

- **API request/response validation** (e.g., FastAPI)
- **Configuration management**
- **Structured data parsing**
- **Type-safe data pipelines**


#### Additional Resources

- [Pydantic Documentation](https://docs.pydantic.dev/)
- [Pydantic Tutorial](https://docs.pydantic.dev/latest/tutorial/)

**Summary:**  
Pydantic makes it easy to define, validate, and parse structured data in Python, leveraging type hints for robust, clear, and maintainable code.

# JSON in Python & LangChain Context

**JSON** (JavaScript Object Notation) is a lightweight, human-readable data format commonly used for data interchange between systems. In Python and frameworks like LangChain, JSON is used for structured output, configuration, and data exchange.


#### Key Points

- **Structure:** JSON supports objects (dictionaries), arrays (lists), strings, numbers, booleans, and null.
- **Serialization/Deserialization:**  
  - To encode Python objects as JSON: `json.dumps()`
  - To decode JSON strings to Python objects: `json.loads()`
- **Common Use Cases:**  
  - API communication  
  - Configuration files  
  - Structured outputs from LLMs

#### Example: Working with JSON in Python

```python
import json

data = {
    "name": "Adil",
    "age": 30,
    "active": True
}

# Serialize Python dict to JSON string
json_str = json.dumps(data)
print(json_str)  # {"name": "Adil", "age": 30, "active": true}

# Deserialize JSON string back to Python dict
parsed = json.loads(json_str)
print(parsed["name"])  # Adil
```

#### JSON in LangChain

- **Structured Output:**  
  Prompt LLMs to return output in JSON for easy parsing and automated processing.
- **With Output Parsers:**  
  Use LangChain’s output parsers to validate/parse JSON responses for robust workflows.

### Useful Links

- [Python json module docs](https://docs.python.org/3/library/json.html)
- [LangChain: Output Parsers (structured output)](https://python.langchain.com/docs/modules/model_io/output_parsers/)


**Summary:**  
JSON is essential for reliable, machine-readable data exchange and is especially useful when prompting LLMs for structured output or integrating with other systems.

# When to use what?

#### Use **TypedDict** if:
- You only need type hints (basic structure enforcement).
- You don’t need validation (e.g., checking numbers are positive).
- You trust the LLM to return correct data.

#### Use **Pydantic** if:
- You need data validation (e.g., sentiment must be "positive", "neutral", or "negative").
- You need default values if the LLM misses fields.
- You want automatic type conversion (e.g., "100" → 100).

#### Use **JSON Schema** if:
- You don’t want to import extra Python libraries (Pydantic).
- You need validation but don’t need Python objects.
- You want to define structure in a standard JSON format.

#### Use **JSON Schema** if:
- You don't want to import extra Python libraries (Pydantic).
- You need validation but don’t need Python objects.
- You want to define structure in a standard JSON format.

### 🛑 When to Use What?
| Feature               | TypedDict | Pydantic | JSON Schema |
|------------------------|------------|----------|--------------|
| Basic structure        | ✅         | ✅       | ✅           |
| Type enforcement       | ✅         | ✅       | ✅           |
| Data validation        | ❌         | ✅       | ✅           |
| Default values         | ❌         | ✅       | ❌           |
| Automatic conversion   | ❌         | ✅       | ❌           |
| Cross-language compatibility | ❌     | ❌       | ✅           |