# 2.5 Output Parsers in LangChain

## üéØ Learning Objectives

**Output parsers** transform unstructured LLM text into structured Python objects. In this notebook, you'll learn:

1. **PydanticOutputParser** - Parse LLM output into Pydantic models
2. **JsonOutputParser** - Parse into Python dictionaries
3. **CommaSeparatedListOutputParser** - Parse into Python lists

## üí° Why Use Output Parsers?

| Without Parser | With Parser |
|---------------|-------------|
| `"The sentiment is positive and the topic is AI"` | `{"sentiment": "positive", "topic": "AI"}` |
| Manual string parsing | Automatic type validation |
| Error-prone | Reliable and consistent |

## üîë Key Concept

Output parsers work in two steps:
1. **Inject format instructions** into the prompt
2. **Parse the LLM response** into structured data

---

## üì¶ Installation (Run Once)

## üîê Environment Setup

#### Enter your Open AI Key here

You can get the key from [here](https://platform.openai.com/api-keys) after creating an account or signing in

In [None]:
# from getpass import getpass

# OPENAI_KEY = getpass('Please enter your Open AI API Key here: ')

In [None]:
# import os

# os.environ['OPENAI_API_KEY'] = OPENAI_KEY

## ü§ñ Initialize the LLM

We'll use OpenAI's GPT-4o-mini for structured output generation. More capable models generally produce better structured outputs.

In [4]:
# ============================================================================
# ENVIRONMENT SETUP: Load API Keys & Import Dependencies
# ============================================================================
# We use python-dotenv to securely load API keys from a .env file
# This is a best practice - never hardcode API keys in your notebooks!
# ============================================================================

from dotenv import load_dotenv
import os
import sys
import platform

# Load environment variables from .env file
load_dotenv()

# Add parent directory to path for importing helpers
sys.path.append(os.path.abspath("../.."))

# Import our LLM factory functions
# - get_groq_llm(): Creates a Groq-hosted LLM (fast inference with open-source models)
# - get_openai_llm(): Creates an OpenAI GPT model
# - get_databricks_llm(): Creates a Databricks-hosted LLM
from helpers.utils import get_groq_llm, get_openai_llm, get_databricks_llm

print("‚úÖ Environment variables loaded successfully!")
print(f"üìç Running on: {platform.system()}")

# -----------------------------------------------------------------------------
# Initialize the LLM based on platform or preference
# The choice of LLM affects tool calling capabilities and speed
# -----------------------------------------------------------------------------
if sys.platform == "win32":
    # Windows: Use Groq for fast inference
    llm = get_groq_llm()
elif sys.platform == "darwin":
    # macOS: Use Databricks-hosted Gemini
    llm = get_databricks_llm("databricks-gpt-5-1")  
else:
    # Linux: Default to Groq
    llm = get_groq_llm()

# Print which LLM we're using
if hasattr(llm, 'model_name'):
    print(f"ü§ñ LLM initialized: {llm.model_name}")
elif hasattr(llm, 'model'):
    print(f"ü§ñ LLM initialized: {llm.model}")
else:
    print("ü§ñ LLM initialized successfully")

‚úÖ Environment variables loaded successfully!
üìç Running on: Darwin
ü§ñ LLM initialized: databricks-gpt-5-1


---

## üìä Overview of Output Parsers

LangChain provides several output parsers for different use cases:

| Parser | Output Type | Best For |
|--------|-------------|----------|
| `PydanticOutputParser` | Pydantic Model | Complex schemas with validation |
| `JsonOutputParser` | Python Dict | Simple JSON structures |
| `CommaSeparatedListOutputParser` | Python List | Lists of items |
| `StrOutputParser` | String | Simple text extraction |


---

## üî∑ Parser 1: PydanticOutputParser

The most powerful parser - converts LLM output into **validated Pydantic models**.

**Advantages:**
- Type checking and validation
- IDE autocomplete support
- Clear schema definition
- Automatic error messages

> **Note:** Use a capable LLM (GPT-4o-mini or better) for reliable structured output.

In [5]:
# ============================================================================
# STEP 1: DEFINE YOUR DATA MODEL
# ============================================================================
# Pydantic models define the structure of your expected output
# Field() adds descriptions that help the LLM understand what to generate
# ============================================================================

from langchain_core.prompts import PromptTemplate, ChatPromptTemplate
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field
from typing import List

# Define the output structure using Pydantic
class QueryResponse(BaseModel):
    """Structured response for topic analysis"""
    description: str = Field(description="A brief description of the topic asked by the user")
    pros: str = Field(description="3 bullet points showing the pros of the topic")
    cons: str = Field(description="3 bullet points showing the cons of the topic")
    conclusion: str = Field(description="One line conclusion of the topic")

# Create the parser from the Pydantic model
parser = PydanticOutputParser(pydantic_object=QueryResponse)

print("‚úÖ PydanticOutputParser created")
print(f"üìã Expected fields: {list(QueryResponse.model_fields.keys())}")

‚úÖ PydanticOutputParser created
üìã Expected fields: ['description', 'pros', 'cons', 'conclusion']


In [6]:
# ============================================================================
# STEP 2: VIEW FORMAT INSTRUCTIONS
# ============================================================================
# The parser auto-generates instructions that tell the LLM how to format output
# These instructions are injected into your prompt!
# ============================================================================

print("üìú Format Instructions (injected into prompt):")
print("=" * 60)
print(parser.get_format_instructions())
print("=" * 60)

üìú Format Instructions (injected into prompt):
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"description": "Structured response for topic analysis", "properties": {"description": {"description": "A brief description of the topic asked by the user", "title": "Description", "type": "string"}, "pros": {"description": "3 bullet points showing the pros of the topic", "title": "Pros", "type": "string"}, "cons": {"description": "3 bullet points showing the cons of the topic", "title": "Cons", "type": "string"}, "conclusion": {"description": "One line conclusion of the topic", "title": "Concl

In [9]:
# ============================================================================
# STEP 3: CREATE THE CHAIN
# ============================================================================
# The chain: Prompt ‚Üí LLM ‚Üí Parser
# - partial_variables injects the format instructions automatically
# - The parser converts the LLM's text output into a Pydantic object
# ============================================================================

prompt_txt = """
Answer the user query and generate the response based on the following formatting instructions.

Format Instructions:
{format_instructions}

Query:
{query}
"""

prompt = PromptTemplate(
    template=prompt_txt,
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

# Build the LCEL chain
chain = prompt | llm | parser

# Test with a simple query
response = chain.invoke({"query": "What is the capital of France?"})

print("üéØ Parsed Response:")
print(f"   Type: {type(response)}")
print(f"   Description: {response.description}...")

üéØ Parsed Response:
   Type: <class '__main__.QueryResponse'>
   Description: The user is asking for the capital city of France, which is Paris....


In [None]:
# ============================================================================
# USING THE PARSED RESPONSE
# ============================================================================

question = "Tell me about Commercial Real Estate"
response = chain.invoke({"query": question})

print(f"üìù Query: {question}")
print("-" * 50)

In [None]:
# Access fields as Python attributes (type-safe!)
print("üìã Description:")
print(response.description)

In [None]:
print("\n‚úÖ Pros:")
print(response.pros)

In [None]:
# Convert to dictionary for JSON serialization
print("\nüì¶ As Dictionary:")
response.model_dump()

In [None]:
# Pretty print all fields
print("üìã All Fields:")
print("=" * 50)
for k, v in response.model_dump().items():
    print(f"\nüîπ {k.upper()}:")
    print(f"   {v}")

---

## üî∑ Parser 2: JsonOutputParser

Similar to PydanticOutputParser but returns a **Python dictionary** instead of a Pydantic model.

**When to use:**
- When you need a simple dict, not a Pydantic model
- For dynamic schemas
- When you want flexibility over strict typing


In [None]:
from typing import List

from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import JsonOutputParser
from pydantic import BaseModel, Field

# Define your desired data structure - like a python data class.
class QueryResponse(BaseModel):
    description: str = Field(description="A brief description of the topic asked by the user")
    pros: str = Field(description="3 bullet points showing the pros of the topic asked by the user")
    cons: str = Field(description="3 bullet points showing the cons of the topic asked by the user")
    conclusion: str = Field(description="One line conclusion of the topic asked by the user")

# Set up a parser + inject instructions into the prompt template.
parser = JsonOutputParser(pydantic_object=QueryResponse)
parser

In [None]:
# create the final prompt with formatting instructions from the parser
prompt_txt = """
             Answer the user query and generate the response based on the following formatting instructions

             Format Instructions:
             {format_instructions}

             Query:
             {query}
            """
prompt = PromptTemplate(
    template=prompt_txt,
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

print(prompt)

In [None]:
# create a simple LCEL chain to take the prompt, pass it to the LLM, enforce response format using the parser
chain = (prompt
              |
            chatgpt
              |
            parser)
chain

In [None]:
topic_queries = [
    "Tell me about commercial real estate",
    "Tell me about Generative AI"
]

topic_queries_formatted = [{"query": topic}
                    for topic in topic_queries]
topic_queries_formatted

In [None]:
responses = chain.map().invoke(topic_queries_formatted)

In [None]:
responses[0], type(responses[0])

In [None]:
import pandas as pd

df = pd.DataFrame(responses)
df

In [None]:
for response in responses:
  for k,v in response.items():
    print(f"{k}:\n{v}\n")
  print('-----')

---

## üî∑ Parser 3: CommaSeparatedListOutputParser

The simplest parser - converts comma-separated text into a **Python list**.

**When to use:**
- Generating lists of items
- Simple enumerations
- Tags or categories

In [None]:
# ============================================================================
# COMMASEPARATEDLISTOUTPUTPARSER
# ============================================================================

from langchain_core.output_parsers import CommaSeparatedListOutputParser
from langchain_core.prompts import PromptTemplate

output_parser = CommaSeparatedListOutputParser()

# View the format instructions
format_instructions = output_parser.get_format_instructions()
print("üìú Format Instructions:")
print(format_instructions)

In [None]:
format_instructions = output_parser.get_format_instructions()

# And a query intented to prompt a language model to populate the data structure.
prompt_txt = """
             Create a list of 5 different ways in which Generative AI can be used

             Output format instructions:
             {format_instructions}
             """

prompt = PromptTemplate.from_template(template=prompt_txt)

prompt = PromptTemplate(
    template=prompt_txt,
    partial_variables={"format_instructions": format_instructions},
)
print(prompt)

In [None]:
# create a simple LLM Chain - more on this later
llm_chain = (prompt
              |
            chatgpt
              |
            output_parser)

# run the chain
response = llm_chain.invoke({})

In [None]:
print("üìã Parsed List:")
print(response)

In [None]:
# Access list items
print("\nüìù List Items:")
for i, item in enumerate(response, 1):
    print(f"  {i}. {item}")

In [None]:
print(f"\nüìä Type: {type(response)}")  # <class 'list'>

# ============================================================================
# üìù KEY TAKEAWAYS FROM THIS NOTEBOOK:
# ============================================================================
# 1. Output parsers convert unstructured LLM text ‚Üí structured Python objects
# 2. PydanticOutputParser: Best for complex schemas with validation
# 3. JsonOutputParser: Returns Python dictionaries
# 4. CommaSeparatedListOutputParser: Returns Python lists
# 5. Use .get_format_instructions() to see what's injected into prompts
# 6. Chain: prompt | llm | parser for clean LCEL integration
# ============================================================================