# Output Parsers
Output parsers are responsible for taking the output of an LLM and transforming it to a more suitable format. This is very useful when you are using LLMs to generate any form of structured data.

# What is an Output Parser?
An output parser takes the raw text output from an LLM and processes it into structured data formats, such as lists, JSON, dates, or even custom data structures. This is valuable in NLP workflows where parsed data can be processed, analyzed, or integrated into applications.

# Why Use Output Parsers?
Consistency: Ensures model outputs are always in the same format.

Usability: Structured data is easier to manipulate and integrate.

Error Handling: Allows retry or error-correction mechanisms.

# Parsing JSON Output 
We can use the JsonOutputParser class to parse this output into a Python dictionary:



In [1]:
import groq
from langchain_core.output_parsers import JsonOutputParser

# Assume you have an LLM output in JSON format
llm_output = '{"name": "John", "age": 30, " occupation": "Developer"}'

# Create an instance of the JsonOutputParser
parser = JsonOutputParser()

# Parse the LLM output
parsed_output = parser.parse(llm_output)

print(parsed_output)  # Output: {'name': 'John', 'age': 30, 'occupation': 'Developer'}

{'name': 'John', 'age': 30, ' occupation': 'Developer'}


# Parse XML output

To parse XML output, you can use the XMLOutputParser. Here's an example:

In [6]:
pip install defusedxml

Collecting defusedxml
  Using cached defusedxml-0.7.1-py2.py3-none-any.whl.metadata (32 kB)
Using cached defusedxml-0.7.1-py2.py3-none-any.whl (25 kB)
Installing collected packages: defusedxml
Successfully installed defusedxml-0.7.1
Note: you may need to restart the kernel to use updated packages.


In [11]:
from defusedxml import ElementTree

# Well-formed XML string for testing
xml_data = "<response><data>Some model output data here</data></response>"
root = ElementTree.fromstring(xml_data)
print(root.find('data').text)


Some model output data here


# How to parse YAML output

To parse YAML output, you can use the JsonOutputParser with the yaml library. Here's an example:

* if you’re dealing with YAML outputs, you can load it directly without trying to convert it to JSON:

In [16]:
import yaml

# Your YAML output string
llm_output = """name: John
age: 30
occupation: Developer"""

# Load YAML to a dictionary
parsed_output = yaml.safe_load(llm_output)

print(parsed_output)  # Output: {'name': 'John', 'age': 30, 'occupation': 'Developer'}


{'name': 'John', 'age': 30, 'occupation': 'Developer'}


* You can convert your input to a proper JSON string before passing it to the JsonOutputParser. Here's how you can do that:

In [17]:
import json

# Your string output that needs to be converted to JSON format
llm_output = """name: John
age: 30
occupation: Developer"""

# Manually convert the output to valid JSON
# You can use yaml.safe_load if you expect to deal with YAML regularly.
import yaml
data_dict = yaml.safe_load(llm_output)

# Convert the dictionary to a JSON string
json_output = json.dumps(data_dict)

# Initialize the parser
parser = JsonOutputParser()

# Parse the JSON string
parsed_output = parser.parse(json_output)

print(parsed_output)  # Output: {'name': 'John', 'age': 30, 'occupation': 'Developer'}


{'name': 'John', 'age': 30, 'occupation': 'Developer'}


# PydanticOutputParser
The PydanticOutputParser uses Pydantic models to enforce a specific structure for the language model’s output. Pydantic is a Python library for data validation and parsing, and it lets you define expected fields, types, and validation rules for your data. By using PydanticOutputParser, you can ensure that responses from the language model meet your required schema.

In [9]:
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field

# Define the expected structure of the response
class Action(BaseModel):
    action: str = Field(description="action to take")
    action_input: str = Field(description="input to the action")

# Initialize the PydanticOutputParser with the Action model
parser = PydanticOutputParser(pydantic_object=Action)

# Example response to parse
response = '{"action": "search", "action_input": "leo di caprio girlfriend"}'
parsed_output = parser.parse(response)

print(parsed_output)
# Output: Action(action='search', action_input='leo di caprio girlfriend')


action='search' action_input='leo di caprio girlfriend'


In [8]:
#pip install -U langchain-openai

#  RetryOutputParser
The RetryOutputParser is used when the language model’s output does not match the expected structure, causing an error in parsing (e.g., if the output is missing fields). It’s essentially a layer that attempts to get a corrected response from the model by retrying with an additional prompt. The RetryOutputParser leverages a secondary language model call, providing the original model’s response along with a reminder of the required output format.

In [11]:
from langchain.output_parsers import RetryOutputParser
from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_openai import OpenAI
from pydantic import BaseModel, Field

# Define the Action model to parse the response into structured fields
class Action(BaseModel):
    action: str = Field(description="action to take")
    action_input: str = Field(description="input to the action")

# Initialize the PydanticOutputParser with the Action model
parser = PydanticOutputParser(pydantic_object=Action)

# Define a prompt template
prompt_template = PromptTemplate(
    template="Based on the user question, provide an Action and Action Input for what step should be taken.\n{format_instructions}\nQuestion: {query}\nResponse:",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

# Format the prompt with a user query
prompt_value = prompt_template.format_prompt(query="who is leo di caprio's girlfriend?")

# Initialize the RetryOutputParser with a language model for retry attempts
retry_parser = RetryOutputParser.from_llm(parser=parser, llm=OpenAI(temperature=0))

# Use parse_with_prompt instead of parse
try:
    # "bad_response" simulates an incomplete or incorrect response that caused the initial parse to fail
    bad_response = '{"action": "search"}'
    parsed_output = retry_parser.parse_with_prompt(bad_response, prompt_value)
    print(parsed_output)
except Exception as e:
    print("Parse failed:", e)


action='search' action_input='leo di caprio girlfriend'
