# Managing Outputs with Output Parsers


In a production setting, outputs from language models in a predictable data structure are often desirable. Consider, for instance, developing a thesaurus application to generate a collection of alternative words relevant to the given context. Large language models (LLMs) can generate numerous suggestions for synonyms or similar terms. Below is an example of output from ChatGPT listing several words closely related to “behavior.”

    Here are some substitute words for "behavior":  
      
    Conduct  
    Manner  
    Demeanor  
    Attitude  
    Disposition  
    Deportment  
    Etiquette  
    Protocol  
    Performance  
    Actions

The challenge arises from the absence of a dynamic method to extract relevant information from the provided text. Consider splitting the response by new lines and disregarding the initial lines. However, this approach is unreliable as there’s no assurance that responses will maintain a consistent format. The list might be numbered, or it might not include an introductory line.

Output Parsers enable us to define a data structure that precisely describes what is expected from the model. In a word suggestion application, you might request a list of words or a combination of different variables, such as a word and an explanation.

Structured outputs can also be enforced through APIs, such as those provided by OpenAI models, where the model can be prompted to generate outputs following a predefined schema. For instance, you can specify a JSON schema or use a Pydantic model to ensure that the outputs conform to the expected structure, making it easier to integrate into applications that require predictable data formats. This capability will be covered in more detail in the book, where we will explore practical methods to structure and validate outputs using these techniques.

The Pydantic parser is versatile and has three unique types. However, other options are also available for less complex tasks.

**Note:**  The thesaurus application will serve as a practical example to clarify the nuances of each approach.

### PydanticOutputParser

This class instructs the model to produce its output in JSON format. The parser’s output can be treated as a list, allowing for simple indexing of the results and eliminating formatting issues.

```
💡It is important to note that not all models have the same capability to generate JSON outputs. So, it would be best to use a more powerful model (like Anthropic or OpenAI’s most recent models) to get the best result.
```
This wrapper uses the Pydantic library to define and validate data structures in Python. It allows determining the expected output structure, including its name, type, and description. For instance, a variable must hold multiple suggestions, like a list, in the thesaurus application. This is achieved by creating a class that inherits the Pydantic’s `BaseModel class`.

 Set the OPENAI_API_KEY environment variable with your API credentials.

In [None]:
import os
from langchain_custom_utils.helper import get_openai_api_key
OPENAI_API_KEY = get_openai_api_key()

In [None]:
from langchain.chat_models import ChatOpenAI

from langchain import LLMChain
from langchain.prompts import PromptTemplate
from langchain.output_parsers import PydanticOutputParser

In [None]:
from pydantic import BaseModel, Field, validator
from typing import List

In [None]:
model_name = 'gpt-3.5-turbo'
temperature = 0.0
model = ChatOpenAI(model_name=model_name, temperature=temperature)

## Pydantic OutputParser Example

In [None]:
# Define your desired data structure.
class Suggestions(BaseModel):
    words: List[str] = Field(description="list of substitue words based on context")

    # Throw error in case of recieving a numbered-list from API
    @validator('words')
    def not_start_with_number(cls, field):
        if field[0].isnumeric():
            raise ValueError("The word can not start with numbers!")
        return field

In [None]:
parser = PydanticOutputParser(pydantic_object=Suggestions)

Import the necessary libraries and create the `Suggestions` schema class, which consists of two components:

1.  **Expected Outputs:** Each output is defined by declaring a variable with the desired type, such as a list of strings (: List[str]) in the example code. Alternatively, it could be a single string (: str) for cases expecting a singular word or sentence as the response. It’s mandatory to provide a brief description using the Field function’s description attribute, aiding the model during inference. (An illustration of handling multiple outputs will be presented later in the book.)
2.  **Validators:** We can declare functions to validate the formatting. For instance, the provided code has a validation to ensure the first character is not a number. The function’s name is not critical, but the @validator decorator must be applied to the variable requiring validation (e.g., @validator('words')). Note that if the variable is specified as a list, the field argument within the validator function will also be a list.

We will pass the created class to the `PydanticOutputParser` wrapper to make it a `LangChain` parser object. The next step is to prepare the prompt.

In [None]:
template = """
Offer a list of suggestions to substitue the specified target_word based the presented context.
{format_instructions}
target_word={target_word}
context={context}
"""

In [None]:
target_word="behaviour"
context="The behaviour of the students in the classroom was disruptive and made it difficult for the teacher to conduct the lesson."

In [None]:
prompt_template = PromptTemplate(
    template=template,
    input_variables=["target_word", "context"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

The template variable is a string incorporating named index placeholders in the following `{variable_name}` format. The template variable defines our prompts for the model, with the anticipated formatting from the output parser and the inputs (the `{format_instructions}` placeholder will be replaced by instructions from the output parser). The `PromptTemplate` takes in the template string, specifying the type of each placeholder. These placeholders can be categorized as `input_variables`, whose values are assigned later through the `.format_prompt()` method or `partial_variables`, defined immediately.

For querying models like GPT, the prompt will be passed on LangChain’s OpenAI wrapper. (It’s important to set the `OPENAI_API_KEY` environment variables with your API key from OpenAI.) Setting the temperature value to 0 also ensures that the outcomes are consistent and reproducible.

> 💡The temperature value could be between 0 and 1, where a higher number means the model is more creative. Using larger value in production is a good practice for tasks requiring creative output.

In [None]:
chain = LLMChain(llm=model, prompt=prompt_template)

In [None]:
# Run the LLMChain to get the AI-generated answer
output = chain.run({"target_word": target_word, "context":context})

In [None]:
parser.parse(output)

The parser object’s `parse()` function will convert the model’s string response to the format we specified. You can index through the list of words and use them in your applications. Notice the simplicity of accessing the third suggestion by calling the third index instead of dealing with a lengthy string that requires extensive preprocessing, as demonstrated in the initial example.