- the importance of structured data output from language models, particularly in scenarios where the generated text needs to be used in applications or systems. It uses an example of a thesaurus application where the goal is to generate a list of substitute words for a given term, such as "behavior." The issue arises because the generated text might vary in format, making it challenging to extract the relevant information programmatically.
- "Output Parsers," which are tools that help define and extract specific information from the generated text. These parsers create a structured data format that precisely outlines the expected output. In the case of the thesaurus application, the parser can extract the list of substitute words, even if the response format changes.

# 1. Output Parsers
Pydrantic parser is the most powerful and flexible wrapper, knowing the other options for less complicated problems is beneficial.

## 1-1. PydanticOutputParser
- instructs the model to generate its output in a JSON format and then extract the information from the response
- be able to treat the parser’s output as a list, meaning it will be possible to index through the results without worrying about formatting.

In [1]:
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, validator
from typing import List

# Define your desired data structure.
class Suggestions(BaseModel):
    words: List[str] = Field(description="list of substitue words based on context")

    # Throw error in case of receiving a numbered-list from API
    @validator('words')
    def not_start_with_number(cls, field):
        for item in field:
            if item[0].isnumeric():
                raise ValueError("The word can not start with numbers!")
        return field

parser = PydanticOutputParser(pydantic_object=Suggestions)



We always import and follow the necessary libraries by creating the Suggestions schema class. There are two essential parts to this class:

Expected Outputs: Each output is defined by declaring a variable with desired type, like a list of strings `(: List[str])` in the sample code, or it could be a single string `(: str)` if you are expecting just one word/sentence as the response. Also, It is required to write a simple explanation using the `Field` function’s `description` attribute to help the model during inference. (We will see an example of having multiple outputs later in the lesson)<br><br>
Validators: It is possible to declare functions to validate the formatting. We ensure that the first character is not a number in the sample code. The function’s name is unimportant, but the `@validator` decorator must receive the same name as the variable you want to approve. (like `@validator(’words’)`) It is worth noting that the `field` variable inside the validator function will be a list if you specify it as one.
