## Output Parsing
Language models output text. But there are times when you want to get more structured information than just text back.
Output parsers are classes that **help structure language model responses**. There are **two main methods** an output parser **must implement**:

- **Get format instructions**: A method which returns a **string containing instruction** for how the output of a language model **should be formatted**.
- **Parse**: A method which **takes in a string** (assumed to be the response from a language model) and **parses it** into **some structure**.

- Output Parsing:
    - StrOutputParser
    - JsonOutputParser
    - CSVOutputParser
    - DatetimeOutputParser
    - StructuredOutputParser (Pydantic or Json)  

In [1]:
from dotenv import load_dotenv
import os
from langchain_core.prompts import (
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
    PromptTemplate,
    ChatPromptTemplate
)
from langchain_ollama import ChatOllama

load_dotenv("./../.env")

base_url = os.getenv("OLLAMA_BASE_URL")
model = "llama3.2:latest"

llm = ChatOllama(
    base_url=base_url,
    model=model,
)
llm

ChatOllama(model='llama3.2:latest', base_url='http://localhost:11434')

### `Pydantic` Output Parser

#### LLM -> AI Message -> Pydantic Class -> [BaseModel, Joke(BaseModel)]

In [3]:
from typing import Optional
from pydantic import BaseModel, Field
from langchain_core.output_parsers import PydanticOutputParser

In [5]:
class Joke(BaseModel):
    """Joke to tell user"""
    
    setup: str = Field(description="The setup of the joke")
    punchline: str = Field(description="The punchline of the joke")
    rating: Optional[int] = Field(description="The rating of the joke is from 1 to 10")

In [6]:
# Init the PydanticOutputParser
parser = PydanticOutputParser(
    pydantic_object=Joke,
)
parser

PydanticOutputParser(pydantic_object=<class '__main__.Joke'>)

In [8]:
# Get the instruction of the parser
instruction = parser.get_format_instructions()
print(instruction)

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"description": "Joke to tell user", "properties": {"setup": {"description": "The setup of the joke", "title": "Setup", "type": "string"}, "punchline": {"description": "The punchline of the joke", "title": "Punchline", "type": "string"}, "rating": {"anyOf": [{"type": "integer"}, {"type": "null"}], "description": "The rating of the joke is from 1 to 10", "title": "Rating"}}, "required": ["setup", "punchline", "rating"]}
```


In [11]:
prompt = PromptTemplate(
    template="""
    Answer the user query with a joke. Here is your formatting instruction.
    {format_instruction}
    
    Query: {query}
    Answer:
    """,
    input_variables=['query'],  # The variables that must have value from input
    partial_variables={'format_instruction': instruction},  # Some values required by the prompt templates are available beforehand
)

In [18]:
chain = prompt | llm
output = chain.invoke({
    "query": "Tell me a joke about the cat.",
})
print(output.content)

{"setup": "Why did the cat join a band?", "punchline": "Because it wanted to be the purr-cussionist!", "rating": 8}


In [19]:
# Using PydanticParser to produce the output properly
chain = prompt | llm | parser

output = chain.invoke({
    "query": "Tell me a joke about the dog.",
})
print(output)

setup='Why did the dog go to the vet?' punchline='Because he was feeling ruff!' rating=8


### Parsing with `.with_structured_output()` Method
- This method takes **a schema as input** which specifies the names, types, and description of the desired output attributes
- The schema can be specified as a TypeDict class, JSON Schema, or a Pydantic class.


<br/>


##### Modify the original LLM
##### Later on it can create a problem if you chain this with the other runnables

In [22]:
output = llm.invoke("Tell me a joke about the bird")
print(output.content)

Why did the bird go to the doctor?

Because it had a fowl cough! (get it?)


In [23]:
structured_llm =  llm.with_structured_output(Joke)

In [24]:
output = structured_llm.invoke("Tell me a joke about the bird")
print(output)

setup="A bird walked into a doctor's office and said..." punchline='Why did the bird go to the doctor? Because it had a fowl cough.' rating=8


### JSON Output Parser
- Output parsers accept a **string** or **BaseMessage** as input and can be return an arbitrary type.

In [26]:
from langchain_core.output_parsers import JsonOutputParser

json_parser = JsonOutputParser(pydantic_object=Joke)
instruction = json_parser.get_format_instructions()

In [28]:
prompt = PromptTemplate(
    template="""
    Answer the user query with a joke. Here is your formatting instruction.
    {format_instruction}
    
    Query: {query}
    Answer:
    """,
    input_variables=['query'],  # The variables that must have value from input
    partial_variables={'format_instruction': instruction},  # Some values required by the prompt templates are available beforehand
)

chain = prompt | llm 
output = chain.invoke({
    "query": "Tell me a joke about the crocodile"
})

print(output.content)

{"setup": "Why did the crocodile go to the party?", "punchline": "Because he was a snappy dresser!", "rating": 8}


In [30]:
chain = prompt | llm | json_parser
output = chain.invoke({
    "query": "Tell me a joke about the crocodile"
})

print(output)

{'setup': 'Why did the crocodile go to the party?', 'punchline': 'Because he was a snappy dresser!', 'rating': 8}


### CSV Output Parser
- This output parser can be used when you want to return a list of comma-separated items.

In [33]:
from langchain_core.output_parsers import CommaSeparatedListOutputParser

csv_parser = CommaSeparatedListOutputParser()

instruction = csv_parser.get_format_instructions()
print(instruction)

Your response should be a list of comma separated values, eg: `foo, bar, baz` or `foo,bar,baz`


In [34]:
prompt = PromptTemplate(
    template="""
    Answer the user query with a list of values. Here is your formatting instruction.
    {format_instruction}
    
    Query: {query}
    Answer:
    """,
    input_variables=['query'], 
    partial_variables={'format_instruction': instruction}, 
)

In [35]:
chain = prompt | llm | csv_parser

output = chain.invoke({
    'query': "Generate my website seo keywords. I have content about the NLP and LLM."
})
print(output)

['nlp', 'llm', 'artificial intelligence', 'natural language processing', 'machine learning', 'language model', 'chatbots', 'sentiment analysis', 'text analysis', 'language understanding', 'cognitive computing', 'human-computer interaction', 'language generation']


### Datetime Output Parser
- Gives output in datetime format. **Sometimes throws error** if the LLM output is **not in datetime format**.

In [36]:
from langchain.output_parsers import DatetimeOutputParser

In [37]:
datetime_parser = DatetimeOutputParser()
instruction = datetime_parser.get_format_instructions()

print(instruction)

Write a datetime string that matches the following pattern: '%Y-%m-%dT%H:%M:%S.%fZ'.

Examples: 0377-05-19T15:09:45.845568Z, 0888-05-23T13:22:02.989670Z, 0744-03-28T19:41:09.659302Z

Return ONLY this string, no other words!


In [38]:
prompt = PromptTemplate(
    template="""
    Answer the user query with a datetime. Here is your formatting instruction.
    {format_instruction}
    
    Query: {query}
    Answer:
    """,
    input_variables=['query'], 
    partial_variables={'format_instruction': instruction}, 
)

In [39]:
chain = prompt | llm | datetime_parser

In [42]:
output = chain.invoke({
    'query': "When the america got discovered?"
})

print(output)  # Not the actual value

1492-08-10 04:00:00
