## OUTPUT PARSING
- Output Parsing:
  - StrOutputParser.
  - JsonOutputParser.
  - CSV Output Parser.
  - Datatime Output Parser.
  - Structured Output Parser (Pydantic or Json).
Output parser are classes that help structure LLM responses. There are two main methods:
- **Get format instructions**: a methods which returns a string containing instructions for how the output of a LLM should be formatted.
- **Parse**: a method which takes in a string (from LLM response) and parse it into some structure.

### `Pydantic` Output Parser

In [1]:
from dotenv import load_dotenv
load_dotenv('.env')

True

In [2]:
from langchain_ollama import ChatOllama
from langchain_core.prompts import (
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
    ChatPromptTemplate,
    PromptTemplate
)

base_url = 'http://localhost:11434'
model = 'deepseek-r1:1.5b'

llm = ChatOllama(base_url=base_url, model=model)

In [3]:
from typing import Optional
from pydantic import BaseModel, Field
from langchain_core.output_parsers import PydanticOutputParser

In [8]:
class Joke(BaseModel):
    """Joke to tell user"""
    setup: str = Field(description="The setup of the joke")
    punchline: str = Field(description="The punchline of the joke")
    rating: Optional[int] = Field(description="The rating of the joke is from to 10", default=None)

In [5]:
# what output schema we want
parser = PydanticOutputParser(pydantic_object=Joke)
instruction = parser.get_format_instructions()
print(instruction)

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"description": "Joke to tell user", "properties": {"setup": {"description": "The setup of the joke", "title": "Setup", "type": "string"}, "punchline": {"description": "The punchline of the joke", "title": "Punchline", "type": "string"}, "rating": {"anyOf": [{"type": "integer"}, {"type": "null"}], "description": "The rating of the joke is from to 10", "title": "Rating"}}, "required": ["setup", "punchline", "rating"]}
```


In [6]:
prompt = PromptTemplate(
    template='''
    Answer the user query with a joke. Here is your formatting instruction.
    {format_instruction}

    Query: {query}
    Answer:''',
    input_variables=['query'],
    partial_variables={'format_instruction': parser.get_format_instructions()}
)

chain = prompt | llm
ouput = chain.invoke({'query': 'Tell about the cat'})
print(ouput.content)

<think>
Okay, so I've got this query where the user wants me to tell them about the cat. But it's not just a straightforward question; they want a joke in return with some setup and punchline that fits into the provided schema.

First, I need to understand what the user is asking for. They're probably looking for something funny related to cats, maybe a pun or a humorous fact. The output needs to be formatted as a JSON object with setup, punchline, and rating fields. 

Looking at the example they gave, it's clear that each property is defined: setup describes what goes in, punchline is the joke itself, and rating determines how good the joke is on a scale from 1 to 10.

So, for the cat joke, I should think of something that plays on familiar cat-related terms. The cat's tail is a common symbol, but maybe there's another angle. How about the cat being a superhero? That could be a clever twist and make it humorous in a fun way.

Setting up this joke, I can say: "The cat is a superhero! W

In [9]:
from langchain_core.output_parsers import JsonOutputParser

In [10]:
parser = JsonOutputParser(pydantic_object=Joke)
print(parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"description": "Joke to tell user", "properties": {"setup": {"description": "The setup of the joke", "title": "Setup", "type": "string"}, "punchline": {"description": "The punchline of the joke", "title": "Punchline", "type": "string"}, "rating": {"anyOf": [{"type": "integer"}, {"type": "null"}], "default": null, "description": "The rating of the joke is from to 10", "title": "Rating"}}, "required": ["setup", "punchline"]}
```


In [11]:
from langchain_core.output_parsers import CommaSeparatedListOutputParser
parser = CommaSeparatedListOutputParser()
print(parser.get_format_instructions())

Your response should be a list of comma separated values, eg: `foo, bar, baz` or `foo,bar,baz`


In [12]:
format_instruction = parser.get_format_instructions()

prompt = PromptTemplate(
    template='''
    Answer the user query with a list of values. Here is your formatting instruction.
    {format_instruction}

    Query: {query}
    Answer:''',
    input_variables=['query'],
    partial_variables={'format_instruction': format_instruction}
)


In [14]:
chain = prompt | llm | parser
ouput = chain.invoke({'query': 'List of fruits'})
print(ouput)

['<think>', 'Okay', "I need to respond to the user's query by listing the fruits they specified. First", "I'll check if there are any specific fruits mentioned in their answer. It seems like they've listed several options like apples", 'bananas', 'oranges', 'grapes', 'and others. ', "I should make sure all these items are included without missing any. Let me go through each one again to confirm they're all present in the provided list. Yes", 'that covers apples', 'bananas', 'oranges', 'grapes', 'strawberries', 'blueberries', 'melons', 'kiwis', 'and more. ', 'Now', "I'll format them into a comma-separated list as per their instructions. That should provide a clear and concise answer.", '</think>', 'apples', 'bananas', 'oranges', 'grapes', 'strawberries', 'blueberries', 'melons', 'kiwis']


In [None]:
from langchain.output_parsers import DatetimeOutputParser