## langchain Output Parsers with Gemini
Responsible for taking the output of a model and transforming it to a more suitable format for downstream tasks. Useful when you are using LLMs to generate structured data, or to normalize output from chat models and LLMs.

LangChain has lots of different types of output parsers. This is a list of output parsers LangChain supports: 
Sure, here's the information transformed into a tabular format:

| **Name**            | **Supports Streaming** | **Has Format Instructions** | **Calls LLM** | **Input Type**    | **Output Type**          | **Description**                                                                                                                                   |
|---------------------|------------------------|-----------------------------|---------------|-------------------|--------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|
| **JSON**            | ✅                      | ✅                           |               | str \| Message     | JSON object              | Returns a JSON object as specified. You can specify a Pydantic model, and it will return JSON for that model. Most reliable for structured data.   |
| **XML**             | ✅                      | ✅                           |               | str \| Message     | dict                     | Returns a dictionary of tags. Use when XML output is needed. Use with models that are good at writing XML (like Anthropic's).                      |
| **CSV**             | ✅                      | ✅                           |               | str \| Message     | List[str]                | Returns a list of comma-separated values.                                                                                                          |
| **OutputFixing**    |                        | ✅                           | ✅            | str \| Message     |                          | Wraps another output parser. If that output parser errors, it passes the error message and bad output to an LLM to fix the output.                 |
| **RetryWithError**  |                        | ✅                           | ✅            | str \| Message     |                          | Wraps another output parser. If that output parser errors, it passes the original inputs, bad output, and error message to an LLM to fix it.       |
| **Pydantic**        | ✅                      |                             |               | str \| Message     | pydantic.BaseModel       | Takes a user-defined Pydantic model and returns data in that format.                                                                               |
| **YAML**            | ✅                      |                             |               | str \| Message     | pydantic.BaseModel       | Takes a user-defined Pydantic model and returns data in that format using YAML to encode it.                                                      |
| **PandasDataFrame** | ✅                      |                             |               | str \| Message     | dict                     | Useful for doing operations with pandas DataFrames.                                                                                               |
| **Enum**            | ✅                      |                             |               | str \| Message     | Enum                     | Parses response into one of the provided enum values.                                                                                             |
| **Datetime**        | ✅                      |                             |               | str \| Message     | datetime.datetime        | Parses response into a datetime string.                                                                                                           |
| **Structured**      | ✅                      |                             |               | str \| Message     | Dict[str, str]           | Returns structured information. Less powerful than other parsers since it only allows for string fields. Useful with smaller LLMs.                |


In [1]:
import os
# Disable pip version check
os.environ['PIP_DISABLE_PIP_VERSION_CHECK'] = '1'
import warnings
warnings.filterwarnings('ignore')

In [2]:
from dotenv import load_dotenv, dotenv_values
import google.generativeai as genai
from IPython.display import Markdown, display
load_dotenv()
os.getenv("GOOGLE_API_KEY") 
my_api_key = os.getenv("GOOGLE_API_KEY")
genai.configure(api_key=my_api_key)

#### JSON Outputparser

In [3]:


from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_google_genai.chat_models import  ChatGoogleGenerativeAI


model= ChatGoogleGenerativeAI(model= "gemini-1.5-flash",max_tokens_to_sample=512, temperature = 0) # "chat-bison@001"


# Define your desired data structure.
class Joke(BaseModel):
    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")


# And a query intented to prompt a language model to populate the data structure.
joke_query = "Tell me a joke."

# Set up a parser + inject instructions into the prompt template.
parser = JsonOutputParser(pydantic_object=Joke)

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | model | parser

chain.invoke({"query": joke_query})

{'setup': "Why don't scientists trust atoms? ",
 'punchline': 'Because they make up everything!'}

#### XMLOutputParser

In [4]:
from langchain_core.output_parsers import XMLOutputParser
from langchain_core.prompts import PromptTemplate

parser = XMLOutputParser(tags=["movies", "actor", "film", "name", "genre"])
prompt = PromptTemplate(
    template="""{query}\n{format_instructions}""",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

actor_query = "Generate the shortened filmography for Amir Khan."
chain = prompt | model | parser

output = chain.invoke({"query": actor_query})

print(output)

# We will add these instructions to the prompt below


{'movies': [{'actor': [{'name': 'Amir Khan'}, {'film': [{'name': 'Lagaan: Once Upon a Time in India'}, {'genre': 'Drama, Musical, Sport'}]}, {'film': [{'name': 'Rang De Basanti'}, {'genre': 'Drama, Musical'}]}, {'film': [{'name': '3 Idiots'}, {'genre': 'Comedy, Drama'}]}, {'film': [{'name': 'Dhoom 3'}, {'genre': 'Action, Thriller'}]}, {'film': [{'name': 'PK'}, {'genre': 'Comedy, Drama, Science Fiction'}]}, {'film': [{'name': 'Dangal'}, {'genre': 'Biography, Drama, Sport'}]}, {'film': [{'name': 'Secret Superstar'}, {'genre': 'Drama, Musical'}]}, {'film': [{'name': 'Thugs of Hindostan'}, {'genre': 'Action, Adventure, History'}]}, {'film': [{'name': 'Laal Singh Chaddha'}, {'genre': 'Comedy, Drama, Romance'}]}]}]}


#### CommaSeparatedListOutputParser

In [5]:
from langchain.output_parsers import CommaSeparatedListOutputParser
from langchain_core.prompts import PromptTemplate

output_parser = CommaSeparatedListOutputParser()

format_instructions = output_parser.get_format_instructions()
prompt = PromptTemplate(
    template="List five {subject}.\n{format_instructions}",
    input_variables=["subject"],
    partial_variables={"format_instructions": format_instructions},
)

csvchain = prompt | model | output_parser
csvchain.invoke({"subject": "ice cream flavors"})

['chocolate', 'vanilla', 'strawberry', 'mint chocolate chip', 'cookie dough']

#### PydanticOutputParser

In [6]:
from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field, validator


# Define your desired data structure.
class Joke(BaseModel):
    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")
        # You can add custom validation logic easily with Pydantic.
    @validator("setup")
    def question_ends_with_question_mark(cls, field):
        if field[-1] != "?":
            raise ValueError("Badly formed question!")
        return field


# Set up a parser + inject instructions into the prompt template.
parser = PydanticOutputParser(pydantic_object=Joke)

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

# And a query intended to prompt a language model to populate the data structure.
chain = prompt | model | parser
chain.invoke({"query": "Tell me a joke about Nepal."})

Joke(setup='Why did the Nepalese climber bring a ladder to Mount Everest?', punchline='He heard it was a really tall mountain.')

#### YamlOutputParser

In [7]:
# Set up a parser + inject instructions into the prompt template.
from langchain.output_parsers.yaml import YamlOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field


# Define your desired data structure.
class Joke(BaseModel):
    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")
    @validator("setup")
    def question_ends_with_question_mark(cls, field):
        if field[-1] != "?":
            raise ValueError("Badly formed question!")
        return field



# And a query intented to prompt a language model to populate the data structure.
joke_query = "Tell me a joke about Nepal."

# Set up a parser + inject instructions into the prompt template.
parser = YamlOutputParser(pydantic_object=Joke)

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | model | parser

chain.invoke({"query": joke_query})

Joke(setup='Why did the Nepalese man bring a ladder to the mountain?', punchline='He wanted to get to the top of the world!')

#### PandasDataFrameOutputParser

In [8]:
import pprint
from typing import Any, Dict

import pandas as pd
from langchain.output_parsers import PandasDataFrameOutputParser
from langchain_core.prompts import PromptTemplate


# Define your desired Pandas DataFrame.
df = pd.DataFrame(
    {
        "num_legs": [2, 4, 8, 0],
        "num_wings": [2, 0, 0, 0],
        "num_specimen_seen": [10, 2, 1, 8],
    }
)

# Set up a parser + inject instructions into the prompt template.
parser = PandasDataFrameOutputParser(dataframe=df)


In [9]:
# Here's an example of a column operation being performed.
df_query = "Retrieve the num_wings column."

# Set up the prompt.
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | model | parser
parser_output = chain.invoke({"query": df_query})

parser_output = pd.DataFrame(parser_output)
parser_output

Unnamed: 0,num_wings
0,2
1,0
2,0
3,0


In [10]:
# Here's an example of a random Pandas DataFrame operation limiting the number of rows
df_query = "Retrieve the average of the num_legs column from rows 1 to 3."

# Set up the prompt.
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | model | parser
parser_output = chain.invoke({"query": df_query})

print(parser_output)

{'mean': 4.0}


#### EnumOutputparser

In [13]:
from langchain.output_parsers.enum import EnumOutputParser
from enum import Enum


class Colors(Enum):
    RED = "red"
    GREEN = "green"
    BLUE = "blue"
    BROWN = "brown"

parser = EnumOutputParser(enum=Colors,return_exceptions=False)

In [14]:
from langchain_core.prompts import PromptTemplate


prompt = PromptTemplate.from_template(
    """What color eyes does this person have?

> Person: {person}

Instructions: {instructions}"""
).partial(instructions=parser.get_format_instructions())
chain = prompt | ChatGoogleGenerativeAI(model = "gemini-pro") | parser
chain.invoke({"person": "Salman Khan"})

<Colors.BROWN: 'brown'>

#### DatetimeOutputParser

In [15]:
from langchain.output_parsers import DatetimeOutputParser
from langchain_core.prompts import PromptTemplate


output_parser = DatetimeOutputParser()
template = """Answer the users question:

{question}

{format_instructions}"""
prompt = PromptTemplate.from_template(
    template,
    partial_variables={"format_instructions": output_parser.get_format_instructions()},
)

chain = prompt | ChatGoogleGenerativeAI(model = "gemini-pro")| output_parser
output = chain.invoke({"question": "when was the United Nations founded?"})
print( "Raw Output:", output)
from datetime import datetime

# Given datetime object

# Format the datetime object into the desired MM/DD/YYYY format
formatted_date = output.strftime('%m/%d/%Y')

print("Formatted Date:", formatted_date) 


Raw Output: 1945-10-24 00:00:00
Formatted Date: 10/24/1945


#### Structured output parser
This output parser can be used when you want to return multiple fields. While the <b>  Pydantic/JSON parser is more powerful </b>, this is useful for less powerful models.

In [16]:
from langchain.output_parsers import ResponseSchema, StructuredOutputParser
from langchain_core.prompts import PromptTemplate

response_schemas = [
    ResponseSchema(name="answer", description="answer to the user's question"),
    ResponseSchema(
        name="source",
        description="source used to answer the user's question, should be a website.",
    ),
]
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

format_instructions = output_parser.get_format_instructions()
prompt = PromptTemplate(
    template="answer the users question as best as possible.\n{format_instructions}\n{question}",
    input_variables=["question"],
    partial_variables={"format_instructions": format_instructions},
)

chain = prompt | model | output_parser

chain.invoke({"question": "what's the capital of Nepal?"})

{'answer': 'The capital of Nepal is Kathmandu.',
 'source': 'https://www.worldpopulationreview.com/countries/nepal-population'}

#### Output-fixing parser
This output parser wraps another output parser, and in the event that the first one fails it calls out to another LLM to fix any errors.

But we can do other things besides throw errors. Specifically, we can pass the misformatted output, along with the formatted instructions, to the model and ask it to fix it.

In [17]:
from typing import List
from langchain.output_parsers import PydanticOutputParser, OutputFixingParser
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain.llms import OpenAI  # or the appropriate model from langchain
from langchain_google_genai.llms import GoogleGenerativeAI

llm = GoogleGenerativeAI(model="models/text-bison-001")

# Define your Pydantic model
class Actor(BaseModel):
    name: str = Field(description="name of an actor")
    film_names: List[str] = Field(description="list of names of films they starred in")

actor_query = "Generate the filmography for a random actor."

# Initialize PydanticOutputParser with Actor model
parser = PydanticOutputParser(pydantic_object=Actor)

# Misformatted JSON string (not valid JSON)
misformatted = "{'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}"

# Initialize OutputFixingParser with the PydanticOutputParser and LLM model
new_parser = OutputFixingParser(parser=parser, llm=llm)

# Parsing the misformatted input
try:
    parsed_output = new_parser.parse(misformatted)
    print(parsed_output)
except Exception as e:
    print(f"Error: {e}")


Error: 'NoneType' object has no attribute 'invoke'


#### Retry parser
While in some cases it is possible to fix any parsing mistakes by only looking at the output, in other cases it isn't. An example of this is when the output is not just in the incorrect format, but is partially complete. Consider the below example.



In [18]:
from langchain.output_parsers import (
    OutputFixingParser,
    PydanticOutputParser,
)
from langchain_core.prompts import (
    PromptTemplate,
)
from langchain_core.pydantic_v1 import BaseModel, Field

template = """Based on the user question, provide an Action and Action Input for what step should be taken.
{format_instructions}
Question: {query}
Response:"""


class Action(BaseModel):
    action: str = Field(description="action to take")
    action_input: str = Field(description="input to the action")


retry_parser = PydanticOutputParser(pydantic_object=Action)

In [19]:
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": retry_parser.get_format_instructions()},
)

In [20]:
prompt_value = prompt.format_prompt(query="who is leo di caprios gf?")
prompt_value

StringPromptValue(text='Answer the user query.\nThe output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"action": {"title": "Action", "description": "action to take", "type": "string"}, "action_input": {"title": "Action Input", "description": "input to the action", "type": "string"}}, "required": ["action", "action_input"]}\n```\nwho is leo di caprios gf?\n')

In [27]:
bad_response = '{"action": "search"}'
# parser.parse(bad_response)

from langchain.output_parsers.retry import RetryOutputParser

retry_parser = RetryOutputParser.from_llm(parser=parser, llm=model)

In [None]:
retry_parser.parse_with_prompt(bad_response, prompt_value)