## Output parser 

Output parser is responsible for taking the output of a model and transforming it to a more suitable format for downstream tasks. Useful when you are using LLMs to generate structured data, or to normalize output from chat models and LLMs.LangChain has lots of different types of output parsers such as JSON, XML, CSV, OutputFixing, RetryWithError, Pydantic, YAML, PandasDataFrame, Enum, Datatime, Structured.

### Output parsing with Prompt Template

Try to make output parsing without langchain output_parser function.. Give some prompts to the make the structure of the code.

In [12]:
# Chat Prompt Template
from langchain_core.prompts import ChatPromptTemplate
import os
from langchain_core.output_parsers import StrOutputParser
from langchain_google_genai import ChatGoogleGenerativeAI
from dotenv import load_dotenv

# Load environment variables from the .env file
load_dotenv()


# Access the API key from the environment
api_key = os.getenv("GOOGLE_GEN_API")
llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro", api_key=api_key)
result = llm.invoke("Tell me about Dhaka.")

In [13]:
result.content

"## Dhaka: A City of Chaos and Charm\n\nDhaka, the capital of Bangladesh, is a city of stark contrasts. It's a vibrant, bustling metropolis steeped in history and culture, yet grappling with challenges of overpopulation, poverty, and traffic congestion. \n\n**Here's a glimpse into Dhaka:**\n\n**History & Culture:**\n\n* **Ancient Roots:**  Dhaka's history dates back to the 7th century, serving as a prominent Mughal capital in the 17th century.  \n* **Architectural Marvels:** Mughal architecture shines in places like Lalbagh Fort, Ahsan Manzil (Pink Palace), and the Star Mosque.\n* **Cultural Hub:** Dhaka boasts numerous museums, art galleries, and theaters, showcasing the rich heritage of Bangladesh.\n* **Festivals:** Experience the vibrancy of Bengali culture during festivals like Pohela Boishakh (Bengali New Year), Durga Puja, and Eid.\n\n**Modern Metropolis:**\n\n* **Economic Powerhouse:** Dhaka is Bangladesh's economic center, with industries ranging from textiles and garments to f

In [14]:
prompt_template = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. You will response the output in a CSV format"),
    ("user", "Tell me about {domain} in {city}")
])

chain = prompt_template | llm 

result = chain.invoke({"domain": "tourist places, culture, Facilities & issues", "city": "Dhaka"})
print(result.content)

Tourist Places,Culture,Facilities,Issues
Lalbagh Fort,Mughal architecture,Guided tours,Limited accessibility for people with disabilities
Ahsan Manzil,Pink Palace,Museum,Overcrowding during peak season
Dhakeshwari Temple,Hindu temple,Religious services,Traffic congestion
Armenian Church,Armenian history,Historical site,Limited parking space
Sadarghat,River port,Boat trips,Sanitation concerns
Star Mosque,Mughal architecture,Religious services,Limited accessibility for people with disabilities
National Parliament House,Modern architecture,Guided tours,Limited public transportation options
Bangladesh National Museum,Bangladeshi history and culture,Museum,Limited interactive exhibits
Balda Garden,Botanical garden,Picnic areas,Maintenance issues
Ramna Park,Urban park,Recreational activities,Overcrowding during weekends
,,,,,
**Culture:**
- **Rich history and heritage:** Dhaka is an ancient city with a rich history dating back to the Mughal era.
- **Vibrant arts and crafts:** The city is kno

In [15]:
prompt_template = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. You will response the output in a JSON format"),
    ("user", "Tell me about {domain} in {city}")
])

chain = prompt_template | llm 

result = chain.invoke({"domain": "tourist places, culture, Facilities & issues", "city": "Dhaka"})
print(result.content)

```json
{
  "tourist_places": [
    {
      "name": "Ahsan Manzil",
      "description": "A historic palace, once home to the Nawab of Dhaka, showcasing Mughal architecture and rich history.",
      "type": "Historical & Architectural"
    },
    {
      "name": "Lalbagh Fort",
      "description": "An incomplete 17th-century Mughal fort with beautiful gardens, showcasing impressive architecture and a tragic history.",
      "type": "Historical & Architectural"
    },
    {
      "name": "National Parliament House",
      "description": "An architectural marvel designed by Louis Kahn, offering guided tours to witness its grandeur.",
      "type": "Architectural & Government"
    },
    {
      "name": "Dhakeshwari Temple",
      "description": "An ancient Hindu temple dedicated to Goddess Dhakeshwari, considered the city's namesake and a significant religious site.",
      "type": "Religious"
    },
    {
      "name": "Star Mosque",
      "description": "A stunning mosque adorned with

### Output Parsing with LangChain

We can use an output parser to help users to specify an arbitrary JSON schema via the prompt, query a model for outputs that conform to that schema, and finally parse that schema as JSON.

The JsonOutputParser is one built-in option for prompting for and then parsing JSON output. While it is similar in functionality to the PydanticOutputParser, it also supports streaming back partial JSON objects.

Pydantic in LangChain is used to define and validate data models, which helps in structuring the data passed between various components in a LangChain application.Pydantic is often used to create structured data schemas, like the one in your example, where you define a Joke model with fields setup and punchline.

In [16]:
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import PromptTemplate
from pydantic import BaseModel, Field


# Define your desired data structure.
class Joke(BaseModel):
    setup: str = Field(description="question to set up about places description")
    description: str = Field(description="answer to provide some description of that place")
    tourist_places: str = Field(description="answer to provide the best tourist spots of that city")


# And a query intented to prompt a language model to populate the data structure.
joke_query = "Dhaka"

# Set up a parser + inject instructions into the prompt template.
parser = JsonOutputParser(pydantic_object=Joke)

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\nTell me something about {query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | llm | parser

chain.invoke({"query": joke_query})

{'setup': 'Dhaka, the vibrant capital of Bangladesh, is a city brimming with history, culture, and chaos.',
 'description': 'Known for its bustling streets, Mughal architecture, and delicious street food, Dhaka offers a sensory overload for visitors. The city is located on the banks of the Buriganga River and is a melting pot of traditions and modernity.',
 'tourist_places': "Some of the must-visit tourist spots in Dhaka include the Lalbagh Fort, Ahsan Manzil (Pink Palace), the Star Mosque, Dhakeshwari Temple, and Sadarghat River Front. Don't miss the opportunity to explore the narrow alleys of Old Dhaka with its vibrant markets and street food stalls."}

In [17]:
parser.get_format_instructions()

'The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"setup": {"description": "question to set up about places description", "title": "Setup", "type": "string"}, "description": {"description": "answer to provide some description of that place", "title": "Description", "type": "string"}, "tourist_places": {"description": "answer to provide the best tourist spots of that city", "title": "Tourist Places", "type": "string"}}, "required": ["setup", "description", "tourist_places"]}\n```'

You can also use the JsonOutputParser without Pydantic. This will prompt the model to return JSON, but doesn't provide specifics about what the schema should be.

In [18]:
query = "Dhaka"

parser = JsonOutputParser()
print(parser.get_format_instructions())

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n tell me something about {query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | llm | parser

chain.invoke({"query": joke_query})

Return a JSON object.


{'response': "Dhaka, the capital of Bangladesh, is a vibrant megacity known for its rich history, bustling streets, and delicious cuisine. Here are a few interesting facts about Dhaka:\n\n* **Historical Significance:** Dhaka has a long and fascinating history, dating back to the 7th century. It served as the Mughal capital of Bengal and boasts numerous historical landmarks, including the Lalbagh Fort and the Ahsan Manzil (Pink Palace).\n\n* **Rickshaw Capital of the World:** Dhaka is famous for its colorful and ubiquitous rickshaws, a primary mode of transportation in the city. It's estimated that millions of rickshaws navigate its streets daily.\n\n* **Delicious Cuisine:** Dhaka offers a diverse culinary scene, with Bengali and Mughlai influences. Must-try dishes include biryani, fish curry, and sweets like roshogolla and mishti doi.\n\n* **Cultural Hub:** The city is a center for art, literature, and music. It hosts numerous cultural events throughout the year, including the Dhaka Li

### XMLParsing

This guide shows you how to use the XMLOutputParser to prompt models for XML output, then and parse that output into a usable format.

In [19]:
from langchain.output_parsers import XMLOutputParser

actor_query = "Generate the shortened filmography for Tom Hanks."

parser = XMLOutputParser(tags=["movies", "actor", "film", "name", "genre"])

# We will add these instructions to the prompt below
parser.get_format_instructions()

prompt = PromptTemplate(
    template="""{query}\n{format_instructions}""",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | llm | parser

output = chain.invoke({"query": actor_query})
print(output)

{'movies': [{'actor': [{'name': 'Tom Hanks'}, {'film': [{'name': "He Knows You're Alone"}, {'genre': 'Horror'}]}, {'film': [{'name': 'Splash'}, {'genre': 'Romantic Comedy'}]}, {'film': [{'name': 'Bachelor Party'}, {'genre': 'Comedy'}]}, {'film': [{'name': 'The Man Who Knew Too Much'}, {'genre': 'Thriller'}]}, {'film': [{'name': 'Big'}, {'genre': 'Fantasy Comedy-Drama'}]}, {'film': [{'name': 'Turner & Hooch'}, {'genre': 'Comedy'}]}, {'film': [{'name': 'A League of Their Own'}, {'genre': 'Sports Comedy-Drama'}]}, {'film': [{'name': 'Sleepless in Seattle'}, {'genre': 'Romantic Comedy'}]}, {'film': [{'name': 'Philadelphia'}, {'genre': 'Legal Drama'}]}, {'film': [{'name': 'Forrest Gump'}, {'genre': 'Comedy-Drama'}]}, {'film': [{'name': 'Apollo 13'}, {'genre': 'Docudrama'}]}, {'film': [{'name': 'Toy Story'}, {'genre': 'Animated Buddy Comedy'}]}, {'film': [{'name': 'Saving Private Ryan'}, {'genre': 'Epic War Drama'}]}, {'film': [{'name': "You've Got Mail"}, {'genre': 'Romantic Comedy'}]}, {'f

In [20]:
parser.get_format_instructions()

'The output should be formatted as a XML file.\n1. Output should conform to the tags below. \n2. If tags are not given, make them on your own.\n3. Remember to always open and close all the tags.\n\nAs an example, for the tags ["foo", "bar", "baz"]:\n1. String "<foo>\n   <bar>\n      <baz></baz>\n   </bar>\n</foo>" is a well-formatted instance of the schema. \n2. String "<foo>\n   <bar>\n   </foo>" is a badly-formatted instance.\n3. String "<foo>\n   <tag>\n   </tag>\n</foo>" is a badly-formatted instance.\n\nHere are the output tags:\n```\n[\'movies\', \'actor\', \'film\', \'name\', \'genre\']\n```'

### Create a custom Output Parser
In some situations you may want to implement a custom parser to structure the model output into a custom format. Here, we will make a simple parse that inverts the case of the output from the model.

For example, if the model outputs: "Meow", the parser will produce "mEOW".

In [None]:
from typing import Iterable
from langchain_core.messages import AIMessage, AIMessageChunk



def parse(ai_message: AIMessage) -> str:
    """Parse the AI message."""
    return ai_message.content.swapcase()


chain = llm | parse
chain.invoke("hello")

In [None]:
from typing import Iterable
from langchain_core.messages import AIMessage, AIMessageChunk



def parse(ai_message: AIMessage) -> str:
    """Parse the AI message."""
    return ai_message.content.swapcase()


chain = llm | parse
chain.invoke("hello")