#### **Output Parsers**

**Why Use a Parser at All?**

Even when a model gives a text response, it also includes metadata like token usage, response time, etc. Without a parser, you often have to manually extract just the content using result.content.

➡️ Problem: Manually extracting .content every time is tedious.

➡️ Solution: StringOutputParser automates this. It extracts the core text and makes it clean and usable in chains or further processing.

**Goal:**

Use LLM twice:  
Generate a detailed report on a topic (e.g., Black Hole).  
Then use that report to generate a 5-line summary.  

Two methods shown:  
**Without parser →** using result.content manually  
**With StringOutputParser →** cleaner & reusable via Chain

##### **Without Parser**

In [9]:
from langchain_groq import ChatGroq
from langchain_core.prompts import PromptTemplate
from dotenv import load_dotenv
import os
import textwrap

load_dotenv()

# llm = HuggingFaceEndpoint(
#     repo_id="google/gemma-2-2b-it",
#     task = "text-generation",
#     huggingfacehub_api_token=os.getenv("HUGGINGFACE_API_TOKEN")
# )

model = ChatGroq(model="llama-3.3-70b-versatile")

# 1st Prompt -> Detailed Report
template1 = PromptTemplate(
    template="Write a detailed report on {topic}.",
    input_variables=["topic"]
)

# 2nd Prompt -> Summary
template2 = PromptTemplate(
    template="Write a 5-line summary of the following report: {text}",
    input_variables=["text"]
)

prompt1 = template1.invoke({"topic":"Black Hole"})
result1 = model.invoke(prompt1)

prompt2 = template2.invoke({"text":result1.content})
result2 = model.invoke(prompt2)

wrapped_output = textwrap.fill(result2.content, width=80)
print(wrapped_output)

Here is a 5-line summary of the report: Black holes are regions in space with
incredibly strong gravitational pull, formed when massive stars collapse.  They
are characterized by their event horizon, singularity, and ergosphere, and come
in four types: stellar, intermediate-mass, supermassive, and primordial.  Black
holes have unique properties, including their gravitational pull, Hawking
radiation, and ability to warp spacetime.  Their presence can be inferred
through observational evidence such as X-rays, radio waves, and gravitational
lensing.  Despite significant research, many challenges and open questions
remain, including the information paradox and the nature of singularities, which
continue to be the subject of ongoing study and research.


##### **Using StrOutputParser with Chains**

In [12]:
from langchain_groq import ChatGroq
from langchain_core.prompts import PromptTemplate
from dotenv import load_dotenv
import os
import textwrap
from langchain_core.output_parsers import StrOutputParser

load_dotenv()

model = ChatGroq(model="llama-3.3-70b-versatile")

# 1st Prompt -> Detailed Report
template1 = PromptTemplate(
    template="Write a detailed report on {topic}.",
    input_variables=["topic"]
)

# 2nd Prompt -> Summary
template2 = PromptTemplate(
    template="Write a 5-point summary of the following report: {text}",
    input_variables=["text"]
)

parser = StrOutputParser()

chain = template1 | model | parser | template2 | model | parser

result = chain.invoke({'topic':'Black Hole'})
print(result)

Here is a 5-point summary of the report on black holes:

1. **Formation of Black Holes**: Black holes are formed when a massive star collapses in on itself, causing a massive amount of matter to be compressed into an incredibly small space, creating an intense gravitational field that warps spacetime. There are four types of black holes, including stellar, supermassive, intermediate-mass, and primordial black holes.

2. **Properties of Black Holes**: Black holes have unique properties, including an event horizon (the point of no return), a singularity (the point at the center with infinite density and gravity), an ergosphere (a region where gravity can extract energy from objects), and Hawking radiation (radiation emitted due to quantum effects near the event horizon).

3. **Effects of Black Holes on Spacetime**: Black holes have a profound impact on spacetime, causing gravitational lensing (bending and distorting light), frame-dragging (twisting and rotating spacetime), and time dilat

##### **JSON Output Parsers**  
Forces the LLM to return output in valid JSON format – clean, structured, and easy to work with in Python (as dictionaries).  

**Limitation of JsonOutputParser-**  
❌ It does NOT enforce a schema (i.e. might not give in a definative format which we want ).   
Eg- Change template - 'Give me 5 facts about {topic} \n {format_instruction}'
You can ask for specific fields, but LLM might return a slightly different format.

In [16]:
from langchain_groq import ChatGroq
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import JsonOutputParser
from dotenv import load_dotenv

load_dotenv()

model = ChatGroq(model="llama-3.3-70b-versatile")

parser = JsonOutputParser()

template = PromptTemplate(
    template="Give me the name, age, and city of a fictional person \n {format_instructions}",
    input_variables=[],  # Empty list because we're not using any input variables 
    partial_variables={"format_instructions": parser.get_format_instructions()}  # Passes the format instructions to the template
)

chain = template | model | parser

result = chain.invoke({})
print(result)

{'name': 'Emily Wilson', 'age': 32, 'city': 'New York'}


##### **Pydantic Output Parser**  
PydanticOutputParser is an output parser in LangChain that uses Pydantic models to:  
1) Enforce strict JSON schema  
2) Validate data types  
3) Apply constraints (e.g., age > 18)  
4) Auto-convert incorrect types when possible  
It is basically an upgraded StructuredOutputParser with validation + type safety.

In [26]:
from langchain_groq import ChatGroq
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import PydanticOutputParser
from dotenv import load_dotenv
from pydantic import BaseModel, Field

load_dotenv()

model = ChatGroq(model="llama-3.3-70b-versatile")

# Schema
class Person(BaseModel):

    name: str = Field(description="The name of the person")
    age: int = Field(gt=18, description="The age of the person must be greater than 18")
    place: str = Field(description="The place where the person lives")

parser = PydanticOutputParser(pydantic_object=Person)

template = PromptTemplate(
    template="Give me the name, age, and city of a fictional person from {country} \n {format_instructions}",
    input_variables=["country"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

prompt = template.invoke({"country":"India"})
print(prompt) # Prompt going to the LLM model BTS

print("\n")

chain = template | model | parser
result = chain.invoke({'country': "Sri Lanka"})
print(result)

text='Give me the name, age, and city of a fictional person from India \n The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"name": {"description": "The name of the person", "title": "Name", "type": "string"}, "age": {"description": "The age of the person must be greater than 18", "exclusiveMinimum": 18, "title": "Age", "type": "integer"}, "place": {"description": "The place where the person lives", "title": "Place", "type": "string"}}, "required": ["name", "age", "place"]}\n```'


name='Nimal Perera' age=30 place='Colombo'


##### **Structured Output Parsers-** Deprecated Library 
forces the LLM to output structured JSON using a pre-defined schema.  
This schema is declared using the ResponseSchema class.  
**Main Benefit:** You enforce the structure of the LLM response (specific keys like fact1, fact2, etc.).    
**Limitation -** It does not perform data validation (like data types).

In [None]:
from langchain_groq import ChatGroq
from langchain_core.prompts import PromptTemplate
from dotenv import load_dotenv
from langchain_core.output_parsers import StructuredOutputParser, ResponseSchema

load_dotenv()

model = ChatGroq(model="llama-3.3-70b-versatile")

schema = [
    ResponseSchema(name="Fact 1", description="Fact 1 about the topic"),
    ResponseSchema(name="Fact 2", description="Fact 2 about the topic"),
    ResponseSchema(name="Fact 3", description="Fact 3 about the topic"),
    ResponseSchema(name="Fact 4", description="Fact 4 about the topic"),
    ResponseSchema(name="Fact 5", description="Fact 5 about the topic")
]

parser = StructuredOutputParser.from_response_schemas(schema)

template = PromptTemplate(
    template = "Give me 5 facts about {topic} \n {format_instructions}",
    input_variables = ["topic"],
    partial_variables = {"format_instructions": parser.get_format_instructions()}
)

chain = template | model | parser
result = chain.invoke({"topic":"Black Hole"})
print(result)
