# üîß Output Parsers in LangChain

## Overview
This notebook covers **Output Parsers**, which transform raw LLM text responses into structured data formats like JSON, lists, or custom objects.

## What You'll Learn
- Why output parsing is essential for applications
- Using `JsonOutputParser` to extract structured data
- Building chains with output parsers
- Handling real-world data extraction (financial data example)

## Prerequisites
```bash
pip install langchain langchain-groq python-dotenv
```

---

## Why Output Parsers?

LLMs return **unstructured text**, but applications often need **structured data**:

| LLM Output (Text) | Application Need |
|-------------------|------------------|
| "The revenue was $25 billion..." | `{"revenue": 25000000000}` |
| "Apple, Banana, Cherry" | `["Apple", "Banana", "Cherry"]` |
| "Yes, it's valid" | `True` |

**Output Parsers** bridge this gap by:
1. Instructing the LLM to format its response
2. Parsing the response into the desired structure

---

In [1]:
from langchain_groq import ChatGroq
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.exceptions import OutputParserException

## 1. Setup

Import the necessary modules:
- `ChatGroq`: LLM provider
- `PromptTemplate`: For creating structured prompts
- `JsonOutputParser`: Parses JSON from LLM responses

In [2]:
from dotenv import load_dotenv

load_dotenv()

True

In [3]:
prompt = '''
From the below news article, extract revenue and eps in JSON format containing the
following keys: 'revenue_actual', 'revenue_expected', 'eps_actual', 'eps_expected'. 

Each value should have a unit such as million or billion.

Only return the valid JSON. No preamble.

Article
=======
{article}
'''

pt = PromptTemplate.from_template(prompt)
pt



---

## 2. Creating the Extraction Prompt

### Key Elements of a Good Extraction Prompt:

1. **Clear Instructions**: Specify exactly what data to extract
2. **Output Format**: Define the JSON structure with key names
3. **No Preamble**: Tell the LLM to return only the JSON (no explanatory text)
4. **Examples**: (Optional) Provide sample outputs

### Our Use Case: Financial Data Extraction
Extract earnings data from news articles:
- `revenue_actual`: Reported revenue
- `revenue_expected`: Expected revenue
- `eps_actual`: Actual earnings per share
- `eps_expected`: Expected earnings per share

In [4]:
llm = ChatGroq(model_name="llama-3.3-70b-versatile")
chain = pt | llm

---

## 3. Building the Chain

Create a simple chain: `PromptTemplate ‚Üí LLM`

The chain will:
1. Fill in the article text into the prompt template
2. Send the formatted prompt to the LLM
3. Return the response

In [5]:
article_text = '''
Here‚Äôs what the company reported compared with what Wall Street was expecting, based on a survey of analysts by LSEG:

Earnings per share: 72 cents, adjusted vs. 58 cents expected
Revenue: $25.18 billion vs. $25.37 billion expected
'''
response = chain.invoke({'article': article_text})
response.content

'{"revenue_actual": "$25.18 billion", "revenue_expected": "$25.37 billion", "eps_actual": "72 cents", "eps_expected": "58 cents"}'

---

## 4. Testing with Real Articles

### Example 1: Generic Earnings Report
Let's extract data from a company earnings announcement.

In [6]:
article_text = '''
Here‚Äôs how the iPhone maker did versus LSEG consensus estimates for the quarter ending Sept. 28:  

Earnings per share: $1.64, adjusted, versus $1.60 estimated 
Revenue: $94.93 billion vs. $94.58 billion estimated 
'''
response = chain.invoke({'article': article_text})
response.content

'{"revenue_actual": "94.93 billion", "revenue_expected": "94.58 billion", "eps_actual": "1.64", "eps_expected": "1.60"}'

### Example 2: Apple Earnings Report
Testing with a different article to verify consistency.

In [7]:
parser = JsonOutputParser()
res = parser.parse(response.content)
res

{'revenue_actual': '94.93 billion',
 'revenue_expected': '94.58 billion',
 'eps_actual': '1.64',
 'eps_expected': '1.60'}

---

## 5. Parsing the JSON Response

The `JsonOutputParser` converts the string response into a Python dictionary.

**Why use a parser instead of `json.loads()`?**
- Handles edge cases (markdown code blocks, extra whitespace)
- Provides better error messages
- Can be integrated into chains

In [8]:
res['revenue_actual']

'94.93 billion'

---

## üìù Summary

### Output Parser Types in LangChain

| Parser | Output Type | Use Case |
|--------|-------------|----------|
| `JsonOutputParser` | `dict` | Structured data extraction |
| `StrOutputParser` | `str` | Clean text output |
| `CommaSeparatedListOutputParser` | `list` | Lists of items |
| `PydanticOutputParser` | Pydantic model | Type-validated objects |

### Best Practices
1. **Be specific** in your prompt about the expected JSON structure
2. **Use "No preamble"** to avoid explanatory text before JSON
3. **Handle errors** gracefully - LLMs don't always produce valid JSON
4. **Test with multiple inputs** to ensure consistency

### Chain with Parser
For cleaner code, add the parser directly to the chain:
```python
chain = prompt | llm | JsonOutputParser()
result = chain.invoke({'article': text})  # Returns dict directly
```

### Accessing Parsed Data

Now you can access individual fields like a regular Python dictionary!