# LangChain Structured Output

- Why this is needed? LLM generally respond in a natural language. But, if you want to make your llm request some data from a given API, then llm has to provide the response in a specific format(which is usually given by the user). This concept is very much required while building agents.
- To get structured output, we need to have some idea on Pydantic python third party package and TypedDict from core python.
- In market whatever LLMs are present, some can generate structured output and some cannot.
- But using langchain, we can generate structured output from both these types.
- For the LLMs which can provide structured output, there is a method called `.with_structured_output()`
- For the LLMs which cannot provide structured output, we can use something called **Output Parsers**

**`.with_structured_output()`**

- Its pretty simple. After creating Model instance, we invoke the `.with_structured_output()` method on it and it returns an instance of the model which can return structured output.
- Only thing we need to know is, the method, needs the format in which the output will be returned. That format can be given in 3 ways.
  1. Using built-in Typed Dictionary
  2. Using Pydantic
  3. JSON Schema
- `.with_structured_output()` method takes a parameter called `method`. That parameter value can be `Literal['function_calling', 'json_mode', 'json_schema']`.
  - If you want the structured output generated by the LLM to be used for function calling, you can use the value `function_calling`.
  - If you want the structured output generated by LLM in JSON format(could be for API call or some other reason), use the value `json_mode`.

## TypedDict

- Its something similar to how we define type in TypeScript.
- After defining the type, the IDE will suggest you what type is expected.
- The only thing is, if you do not obey the rule, still it will not throw any error.

### Basic Implementation

In [1]:
from typing import TypedDict


class Employee(TypedDict):
    id: int
    name: str

amit: Employee = {
    'id': 123456,
    'name': 'Amit Kumar Mahapatra'
}

print(amit)

{'id': 123456, 'name': 'Amit Kumar Mahapatra'}


### Structured Output - TypedDict

In [None]:
from typing import Literal, TypedDict, Dict, Annotated

from dotenv import load_dotenv
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace


load_dotenv()


# class ReviewAction(TypedDict):
#     review_sentiment: Literal['Positive', 'Negative']
#     action: str

class ReviewAction(TypedDict):
    review_sentiment: Annotated[Literal['Positive', 'Negative'], "Generate the sentiment of the review"]
    action: Annotated[str, "Based on the review, please generate the next action item"]


llm = HuggingFaceEndpoint(
    repo_id='',
    task='text-generation'
)

model = ChatHuggingFace(llm=llm).with_structured_output(schema=ReviewAction)

response: Dict = model.invoke('This is a great purchase.')
print( response )

response: Dict = model.invoke('This is a horrible purchase.')
print( response )

**What happens in the backend?**

When we are sending the prompt, a newer prompt is being generated which is saying something like "You are a helpful AI assitant. Looking at the review, please generate the review sentiment(Positive or Negative) and the next action item. Return the response in JSON format.".

And because the model is trained with json data, it can return JSON response.

## Pydantic

### Structured Output - Pydantic

In [None]:
from typing import Literal, Dict, Annotated

from dotenv import load_dotenv
from pydantic import BaseModel, Field
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace


load_dotenv()


class ReviewAction(BaseModel):
    review_sentiment: Literal['Positive', 'Negative'] = Field(description="Generate the sentiment of the review as either Positive or Negative")
    action: str =Field(description="Based on the review, please generate the next action item")
    rating: int = Field(gt=0, lt=6, description='Rating of the review, which should be between 1 and 5(both included)')

llm = HuggingFaceEndpoint(
    repo_id='',
    task='text-generation'
)

model = ChatHuggingFace(llm=llm).with_structured_output(schema=ReviewAction)

response: Dict = model.invoke('This is a great purchase.')
print( response )

response: Dict = model.invoke('This is a horrible purchase.')
print( response )

## JSON Schema

In [None]:
from typing import Literal, Dict, Annotated

from dotenv import load_dotenv
from pydantic import BaseModel, Field
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace


load_dotenv()


review_action = {
  "title": "ReviewAction",
  "type": "object",
  "properties": {
    "review_sentiment": {
      "type": "string",
      "description": "Generate the sentiment of the review as either Positive or Negative",
      "enum": ["Positive", "Negative"]
    },
    "action": {
      "type": "string",
      "description": "Based on the review, please generate the next action item"
    },
    "rating": {
      "type": "integer",
      "description": "Rating of the review, which should be between 1 and 5(both included)",
      "minimum": 1,
      "maximum": 5
    }
  },
  "required": ["review_sentiment", "action", "rating"]
}

llm = HuggingFaceEndpoint(
    repo_id='',
    task='text-generation'
)

model = ChatHuggingFace(llm=llm).with_structured_output(schema=review_action)

response: Dict = model.invoke('This is a great purchase.')
print( response )

response: Dict = model.invoke('This is a horrible purchase.')
print( response )

## Which one to use when

**All these format is created to check of the LLM is providing structured output or not. The validation is not for your written code.**

- When you trust that LLM is going to provide correct data.
- When you don't trust your LLM and need default values of LLM miss anything.
- JSON Schema is for some reason. you have to use languages other than python.