### Input and Output Parsing with Serialization in LangChain and OpenAI

This program demonstrates how to use Pydantic, Datetime, CommaSeparatedList, and OutputFixing parsers from LangChain in simple terms. Each section validates or fixes input data and outputs the results in a clear, understandable way.

- In the following program we will be parsing both ways - with and without the LLM. 

In [140]:
# Read the open ai API key from your text file
f = open('C:\\Users\\Shailendra Kadre\\Desktop\\OPEN_AI_KEY.txt')
api_key = f.read()

In [78]:
# Import necessary parsers from LangChain
from langchain.output_parsers import PydanticOutputParser  # For parsing data into Pydantic models
from langchain.output_parsers import DatetimeOutputParser  # For parsing datetime strings
from langchain.output_parsers import CommaSeparatedListOutputParser  # For parsing comma-separated lists
from langchain.output_parsers import OutputFixingParser  # For fixing and handling parser errors
from pydantic import BaseModel  # Import Pydantic to define structured data models

In [80]:
# Sample input data
input_data = "2025-01-22 12:30:00, Shailendra, Engineering, Deep Learning"  # Example input string

# Define a Pydantic model for structured data
class DataModel(BaseModel):
    date_time: str
    name: str
    field: str
    subject: str

In [82]:
# First, use DatetimeOutputParser to parse the date and time from input data
from langchain.output_parsers import DatetimeOutputParser
from datetime import datetime

# Example input data
input_data = "2025-01-22 12:30:00, Shailendra, Engineering, Deep Learning"

# Extract the datetime string (first part before the comma)
datetime_str = input_data.split(',')[0].strip()

# Convert the datetime string to ISO 8601 format (with milliseconds and 'Z' suffix)
try:
    iso_datetime_str = datetime.strptime(datetime_str, "%Y-%m-%d %H:%M:%S").isoformat(timespec='milliseconds') + "Z"
    print("ISO 8601 formatted datetime:", iso_datetime_str)  # Output the formatted datetime string
except ValueError as e:
    print("Datetime Parsing Error:", e)
    iso_datetime_str = None  # Handle the error gracefully

# Initialize the DatetimeOutputParser if datetime_str is valid
if iso_datetime_str:
    datetime_parser = DatetimeOutputParser()  # Initialize the datetime parser

    try:
        # Now parse with the new parser if iso_datetime_str is valid
        datetime_parsed = datetime_parser.parse(iso_datetime_str)  # Pass the correctly formatted datetime string
        print("Parsed Date and Time:", datetime_parsed)  # Output the parsed datetime
    except Exception as e:
        print("Parsing Error:", e)
else:
    print("Invalid datetime format.")

ISO 8601 formatted datetime: 2025-01-22T12:30:00.000Z
Parsed Date and Time: 2025-01-22 12:30:00


In [83]:
# Next, use CommaSeparatedListOutputParser to parse the remaining comma-separated values
comma_str = ', '.join(input_data.split(',')[1:])  # Extract the part after the datetime and join into a string
comma_parser = CommaSeparatedListOutputParser()  # Initialize the parser for comma-separated lists
comma_parsed = comma_parser.parse(comma_str)  # Parse the comma-separated values
print("Parsed Comma-Separated List:", comma_parsed)  # Output the parsed list

Parsed Comma-Separated List: ['Shailendra', 'Engineering', 'Deep Learning']


In [106]:
# Let's also demonstrate OutputFixingParser for fixing and parsing faulty inputs
from langchain.llms import OpenAI
from langchain.llms import OpenAI
from langchain.output_parsers import DatetimeOutputParser, OutputFixingParser
from datetime import datetime

# Replace with your actual OpenAI API key
api_key = api_key  # <-- Ensure this is your actual API key

# Initialize the LLM with the correct API key
llm = OpenAI(temperature=0.7, openai_api_key=api_key)

# Example input data with datetime in the correct format
datetime_str = "2025-01-22T12:30:00.000Z"  # Ensure this matches the required format

# Initialize the datetime parser
datetime_parser = DatetimeOutputParser()

# Use the OutputFixingParser with the LLM
output_fixer = OutputFixingParser.from_llm(parser=datetime_parser, llm=llm)

# Fix and parse the datetime string
try:
    fixed_output = output_fixer.parse(datetime_str)  # Parsing the datetime string
    print("Fixed Output:", fixed_output)  # Output the fixed parsed result
except Exception as e:
    print("Error:", e)


Fixed Output: 2025-01-22 12:30:00


In [None]:
# Demo of pydantic parser
# Install the library if not done already
#!pip install pydantic

In [111]:
from langchain.output_parsers import PydanticOutputParser
class Planet(BaseModel):
    name: str = Field(description="Name of a planet")
    discoveries: list = Field(description="Python list of three facts about it")

In [131]:
query = 'Name a well known planet and a list of three facts about it' 

In [136]:
parser = PydanticOutputParser(pydantic_object=Planet)
print(parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"name": {"description": "Name of a planet", "title": "Name", "type": "string"}, "discoveries": {"description": "Python list of three facts about it", "items": {}, "title": "Discoveries", "type": "array"}}, "required": ["name", "discoveries"]}
```


In [142]:
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI  # Ensure that OpenAI is correctly imported

# Initialize your OpenAI model with the correct API key
#api_key = "your_openai_api_key_here"  # Replace with your actual API key
llm = OpenAI(temperature=0.7, openai_api_key=api_key)

# Assuming you're working with simple text output
# Create a prompt template with placeholders for 'query' and 'format_instructions'
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},  # Adjust if needed
)

# Format the prompt with the user's query
_input = prompt.format_prompt(query='Name a well known planet and a list of three facts about it')

# Send the formatted prompt to the model and receive the output
output = llm(_input.to_string())

# Output is plain text, so you can directly print or process it
print("Model Output:", output)


Model Output: 
{"name": "Earth", "discoveries": ["Earth is the third planet from the Sun", "It is the only known planet to have liquid water on its surface", "Earth has a single natural satellite, the Moon"]}
