#**Calculate Employee's Term Life Insurance as per the Payscale**

###**Reliable JSON Output Generation with LLM, Parsing the JSON output for error handling & Pydantic (for JSON Schema Validation)**

#####**✅ Problem:**

When interacting with LLMs, we often want structured output (like JSON) rather than freeform text. However, models may produce output that:

- Is not valid JSON.

- Does not follow a pre-defined schema.

- Fails integration with downstream systems expecting structured data.

#####**🛠️ Solution:**

This notebook demonstrates how to:

- Prompt the LLM to return JSON output explicitly.

- Use JSON Schema to validate the structure of the output.

- Handle model responses gracefully when schema validation fails.

#####**Example Techniques Used:**

- Prompt engineering for instructing the LLM to produce JSON.

- Validation via jsonschema module.

###**Install Dependencies**

In [1]:
!pip install pydantic openai



###**Import Required Modules**

In [2]:
import json
from typing import Dict
from pydantic import BaseModel, ValidationError, RootModel
from openai import OpenAI

###**Get API key from Secret and Set as ENV**

In [3]:
from google.colab import userdata
# Retrieve the API key from Colab's secrets
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')

In [4]:
import os
# Set the API key (recommended: store securely, not hardcoded)
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY

###**Set Up OpenAI Client**

In [5]:
client = OpenAI()

###**Generate Response Using Prompt**

In [6]:
#JSON Output Generation
prompt = """
Generate name, age, payscale and city of 5 employees like employee1, employee2 etc.
Respond ONLY with a JSON object.
"""

response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": prompt}]
    )
output = response.choices[0].message.content.strip()
print("Raw output:\n", output)

Raw output:
 {
  "employee1": {
    "name": "Alice",
    "age": 30,
    "payscale": "$60,000",
    "city": "New York"
  },
  "employee2": {
    "name": "Bob",
    "age": 35,
    "payscale": "$55,000",
    "city": "Los Angeles"
  },
  "employee3": {
    "name": "Charlie",
    "age": 28,
    "payscale": "$50,000",
    "city": "Chicago"
  },
  "employee4": {
    "name": "David",
    "age": 40,
    "payscale": "$70,000",
    "city": "Houston"
  },
  "employee5": {
    "name": "Emily",
    "age": 25,
    "payscale": "$45,000",
    "city": "Miami"
  }
}


In [7]:
# Convert string to dictionary
data = json.loads(output)

In [8]:
print(data)

{'employee1': {'name': 'Alice', 'age': 30, 'payscale': '$60,000', 'city': 'New York'}, 'employee2': {'name': 'Bob', 'age': 35, 'payscale': '$55,000', 'city': 'Los Angeles'}, 'employee3': {'name': 'Charlie', 'age': 28, 'payscale': '$50,000', 'city': 'Chicago'}, 'employee4': {'name': 'David', 'age': 40, 'payscale': '$70,000', 'city': 'Houston'}, 'employee5': {'name': 'Emily', 'age': 25, 'payscale': '$45,000', 'city': 'Miami'}}


In [9]:
# Access employee1's salary to calculate term life insurance provided ny company
print("Employee1 Term life insurance:", data["employee1"]["payscale"]*4)

Employee1 Term life insurance: $60,000$60,000$60,000$60,000


###**Define Pydantic Models**

In [17]:
from pydantic import BaseModel, field_validator
from typing import List, Optional

import re

class PersonInfo(BaseModel):
    name: str
    age: int
    payscale: int
    city: str

    @field_validator('payscale', mode='before')
    @classmethod
    def parse_payscale(cls, v):
        if isinstance(v, int):
            return v
        if isinstance(v, str):
            # Remove $ and commas
            cleaned = re.sub(r'[^\d]', '', v)
            return int(cleaned)
        raise ValueError("Invalid payscale format")

class EmployeeDict(BaseModel):
    employees: List[PersonInfo]

###**Compose Prompt and Call GPT**
- JSON Output Generation
- Parsing the JSON output for error handling
- Schema validation

In [None]:
#JSON Output Generation
prompt = """
Generate a JSON object containing a list of 5 employees.
Each employee object in the list should have the following keys: "name", "age", "payscale", and "city".
The top-level key of the JSON object should be "employees".
Respond ONLY with a JSON object.
"""

try:
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": prompt}]
    )
    output = response.choices[0].message.content.strip()
    print("Raw output:\n", output)

    try:
        #Parsing the JSON output for error handling
        data = json.loads(output)
        # Schema validation
        validated_data = EmployeeDict.model_validate(data)
        print("\n✅ Validated Data:\n", validated_data.model_dump())
    except json.JSONDecodeError as e:
        print("❌ JSON parsing failed:", e)
    except ValidationError as ve:
        print("❌ Validation failed:", ve)

except Exception as e:
    print("OpenAI API error:", e)

Raw output:
 {
  "employees": [
    {
      "name": "employee1",
      "age": 30,
      "payscale": "$50,000",
      "city": "New York"
    },
    {
      "name": "employee2",
      "age": 35,
      "payscale": "$60,000",
      "city": "Los Angeles"
    },
    {
      "name": "employee3",
      "age": 28,
      "payscale": "$45,000",
      "city": "Chicago"
    },
    {
      "name": "employee4",
      "age": 32,
      "payscale": "$55,000",
      "city": "Houston"
    },
    {
      "name": "employee5",
      "age": 40,
      "payscale": "$70,000",
      "city": "Miami"
    }
  ]
}

✅ Validated Data:
 {'employees': [{'name': 'employee1', 'age': 30, 'payscale': 50000, 'city': 'New York'}, {'name': 'employee2', 'age': 35, 'payscale': 60000, 'city': 'Los Angeles'}, {'name': 'employee3', 'age': 28, 'payscale': 45000, 'city': 'Chicago'}, {'name': 'employee4', 'age': 32, 'payscale': 55000, 'city': 'Houston'}, {'name': 'employee5', 'age': 40, 'payscale': 70000, 'city': 'Miami'}]}


In [24]:
data_dict = validated_data.model_dump()

In [34]:
print(data_dict['employees'][0]['payscale'] * 5)

250000
