#**Calculate Employee's Term Life Insurance as per the Payscale**

###**Reliable JSON Output Generation with LLM, Parsing the JSON output for error handling & Pydantic (for JSON Schema Validation)**

#####**✅ Problem:**

When interacting with LLMs, we often want structured output (like JSON) rather than freeform text. However, models may produce output that:

- Is not valid JSON.

- Does not follow a pre-defined schema.

- Fails integration with downstream systems expecting structured data.

#####**🛠️ Solution:**

This notebook demonstrates how to:

- Prompt the LLM to return JSON output explicitly.

- Use JSON Schema to validate the structure of the output.

- Handle model responses gracefully when schema validation fails.

#####**Example Techniques Used:**

- Prompt engineering for instructing the LLM to produce JSON.

- Validation via jsonschema module.

##**0. Install Dependencies**

In [3]:
!pip install pydantic openai



##**1. Import Required Modules**

In [4]:
import json
from typing import Dict
from pydantic import BaseModel, ValidationError, RootModel
from openai import OpenAI

##**2. Get API key from Secret**

In [5]:
from google.colab import userdata
# Retrieve the API key from Colab's secrets
api_key = userdata.get('OPENAI_API_KEY')

##**3. Set Up OpenAI Client**

In [6]:
# Set your OpenAI API key
client = OpenAI(api_key=api_key)

In [38]:
#JSON Output Generation
prompt = """
Generate name, age, payscale and city of 5 employees like employee1, employee2 etc.
Respond ONLY with a JSON object.
"""

response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": prompt}]
    )
output = response.choices[0].message.content.strip()
print("Raw output:\n", output)

Raw output:
 {
  "employee1": {
    "name": "John Doe",
    "age": 30,
    "payscale": "$50,000",
    "city": "New York"
  },
  "employee2": {
    "name": "Jane Smith",
    "age": 28,
    "payscale": "$45,000",
    "city": "Los Angeles"
  },
  "employee3": {
    "name": "Mike Johnson",
    "age": 35,
    "payscale": "$60,000",
    "city": "Chicago"
  },
  "employee4": {
    "name": "Emily Wilson",
    "age": 26,
    "payscale": "$40,000",
    "city": "Houston"
  },
  "employee5": {
    "name": "Alex Rodriguez",
    "age": 32,
    "payscale": "$55,000",
    "city": "Miami"
  }
}


In [45]:
# Convert string to dictionary
data = json.loads(output)

In [46]:
print(data)

{'employee1': {'name': 'John Doe', 'age': 30, 'payscale': '$50,000', 'city': 'New York'}, 'employee2': {'name': 'Jane Smith', 'age': 28, 'payscale': '$45,000', 'city': 'Los Angeles'}, 'employee3': {'name': 'Mike Johnson', 'age': 35, 'payscale': '$60,000', 'city': 'Chicago'}, 'employee4': {'name': 'Emily Wilson', 'age': 26, 'payscale': '$40,000', 'city': 'Houston'}, 'employee5': {'name': 'Alex Rodriguez', 'age': 32, 'payscale': '$55,000', 'city': 'Miami'}}


In [78]:
# Access employee1's salary to calculate term life insurance provided ny company
print("Employee1 Term life insurance:", data["employee1"]["payscale"]*4)

Employee1 Term life insurance: $50,000$50,000$50,000$50,000


##**4. Define Pydantic Models**

In [79]:
class PersonInfo(BaseModel):
    name: str
    age: int
    payscale: int
    city: str

# Using RootModel to support JSON like {"employee1": {...}, "employee2": {...}}
class EmployeeDict(RootModel[Dict[str, PersonInfo]]):
    pass


In [80]:
from pydantic import BaseModel, field_validator
import re

class PersonInfo(BaseModel):
    name: str
    age: int
    payscale: int
    city: str

    @field_validator('payscale', mode='before')
    @classmethod
    def parse_payscale(cls, v):
        if isinstance(v, int):
            return v
        if isinstance(v, str):
            # Remove $ and commas
            cleaned = re.sub(r'[^\d]', '', v)
            return int(cleaned)
        raise ValueError("Invalid payscale format")

class EmployeeDict(RootModel[Dict[str, PersonInfo]]):
    pass

##**5. Compose Prompt and Call GPT**
- JSON Output Generation
- Parsing the JSON output for error handling
- Schema validation

In [87]:
#JSON Output Generation
prompt = """
Generate name, age, payscale and city of 5 employees like employee1, employee2 etc.
Respond ONLY with a JSON object.
"""

try:
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": prompt}]
    )
    output = response.choices[0].message.content.strip()
    print("Raw output:\n", output)

    try:
        #Parsing the JSON output for error handling
        data = json.loads(output)
        # Schema validation
        validated_data = EmployeeDict.model_validate(data)
        print("\n✅ Validated Data:\n", validated_data.model_dump())
    except json.JSONDecodeError as e:
        print("❌ JSON parsing failed:", e)
    except ValidationError as ve:
        print("❌ Validation failed:", ve)

except Exception as e:
    print("OpenAI API error:", e)

Raw output:
 {
  "employee1": {
    "name": "John Doe",
    "age": 28,
    "payscale": "$50,000",
    "city": "New York"
  },
  "employee2": {
    "name": "Jane Smith",
    "age": 35,
    "payscale": "$60,000",
    "city": "Los Angeles"
  },
  "employee3": {
    "name": "Michael Johnson",
    "age": 30,
    "payscale": "$55,000",
    "city": "Chicago"
  },
  "employee4": {
    "name": "Emily Davis",
    "age": 25,
    "payscale": "$45,000",
    "city": "San Francisco"
  },
  "employee5": {
    "name": "Sam Wilson",
    "age": 33,
    "payscale": "$70,000",
    "city": "Houston"
  }
}

✅ Validated Data:
 {'employee1': {'name': 'John Doe', 'age': 28, 'payscale': 50000, 'city': 'New York'}, 'employee2': {'name': 'Jane Smith', 'age': 35, 'payscale': 60000, 'city': 'Los Angeles'}, 'employee3': {'name': 'Michael Johnson', 'age': 30, 'payscale': 55000, 'city': 'Chicago'}, 'employee4': {'name': 'Emily Davis', 'age': 25, 'payscale': 45000, 'city': 'San Francisco'}, 'employee5': {'name': 'Sam Wi

In [88]:
data_dict = validated_data.model_dump()

In [89]:
print(data_dict['employee1']['payscale'])

50000


In [91]:
# Multiply employee1's salary by 4
print("\n💰 Calculate employee1's Term Life insurance:\n", data_dict['employee1']['payscale']*4)



💰 Calculate employee1's Term Life insurance:
 200000
