<a href="https://colab.research.google.com/github/VSriram-py/ai-cloud/blob/main/car_structured_output_openai.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Structured Output for Car Specifications

This notebook demonstrates how to extract structured information about cars using OpenAI's API with LangChain.

## How to Get Your OpenAI API Key

Before running this notebook, you need an OpenAI API key. Follow these steps:

1. **Create an OpenAI Account**:
   - Go to [platform.openai.com](https://platform.openai.com)
   - Click "Sign up" or "Log in" if you already have a ChatGPT account
   - Complete email and phone verification

2. **Generate Your API Key**:
   - Once logged in, go to the API Keys section (Settings > API Keys)
   - Click "Create new secret key"
   - Give it a name (e.g., "Colab Car Analysis")
   - **IMPORTANT**: Copy and save this key immediately - you won't be able to see it again!

3. **Set Up Billing**:
   - Navigate to "Billing" in your account settings
   - Add payment information (OpenAI uses pay-per-use pricing)
   - You can set usage limits to control spending

4. **Security Best Practices**:
   - Never share your API key publicly
   - Don't commit it to GitHub or public repositories
   - Use environment variables or secure secret management
   - Regularly rotate your keys

ðŸ’¡ **For Google Colab**: We'll use Colab's `userdata` feature to store your API key securely.

### Install Required Packages

In [None]:
# Install required packages for Google Colab
!pip install langchain-openai pydantic python-dotenv -q

### Setup OpenAI API Key

**Option 1: Using Google Colab Secrets (Recommended)**
- Click the key icon (ðŸ”‘) in the left sidebar
- Add a new secret named `OPENAI_API_KEY`
- Paste your API key as the value
- Toggle on notebook access

**Option 2: Direct input (Less secure)**
- Uncomment the second line in the cell below
- Replace 'your-api-key-here' with your actual key

In [None]:
import os
from langchain_openai import ChatOpenAI

# Option 1: Using Colab Secrets (Recommended)
try:
    from google.colab import userdata
    api_key = userdata.get('OPENAI_API_KEY')
except:
    # Option 2: Direct input (for testing only)
    api_key = 'your-api-key-here'  # Replace with your actual API key

# Initialize OpenAI model
llm = ChatOpenAI(
    model="gpt-4o",  # You can also use "gpt-4o-mini" for lower cost
    api_key=api_key,
    temperature=0  # For more deterministic outputs
)

print("âœ“ OpenAI API initialized successfully!")

### Simple Schema Example

In [None]:
text = "The Tesla Model 3 is a 2023 electric vehicle with a starting price of $40,000."

In [None]:
from pydantic import BaseModel

class BasicCarInfo(BaseModel):
    brand: str
    model: str
    year: int
    fuel_type: str
    price_usd: float

In [None]:
llm_info = llm.with_structured_output(BasicCarInfo)

In [None]:
response = llm_info.invoke(f"Extract structured output from the text according to the schema. Text: {text}")
response

In [None]:
response.model_dump()

In [None]:
response.brand

### Complex Schema for Car Specifications

This schema works for all car types: Petrol, Diesel, and Electric Vehicles (EVs)

In [None]:
# First, let's see what raw information looks like
response = llm.invoke("What are the specifications of Tesla Model 3 2023?")
print(response.content)

In [None]:
from typing import Optional, Literal
from pydantic import BaseModel, Field


class EngineSpecs(BaseModel):
    """Engine specifications for petrol and diesel cars"""
    engine_type: Optional[str] = Field(None, description="Engine configuration (e.g., Inline-4, V6, V8)")
    displacement_liters: Optional[float] = Field(None, description="Engine displacement in liters")
    cylinders: Optional[int] = Field(None, description="Number of cylinders")
    horsepower_hp: Optional[int] = Field(None, description="Maximum horsepower")
    torque_nm: Optional[int] = Field(None, description="Maximum torque in Newton-meters")
    fuel_system: Optional[str] = Field(None, description="Fuel injection system type")


class ElectricSpecs(BaseModel):
    """Electric motor specifications for EVs"""
    motor_type: Optional[str] = Field(None, description="Type of electric motor (e.g., Permanent Magnet, AC Induction)")
    power_kw: Optional[float] = Field(None, description="Electric motor power in kilowatts")
    horsepower_hp: Optional[int] = Field(None, description="Electric motor horsepower equivalent")
    torque_nm: Optional[int] = Field(None, description="Electric motor torque in Newton-meters")
    drive_configuration: Optional[str] = Field(None, description="Drive configuration (e.g., RWD, AWD, Dual Motor)")


class BatterySpecs(BaseModel):
    """Battery specifications for EVs and hybrids"""
    capacity_kwh: Optional[float] = Field(None, description="Battery capacity in kilowatt-hours")
    type: Optional[str] = Field(None, description="Battery chemistry (e.g., Lithium-ion, LFP, NMC)")
    range_km: Optional[int] = Field(None, description="Electric range in kilometers (WLTP/EPA)")
    charging_time_hours: Optional[str] = Field(None, description="Charging time details (e.g., '0-80% in 30 min with fast charging')")
    fast_charging_kw: Optional[int] = Field(None, description="Maximum fast charging power in kilowatts")


class FuelSpecs(BaseModel):
    """Fuel specifications for petrol and diesel cars"""
    tank_capacity_liters: Optional[float] = Field(None, description="Fuel tank capacity in liters")
    fuel_economy_city_kmpl: Optional[float] = Field(None, description="City fuel economy in km per liter")
    fuel_economy_highway_kmpl: Optional[float] = Field(None, description="Highway fuel economy in km per liter")
    fuel_economy_combined_kmpl: Optional[float] = Field(None, description="Combined fuel economy in km per liter")
    emissions_co2_gkm: Optional[int] = Field(None, description="CO2 emissions in grams per kilometer")


class PerformanceSpecs(BaseModel):
    acceleration_0_100_kmh_sec: Optional[float] = Field(None, description="0-100 km/h acceleration time in seconds")
    top_speed_kmh: Optional[int] = Field(None, description="Maximum top speed in km/h")
    transmission: Optional[str] = Field(None, description="Transmission type (e.g., 6-speed manual, 8-speed automatic, Single-speed)")
    drivetrain: Optional[str] = Field(None, description="Drivetrain configuration (FWD, RWD, AWD, 4WD)")


class DimensionsSpecs(BaseModel):
    length_mm: Optional[int] = Field(None, description="Vehicle length in millimeters")
    width_mm: Optional[int] = Field(None, description="Vehicle width in millimeters")
    height_mm: Optional[int] = Field(None, description="Vehicle height in millimeters")
    wheelbase_mm: Optional[int] = Field(None, description="Wheelbase in millimeters")
    weight_kg: Optional[int] = Field(None, description="Curb weight in kilograms")
    seating_capacity: Optional[int] = Field(None, description="Number of seats")
    boot_capacity_liters: Optional[int] = Field(None, description="Boot/trunk capacity in liters")


class SafetyFeatures(BaseModel):
    airbags: Optional[int] = Field(None, description="Number of airbags")
    safety_rating: Optional[str] = Field(None, description="Safety rating (e.g., Euro NCAP 5-star, IIHS Top Safety Pick)")
    adas_features: Optional[list[str]] = Field(None, description="List of Advanced Driver Assistance Systems")


class CarSpecs(BaseModel):
    """Complete car specifications for Petrol, Diesel, and Electric Vehicles"""
    brand: str = Field(description="Manufacturer or brand name")
    model: str = Field(description="Car model name")
    variant: Optional[str] = Field(None, description="Specific variant or trim level")
    year: Optional[int] = Field(None, description="Model year")
    fuel_type: Literal["Petrol", "Diesel", "Electric", "Hybrid", "Plug-in Hybrid"] = Field(
        description="Primary fuel/power type"
    )

    # Conditional fields based on fuel type
    engine: Optional[EngineSpecs] = Field(None, description="Engine specs (for petrol/diesel cars)")
    electric_motor: Optional[ElectricSpecs] = Field(None, description="Electric motor specs (for EVs)")
    battery: Optional[BatterySpecs] = Field(None, description="Battery specs (for EVs and hybrids)")
    fuel: Optional[FuelSpecs] = Field(None, description="Fuel specs (for petrol/diesel cars)")

    # Common fields for all car types
    performance: PerformanceSpecs = Field(description="Performance specifications")
    dimensions: DimensionsSpecs = Field(description="Dimensions and capacity")
    safety: Optional[SafetyFeatures] = Field(None, description="Safety features and ratings")

    price_usd: Optional[float] = Field(None, description="Starting price in US dollars")
    price_inr: Optional[float] = Field(None, description="Starting price in Indian Rupees")

In [None]:
carSpec_llm = llm.with_structured_output(CarSpecs)

### Example 1: Electric Vehicle (Tesla Model 3)

In [None]:
query_ev = "What are the specifications of Tesla Model 3 Long Range 2023?"
response_ev = carSpec_llm.invoke(query_ev)
response_ev

In [None]:
response_ev.model_dump()

In [None]:
# Access specific attributes
print(f"Brand: {response_ev.brand}")
print(f"Model: {response_ev.model}")
print(f"Fuel Type: {response_ev.fuel_type}")
print(f"Battery Capacity: {response_ev.battery.capacity_kwh if response_ev.battery else 'N/A'} kWh")
print(f"Range: {response_ev.battery.range_km if response_ev.battery else 'N/A'} km")
print(f"0-100 km/h: {response_ev.performance.acceleration_0_100_kmh_sec} seconds")

### Example 2: Petrol Car (BMW 3 Series)

In [None]:
query_petrol = "What are the specifications of BMW 3 Series 330i 2023?"
response_petrol = carSpec_llm.invoke(query_petrol)
response_petrol

In [None]:
# Access engine specifications for petrol car
print(f"Brand: {response_petrol.brand}")
print(f"Model: {response_petrol.model}")
print(f"Fuel Type: {response_petrol.fuel_type}")
print(f"Engine: {response_petrol.engine.displacement_liters if response_petrol.engine else 'N/A'} L, {response_petrol.engine.cylinders if response_petrol.engine else 'N/A'} cylinders")
print(f"Horsepower: {response_petrol.engine.horsepower_hp if response_petrol.engine else 'N/A'} HP")
print(f"Fuel Economy (Combined): {response_petrol.fuel.fuel_economy_combined_kmpl if response_petrol.fuel else 'N/A'} km/l")

### Example 3: Diesel Car (Toyota Fortuner)

In [None]:
query_diesel = "What are the specifications of Toyota Fortuner 2.8 Diesel 4x4 2023?"
response_diesel = carSpec_llm.invoke(query_diesel)
response_diesel

In [None]:
# Access diesel engine specifications
print(f"Brand: {response_diesel.brand}")
print(f"Model: {response_diesel.model}")
print(f"Fuel Type: {response_diesel.fuel_type}")
print(f"Engine Displacement: {response_diesel.engine.displacement_liters if response_diesel.engine else 'N/A'} L")
print(f"Torque: {response_diesel.engine.torque_nm if response_diesel.engine else 'N/A'} Nm")
print(f"Drivetrain: {response_diesel.performance.drivetrain}")

### Compare Multiple Cars

In [None]:
import pandas as pd

# Create a comparison dataframe
cars_data = [
    {
        'Brand': response_ev.brand,
        'Model': response_ev.model,
        'Type': response_ev.fuel_type,
        'Power': f"{response_ev.electric_motor.horsepower_hp if response_ev.electric_motor else 'N/A'} HP",
        'Range/Efficiency': f"{response_ev.battery.range_km if response_ev.battery else 'N/A'} km",
        '0-100 km/h': f"{response_ev.performance.acceleration_0_100_kmh_sec} sec"
    },
    {
        'Brand': response_petrol.brand,
        'Model': response_petrol.model,
        'Type': response_petrol.fuel_type,
        'Power': f"{response_petrol.engine.horsepower_hp if response_petrol.engine else 'N/A'} HP",
        'Range/Efficiency': f"{response_petrol.fuel.fuel_economy_combined_kmpl if response_petrol.fuel else 'N/A'} km/l",
        '0-100 km/h': f"{response_petrol.performance.acceleration_0_100_kmh_sec} sec"
    },
    {
        'Brand': response_diesel.brand,
        'Model': response_diesel.model,
        'Type': response_diesel.fuel_type,
        'Power': f"{response_diesel.engine.horsepower_hp if response_diesel.engine else 'N/A'} HP",
        'Range/Efficiency': f"{response_diesel.fuel.fuel_economy_combined_kmpl if response_diesel.fuel else 'N/A'} km/l",
        '0-100 km/h': f"{response_diesel.performance.acceleration_0_100_kmh_sec} sec"
    }
]

df = pd.DataFrame(cars_data)
df

### Custom Query - Try Your Own Car!

In [None]:
# Try with your own car query
custom_query = "What are the specifications of Tata Nexon EV Max 2024?"  # Change this to any car
custom_response = carSpec_llm.invoke(custom_query)
custom_response

In [None]:
# Pretty print the response
import json
print(json.dumps(custom_response.model_dump(), indent=2))

## Notes

### Advantages of this approach:
1. **Structured Data**: Guaranteed JSON-formatted output matching your schema
2. **Type Safety**: Pydantic validates data types automatically
3. **Flexible**: Works with petrol, diesel, and electric vehicles
4. **Easy Integration**: Can be integrated into databases, APIs, or dashboards

### Cost Considerations:
- OpenAI charges per token (input + output)
- GPT-4o: ~$2.50 per 1M input tokens, ~$10 per 1M output tokens
- GPT-4o-mini: ~$0.15 per 1M input tokens, ~$0.60 per 1M output tokens
- Monitor usage in your OpenAI dashboard

### Tips:
- Set temperature=0 for consistent, deterministic outputs
- Use gpt-4o-mini for faster and cheaper responses during development
- Switch to gpt-4o for better accuracy in production
- Always validate critical information from the AI's response