LAB M. 106 API Calling JSON Report
Cindy Lund

This lab extends your product listing generator by adding robust data validation using Pydantic. You'll learn how to ensure data quality before processing, which is crucial for production systems.

What you'll build:

Pydantic models for product data validation
JSON validation system
Integration with your existing ChatGPT API workflow
Error handling for invalid data
A complete validated API pipeline

In [1]:
%pip install datasets openai python-dotenv pydantic Pillow

Note: you may need to restart the kernel to use updated packages.


In [2]:
from datasets import load_dataset

dataset = load_dataset("ashraq/fashion-product-images-small", split="train[:100]")
print(len(dataset))
                  

  from .autonotebook import tqdm as notebook_tqdm


100


In [3]:
import PIL
print(PIL.__version__)


12.1.0


In [4]:
# Install: pip install datasets

from pathlib import Path


print("Loading product dataset...")


Loading product dataset...


In [5]:
# import Pydantic for data validation
from pydantic import BaseModel, Field, ValidationError




In [6]:
#checkpoint (before writing any validation logic)

import pydantic
print("Pydantic version:", pydantic.__version__)

Pydantic version: 2.12.5


In [7]:
#Create the Input model: ProductRequest with the Pydantic basics model
from pydantic import BaseModel, Field, ValidationError, field_validator
from typing import Optional, List

class ProductRequest(BaseModel):
    name: str = Field(min_length=1)
    price: float
    category: str = Field(min_length=1)

    # Optional fields (allowed but not required)
    brand: Optional[str] = None
    description: Optional[str] = None
    features: Optional[List[str]] = None

    @field_validator("price")
    @classmethod
    def price_must_be_positive(cls, v):
        if v <= 0:
            raise ValueError("price must be > 0")
        return v


In [8]:
#Test of the ProductRequest model with valid and invalid data
print("\n--- ProductRequest tests ---")

# 1) Valid example
try:
    req_ok = ProductRequest(name="Running Shoes", price=79.99, category="Footwear")
    print("✓ Valid request:", req_ok.model_dump())
except ValidationError as e:
    print("✗ Should be valid, but got error:")
    print(e)

# 2) Invalid example (negative price)
try:
    req_bad = ProductRequest(name="Running Shoes", price=-1, category="Footwear")
    print(req_bad)
except ValidationError as e:
    print("✓ ValidationError (expected):")
    print(e)



--- ProductRequest tests ---
✓ Valid request: {'name': 'Running Shoes', 'price': 79.99, 'category': 'Footwear', 'brand': None, 'description': None, 'features': None}
✓ ValidationError (expected):
1 validation error for ProductRequest
price
  Value error, price must be > 0 [type=value_error, input_value=-1, input_type=int]
    For further information visit https://errors.pydantic.dev/2.12/v/value_error


In [9]:
#Define the Output model: ProductListing with the Pydantic basics model
from pydantic import BaseModel, Field, ValidationError
from typing import List

class ProductListing(BaseModel):
    title: str = Field(min_length=1)
    description: str = Field(min_length=1)
    features: List[str]
    keywords: List[str]

print("\n--- ProductListing tests ---")

# 1) Valid output example (what ChatGPT should return)
try:
    listing_ok = ProductListing(
        title="Lightweight Running Shoes",
        description="Breathable running shoes designed for comfort and speed.",
        features=["Breathable mesh", "Lightweight sole", "Comfort fit"],
        keywords=["running shoes", "lightweight", "breathable", "sports footwear"]
    )
    print("✓ Valid ProductListing:", listing_ok.model_dump())
except ValidationError as e:
    print("✗ Should be valid, but got error:")
    print(e)

# 2) Invalid output example (missing required field)
try:
    listing_bad = ProductListing(
        title="Lightweight Running Shoes",
        features=["Breathable mesh"],
        keywords=["running shoes"]
    )
except ValidationError as e:
    print("✓ ValidationError (expected for missing fields):")
    print(e.errors())




--- ProductListing tests ---
✓ Valid ProductListing: {'title': 'Lightweight Running Shoes', 'description': 'Breathable running shoes designed for comfort and speed.', 'features': ['Breathable mesh', 'Lightweight sole', 'Comfort fit'], 'keywords': ['running shoes', 'lightweight', 'breathable', 'sports footwear']}
✓ ValidationError (expected for missing fields):
[{'type': 'missing', 'loc': ('description',), 'msg': 'Field required', 'input': {'title': 'Lightweight Running Shoes', 'features': ['Breathable mesh'], 'keywords': ['running shoes']}, 'url': 'https://errors.pydantic.dev/2.12/v/missing'}]


In [10]:
# Utility function to load JSON with error handling and Implement two functions

import json
from pathlib import Path

def load_json(path: str | Path):
    """
    Load JSON from a file.
    Returns (data, None) on success
    Returns (None, error_message) on JSONDecodeError or FileNotFoundError
    """
    path = Path(path)
    try:
        with path.open("r", encoding="utf-8") as f:
            return json.load(f), None
    except FileNotFoundError:
        return None, f"File not found: {path}"
    except json.JSONDecodeError as e:
        return None, f"Invalid JSON format in {path}: {e}"


In [11]:
#Implement validate_request(data)
from pydantic import ValidationError

def validate_request(data: dict):
    """
    Validate incoming request data using ProductRequest.
    Returns (ProductRequest_obj, None) on success
    Returns (None, errors_list) on ValidationError
    """
    try:
        req = ProductRequest(**data)
        return req, None
    except ValidationError as e:
        return None, e.errors()
    
#checkpoint: test the validate_request function with sample JSON files (valid and invalid)
tests = [
    "sample_requests/valid_product.json",
    "sample_requests/invalid_negative_price.json",
    "sample_requests/invalid_missing_name.json",
]

for p in tests:
    data, load_err = load_json(p)
    print(f"\n=== {p} ===")

    if load_err:
        print("LOAD ERROR:", load_err)
        continue

    req, val_err = validate_request(data)
    if val_err is None:
        print("✓ VALID -> ProductRequest")
        print(req.model_dump())
    else:
        print("✗ INVALID -> field errors (e.errors())")
        print(val_err)




=== sample_requests/valid_product.json ===
✓ VALID -> ProductRequest
{'name': 'Running Shoes', 'price': 79.99, 'category': 'Footwear', 'brand': 'Acme', 'description': 'Lightweight shoes for daily training', 'features': ['Breathable mesh', 'Rubber sole']}

=== sample_requests/invalid_negative_price.json ===
✗ INVALID -> field errors (e.errors())
[{'type': 'value_error', 'loc': ('price',), 'msg': 'Value error, price must be > 0', 'input': -5, 'ctx': {'error': ValueError('price must be > 0')}, 'url': 'https://errors.pydantic.dev/2.12/v/value_error'}]

=== sample_requests/invalid_missing_name.json ===
✗ INVALID -> field errors (e.errors())
[{'type': 'missing', 'loc': ('name',), 'msg': 'Field required', 'input': {'category': 'electronics', 'price': 49.99}, 'url': 'https://errors.pydantic.dev/2.12/v/missing'}]


In [12]:
import random

# Lab M1.06 — Prompt builder with Style Guides & Personas (JSON-only, no images)
def create_prompt_from_request(req: ProductRequest) -> str:
    """
    Build a prompt from validated ProductRequest (JSON input).
    No images. Strict JSON output.
    Includes style guidelines and personas for richer descriptions.
    """

    # ---------- Style rules ----------
    STYLE_RULES = """
Writing rules:
- Write a natural, engaging product description (no filler).
- Vary the opening sentence; avoid generic openings like "This product is".
- Be specific and concrete.
- Focus on benefits, not just features.
- Do not mention images or visual analysis.
"""

    # ---------- Optional personas ----------
    PERSONAS = [
        "Tone: modern minimalist, crisp and factual.",
        "Tone: sporty and energetic, performance-focused.",
        "Tone: premium and refined, elegant and confident.",
        "Tone: friendly and approachable, clear and helpful."
    ]

    # Pick a stable persona based on product name (deterministic)
    rng = random.Random(req.name)
    persona = rng.choice(PERSONAS)

    # ---------- Product information ----------
    info_lines = [
        f"- Name: {req.name}",
        f"- Category: {req.category}",
        f"- Price: {req.price}",
    ]

    if req.brand:
        info_lines.append(f"- Brand: {req.brand}")
    if req.description:
        info_lines.append(f"- Description: {req.description}")
    if req.features:
        info_lines.append(f"- Features: {', '.join(req.features)}")

    product_info = "\n".join(info_lines)

    # ---------- Final prompt ----------
    return f"""You are an expert e-commerce copywriter.

{persona}

{STYLE_RULES}

Product Information:
{product_info}

Your task:
- Create a compelling, professional product listing.
- Do NOT reference images.
- Do NOT invent specifications that are not implied by the data.

Return STRICT JSON only. No markdown. No extra text.
Use EXACTLY this JSON structure:

{{
  "title": "SEO-friendly product title (max 60 characters)",
  "description": "Detailed, persuasive description (150–200 words)",
  "features": ["5–7 concise bullet features"],
  "keywords": ["10–15 SEO-relevant keywords"]
}}
"""


In [13]:
import os, json, time
from openai import OpenAI
from pydantic import ValidationError

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def generate_listing_from_request(req: ProductRequest, max_retries=2):
    prompt = create_prompt_from_request(req)

    for attempt in range(max_retries + 1):
        try:
            resp = client.responses.create(
                model="gpt-4o-mini",
                input=[{
                    "role": "user",
                    "content": [{"type": "input_text", "text": prompt}]
                }]
            )

            parsed = json.loads(resp.output_text)
            listing = ProductListing.model_validate(parsed)
            return listing, None

        except (json.JSONDecodeError, ValidationError) as e:
            return None, {"type": "validation_error", "details": str(e)}

        except Exception as e:
            return None, {"type": "api_error", "details": str(e)}

    return None, {"type": "unknown_error"}


In [14]:
def process_request_file(path: str):
    data, load_err = load_json(path)
    if load_err:
        return {"status": "load_error", "error": load_err}

    req, val_err = validate_request(data)
    if val_err:
        return {"status": "validation_error", "errors": val_err}

    listing, api_err = generate_listing_from_request(req)
    if api_err:
        return {"status": "processing_error", "error": api_err}

    return {"status": "ok", "listing": listing.model_dump()}


print(process_request_file("sample_requests/invalid_negative_price.json"))
print(process_request_file("sample_requests/valid_product.json"))


{'status': 'validation_error', 'errors': [{'type': 'value_error', 'loc': ('price',), 'msg': 'Value error, price must be > 0', 'input': -5, 'ctx': {'error': ValueError('price must be > 0')}, 'url': 'https://errors.pydantic.dev/2.12/v/value_error'}]}
{'status': 'ok', 'listing': {'title': 'Acme Lightweight Running Shoes for Daily Training', 'description': "Elevate your daily workout with Acme's Lightweight Running Shoes, designed specifically for the discerning athlete. Crafted with breathable mesh, these shoes ensure your feet remain cool and comfortable, allowing for greater focus on your performance. The rubber sole provides exceptional grip and durability, making every stride confident and responsive on any surface. Perfectly balancing style and functionality, these shoes transition seamlessly from the track to casual outings. Experience the harmony of advanced engineering and elegant design, propelling you toward your fitness goals with poise and assurance.", 'features': ['Lightweigh

In [15]:
#write a batch processing function that takes a folder of JSON files, runs them through the pipeline, and collects results and errors in separate lists for summary reporting at the end.
from pathlib import Path

def run_batch_from_folder(folder: str):
    folder = Path(folder)
    paths = sorted(folder.glob("*.json"))

    results = []
    errors = []

    for path in paths:
        out = process_request_file(str(path))  # reuses your Step 4 pipeline

        if out["status"] == "ok":
            results.append({"file": path.name, "listing": out["listing"]})
            print(f"✓ OK: {path.name}")
        else:
            errors.append({"file": path.name, **out})
            print(f"⚠ FAIL: {path.name} — {out['status']}")

    print("\n===== BATCH SUMMARY =====")
    print(f"{len(results)} succeeded, {len(errors)} failed")

    return results, errors

results, errors = run_batch_from_folder("sample_requests")


⚠ FAIL: invalid_missing_name.json — validation_error
⚠ FAIL: invalid_negative_price.json — validation_error
✓ OK: valid_product.json

===== BATCH SUMMARY =====
1 succeeded, 2 failed


In [16]:
# Request hanler that is like the front door of an API, which takes a JSON payload, runs it through the validation and processing pipeline, and returns a structured response dict with status codes and messages.

def handle_request(payload: dict):
    """
    Front door request handler:
    payload -> validate input -> process -> validate output -> return response dict
    """
    # 1) Validate input
    req, val_err = validate_request(payload)
    if val_err:
        return {
            "status_code": 422,
            "body": {
                "message": "Validation error",
                "errors": val_err
            }
        }

    # 2) Process (API call + output validation)
    listing, api_err = generate_listing_from_request(req)
    if api_err:
        return {
            "status_code": 500,
            "body": {
                "message": "Processing error",
                "error": api_err
            }
        }

    # 3) Success
    return {
        "status_code": 200,
        "body": {
            "message": "Success",
            "data": listing.model_dump()
        }
    }


print("\n--- handle_request tests ---")

valid_payload = {
    "name": "Running Shoes",
    "price": 79.99,
    "category": "Footwear"
}

invalid_payload = {
    "name": "   ",
    "price": -5,
    "category": ""
}
# Test the handle_request function with both valid and invalid payloads, and print the structured responses to verify correct status codes and error messages.

print(handle_request(valid_payload))
print(handle_request(invalid_payload))



--- handle_request tests ---
{'status_code': 200, 'body': {'message': 'Success', 'data': {'title': 'Premium Performance Running Shoes for Every Athlete', 'description': 'Elevate your running experience with our expertly crafted Running Shoes, designed to provide unparalleled comfort and support. Engineered for both seasoned runners and weekend warriors, these shoes feature advanced cushioning that absorbs impact, allowing you to focus solely on your performance. The breathable materials keep your feet cool and dry, while the elegant design seamlessly transitions from the track to casual outings. Enjoy an optimal fit with a true-to-size structure that enhances stability, reducing the risk of injury. At just $79.99, you invest in a blend of style, function, and durability that propels your passion for running to new heights.', 'features': ['Advanced cushioning for superior comfort', 'Breathable materials for optimal airflow', 'True-to-size fit enhancing stability', 'Elegant design suita