
---

# 🧠 Lesson: Structured Output & Schema-Aware Prompting in LLMs

---

## 🔹 1. Experiment with Output Formats (e.g., JSON, XML)

---

### ✅ Core Principle:

> For non-creative or data-centric tasks, use **structured formats** like **JSON** or **XML** to ensure **consistency**, **parseability**, and **control**.

---

### 🤖 When to Use Structured Output:

You’re doing tasks like:

* **Data extraction**: Names, dates, prices, IDs
* **Categorization**: Product tagging, sentiment
* **Ranking or sorting**: Ordered lists of results
* **Entity recognition**: Pulling specific pieces of text
* **Filtering and summarizing**: Specific parts of a document

---

### ✅ Why JSON is the Most Popular Choice:

| Advantage                   | Description                                      |
| --------------------------- | ------------------------------------------------ |
| 📐 **Consistent Format**    | Keeps your response structure predictable        |
| 🎯 **Data Focused**         | You extract exactly what you ask for             |
| 🧠 **Less Hallucination**   | Model sticks to format rather than adding fluff  |
| 🔄 **Relationship-Aware**   | Nested objects define clear hierarchies          |
| 🔢 **Includes Data Types**  | Text, boolean, number, array, etc.               |
| 📊 **Easy to Post-process** | Ideal for APIs, automation, and database updates |

---

### 🧪 Example: Extracting User Details

#### ❌ Bad Prompt (Plaintext):

```
Extract the name, age, and email from this message:  
"Hi, I’m Sam, 32 years old, and my email is sam@example.com"
```

> Response might be:
> “The user is Sam, aged 32, with email [sam@example.com](mailto:sam@example.com).”

Hard to parse. Variable format.

---

#### ✅ Good Prompt (Structured JSON):

```
Extract the following information and return as a JSON object:
- name (string)
- age (integer)
- email (string)

Input: "Hi, I’m Sam, 32 years old, and my email is sam@example.com"

Output:
```

```json
{
  "name": "Sam",
  "age": 32,
  "email": "sam@example.com"
}
```

🎯 Now the output is:

* Machine-readable
* Consistent
* Type-safe

---

## 🔧 2. JSON Repair

---

### ✅ Problem:

> JSON is **token-heavy**, and long generations can get **cut off**, leaving you with **broken JSON** like:

```json
{
  "user": {
    "name": "Sam",
    "email": "sam@example.com",
    "age":
```

This happens when:

* Your prompt or input is too long
* You didn’t cap token usage with `max_tokens`
* The model ran out of room

---

### 🔥 Solution: Use `jsonrepair` library

#### 🛠 How It Works:

`jsonrepair` auto-corrects malformed JSON by:

* Adding missing brackets
* Fixing syntax errors
* Closing open quotes or commas

---

### 📦 Example in Node.js:

```javascript
import { jsonrepair } from 'jsonrepair';

const brokenJSON = `{"name": "Sam", "age": 32,`;
const fixedJSON = jsonrepair(brokenJSON);
console.log(JSON.parse(fixedJSON));
```

✅ This lets you build **robust systems** that don’t break if the model output is slightly invalid.

---

### 🧠 Best Practices:

| Problem           | Solution                          |
| ----------------- | --------------------------------- |
| Truncated output  | Use `jsonrepair`                  |
| Large input       | Preprocess & trim irrelevant data |
| Unexpected format | Include schema in prompt          |

---

## 🔸 3. Working with Schemas

---

### ✅ Core Idea:

> **JSON Schema** tells the model *what kind of structure and data types* to follow — like a blueprint for structured understanding.

---

### 🧱 What is a JSON Schema?

A **schema** defines:

* The **keys** expected
* Their **types** (string, array, number)
* Whether they are **required**
* Any **format constraints** (email, date-time, UUID)

---

### 📐 Example JSON Schema:

```json
{
  "type": "object",
  "properties": {
    "product_name": { "type": "string" },
    "price": { "type": "number" },
    "release_date": { "type": "string", "format": "date" },
    "in_stock": { "type": "boolean" }
  },
  "required": ["product_name", "price", "release_date"]
}
```

---

### 🧠 How to Use This with LLMs:

#### Prompt:

```
You are a data extractor. Output product data in this format:

Schema:
- product_name (string)
- price (number)
- release_date (YYYY-MM-DD)
- in_stock (true/false)

Input: "The new Echo speaker was released on 2023-11-15 for $89.99 and is currently available."
```

#### Output:

```json
{
  "product_name": "Echo speaker",
  "price": 89.99,
  "release_date": "2023-11-15",
  "in_stock": true
}
```

---

### 🧠 Why Schema Helps:

| Benefit                          | Description                                           |
| -------------------------------- | ----------------------------------------------------- |
| 🧠 **Directs Attention**         | Focuses model on relevant fields                      |
| 🛡️ **Reduces Hallucination**    | Less chance of the model “making up” stuff            |
| 📅 **Adds Time-Awareness**       | With date formats like YYYY-MM-DD                     |
| ⚙️ **Ready for API Integration** | Pass structured results to databases, endpoints, etc. |

---

## ⚠️ Common Pitfalls & How to Handle

| Problem                 | Fix                                         |
| ----------------------- | ------------------------------------------- |
| JSON too long → cut off | Use `max_tokens` or chunking strategy       |
| Invalid JSON            | Use `jsonrepair`                            |
| Model ignores schema    | Reinforce structure with examples           |
| Unwanted text with JSON | Use system prompt: "Return only valid JSON" |

---

## 🔁 Bonus Pro Tip: Schema + Few-Shot Examples = 🔥

Combine **schemas** and **few-shot examples** to **nail structure + accuracy**.

### Prompt:

```
Format: JSON  
Fields: title, price, release_date

Examples:
Text: “The iPhone 13 launched on Sept 24, 2021, priced at $799.”  
Output:  
{ "title": "iPhone 13", "price": 799, "release_date": "2021-09-24" }

Text: “Samsung’s latest phone hit shelves in March 2023 for $999.”  
Output:  
{ "title": "Samsung phone", "price": 999, "release_date": "2023-03-01" }

Now do this:
Text: "Pixel 8 came out in October 2023 for $699"
```

✅ Result: Structured, accurate, and format-consistent response.

---

## 🧠 Summary Cheat Sheet

| Concept                      | Key Insight                                    | Tip                                         |
| ---------------------------- | ---------------------------------------------- | ------------------------------------------- |
| **Structured Output (JSON)** | Control, consistency, parseability             | Use for data tasks                          |
| **JSON Repair**              | Fix broken JSON on-the-fly                     | Use `jsonrepair`                            |
| **Schema Prompting**         | Teach the model structure                      | Use JSON Schema or bullet field definitions |
| **Token Efficiency**         | JSON can be heavy                              | Combine with max\_tokens and chunking       |
| **Few-shot + Schema**        | Combines example-based learning with structure | Mix formatting examples with fields         |

---
