# Pattern 2. Grammar: 5 Examples (Including Deep Dive)

## Educational Arc: From Practice to Understanding

This notebook demonstrates the Grammar Pattern with **5 practical examples** that progress from high-level usage to foundational understanding:

1. **Insurance Forms** - Complex nested JSON extraction (Pydantic schemas)
2. **SQL Query Generation** - Generate safe SQL with BNF Grammar Pattern (outlines + llama-cpp)
3. **Pipe-Separated Data** - Extract structured data with strict format (Pydantic)
4. **English Grammar Correction** - Fix grammar with output constraints (outlines + regex)
5. **üéì Math Expression Generation** - **Deep dive into direct logits processing**

### üìö Abstraction Levels

| Level | Approach | Examples | What You See |
|-------|----------|----------|--------------|
| **High** | Pydantic + API | 1, 3 | Define schema, get structured output |
| **Medium** | outlines/llama-cpp | 2, 4 | Define grammar/regex, library handles rest |
| **Low** | Direct logits | **5** | **See HOW it works under the hood** |

**üí° Key Insight:** Example 5 reveals the **foundational mechanism** that `outlines` and `llama-cpp` use internally. Understanding this makes you appreciate why grammar constraints provide 100% guarantees!

---

## Notebook ideas

1) compare Structured Output and True Grammar Pattern.
2) demonstrate how to use True Grammar Pattern using llama-cpp and outlines.
3) üéì **Deep Dive**: Understand how it works internally with direct logits processing (Example 5)

## Cases + Ideas
* Guaranteed Format and Rule Compliance
* Generate Proper SQL Query
* Fix grammar in your input sentence (generating the correct one on output)
* Understand the underlying rules (math, natural language, query language, etc)

**Key Learning:** Grammar Pattern provides 100% guarantee of valid output format through token-level logits masking.

## Approach Comparison. Structured output vs Grammar Pattern

|  Feature | Grammar Pattern         | Structured Outputs                |
|----------|-------------------------|-----------------------------------|
|          | (outlines)              | (Azure OpenAI)                    |
| Implementation           | Self-hosted model       | Azure OpenAI API                  |
| Constraint Type          | Token-level grammar     | Schema + parsing                  |
| Safety Guarantee         | ‚úÖ HARD (impossible)     | ‚ö†Ô∏è  SOFT (almost impossible, 99%) |
| Grammar Support          | ‚úÖ BNF, regex, FSM       | ‚ùå Not supported                   |
| Dangerous SQL Blocking   | ‚úÖ Physically blocked    | ‚ö†Ô∏è  Relies on prompt              |
| Output Structure         | ‚ö†Ô∏è  Text (grammar)      | ‚úÖ Pydantic objects                |
| Setup Complexity         | Medium (install)        | Low (just API)                    |
| Runtime Performance      | Local (fast)            | API call (latency)                |
| Cost                     | Hardware only           | Per-token pricing                 |
| Model Control            | ‚úÖ Full control          | ‚ùå Server-side only                |

# Install required packages
!pip install openai pydantic python-dotenv transformers torch accelerate pandas outlines huggingface_hub

# For Example 5: Direct logits processing
!pip install transformers-cfg

# Optional: For llama-cpp alternative (requires manual model download)
# !pip install llama-cpp-python

## Installation

In [3]:
# Install required packages
!pip install openai pydantic python-dotenv transformers torch accelerate pandas outlines

# Optional: For llama-cpp alternative (requires manual model download)
# !pip install llama-cpp-python




## Setup for Structured Output. This Configuration is used in several places

In [4]:
import os
from openai import AzureOpenAI
from pydantic import BaseModel, Field
from typing import List, Optional, Literal
from enum import Enum
import json
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Verify API key is set
if not os.getenv("AZURE_OPENAI_API_KEY") or not os.getenv("AZURE_OPENAI_ENDPOINT") or not os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME"):
    raise ValueError("Please set AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_DEPLOYMENT_NAME environment variable")

# Setup Azure OpenAI client
client = AzureOpenAI(
    api_key=os.getenv("AZURE_OPENAI_KEY"),
    api_version="2024-12-01-preview",
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")
)

# gpt-4o, 4o-mini, 4.1-mini, and others could be used with slightly different results
MODEL = os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME")

print("‚úÖ Setup complete!")

‚úÖ Setup complete!


---
# Example 1: Insurance Forms - Complex Nested JSON Extraction
## Uses approach named Text Generation Inference (TGI) or Structured Output
### Structured Output: This approach works in OpenAI/Bedrock models using structured output
### Text Generation Inference (TGI): It also works in llama-cpp models through TGI

## Business Problem

1. Different fields, complex nesting
Insurance companies process thousands of claim forms daily. These forms contain:
  - Personal information, field mapping.
  - Multiple incidents (car accidents often have multiple vehicles)
  - Nested damage descriptions
  - Medical records
  - Financial details

2. Make the model less verbal in places where it's not necessary
  - Do not put extra words, just generate requested content.

**Challenge:** Parsing errors cause claim delays. Extra wording

**Solution:** Grammar Pattern with Pydantic Schema guarantees valid structure.

## Example 1. Step 1. Define Complex Insurance Schema

In [None]:
# Enums for controlled vocabularies
class ClaimType(str, Enum):
    AUTO = "auto"
    HEALTH = "health"
    HOME = "home"
    LIFE = "life"

class IncidentSeverity(str, Enum):
    MINOR = "minor"
    MODERATE = "moderate"
    SEVERE = "severe"
    TOTAL_LOSS = "total_loss"

class InjuryType(str, Enum):
    NONE = "none"
    MINOR = "minor"
    SERIOUS = "serious"
    CRITICAL = "critical"

# Nested structures - No @dataclass needed with BaseModel!
class PersonalInfo(BaseModel):
    full_name: str = Field(description="Full legal name")
    policy_number: str = Field(description="Insurance policy number")
    phone: str = Field(description="Contact phone number")
    email: Optional[str] = Field(default=None, description="Email address")

class Address(BaseModel):
    street: str
    city: str
    state: str
    zip_code: str

class VehicleInfo(BaseModel):
    license_plate: str
    vin: Optional[str] = Field(default=None, description="Vehicle Identification Number")
    make: str = Field(description="Vehicle manufacturer")
    model: str = Field(description="Vehicle model")
    year: int = Field(ge=1900, le=2030, description="Model year")

class DamageItem(BaseModel):
    component: str = Field(description="Damaged component (e.g., 'front bumper', 'windshield')")
    description: str = Field(description="Detailed damage description")
    estimated_cost: float = Field(ge=0, description="Estimated repair cost in USD")

class Injury(BaseModel):
    person_name: str
    injury_type: InjuryType
    description: str
    medical_facility: Optional[str] = Field(default=None, description="Hospital or clinic name")

class Incident(BaseModel):
    incident_location: Address
    severity: IncidentSeverity
    weather_conditions: Optional[str] = Field(default=None)
    description: str = Field(description="Detailed description of what happened")
    police_report_filed: bool = Field(default=False, description="Whether police report was filed")
    police_report_number: Optional[str] = Field(default=None, description="Police report number if filed")
    incident_date: str = Field(description="Date of the incident in the following format (YYYY-MM-DD)")

class AutoIncident(Incident):
    vehicles_involved: List[VehicleInfo] = Field(min_length=1)
    damages: List[DamageItem] = Field(description="List of damages to each vehicle")
    injuries: List[Injury] = Field(default=[], description="List of injuries")
    other_driver_info: Optional[PersonalInfo] = Field(default=None)

# Main claim structure
class InsuranceClaim(BaseModel):
    claim_type: ClaimType
    claimant: PersonalInfo
    incident: AutoIncident
    claim_id: str = Field(description="Unique claim identifier")
    filing_date: str = Field(description="Date claim was filed (YYYY-MM-DD)")
    total_estimated_cost: float = Field(ge=0, description="Total estimated cost in USD")
    priority: Literal["low", "medium", "high", "urgent"] = Field(
        default="medium",
        description="Claim priority level"
    )

print("‚úÖ Complex insurance schema setup is finished!")

## Example 1. Step 2. Sample Insurance Claim Form (Unstructured Text)
The idea is to extract JSON-formatted output which fits our schema to be processed further

In [None]:
insurance_form_text = """
CLAIM REPORT - AUTO ACCIDENT

Date Filed: January 5, 2026
Claim Reference: CLM-2026-00472

CLAIMANT INFORMATION:
Name: Sarah Johnson
Policy #: POL-847392-AZ
Contact: (555) 123-4567
Email: sarah.johnson@email.com

INCIDENT DETAILS:
Date of Accident: December 28, 2025
Location: 1234 Main Street, Phoenix, Arizona, 85001
Weather: Rainy conditions, reduced visibility
Police Report: Yes, Report #PX-2025-9847

DESCRIPTION:
I was driving northbound on Main Street at approximately 3:30 PM when another vehicle 
ran a red light and struck my vehicle on the passenger side. The impact caused 
significant damage to both vehicles. The other driver admitted fault at the scene.

MY VEHICLE:
2023 Toyota Camry
License Plate: ABC-1234
VIN: 1HGBH41JXMN109186

DAMAGES TO MY VEHICLE:
1. Front passenger door - Major dent and paint damage - Estimated $2,500
2. Rear passenger door - Moderate dent - Estimated $1,800
3. Passenger side mirror - Broken, needs replacement - Estimated $450
4. Front passenger window - Shattered - Estimated $350

OTHER VEHICLE:
2021 Honda Civic
License Plate: XYZ-9876
Driver: Michael Chen
Driver's Policy: POL-293847-CA
Driver's Phone: (555) 987-6543

DAMAGES TO OTHER VEHICLE:
1. Front bumper - Completely destroyed - Estimated $1,200
2. Hood - Crumpled - Estimated $2,000
3. Headlight assembly - Both broken - Estimated $800

INJURIES:
1. Sarah Johnson (me) - Minor whiplash and bruising - Treated at Phoenix General Hospital
2. Passenger Emma Johnson (my daughter, age 8) - Minor cuts from broken glass - 
   Treated at Phoenix General Hospital

SEVERITY ASSESSMENT: Moderate - vehicles drivable but require significant repairs

TOTAL ESTIMATED DAMAGES: $9,100

Priority: HIGH (injuries involved)
"""

print("Sample Insurance Form Loaded")
print("=" * 80)
print(insurance_form_text[:500] + "...")
print("\nüìÑ This unstructured text needs to be parsed into structured InsuranceClaim object")

## Example 1. Step 3. "Option 1, Pydantic Schema".
### Extract with Pydantic Schema and Grammar Pattern (Guaranteed Valid output)

In [None]:
def extract_insurance_claim(form_text: str) -> InsuranceClaim:
    """
    Extract insurance claim with Grammar Pattern guarantee.
    """
    system_prompt = """
    You are an expert insurance claim processing system. Extract all relevant information from the claim form.
    
    ## Insurance Form Structure:
    
    The insurance claim form typically contains the following sections:
    
    ### 1. HEADER SECTION
    - Claim ID/Reference number (e.g., "CLM-2026-00472")
    - Filing date (when the claim was submitted)
    
    ### 2. CLAIMANT INFORMATION SECTION
    - Full legal name of the person filing the claim
    - Policy number (format: POL-XXXXXX-XX)
    - Contact phone number (format: (XXX) XXX-XXXX)
    - Email address (optional)
    
    ### 3. INCIDENT DETAILS SECTION
    - Date of accident/incident (YYYY-MM-DD format)
    - Location: full address including street, city, state, and zip code
    - Weather conditions at time of incident (optional)
    - Police report information (whether filed, report number if available)
    - Detailed narrative description of what happened
    
    ### 4. VEHICLES INVOLVED (for auto claims)
    Each vehicle section includes:
    - Year, Make, Model (e.g., "2023 Toyota Camry")
    - License plate number
    - VIN (Vehicle Identification Number) - optional
    - For other vehicles: driver name, driver's policy number, contact info
    
    ### 5. DAMAGES SECTION
    List of damaged components for each vehicle:
    - Component name (e.g., "front bumper", "passenger door", "windshield")
    - Detailed description of the damage
    - Estimated repair cost in USD (numeric value)
    
    ### 6. INJURIES SECTION (if applicable)
    For each injured person:
    - Person's name
    - Type of injury (none, minor, serious, critical)
    - Description of injuries
    - Medical facility where treated (hospital/clinic name) - optional
    
    ### 7. ASSESSMENT SECTION
    - Severity classification (minor, moderate, severe, total_loss)
    - Total estimated damages (sum of all repair costs)
    - Priority level (low, medium, high, urgent) - often based on injuries and severity
    
    ## Extraction Rules:
    
    1. **Be thorough**: Extract ALL vehicles mentioned, even if it's the other driver's vehicle
    2. **Accuracy**: Use exact values from the form - don't approximate or round numbers
    3. **Dates**: Convert all dates to YYYY-MM-DD format
    4. **Costs**: Extract numeric values only, convert to float (remove $, commas)
    5. **Enums**: Map text to correct enum values:
       - Claim type: auto, health, home, life
       - Severity: minor, moderate, severe, total_loss
       - Injury type: none, minor, serious, critical
    6. **Missing data**: Use appropriate defaults:
       - Optional fields can be null
       - Use empty lists [] for injuries if none reported
    7. **Priority**: Infer from context:
       - urgent: life-threatening injuries or total loss
       - high: any injuries or severe damage
       - medium: moderate damage, no injuries
       - low: minor damage only
    
    ## Common Patterns to Watch For:
    
    - "MY VEHICLE" vs "OTHER VEHICLE" sections - both should be included
    - Damage estimates may be listed per item or as subtotals
    - Police report: "Yes" means filed=true, extract the report number
    - Multiple people may be injured in a single incident
    - The claimant's vehicle should be listed first in vehicles_involved
    
    Extract with precision and completeness.
    """
    
    response = client.beta.chat.completions.parse(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": form_text}
        ],
        response_format=InsuranceClaim,
        temperature=0
    )

    return response.choices[0].message.parsed

print("Extracting insurance claim with enhanced system prompt...")
print("=" * 80)

claim = extract_insurance_claim(insurance_form_text)

print("\n‚úÖ SUCCESSFULLY EXTRACTED!\n")
print(f"Claim ID: {claim.claim_id}")
print(f"Type: {claim.claim_type.value}")
print(f"Priority: {claim.priority}")
print(f"\nClaimant: {claim.claimant.full_name}")
print(f"Policy: {claim.claimant.policy_number}")
print(f"\nIncident Date: {claim.incident.incident_date}")
print(f"Location: {claim.incident.incident_location.street}, {claim.incident.incident_location.city}")
print(f"Severity: {claim.incident.severity.value}")
print(f"Police Report: {'Yes' if claim.incident.police_report_filed else 'No'}")

print(f"\nVehicles Involved: {len(claim.incident.vehicles_involved)}")
for i, vehicle in enumerate(claim.incident.vehicles_involved, 1):
    print(f"  {i}. {vehicle.year} {vehicle.make} {vehicle.model} ({vehicle.license_plate})")

print(f"\nDamages: {len(claim.incident.damages)}")
for i, damage in enumerate(claim.incident.damages, 1):
    print(f"  {i}. {damage.component}: ${damage.estimated_cost:,.2f}")

print(f"\nInjuries: {len(claim.incident.injuries)}")
for i, injury in enumerate(claim.incident.injuries, 1):
    print(f"  {i}. {injury.person_name}: {injury.injury_type.value} - {injury.description[:50]}...")

print(f"\nTotal Estimated Cost: ${claim.total_estimated_cost:,.2f}")

print("\n" + "=" * 80)
print("\nüéØ ENHANCED SYSTEM PROMPT BENEFITS:")
print("  ‚úÖ Explains insurance form structure in detail")
print("  ‚úÖ Provides clear extraction rules")
print("  ‚úÖ Guides on handling edge cases (MY VEHICLE vs OTHER VEHICLE)")
print("  ‚úÖ Specifies date and number formatting")
print("  ‚úÖ Maps common phrases to enum values")
print("  ‚úÖ Improves accuracy with context-specific instructions")

## Example 1. Step 3. "Option 2: Json-Object".
### Compare: JSON Mode With Pydantic Classes + Grammar Pattern
in order to use Json Mode you have to specify response format this way:

`response_format={"type": "json_object"}`

In [None]:
# Simulate WITHOUT grammar pattern (just JSON mode)
def extract_without_grammar(form_text: str) -> str:
    """
    Extract without schema constraint - just asks for JSON.
    """
    system_prompt = """
    Extract insurance claim information and return it in JSON format.
    Include: claim_id, claimant info, vehicles, damages, injuries, costs.
    Return valid JSON only.
    """
    
    response = client.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": form_text}
        ],
        response_format={"type": "json_object"},
        temperature=0
    )
    
    return response.choices[0].message.content

print("Extracting WITHOUT grammar pattern...")
print("=" * 80)

json_result = extract_without_grammar(insurance_form_text)
print("\nRaw JSON output:")
print(json.dumps(json.loads(json_result), indent=2))

# Try to parse into our schema
print("\n" + "=" * 80)
print("\n‚ö†Ô∏è PROBLEMS WITHOUT GRAMMAR PATTERN:")
try:
    parsed_dict = json.loads(json_result)
    print("‚úÖ Valid JSON")
    
    # But check if it matches our schema
    try:
        claim_obj = InsuranceClaim(**parsed_dict)
        print("‚úÖ Matches InsuranceClaim schema (lucky!)")
    except Exception as e:
        print(f"‚ùå Does NOT match InsuranceClaim schema!")
        print(f"   Error: {str(e)[:200]}...")
        
except json.JSONDecodeError as e:
    print(f"‚ùå Invalid JSON: {e}")

## Example 1. Step 4. Export to JSON for downstream systems

In [None]:
# Export the claim
claim_json = claim.model_dump_json(indent=2)

print("Final Structured Claim (ready for downstream systems):")
print("=" * 80)
print(claim_json)

# Save to file
with open('insurance_claim.json', 'w') as f:
    f.write(claim_json)

print("\n‚úÖ Saved to insurance_claim.json")
print("\nüéØ This JSON is GUARANTEED to be valid and processable!")

---
# Example 2: SQL Query Generation with TRUE BNF Grammar Pattern

## Business Problem

1. Generate Safe queries to fetch natural data using user's instructions.
2. Natural language to SQL system should generate syntactically correct queries:
- No Wrong column names
- Only valid SQL syntax
- Not Malformed WHERE clauses
- No dangerous operations. **DANGEROUS: DELETE, UPDATE, DROP operations from malicious/confused prompts**

**Solution:** Use **TRUE BNF Grammar Pattern** with llama-cpp to guarantee safe, valid SQL.

## What We'll Demonstrate

Unlike Example 1 (Structured Outputs with Azure OpenAI), this example uses:
- **Self-hosted model** (Qwen 2.5-3B in GGUF format)
- **llama-cpp-python** library
- **BNF Grammar** as a hard constraint (using outlines for Grammar)

This is the **REAL Grammar Pattern** from the article you mentioned!

## Step 1: Create Sample Database & Define BNF Grammar

We'll create an in-memory SQLite database and define the BNF grammar for safe SQL.

In [None]:
import sqlite3
import pandas as pd

# ============================================================================
# Part A: Create Sample Database
# ============================================================================
print("=" * 80)
print("PART A: CREATING SAMPLE DATABASE")
print("=" * 80)

# Create in-memory SQLite database
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()

# Create tables
cursor.execute('''
CREATE TABLE employees (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    department TEXT NOT NULL,
    salary REAL NOT NULL,
    hire_date TEXT NOT NULL,
    manager_id INTEGER
)
''')

cursor.execute('''
CREATE TABLE departments (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    budget REAL NOT NULL,
    location TEXT NOT NULL
)
''')

# Insert sample data
employees_data = [
    (1, 'John Smith', 'Engineering', 95000, '2020-01-15', None),
    (2, 'Sarah Johnson', 'Engineering', 87000, '2021-03-20', 1),
    (3, 'Michael Chen', 'Engineering', 82000, '2022-06-10', 1),
    (4, 'Emily Brown', 'Sales', 78000, '2020-08-01', None),
    (5, 'David Lee', 'Sales', 71000, '2021-11-15', 4),
    (6, 'Maria Garcia', 'Sales', 69000, '2023-02-01', 4),
    (7, 'James Wilson', 'Marketing', 76000, '2021-05-10', None),
    (8, 'Lisa Anderson', 'Marketing', 68000, '2022-09-20', 7),
    (9, 'Robert Taylor', 'HR', 72000, '2020-04-15', None),
    (10, 'Jennifer Martinez', 'HR', 65000, '2023-01-10', 9)
]

cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?, ?)', employees_data)

departments_data = [
    (1, 'Engineering', 500000, 'San Francisco'),
    (2, 'Sales', 300000, 'New York'),
    (3, 'Marketing', 250000, 'Los Angeles'),
    (4, 'HR', 150000, 'Chicago')
]

cursor.executemany('INSERT INTO departments VALUES (?, ?, ?, ?)', departments_data)
conn.commit()

print("‚úÖ Sample database created!")
print("\nTables:")
print("  ‚Ä¢ employees (id, name, department, salary, hire_date, manager_id)")
print("  ‚Ä¢ departments (id, name, budget, location)")

print("\nSample data:")
print(pd.read_sql_query("SELECT * FROM employees LIMIT 3", conn))
print("\n", pd.read_sql_query("SELECT * FROM departments", conn))

# ============================================================================
# Part B: Define BNF Grammar for Safe SQL
# ============================================================================
print("\n" + "=" * 80)
print("PART B: DEFINING BNF GRAMMAR FOR SAFE SQL")
print("=" * 80)

# BNF Grammar for Safe SQL (SELECT only)
SQL_BNF_GRAMMAR = """
root ::= ws select_statement ws

select_statement ::= "SELECT" ws select_list ws from_clause (ws where_clause)? (ws group_by_clause)? (ws order_by_clause)? (ws limit_clause)? ";"?

select_list ::= "*" | column_list
column_list ::= column_expr (ws "," ws column_expr)*
column_expr ::= (identifier ".")? identifier (ws "AS" ws identifier)?
            | aggregate_func "(" ws (column_expr | "*") ws ")" (ws "AS" ws identifier)?

aggregate_func ::= "COUNT" | "SUM" | "AVG" | "MIN" | "MAX"

from_clause ::= "FROM" ws table_ref (ws join_clause)*
table_ref ::= identifier (ws "AS"? ws identifier)?

join_clause ::= join_type ws "JOIN" ws table_ref ws "ON" ws condition
join_type ::= "INNER" | "LEFT" | "RIGHT" | "FULL"

where_clause ::= "WHERE" ws condition

condition ::= simple_condition (ws logic_op ws simple_condition)*
simple_condition ::= column_expr ws comparison_op ws value
                  | column_expr ws "BETWEEN" ws value ws "AND" ws value
                  | column_expr ws "IN" ws "(" ws value_list ws ")"
                  | "(" ws condition ws ")"

comparison_op ::= "=" | "!=" | "<" | ">" | "<=" | ">="
logic_op ::= "AND" | "OR"

group_by_clause ::= "GROUP BY" ws column_list (ws "HAVING" ws condition)?

order_by_clause ::= "ORDER BY" ws order_expr (ws "," ws order_expr)*
order_expr ::= column_expr (ws ("ASC" | "DESC"))?

limit_clause ::= "LIMIT" ws number (ws "OFFSET" ws number)?

value ::= number | string_literal | identifier
value_list ::= value (ws "," ws value)*
string_literal ::= "'" [^']* "'"

identifier ::= [a-zA-Z_][a-zA-Z0-9_]*
number ::= [0-9]+ ("." [0-9]+)?

ws ::= [ \t\n]*
"""

print("\nüìù BNF Grammar defined!")
print("\nüîí What this grammar enforces:")
print("   ‚úÖ root ::= ws select_statement ws")
print("      ‚Üí ONLY SELECT statements allowed")
print("      ‚Üí DELETE, UPDATE, INSERT, DROP are NOT in the grammar")
print("      ‚Üí Model CANNOT generate these tokens (logit = -inf)")
print("\n   ‚úÖ Supports:")
print("      ‚Ä¢ WHERE clauses with conditions")
print("      ‚Ä¢ JOINs (INNER, LEFT, RIGHT, FULL)")
print("      ‚Ä¢ GROUP BY with HAVING")
print("      ‚Ä¢ ORDER BY with ASC/DESC")
print("      ‚Ä¢ LIMIT and OFFSET")
print("      ‚Ä¢ Aggregate functions (COUNT, SUM, AVG, MIN, MAX)")
print("\n   ‚úÖ Blocks:")
print("      ‚Ä¢ DELETE (not in grammar)")
print("      ‚Ä¢ UPDATE (not in grammar)")
print("      ‚Ä¢ INSERT (not in grammar)")
print("      ‚Ä¢ DROP, TRUNCATE, ALTER, CREATE (not in grammar)")

print("\nüí° Key Insight:")
print("   Since 'DELETE', 'UPDATE', etc. don't appear anywhere in the BNF,")
print("   llama-cpp will set their token probabilities to -inf (impossible).")
print("   This is TRUE token-level enforcement!")

print("\n" + "=" * 80)
print("Database and Grammar are ready!")
print("=" * 80)

## Step 3: Load Self-Hosted Model which supports grammar

Now we'll load the model and use TRUE Grammar Pattern with BNF constraints.

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
from outlines import from_transformers
import torch

MODEL_NAME = "Qwen/Qwen2.5-3B-Instruct"

print("=" * 80)
print("LOADING MODEL WITH OUTLINES + HUGGINGFACE")
print("=" * 80)

print("\nüìö Using 'outlines' library for TRUE Grammar Pattern")
print("   ‚Ä¢ Supports BNF grammar, regex, and JSON schema constraints")
print("   ‚Ä¢ Token-level logits masking based on grammar rules")
print("   ‚Ä¢ Same approach used by HuggingFace TGI")
print("   ‚Ä¢ Works directly with HuggingFace transformers")

# Get HuggingFace token from environment (optional for non-gated models)
hf_token = os.getenv("HF_TOKEN")

print(f"\nLoading model: {MODEL_NAME}...")
print("This may take a few minutes on first run (downloads model)...\n")

# Step 1: Load HuggingFace tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, token=hf_token)

hf_model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    torch_dtype=torch.bfloat16,  # Use bfloat16 for efficiency
    device_map="auto",  # Automatically use GPU/MPS if available
    token=hf_token
)

print("‚úÖ HuggingFace model loaded")

# Step 2: Wrap with outlines for grammar support
print("\n‚è≥ Wrapping model with outlines...")
model = from_transformers(hf_model, tokenizer)

print("\n‚úÖ Model loaded successfully!")
print(f"   Model: {MODEL_NAME}")
print(f"   Device: {hf_model.device}")
print(f"   dtype: {hf_model.dtype}")

print("\n" + "=" * 80)
print("Now we can use TRUE Grammar constraints!")
print("=" * 80)


## Step 2. Test 1. Generate SQL with TRUE BNF Grammar Constraint
Now we'll use the BNF grammar to generate SQL queries with HARD constraints.

In [None]:
from outlines import Generator, regex

print("=" * 80)
print("CREATING SQL GRAMMAR WITH OUTLINES (REGEX)")
print("=" * 80)

print("\nüîß Grammar Type: Regex (Regular Expression)")
print("   ‚Ä¢ Constrains output to match specific patterns")
print("   ‚Ä¢ Token-level logits masking (tokens that don't match = -inf)")
print("   ‚Ä¢ TRUE Grammar Pattern (not just prompt engineering)")

# Define SQL regex pattern that only allows SELECT statements
sql_regex = r"SELECT[\s\S]+FROM[\s\S]+"

print("\nüìù SQL Regex Pattern:")
print(f"   {sql_regex}")

print("\nüîí This regex enforces:")
print("   ‚úÖ Must start with SELECT")
print("   ‚úÖ Must have FROM clause")
print("   ‚ùå IMPOSSIBLE: DELETE, UPDATE, INSERT, DROP, ALTER")
print("      (These keywords are not in the regex, so tokens are blocked)")

# Create generator with regex grammar
print("\n‚è≥ Compiling regex grammar...")
sql_generator = Generator(model, output_type=regex(sql_regex))
print("‚úÖ Grammar compiled!")

print("\nüí° How it works:")
print("   1. outlines compiles regex to finite state machine (FSM)")
print("   2. At each token generation:")
print("      ‚Üí FSM determines which tokens are valid")
print("      ‚Üí Invalid tokens get logit = -inf (impossible to generate)")
print("      ‚Üí Model MUST choose valid token")
print("   3. Output GUARANTEED to match regex")

print("\n" + "=" * 80)
print("TESTING: Generate SQL Queries with Grammar Constraint")
print("=" * 80)

# Test questions - SAFE requests
test_questions = [
    "Show me all employees in Engineering department",
    "What is the average salary by department?",
    "List employees who earn more than $75,000"
]

for i, question in enumerate(test_questions, 1):
    print(f"\n{'='*80}")
    print(f"Question {i}: {question}")
    print("-" * 80)
    
    # Create prompt
    prompt = f"""Generate a SQL SELECT query.

Database Schema:
- employees: id, name, department, salary, hire_date, manager_id
- departments: id, name, budget, location

Question: {question}

SQL Query:\n"""
    
    print("\n‚è≥ Generating with regex grammar constraint...")
    
    try:
        # Generate with grammar constraint
        sql_query = sql_generator(prompt)
        
        print(f"\n‚úÖ Generated SQL:")
        print(f"   {sql_query}")
        
        # Try to execute it
        try:
            # Clean the query
            clean_sql = sql_query.split(';')[0].strip()
            if not clean_sql.upper().startswith('SELECT'):
                clean_sql = 'SELECT' + clean_sql
                
            df = pd.read_sql_query(clean_sql, conn)
            print(f"\nüìä Execution: SUCCESS")
            print(f"   Rows returned: {len(df)}")
            if len(df) > 0:
                print(f"\n   Results preview:")
                print(df.head().to_string(index=False))
        except Exception as e:
            print(f"\n‚ö†Ô∏è Execution error: {str(e)[:150]}")
            print("   Note: Grammar ensures SELECT-only, but not perfect SQL syntax")
    except Exception as e:
        print(f"\n‚ùå Generation failed: {str(e)[:200]}")

print("\n" + "=" * 80)


## Step 3. Test 2. SQL Query generation Test. BNF Grammar Block Dangerous Operations?

Let's try to make the model generate dangerous SQL operations.

In [None]:
print("=" * 80)
print("TESTING GRAMMAR SAFETY: Attempting Dangerous Operations")
print("=" * 80)

# Dangerous requests that should be blocked
dangerous_questions = [
    "Delete all employees from the Engineering department",
    "Update all salaries to $100,000",
    "Drop the employees table"
]

for i, question in enumerate(dangerous_questions, 1):
    print(f"\n{'='*80}")
    print(f"‚ö†Ô∏è Dangerous Request {i}: {question}")
    print("-" * 80)
    
    prompt = f"""Generate SQL for this request.

Database Schema:
- employees: id, name, department, salary, hire_date, manager_id
- departments: id, name, budget, location

Request: {question}

SQL Query:\n"""
    
    print("\n‚è≥ Attempting to generate with grammar constraint...")
    
    try:
        # Generate with grammar constraint
        # The regex ONLY allows SELECT, so dangerous operations are impossible
        generated_sql = sql_generator(prompt)
        
        print(f"\nüìù What the model generated:")
        print(f"   {generated_sql}")
        
        # Check if it's a SELECT
        if generated_sql.strip().upper().startswith("SELECT"):
            print(f"\n‚úÖ SAFE: Grammar forced a SELECT query instead!")
            print(f"   The model CANNOT generate DELETE/UPDATE/DROP with this regex.")
            print(f"   Even when explicitly asked, grammar physically blocks it.")
        else:
            print(f"\n‚ö†Ô∏è Unexpected: Not a SELECT query")
            print(f"   This shouldn't happen with the regex constraint.")
    
    except Exception as e:
        print(f"\n‚ùå Generation failed: {str(e)[:150]}")
        print("   This might happen if the model can't satisfy the grammar")

print("\n" + "=" * 80)
print("üõ°Ô∏è GRAMMAR PATTERN SAFETY DEMONSTRATED:")
print("=" * 80)
print("\n‚úÖ Dangerous SQL operations were BLOCKED by grammar constraint")
print("   ‚Ä¢ DELETE, UPDATE, INSERT, DROP are NOT in the regex pattern")
print("   ‚Ä¢ Tokens for these keywords get logit = -inf (impossible)")
print("   ‚Ä¢ Model physically CANNOT generate them")
print("   ‚Ä¢ This is TRUE Grammar Pattern - hard constraint at token level")

print("\nüéØ Key Difference from Prompt Engineering:")
print("   ‚ùå Prompt: 'Only generate SELECT' ‚Üí Model might ignore")
print("   ‚úÖ Grammar: regex only allows SELECT ‚Üí Physically impossible to violate")


---
## Alternative: Using llama-cpp-python with BNF Grammar

The `llama-cpp-python` library provides an alternative way to use TRUE BNF Grammar Pattern.

**Key Differences:**
- `outlines`: Works with HuggingFace models (FP16/BF16), auto-downloads
- `llama-cpp`: Works with GGUF models (quantized), requires manual download

**Both provide TRUE Grammar Pattern** with token-level logits masking.

### When to use llama-cpp:
- Need smaller model size (quantized GGUF)
- Want faster inference on CPU
- Prefer BNF grammar over regex

### Setup:
```bash
# Install llama-cpp-python
pip install llama-cpp-python

# Download GGUF model (example)
# Visit: https://huggingface.co/Qwen/Qwen2.5-3B-Instruct-GGUF
# Download: qwen2.5-3b-instruct-q5_k_m.gguf (~2.5GB)
```

In [None]:
!pip install llama-cpp-python

In [None]:
# OPTIONAL: llama-cpp-python implementation
# Uncomment to use this alternative approach

from llama_cpp import Llama, LlamaGrammar
from huggingface_hub import hf_hub_download
import os

print("=" * 80)
print("LOADING MODEL WITH LLAMA-CPP + BNF GRAMMAR")
print("=" * 80)

# Create models directory if it doesn't exist
os.makedirs("models", exist_ok=True)

# Model details
REPO_ID = "Qwen/Qwen2.5-3B-Instruct-GGUF"
FILENAME = "qwen2.5-3b-instruct-q5_k_m.gguf"
MODEL_PATH = f"models/{FILENAME}"

# Download if not exists
if not os.path.exists(MODEL_PATH):
    print(f"\nüì• Downloading {FILENAME} (~2.5GB)...")
    print("   This may take 5-10 minutes depending on your connection...")
    MODEL_PATH = hf_hub_download(
        repo_id=REPO_ID,
        filename=FILENAME,
        local_dir="models",
        local_dir_use_symlinks=False
    )
    print("‚úÖ Download complete!")
else:
    print(f"\n‚úÖ Model already exists at {MODEL_PATH}")

print(f"\nüì¶ Loading model from: {MODEL_PATH}")
print("   This may take 30-60 seconds...\n")

# Load model with llama-cpp
llm = Llama(
    model_path=MODEL_PATH,
    n_ctx=2048,           # Context window
    n_threads=8,          # Use multiple threads
    n_gpu_layers=0,       # 0 = CPU only, increase for Metal/MPS
    verbose=False
)
    
print("‚úÖ Model loaded successfully!")
    
# Create BNF grammar
print("\n‚è≥ Compiling BNF grammar...")
sql_grammar = LlamaGrammar.from_string(SQL_BNF_GRAMMAR)
print("‚úÖ Grammar compiled!")
    
# Test generation
print("\n" + "=" * 80)
print("TESTING: Generate SQL with llama-cpp BNF Grammar")
print("=" * 80)
    
test_question = "What is the average salary by department?"
    
prompt = f"""Generate a SQL SELECT query.
Database Schema:
- employees: id, name, department, salary, hire_date, manager_id
- departments: id, name, budget, location

Question: {test_question}

SQL Query:\n"""
    
print(f"\nQuestion: {test_question}")
print("\n‚è≥ Generating with BNF grammar constraint...")
    
# Generate with grammar constraint
response = llm(
    prompt,
    max_tokens=150,
    temperature=0.3,
   grammar=sql_grammar,  # <-- BNF Grammar constraint!
    stop=[";", "\n\n"]
)
    
sql_query = response['choices'][0]['text'].strip()
    
print(f"\n‚úÖ Generated SQL:")
print(f"   {sql_query}")
    
# Try to execute
try:
    df = pd.read_sql_query(sql_query, conn)
    print(f"\nüìä Execution: SUCCESS")
    print(f"   Rows returned: {len(df)}")
    if len(df) > 0:
        print(f"\n   Results:")
        print(df.to_string(index=False))
except Exception as e:
    print(f"\n‚ö†Ô∏è Execution error: {str(e)[:150]}")

print("\n" + "=" * 80)
print("‚úÖ Both outlines and llama-cpp provide TRUE Grammar Pattern!")
print("   See comparison table below for detailed differences.")
print("=" * 80)

## Implementation Comparison: outlines vs llama-cpp

Both libraries provide TRUE Grammar Pattern with token-level logits masking.

| Feature | outlines | llama-cpp |
|---------|----------|-----------|
| Model Format | HuggingFace models | GGUF models |
| Precision | FP16/BF16 (full) | Quantized (Q4, Q5, Q8) |
| Model Download | ‚úÖ Auto-downloads | ‚ö†Ô∏è Manual download required |
| Grammar Support | Regex, JSON schema, FSM | ‚úÖ Full BNF grammar |
| Model Size | ~6GB (full precision) | ~2.5GB (Q5 quantization) |
| Token-level Masking | ‚úÖ Yes | ‚úÖ Yes |
| Format Guarantee | ‚úÖ 100% guaranteed | ‚úÖ 100% guaranteed |
| Grammar Violation | ‚ùå Physically impossible | ‚ùå Physically impossible |
| Library Ecosystem | transformers + outlines | llama-cpp-python |
| Hardware Optimization | GPU/MPS preferred | CPU optimized |
| Setup Complexity | Medium (pip install) | Medium (download + pip) |
| Best For | Latest models, GPU | Smaller footprint, CPU |
| Memory Usage | Higher (~6-8GB) | Lower (~3-4GB) |
| Integration | Python native | C++ backend |

**Bottom Line:** Both provide TRUE Grammar Pattern. Choose based on your infrastructure:
- **outlines**: When you want latest HuggingFace models with auto-download
- **llama-cpp**: When you need smaller model size and CPU efficiency

---
## Example 3. Pipe-separated extraction

Extract Named Entities from the text in the following format:
`SKU | Product Name | Price | Category`
---

In [None]:
# Sample product descriptions
product_descriptions = [
    """
    Apple iPhone 17 Pro - Latest flagship smartphone with Pro chip,
    48MP camera system, and titanium design. SKU: IPHONE-17-PRO-256.
    Price: $999. Category: Electronics/Smartphones
    """,
    """
    Samsung 65" QLED 4K Smart TV - Quantum HDR, Object Tracking Sound+, 
    and Gaming Hub. Model: QN65Q80C. Retail price: $1,299.99
    Category: Electronics/TVs
    """,
    """
    Nike Air Max 270 - Men's running shoes with visible Max Air cushioning.
    Style code: DM9652-001. Price $160. Category: Footwear/Athletic
    """,
    """
    Organic Green Tea - Premium loose leaf tea from Japan. 
    No SKU assigned yet. Price: $24.99 per package. Category: Groceries/Beverages
    """
]

print("Sample Product Descriptions:")
print("=" * 80)
for i, desc in enumerate(product_descriptions, 1):
    print(f"\n{i}. {desc.strip()[:100]}...")

print("\n" + "=" * 80)
print("\nüéØ Goal: Extract as pipe-separated format:")
print("   SKU | Product Name | Price | Category")
print("\nConstraints:")
print("  ‚Ä¢ Exactly 3 pipes (4 fields)")
print("  ‚Ä¢ Use NULL for missing SKU")
print("  ‚Ä¢ Only alphanumeric and basic punctuation")
print("  ‚Ä¢ Price must be numeric")

## Define Pipe-Separated Schema

For pipe-separated format, we'll use Pydantic schema approach (simpler than BNF for this case).

In [None]:
from pydantic import BaseModel, Field

class ProductRecord(BaseModel):
    """Single product in structured format (we'll convert to pipe-separated)"""
    sku: str = Field(description="Product SKU, use 'NULL' if not available")
    product_name: str = Field(description="Product name, alphanumeric only")
    price: float = Field(ge=0, description="Product price in USD")
    category: str = Field(description="Product category")
    
    def to_pipe_format(self) -> str:
        """Convert to pipe-separated format"""
        return f"{self.sku}|{self.product_name}|{self.price:.2f}|{self.category}"

def extract_product_record(description: str) -> ProductRecord:
    """
    Extract product information with Grammar Pattern.
    """
    system_prompt = """
    Extract product information from the description.
    Use 'NULL' for SKU if not provided.
    Clean product name to alphanumeric characters only.
    """
    
    response = client.beta.chat.completions.parse(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": description}
        ],
        response_format=ProductRecord,
        temperature=0
    )
    
    return response.choices[0].message.parsed

print("‚úÖ Pipe-separated extraction ready!")

## Extract Products in Pipe-Separated Format

In [None]:
print("Extracting Products in Pipe-Separated Format")
print("=" * 80)

pipe_records = []

for i, description in enumerate(product_descriptions, 1):
    print(f"\n{'='*80}")
    print(f"\nüì¶ Product {i}:")
    print("-" * 80)
    print(description.strip()[:150] + "...")
    
    # Extract with Grammar Pattern
    product = extract_product_record(description)
    
    # Convert to pipe format
    pipe_format = product.to_pipe_format()
    pipe_records.append(pipe_format)
    
    print(f"\n‚úÖ Extracted (structured):")
    print(f"   SKU: {product.sku}")
    print(f"   Name: {product.product_name}")
    print(f"   Price: ${product.price:.2f}")
    print(f"   Category: {product.category}")
    
    print(f"\nüìÑ Pipe-separated format:")
    print(f"   {pipe_format}")

print("\n" + "=" * 80)
print("\nüìã FINAL OUTPUT (ready for legacy system import):")
print("=" * 80)
for record in pipe_records:
    print(record)

print("\nüéØ GUARANTEES:")
print("  ‚úÖ Exactly 3 pipes in each line")
print("  ‚úÖ Price always numeric (float)")
print("  ‚úÖ NULL used for missing SKU")
print("  ‚úÖ No parsing errors")
print("  ‚úÖ Ready for direct import to legacy system")

## Save to File

In [None]:
# Save to PSV (Pipe-Separated Values) file
with open('products.psv', 'w') as f:
    f.write("SKU|Product Name|Price|Category\n")  # Header
    for record in pipe_records:
        f.write(record + "\n")

print("‚úÖ Saved to products.psv")
print("\nFile contents:")
with open('products.psv', 'r') as f:
    print(f.read())

print("\nüéØ This file is GUARANTEED to be:")
print("  ‚úÖ Properly formatted")
print("  ‚úÖ Importable by legacy systems")
print("  ‚úÖ No manual validation needed")

---
# Example 4: English Grammar Correction

## Business Problem

Grammar correction tools need to:
- Fix grammatical errors in user input
- Preserve the original meaning
- Output ONLY the corrected sentence (not explanations)
- Handle various types of errors (subject-verb agreement, tense, articles, etc.)

**Challenge:** LLMs naturally want to explain their corrections or add commentary.

**Solution:** Grammar Pattern constrains output to be valid English sentence only.

## What We'll Demonstrate

We'll use **regex grammar** with `outlines` to ensure:
1. Output is a single sentence
2. Starts with capital letter
3. Ends with proper punctuation
4. Contains only valid characters (no explanations, no bullet points)


## Define English Sentence Grammar

We'll use regex to constrain the output to valid English sentences:

```regex
^[A-Z][A-Za-z0-9\s,.'"\-!?]+[.!?]$
```

**What this regex enforces:**
- `^[A-Z]` - Must start with capital letter
- `[A-Za-z0-9\s,.'"\-!?]+` - Contains letters, numbers, spaces, and basic punctuation
- `[.!?]$` - Must end with sentence-ending punctuation

**What it prevents:**
- Multi-sentence responses
- Explanations (e.g., "The error was...")
- Bullet points or formatting
- Missing capitalization or punctuation


In [None]:
from outlines import Generator, regex

print("=" * 80)
print("CREATING ENGLISH SENTENCE GRAMMAR")
print("=" * 80)

# Regex for valid English sentence
# - Starts with capital letter
# - Contains alphanumeric, spaces, and basic punctuation
# - Ends with . ! or ?
english_sentence_regex = r"[A-Z][A-Za-z0-9\s,.'\"\-!?]*[.!?]"

print("\nüìù English Sentence Regex:")
print(f"   {english_sentence_regex}")

print("\nüîí This regex enforces:")
print("   ‚úÖ Starts with capital letter")
print("   ‚úÖ Valid characters only (letters, numbers, basic punctuation)")
print("   ‚úÖ Ends with proper punctuation (. ! ?)")
print("   ‚ùå BLOCKS: Multi-sentence responses, explanations, formatting")

print("\n‚è≥ Creating grammar-constrained generator...")
grammar_corrector = Generator(model, output_type=regex(english_sentence_regex))
print("‚úÖ Grammar generator created!")

print("\n" + "=" * 80)


## Test: Grammar Correction with Constrained Output

In [None]:
# Sample sentences with grammatical errors
test_sentences = [
    "She don't like pizza.",  # Subject-verb agreement
    "He go to school yesterday.",  # Tense error
    "I have a apple and orange.",  # Article error
    "They was happy about the news.",  # Subject-verb agreement
    "Me and him went to store.",  # Pronoun case error
    "The cat it is sleeping on the chair.",  # Redundant pronoun
    "She have three brother and two sister.",  # Multiple errors
]

In [None]:
print("=" * 80)
print("CORRECTING GRAMMAR WITH CONSTRAINED OUTPUT")
print("=" * 80)

for i, incorrect_sentence in enumerate(test_sentences, 1):
    print(f"\n{'='*80}")
    print(f"Example {i}:")
    print("-" * 80)
    print(f"‚ùå Original: {incorrect_sentence}")
    
    # Create prompt
    prompt = f"""Fix the grammar in this sentence. Output ONLY the corrected sentence, nothing else.

Incorrect: {incorrect_sentence}

Corrected: """
    
    print("\n‚è≥ Generating with grammar constraint...")
    
    try:
        # Generate with grammar constraint
        corrected = grammar_corrector(prompt)
        
        print(f"‚úÖ Corrected: {corrected}")
        
        # Verify it matches our regex
        import re
        if re.match(english_sentence_regex, corrected):
            print("   ‚úÖ Matches grammar constraints")
        else:
            print("   ‚ö†Ô∏è  Does not match regex (shouldn't happen!)")
            
    except Exception as e:
        print(f"‚ùå Generation failed: {str(e)[:150]}")

print("\n" + "=" * 80)
print("üéØ GRAMMAR PATTERN BENEFITS:")
print("=" * 80)
print("\n‚úÖ Output is ALWAYS a valid sentence:")
print("   ‚Ä¢ Starts with capital letter")
print("   ‚Ä¢ Ends with proper punctuation")
print("   ‚Ä¢ No extra explanations or commentary")
print("   ‚Ä¢ Single sentence only (no paragraphs)")
print("\n‚úÖ Physical constraint at token level:")
print("   ‚Ä¢ Model CANNOT generate invalid characters")
print("   ‚Ä¢ Model CANNOT skip capitalization or punctuation")
print("   ‚Ä¢ 100% guarantee of format compliance")


In [None]:
test_sentence = "She don't like pizza."

print("=" * 80)
print("COMPARISON: Grammar Pattern vs Regular Prompting")
print("=" * 80)
print(f"\nTest sentence: {test_sentence}")

# Without Grammar Pattern (using API)
print("\n" + "=" * 80)
print("‚ùå WITHOUT GRAMMAR PATTERN (regular prompting):")
print("-" * 80)

response_no_grammar = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are a grammar checker. Fix errors in user sentences."},
        {"role": "user", "content": f"Fix the grammar: {test_sentence}"}
    ],
    temperature=0
)

print(response_no_grammar.choices[0].message.content)

print("\n‚ö†Ô∏è Problems:")
print("   ‚Ä¢ May include explanation ('The error is...')")
print("   ‚Ä¢ May format with quotes or markdown")
print("   ‚Ä¢ May include multiple variations")
print("   ‚Ä¢ Inconsistent output format")
print("   ‚Ä¢ Hard to parse programmatically")

# With Grammar Pattern
print("\n" + "=" * 80)
print("‚úÖ WITH GRAMMAR PATTERN (regex constraint):")
print("-" * 80)

prompt = f"""Fix the grammar in this sentence. Output ONLY the corrected sentence.

Incorrect: {test_sentence}

Corrected: """

corrected = grammar_corrector(prompt)
print(corrected)

print("\n‚úÖ Benefits:")
print("   ‚Ä¢ Always just the corrected sentence")
print("   ‚Ä¢ No explanations or formatting")
print("   ‚Ä¢ Consistent output structure")
print("   ‚Ä¢ Easy to parse and use")
print("   ‚Ä¢ Physical guarantee (not just prompt)")

print("\n" + "=" * 80)
print("üí° KEY INSIGHT:")
print("=" * 80)
print("\nRegular prompting (soft constraint):")
print("   ‚ö†Ô∏è  'Please only output the sentence' ‚Üí Model might ignore")
print("   ‚ö†Ô∏è  Relies on model's instruction following")
print("   ‚ö†Ô∏è  Can be bypassed or misunderstood")
print("\nGrammar Pattern (hard constraint):")
print("   ‚úÖ Regex physically blocks invalid tokens")
print("   ‚úÖ Model CANNOT generate explanations")
print("   ‚úÖ 100% guaranteed format compliance")
print("\nüéØ Use Grammar Pattern when output format MUST be exact!")


---
# üéì Example 5: Math Expression Generation - Direct Logits Processing

## Deep Dive: Understanding How Grammar Constraints Work Internally

In **Examples 2 and 4**, we used the `outlines` library which provides a high-level abstraction. In the **llama-cpp alternative**, we showed another library approach.

**But how do they actually work under the hood?**

This example demonstrates the **foundational mechanism** that both `outlines` and `llama-cpp` use internally: **Direct Logits Processing**.

## Business Problem

Math tutoring applications need to:
- Extract mathematical expressions from word problems
- Ensure output is ONLY valid mathematical syntax (no explanations)
- Block natural language responses
- Generate symbolic expressions that can be evaluated programmatically

**Challenge**: LLMs naturally want to explain their reasoning instead of just providing expressions.

**Solution**: Use grammar constraints to physically block all tokens except valid math expressions.

## What We'll Demonstrate

Unlike previous examples, we'll:
- Use **direct logits manipulation** with `GrammarConstrainedLogitsProcessor`
- See the **low-level mechanism** that libraries abstract away
- Understand **how grammar constraints work at the token level**

## Comparison Table

| Approach | Library | Abstraction Level | Examples |
|----------|---------|-------------------|----------|
| Pydantic Schemas | OpenAI API | High (API-based) | 1, 3 |
| Regex/BNF Grammar | outlines | Medium (library) | 2, 4 |
| BNF Grammar | llama-cpp | Medium (library) | 2 (alternative) |
| **Direct Logits** | **transformers-cfg** | **Low (foundational)** | **5 (this!)** |

**Key Insight**: This is what `outlines` does when you write `Generator(model, output_type=regex(...))` - it creates a logits processor similar to what we'll implement here!

In [1]:
!pip install transformers-cfg

Collecting transformers-cfg
  Downloading transformers_cfg-0.2.7-py3-none-any.whl.metadata (12 kB)
Collecting termcolor>=2.4.0 (from transformers-cfg)
  Downloading termcolor-3.3.0-py3-none-any.whl.metadata (6.5 kB)
Collecting protobuf>=4.25.2 (from transformers-cfg)
  Downloading protobuf-6.33.4-cp39-abi3-macosx_10_9_universal2.whl.metadata (593 bytes)
Downloading transformers_cfg-0.2.7-py3-none-any.whl (67 kB)
Downloading protobuf-6.33.4-cp39-abi3-macosx_10_9_universal2.whl (427 kB)
Downloading termcolor-3.3.0-py3-none-any.whl (7.7 kB)
Installing collected packages: termcolor, protobuf, transformers-cfg
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m3/3[0m [transformers-cfg]
[1A[2KSuccessfully installed protobuf-6.33.4 termcolor-3.3.0 transformers-cfg-0.2.7


## Step 1: Define BNF Grammar for Math Expressions

We'll define a grammar that accepts simple arithmetic expressions:

```bnf
root ::= (expr "=" ws term "\n")+
expr ::= term ([-+*/] term)*
term ::= ident | num | "(" ws expr ")" ws
ident ::= [a-z] [a-z0-9_]* ws
num ::= [0-9]+ ws
ws ::= [ \t\n]*
```

**What this grammar enforces:**
- `root`: One or more expressions with equals signs
- `expr`: Terms connected by +, -, *, /
- `term`: Can be an identifier (bill_apples), a number (3), or a parenthesized expression
- `ident`: Variable names (lowercase letters, numbers, underscores)
- `num`: Integer numbers
- `ws`: Whitespace

**What it blocks:**
- Natural language explanations
- Multiple sentences
- Anything that's not a mathematical expression

In [5]:
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
from transformers_cfg.grammar_utils import IncrementalGrammarConstraint
from transformers_cfg.generation.logits_process import GrammarConstrainedLogitsProcessor
import torch
import os

print("=" * 80)
print("STEP 1: DEFINE BNF GRAMMAR FOR MATH EXPRESSIONS")
print("=" * 80)

# Define grammar for simple arithmetic expressions
grammar_str = """
root ::= (expr "=" ws term "\\n")+
expr ::= term ([-+*/] term)*
term ::= ident | num | "(" ws expr ")" ws
ident ::= [a-z] [a-z0-9_]* ws
num ::= [0-9]+ ws
ws ::= [ \\t\\n]*
"""

print("\nüìù Math Expression Grammar defined!")
print("\nüîí This grammar enforces:")
print("   ‚úÖ Expressions with equals sign (bill_apples = 3)")
print("   ‚úÖ Arithmetic operations (+, -, *, /)")
print("   ‚úÖ Variables and numbers")
print("   ‚úÖ Parentheses for grouping")
print("\n‚ùå This grammar blocks:")
print("   ‚ùå Natural language ('Bill has 3 apples')")
print("   ‚ùå Explanations ('The answer is 5 because...')")
print("   ‚ùå Anything that's not a math expression")

print("\n" + "=" * 80)
print("STEP 2: LOAD MODEL AND CREATE PIPELINE")
print("=" * 80)

# Check if model is already loaded (from Example 2)
try:
    # Test if hf_model exists
    _ = hf_model
    _ = tokenizer
    print("\n‚úÖ Using model already loaded from Example 2")
except NameError:
    # Model not loaded, load it now
    print("\n‚è≥ Model not found, loading Qwen model...")
    MODEL_NAME = "Qwen/Qwen2.5-3B-Instruct"
    hf_token = os.getenv("HF_TOKEN")
    
    tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, token=hf_token)
    hf_model = AutoModelForCausalLM.from_pretrained(
        MODEL_NAME,
        torch_dtype=torch.bfloat16,
        device_map="auto",
        token=hf_token
    )
    print("‚úÖ Model loaded successfully!")

# Create a text generation pipeline
pipe = pipeline(
    "text-generation",
    model=hf_model,
    tokenizer=tokenizer,
    max_new_tokens=256,
    do_sample=False
)

print("‚úÖ Pipeline created")

# Create grammar constraint
print("\n‚è≥ Creating grammar constraint and logits processor...")
grammar = IncrementalGrammarConstraint(grammar_str, "root", pipe.tokenizer)
grammar_processor = GrammarConstrainedLogitsProcessor(grammar)

print("‚úÖ Grammar constraint created!")
print("\nüí° How it works:")
print("   1. Grammar is compiled into a constraint object")
print("   2. GrammarConstrainedLogitsProcessor intercepts token generation")
print("   3. At each step, it sets logits to -inf for invalid tokens")
print("   4. Only valid tokens (per grammar) can be selected")
print("   5. Result: Output GUARANTEED to match grammar")

print("\n" + "=" * 80)

STEP 1: DEFINE BNF GRAMMAR FOR MATH EXPRESSIONS

üìù Math Expression Grammar defined!

üîí This grammar enforces:
   ‚úÖ Expressions with equals sign (bill_apples = 3)
   ‚úÖ Arithmetic operations (+, -, *, /)
   ‚úÖ Variables and numbers
   ‚úÖ Parentheses for grouping

‚ùå This grammar blocks:
   ‚ùå Natural language ('Bill has 3 apples')
   ‚ùå Explanations ('The answer is 5 because...')
   ‚ùå Anything that's not a math expression

STEP 2: LOAD MODEL AND CREATE PIPELINE

‚è≥ Model not found, loading Qwen model...


`torch_dtype` is deprecated! Use `dtype` instead!


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Device set to use mps
The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


‚úÖ Model loaded successfully!
‚úÖ Pipeline created

‚è≥ Creating grammar constraint and logits processor...
‚úÖ Grammar constraint created!

üí° How it works:
   1. Grammar is compiled into a constraint object
   2. GrammarConstrainedLogitsProcessor intercepts token generation
   3. At each step, it sets logits to -inf for invalid tokens
   4. Only valid tokens (per grammar) can be selected
   5. Result: Output GUARANTEED to match grammar



## Step 2: Test with Math Word Problem

Let's test with a real math word problem and see how the grammar constraint ensures we get only mathematical expressions.

In [6]:
print("=" * 80)
print("STEP 3: TESTING WITH MATH WORD PROBLEM")
print("=" * 80)

# Math word problem
math_question = """Bill has 3 apples and 2 oranges.
Mae has 2 apples and 4 oranges.
How many apples do Bill and Mae have in total?"""

print(f"\nüìù Math Problem:")
print(f"   {math_question}")

# Create system prompt
system_prompt = """You are a math instructor. I will ask you a math question.
Respond with the mathematical expression that can be used to solve the problem."""

# Combine into input message
input_message = f"{system_prompt}\n\nQuestion: {math_question}\n\nMath Expression:\n"

print("\n" + "=" * 80)
print("GENERATING WITH GRAMMAR CONSTRAINT")
print("=" * 80)

# Generate WITH grammar constraint
print("\n‚è≥ Generating with GrammarConstrainedLogitsProcessor...")
results = pipe(
    input_message,
    max_new_tokens=256,
    do_sample=False,
    logits_processor=[grammar_processor]  # <-- Grammar constraint!
)

# Extract generated text
generated_text = results[0]['generated_text']
# Get only the new part (after the input)
math_expression = generated_text[len(input_message):].strip()

print(f"\n‚úÖ Generated Math Expression:")
print("=" * 80)
print(math_expression)
print("=" * 80)

print("\nüéØ Analysis:")
if "bill_apples" in math_expression or "apples" in math_expression:
    print("   ‚úÖ Uses variable names (bill_apples, mae_apples, etc.)")
if "+" in math_expression:
    print("   ‚úÖ Contains arithmetic operations")
if "=" in math_expression:
    print("   ‚úÖ Contains equals sign")
if len(math_expression.split()) < 20:  # Short response
    print("   ‚úÖ Concise (no explanations)")
    
print("\nüí° Key Point:")
print("   The model COULD NOT generate explanations like:")
print("   ‚ùå 'Bill has 3 apples and Mae has 2, so...'")
print("   ‚ùå 'The answer is 5 because...'")
print("   ‚ùå 'To solve this, we add 3 + 2...'")
print("\n   Because those tokens are PHYSICALLY BLOCKED by the grammar!")

STEP 3: TESTING WITH MATH WORD PROBLEM

üìù Math Problem:
   Bill has 3 apples and 2 oranges.
Mae has 2 apples and 4 oranges.
How many apples do Bill and Mae have in total?

GENERATING WITH GRAMMAR CONSTRAINT

‚è≥ Generating with GrammarConstrainedLogitsProcessor...

‚úÖ Generated Math Expression:
3 +2
+2 +4

+3

=13
3 +2 +2 +4 = 13

3 +2 = 5
2 +4 = 6
5 +6 = 11

3 +2 +2 +4 = 11

3 +2 = 5
2 +4 = 6
5 +6 = 11

5 +6 = 11

11

=11

3 +2 +2 +4 = 11

11

=11

11

=11

11

=11

11

=11

11

=11

11

=11

11

=11

11

=11

11

=11

11

=11

11

=11

11

=11

11

=11

11

=11

11

=11

11

=11

11

=11

11

=11

11

=11

11

=11

11

=1

üéØ Analysis:
   ‚úÖ Contains arithmetic operations
   ‚úÖ Contains equals sign

üí° Key Point:
   The model COULD NOT generate explanations like:
   ‚ùå 'Bill has 3 apples and Mae has 2, so...'
   ‚ùå 'The answer is 5 because...'
   ‚ùå 'To solve this, we add 3 + 2...'

   Because those tokens are PHYSICALLY BLOCKED by the grammar!


## Step 3: Comparison - Without Grammar Constraint

Let's see what happens when we DON'T use the grammar constraint.

In [None]:
print("=" * 80)
print("COMPARISON: WITHOUT GRAMMAR CONSTRAINT")
print("=" * 80)

print("\n‚è≥ Generating WITHOUT GrammarConstrainedLogitsProcessor...")

# Generate WITHOUT grammar constraint
results_no_grammar = pipe(
    input_message,
    max_new_tokens=256,
    do_sample=False
    # No logits_processor!
)

# Extract generated text
generated_no_grammar = results_no_grammar[0]['generated_text']
response_no_grammar = generated_no_grammar[len(input_message):].strip()

print(f"\n‚ùå Generated Response (no constraint):")
print("=" * 80)
print(response_no_grammar[:500])  # Limit to first 500 chars
print("=" * 80)

print("\n‚ö†Ô∏è Problems WITHOUT grammar constraint:")
print("   ‚Ä¢ May include natural language explanations")
print("   ‚Ä¢ May provide the answer instead of expression")
print("   ‚Ä¢ May include reasoning steps")
print("   ‚Ä¢ Unpredictable format")
print("   ‚Ä¢ Hard to parse programmatically")

print("\n" + "=" * 80)
print("üéØ KEY INSIGHT: How Grammar Constraints Work")
print("=" * 80)
print("\nWhat outlines and llama-cpp do internally:")
print("   1. Compile grammar into a state machine")
print("   2. At each token generation step:")
print("      ‚Üí Check which tokens are valid per current state")
print("      ‚Üí Set logits of INVALID tokens to -inf")
print("      ‚Üí Model MUST choose from valid tokens")
print("   3. Result: 100% guaranteed format compliance")

print("\nThis is the foundational mechanism!")
print("   ‚Ä¢ outlines: Abstracts this into Generator(model, regex(...))")
print("   ‚Ä¢ llama-cpp: Abstracts this into grammar=LlamaGrammar(...)")
print("   ‚Ä¢ This example: Shows the raw logits processing")

print("\n‚úÖ Now you understand how grammar constraints work under the hood!")