<a href="https://colab.research.google.com/github/CesarChalcoElias/prompting-techniques/blob/main/03_Constrained_and_Guided_Generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Constrained and Guided Generation

### **Overview**  
This tutorial explores the concepts of constrained and guided generation in the context of large language models. We'll focus on techniques to set up constraints for model outputs and implement rule-based generation using OpenAI's GPT models and the LangChain library.

### **Motivation**  
While large language models are powerful tools for generating text, they sometimes produce outputs that are too open-ended or lack specific desired characteristics. Constrained and guided generation techniques allow us to exert more control over the model's outputs, making them more suitable for specific tasks or adhering to certain rules and formats.

## Setup

In [None]:
!pip install langchain-openai langchain-core -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/84.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.7/84.7 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/484.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m484.9/484.9 kB[0m [31m16.6 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
import os
import random
from collections import Counter
from google.colab import userdata

from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate

In [None]:
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

In [None]:
llm = ChatOpenAI(model_name="gpt-5-nano")

In [None]:
# Function to display model outputs
def display_output(output):
    """Display the model's output in a formatted manner."""
    print("Model Output:")
    print("-" * 40)
    print(output)
    print("-" * 40)
    print()

## Setting Up Constraints for Model Outputs

In [None]:
constrained_prompt = PromptTemplate(
    input_variables=["product", "target_audience", "tone", "word_limit"],
    template="""Create a product description for {product} targeted at {target_audience}.
    Use a {tone} tone and keep it under {word_limit} words.
    The description should include:
    1. A catchy headline
    2. Three key features
    3. A call to action

    Product Description:
    """
)

In [None]:
input_variables = {
    "product": "smart water bottle",
    "target_audience": "health-conscious millennials",
    "tone": "casual and friendly",
    "word_limit": "75"
}

In [None]:
chain = constrained_prompt | llm
output = chain.invoke(input_variables).content
display_output(output)

Model Output:
----------------------------------------
Sip Smart. Live Well.

- Real-time hydration reminders and in-app tracking
- Temperature control with a durable, BPA-free design
- UV-C self-clean mode for effortless freshness

Grab yours and upgrade your hydration today.
----------------------------------------



## Implementing Rule-Based Generation

In [None]:
job_posting_prompt = PromptTemplate(
    input_variables=["job_title", "company", "location", "experience"],
    template="""Create a job posting for a {job_title} position at {company} in {location}.
    The candidate should have {experience} years of experience.
    Follow these rules:
    1. Start with a brief company description (2 sentences)
    2. List 5 key responsibilities, each starting with an action verb
    3. List 5 required qualifications, each in a single sentence
    4. End with a standardized equal opportunity statement

    Format the output as follows:
    COMPANY: [Company Description]

    RESPONSIBILITIES:
    - [Responsibility 1]
    - [Responsibility 2]
    - [Responsibility 3]
    - [Responsibility 4]
    - [Responsibility 5]

    QUALIFICATIONS:
    - [Qualification 1]
    - [Qualification 2]
    - [Qualification 3]
    - [Qualification 4]
    - [Qualification 5]

    EEO: [Equal Opportunity Statement]
    """
)

In [None]:
# Generate the rule-based output
input_variables = {
    "job_title": "Staff AI Engineer",
    "company": "Deloitte Chile",
    "location": "Santiago, Chile",
    "experience": "5+"
}

In [None]:
chain = job_posting_prompt | llm
output = chain.invoke(input_variables).content
display_output(output)

Model Output:
----------------------------------------
COMPANY: Based in Santiago, Deloitte Chile is a leading professional services firm delivering audit, consulting, tax, and advisory services across Chile. We are seeking a Staff AI Engineer to join our team and accelerate AI-driven initiatives for clients.

RESPONSIBILITIES:
- Lead the end-to-end design, development, and deployment of AI/ML solutions for client engagements.
- Architect scalable AI systems and data pipelines on cloud platforms aligned with security and governance requirements.
- Collaborate with product, data, and business teams to translate requirements into robust ML models and deployment plans.
- Mentor and coach junior engineers, promoting best practices in MLops, model governance, and responsible AI.
- Ensure successful delivery by conducting code reviews, implementing automated tests, and upholding Deloitte's engineering standards.

QUALIFICATIONS:
- Have at least 5 years of hands-on experience in AI/ML enginee

## Using JSON Parser for Structured Output

In [None]:
from pydantic import BaseModel, Field
from langchain_core.output_parsers import JsonOutputParser

In [None]:
class JobPosting(BaseModel):
    company_description: str = Field(description="The company's description.")
    role: str = Field(description="The job role.")
    job_description: str = Field(description="The job description.")
    responsibilities: list[str] = Field(description="The company's responsibilities.")
    qualifications: list[str] = Field(description="The company's qualifications.")
    eeo_statement: str = Field(description="The equal opportunity statement.")

In [None]:
json_parser = JsonOutputParser(
    pydantic_object=JobPosting,
    allow_multi_json=True
)

In [None]:
# Create a new prompt template that includes the parser instructions
parsed_job_posting_prompt = PromptTemplate(
    input_variables=["job_title", "company", "location", "experience"],
    template="""Create a job posting in JSON format for a {job_title} position at {company} in {location}.
The role requires {experience} years of experience.

You MUST return a single valid JSON object and NOTHING else.
Do not include explanations, markdown, or extra text.

The JSON must follow these rules:

- "company_description": a brief description of the company in exactly 2 sentences.
- "role": the job title.
- "job_description": a concise paragraph describing the role and its purpose.
- "responsibilities": an array of exactly 5 items.
  - Each item must start with an action verb.
- "qualifications": an array of exactly 5 items.
  - Each item must be a single sentence.
- "eeo_statement": a standardized equal opportunity employment statement.
"""
)

In [None]:
def clean_output(output):
    for key, value in output.items():
        if isinstance(value, str):
            # Remove leading/trailing whitespace and normalize newlines
            output[key] = re.sub(r'\n\s*', '\n', value.strip())
    return output

In [None]:
# Generate the parsed output
chain = parsed_job_posting_prompt | llm
raw_output = chain.invoke(input_variables).content

In [None]:
import re

# Use with_structured_output for robust JSON parsing
structured_llm = llm.with_structured_output(JobPosting)

# Create the chain
chain = parsed_job_posting_prompt | structured_llm

# Invoke the chain, the output will now be a JobPosting Pydantic object
parsed_output_pydantic = chain.invoke(input_variables)

# Convert the Pydantic object to a dictionary before cleaning,
# using .model_dump() for Pydantic V2 compatibility or .dict() for V1.
parsed_output_dict = parsed_output_pydantic.model_dump()

# Clean the output
cleaned_output = clean_output(parsed_output_dict)

In [None]:
# Display the parsed output
print("Parsed Output:")
for key, value in cleaned_output.items():
    print(f"{key.upper()}:")
    print(value)
    print()

Parsed Output:
COMPANY_DESCRIPTION:
Deloitte is a leading professional services firm providing audit, consulting, financial advisory, risk management, and tax services. In Chile, we serve clients across industries with a focus on innovative solutions and technology-enabled transformations.

ROLE:
Staff AI Engineer

JOB_DESCRIPTION:
As a Staff AI Engineer at Deloitte Chile, you will lead the design, development, and deployment of AI and data-driven solutions that address client challenges. You will collaborate with cross-functional teams to translate business needs into scalable AI architectures, ensure compliance with local regulations, and mentor junior engineers.

RESPONSIBILITIES:
['Design and implement AI models and pipelines that deliver measurable business impact.', 'Architect scalable machine learning solutions, including data ingestion, feature engineering, and model deployment.', 'Collaborate with data engineers, product managers, and client teams to translate business require

## Conclusions

### Trade-offs

#### Guided generation
- Usually achieves very strong performance with minimal engineering effort, but does **not guarantee** structural correctness.
- Cheaper and faster than constrained approaches since generation happens once and without decoder-level restrictions.
- Still prone to format violations, hallucinated fields, or partial schema compliance.
- Requires retries, repair logic, or fallback handling in production systems.
- Behavior depends heavily on prompt quality and model alignment.

#### Constrained generation
- Guarantees structural correctness by preventing invalid tokens at decoding time.
- More reliable for automation and system-to-system communication.
- Can reduce model expressiveness and sometimes degrade semantic quality.
- More complex to implement and often backend- or model-dependent.
- May increase latency due to constrained decoding logic.

---

### When to apply it

#### Guided generation
- When schemas are flexible or evolving.
- When human review is involved downstream.
- For reasoning-heavy, analytical, or explanatory outputs.
- When occasional retries or fixes are acceptable.
- During prototyping, experimentation, or rapid iteration.

#### Constrained generation
- When outputs feed directly into automated systems.
- For tool invocation, workflow orchestration, or database writes.
- When strict contracts are required (e.g., enums, IDs, commands).
- At scale, where retries are costly or unacceptable.
- When failure modes must be eliminated, not handled.

---

### When NOT to apply it

#### Guided generation
- When invalid structure can cause system failures.
- For safety-critical or compliance-sensitive outputs.
- When strict determinism is required.

#### Constrained generation
- For creative or open-ended tasks with no single ground truth.
- When the schema is large, deeply nested, or frequently changing.
- For tasks where guided generation already performs well.
- When the added engineering complexity outweighs reliability gains.

---

### Clarifying terms

- **Guided generation** means the model is *encouraged* via prompts or examples to follow a structure, but is still free to generate any token.
- **Constrained generation** means the model is *restricted at decoding time* so invalid tokens cannot be generated at all.
- **Parsers and validators do not constrain generation**; they only detect or repair errors after generation.
- **Structured output APIs are not always truly constrained**; many rely on strong guidance plus retries under the hood.
- The key distinction is **where control happens**:
  - Prompt level → guided
  - Post-generation validation → validation
  - Decoder token selection → constrained
