# Notebook 11: Reflection Agent

Purpose:
- Evaluate the LLM’s generated response before showing it to the user
- Detect overconfidence, unsafe advice, or misalignment
- Enforce NEEL’s confidence and safety policies
- Act as a post-reasoning quality gate

The Reflection Agent does NOT generate advice.
It evaluates advice.

LLM outputs are treated as untrusted drafts.

Even after:
- Analytics
- ML signals
- Supervisor approval

The final response must still be reviewed.

The Reflection Agent ensures:
- Tone matches confidence
- Suggestions are non-prescriptive
- Safety boundaries are respected
- Alignment with user goals

The Reflection Agent can return one of three outcomes:

PASS    → Safe to show user
SOFTEN  → Needs more cautious wording
REJECT  → Must not be shown

These outcomes are enforced programmatically.

Inputs:
- LLM generated response
- Supervisor confidence level
- User profile (goal, priority)

The Reflection Agent does NOT see raw data or ML outputs.

In [1]:
llm_output = """
OBSERVATION:
Your work hours are high and productivity is strong.

REASONING:
Sustained long work hours can sometimes affect long-term energy.

SUGGESTION:
You should increase your study hours further to accelerate progress.

CONFIDENCE NOTE:
This suggestion is based on limited data.
"""

In [2]:
def reflection_agent(
    llm_text: str,
    confidence: str,
    user_goal: str,
    user_priority: str
) -> dict:
    issues = []

    text_lower = llm_text.lower()

    # Rule 1: Absolute or commanding language
    absolute_phrases = ["you should", "must", "definitely", "increase your"]
    if any(p in text_lower for p in absolute_phrases):
        issues.append("Uses commanding or prescriptive language")

    # Rule 2: Confidence mismatch
    if confidence == "LOW" and "should" in text_lower:
        issues.append("Overconfident language for LOW confidence")

    # Rule 3: Goal misalignment
    if "increase" in text_lower and "health" in user_priority.lower():
        issues.append("Suggestion may conflict with health priority")

    # Decide outcome
    if not issues:
        return {
            "decision": "PASS",
            "issues": []
        }

    if len(issues) <= 2:
        return {
            "decision": "SOFTEN",
            "issues": issues
        }

    return {
        "decision": "REJECT",
        "issues": issues
    }

In [3]:
reflection_result = reflection_agent(
    llm_text=llm_output,
    confidence="LOW",
    user_goal="Become an ML Engineer",
    user_priority="Learning + Health"
)

reflection_result

{'decision': 'REJECT',
 'issues': ['Uses commanding or prescriptive language',
  'Overconfident language for LOW confidence',
  'Suggestion may conflict with health priority']}

If decision == PASS:
- Response is shown to the user

If decision == SOFTEN:
- LLM is asked to regenerate with stricter tone

If decision == REJECT:
- Response is blocked
- System may ask clarifying questions instead

Reflection is rule-based intentionally.

Reasons:
- Deterministic behavior
- No hallucination risk
- Auditable decisions
- Predictable safety enforcement

LLMs reason.
Rules enforce.

Final NEEL execution flow:

Analytics
   ↓
ML Signals
   ↓
Supervisor
   ↓
LLM Reasoning
   ↓
Reflection Agent
   ↓
User Response

No LLM output reaches the user without reflection.

The Reflection Agent transforms NEEL from:
"LLM with guardrails"
into
"A self-reviewing AI system"

This agent enforces humility, safety, and alignment.

It is one of NEEL’s strongest differentiators.