# Lesson 3: The Verifier (System 2 Thinking)

## üïµÔ∏è Fixing the "Confidently Wrong" Model
In Lesson 2, `mistral:7b` confidently said 91 is prime. It failed because it was answering based on **intuition** (System 1).

To fix this, we need **System 2** (Deliberative Reasoning). We need to force the model to "Show Its Work".

**The Strategy: "Best-of-N" with Verification**
1.  **Generator**: We ask the model not just for an answer, but for a **step-by-step proof**.
2.  **Verifier**: We use a second LLM call (or the same LLM) to act as a "Judge" or "Teacher". It looks at the proof and checks for errors.
3.  **Selector**: Instead of picking the most *common* answer (Voting), we pick the *highest scored* answer.

In [None]:
import time
from rich.console import Console
try:
    import ollama
except ImportError:
    print("pip install ollama")

console = Console()
MODEL_NAME = "mistral:7b" 

### üìù Step 1: The Generator (Thinker)
We change our prompt. We don't say "Answer Yes/No".
We say: **"Check for factors. Show your work."**

This triggers the model to output a **Chain of Thought (CoT)**. Often, the mere act of writing validation logic allows the model to catch its own error.

In [None]:
def generate_thought(prompt: str, temp: float = 0.7) -> str:
    """Generates a step-by-step reasoning trace."""
    response = ollama.chat(
        model=MODEL_NAME,
        messages=[
            {"role": "system", "content": "You are a math expert. Think step-by-step to check for factors. show your work."},
            {"role": "user", "content": prompt}
        ],
        options={"temperature": temp}
    )
    return response['message']['content'].strip()

### ‚öñÔ∏è Step 2: The Verifier (LLM-as-a-Judge)
This is the crucial new component. 

We take the `solution` generated above, feed it back into the model, and ask: **"Is this logic correct?"**

This works because **Evaluation is easier than Generation**. It is easier to check a math proof than to write one from scratch.

In [None]:
def verify_solution(problem: str, solution: str) -> float:
    """
    Asks the model to critique the reasoning. Returns a score 0.0 to 1.0.
    """
    verifier_prompt = f"""
    Problem: {problem}
    Proposed Solution: {solution}
    
    Task: Check the math calculations in the solution. 
    If you find ANY calculation error (e.g. 7*13 != 91), score it 0.
    If the reasoning is sound and concludes correctly, score it 1.
    Reply with ONLY the score (0 or 1).
    """
    
    response = ollama.chat(
        model=MODEL_NAME, # Self-Correction
        messages=[{"role": "user", "content": verifier_prompt}],
        options={"temperature": 0.0} # Deterministic for judging
    )
    
    content = response['message']['content'].strip()
    return 1.0 if "1" in content else 0.0

### üîé Step 3: Best-of-N Search Loop
This is the algorithm used by **OpenAI o1** during inference time.

1.  **Generate $N$** distinct thought processes.
2.  **Score** each one.
3.  **Filter**: Throw away the bad logic.

Even if the model is wrong 80% of the time, we only need it to be right **once** to succeed.

In [None]:
PROBLEM = "Is 91 a prime number?"
N_SAMPLES = 5

console.print(f"\n[bold yellow]Running Best-of-N Search (N={N_SAMPLES})...[/bold yellow]")
console.print(f"Problem: {PROBLEM}")

best_score = -1.0
best_solution = ""

for i in range(N_SAMPLES):
    # 1. Generate
    solution = generate_thought(PROBLEM, temp=0.8)
    
    # 2. Verify
    score = verify_solution(PROBLEM, solution)
    
    # Logging
    color = "green" if score > 0.5 else "red"
    console.print(f"\n[bold]Sample {i+1} (Score: {score}):[/bold]")
    console.print(f"[{color}]{solution[:150].replace('\n', ' ')}...[/{color}]")
    
    # 3. Selection
    if score > best_score:
        best_score = score
        best_solution = solution
        if score == 1.0:
            console.print("[bold green]Found perfect solution! Stopping early.[/bold green]")
            break

### üèÜ Step 4: Final Result
Did we find the truth?
Usually, in these 5 samples, `mistral:7b` will produce at least one chain where it calculates `7 * 10 = 70`, `7 * 3 = 21`, `70 + 21 = 91`, and realizes 91 is divisible by 7.

In [None]:
console.print(f"\n[bold cyan]üèÜ Best Verification Score:[/bold cyan] {best_score}")

if best_score == 1.0:
    console.print("[bold green]TTRL SUCCESS:[/bold green] Found a valid reasoning path!")
    console.print(f"Best Thought:\n{best_solution}")
else:
    console.print("[bold red]FAILURE:[/bold red] Could not find a verified solution.")