# Lesson 2: When Majority Voting Fails

## üìâ The Limits of Naive Consensus
In Lesson 1, we saw that voting works well for logic puzzles where the model is *conflicted*.

However, what happens if the model is **confidently wrong**?

If a model has a fundamental misconception (e.g., thinking "91 is prime"), it will output the wrong answer 100% of the time, even with high temperature. In this case, $5 \times \text{Wrong} = \text{Consensus Wrong}$.

**In this lesson:**
1.  We try a harder math problem ($91$ is a "semiprime", often mistaken for prime).
2.  We see TTRL fail with naive voting.
3.  This motivates the need for **Verifiers** (Lesson 3).

In [None]:
import time
from collections import Counter
from rich.console import Console
try:
    import ollama
except ImportError:
    print("Please run: pip install ollama")

console = Console()

# --- CONFIGURATION ---
MODEL_NAME = "mistral:7b"
console.print(f"[bold]Connecting to Local Model:[/bold] {MODEL_NAME}")

### üîç Step 1: The Zero-Shot Attempt (Baseline)
First, we ask the model normally. 
We use `temperature=0` (Greedy Decoding) to see its "best guess".

In [None]:
def get_completion(prompt: str, temp: float = 0.7) -> str:
    """Gets a single completion from Ollama."""
    try:
        response = ollama.chat(
            model=MODEL_NAME,
            messages=[
                {"role": "system", "content": "You are a math assistant. Answer with just the result. Is the number Prime? Yes/No."},
                {"role": "user", "content": prompt}
            ],
            options={"temperature": temp}
        )
        return response['message']['content'].strip()
    except Exception as e:
        return f"Error: {e}"

PROBLEM = "Is 91 a prime number? Answer Yes or No."
console.print(f"\n[bold cyan]Problem:[/bold cyan] {PROBLEM}")

# Single Attempt (Greedy decoding)
start_t = time.time()
greedy_ans = get_completion(PROBLEM, temp=0.0)
console.print(f"[bold]Greedy Answer (Temp=0):[/bold] {greedy_ans} (took {time.time()-start_t:.2f}s)")

### üé≤ Step 2: The TTRL Loop
Now we try to fix it by sampling 5 times.
We increase `temperature` to 0.9 to encourage the model to "think outside the box".

In [None]:
N_SAMPLES = 5
console.print(f"\n[bold yellow]Running TTRL Loop (N={N_SAMPLES})...[/bold yellow]")

samples = []
for i in range(N_SAMPLES):
    # High temp for diversity
    ans = get_completion(PROBLEM, temp=0.9)
    
    # Simple normalization to extract Yes/No (Handling chatty models)
    clean_ans = "Yes" if "yes" in ans.lower() else "No" if "no" in ans.lower() else "Unsure"
    
    samples.append(clean_ans)
    console.print(f"  Sample {i+1}: {ans[:50].replace('\n', ' ')}... -> [blue]{clean_ans}[/blue]")

### üó≥Ô∏è Step 3: Analysis of Failure
If the model answers "Yes" (Incorrect) 5 times out of 5, Majority Voting confirms the **Wrong** answer.

This is why models like **OpenAI o1** or **DeepSeek-R1** do NOT just use majority voting. They use **Chain of Thought Verification**.

They don't just ask "Is it prime?" (System 1).
They ask "Check the factors of 91. 91/7=? 91/3=?" (System 2).

In [None]:
counts = Counter(samples)
consensus_ans, votes = counts.most_common(1)[0]

console.print(f"\n[bold]Final Consensus:[/bold] [green]{consensus_ans}[/green] ({votes}/{N_SAMPLES} votes)")

GROUND_TRUTH = "No" # 91 is NOT prime (7 * 13)

if consensus_ans == GROUND_TRUTH:
    console.print("‚úÖ [bold green]CORRECT[/bold green] - TTRL fixed the error!")
else:
    console.print(f"‚ùå [bold red]FAIL[/bold red] - TTRL Failed. The model is confidently wrong.")
    console.print("üëâ This motivates Lesson 3: The Verifier.")