# Deep Reasoning with Large Language Models

This notebook demonstrates **deep reasoning** capabilities using Large Language Models (LLMs). Deep reasoning involves breaking down complex problems into multiple steps, using intermediate reasoning to arrive at solutions, and employing various techniques to enhance the model's analytical and problem-solving abilities.

## Table of Contents
1. [Introduction to Deep Reasoning](#introduction)
2. [Environment Setup](#setup)
3. [Chain-of-Thought Reasoning](#cot)
4. [Tree of Thoughts](#tot)
5. [Self-Consistency](#self-consistency)
6. [Multi-Step Problem Solving](#multi-step)
7. [Reasoning with Verification](#verification)
8. [Summary and Best Practices](#summary)

## 1. Introduction to Deep Reasoning <a id="introduction"></a>

### What is Deep Reasoning?

**Deep reasoning** refers to the ability of LLMs to:
- Break down complex problems into manageable steps
- Explore multiple solution paths
- Verify and validate intermediate results
- Self-reflect on reasoning quality
- Adapt strategies when initial approaches fail

### Key Techniques

1. **Chain-of-Thought (CoT)**: Step-by-step reasoning
2. **Tree of Thoughts (ToT)**: Exploring multiple reasoning paths
3. **Self-Consistency**: Generating multiple solutions and selecting the most common
4. **Least-to-Most Prompting**: Breaking problems into sub-problems
5. **Self-Verification**: Having the model check its own work

### Benefits of Deep Reasoning

- **Improved Accuracy**: Especially for complex problems
- **Transparency**: Clear reasoning traces for debugging
- **Error Detection**: Self-verification catches mistakes
- **Better Generalization**: Works across diverse problem types
- **Trust**: Users can follow the model's reasoning

## 2. Environment Setup <a id="setup"></a>

Let's set up our connection to Azure AI Foundry.

In [None]:
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential
from dotenv import load_dotenv
import os
import json
from typing import List, Dict, Any
from collections import Counter

# Load environment variables
load_dotenv()

# Initialize the AI Project client
project_client = AIProjectClient(
    endpoint=os.environ["PROJECT_ENDPOINT"],
    credential=DefaultAzureCredential()
)

# Get the OpenAI client
chat = project_client.get_openai_client()
model = os.environ["MODEL"]

print(f"Connected to Azure AI Foundry")
print(f"Using model: {model}")

In [None]:
# Helper functions

def get_completion(messages: List[Dict[str, str]], temperature: float = 0.7, max_tokens: int = 500) -> str:
    """Get completion from the model"""
    response = chat.chat.completions.create(
        model=model,
        messages=messages,
        temperature=temperature,
        max_tokens=max_tokens
    )
    return response.choices[0].message.content

def print_section(title: str):
    """Print formatted section header"""
    print("\n" + "=" * 80)
    print(title)
    print("=" * 80)

print("Helper functions loaded successfully!")

## 3. Chain-of-Thought Reasoning <a id="cot"></a>

Chain-of-Thought prompting encourages the model to show its reasoning steps.

### 3.1 Basic Chain-of-Thought

In [None]:
# Without Chain-of-Thought
problem = """A farmer has 17 sheep. All but 9 die. How many sheep are left?"""

print_section("Without Chain-of-Thought")
direct_answer = get_completion(
    [{"role": "user", "content": f"{problem}\n\nProvide only the final answer."}],
    temperature=0.3
)
print(direct_answer)

# With Chain-of-Thought
print_section("With Chain-of-Thought")
cot_answer = get_completion(
    [{"role": "user", "content": f"{problem}\n\nLet's solve this step by step:"}],
    temperature=0.3,
    max_tokens=300
)
print(cot_answer)

### 3.2 Few-Shot Chain-of-Thought

Providing examples of step-by-step reasoning improves consistency.

In [None]:
# Few-shot CoT with examples
few_shot_cot = [
    {"role": "system", "content": "You are a helpful assistant that solves problems step by step."},
    {"role": "user", "content": "A store has 48 apples. They sell 3/4 of them. How many apples are left?"},
    {"role": "assistant", "content": """Let me solve this step by step:

Step 1: Calculate how many apples were sold
- They sold 3/4 of 48 apples
- 3/4 × 48 = (3 × 48) / 4 = 144 / 4 = 36 apples sold

Step 2: Calculate remaining apples
- Started with: 48 apples
- Sold: 36 apples
- Remaining: 48 - 36 = 12 apples

Answer: 12 apples are left."""},
    {"role": "user", "content": "If a car travels at 60 mph for 2.5 hours, then at 45 mph for 1.5 hours, what is the total distance traveled?"}
]

print_section("Few-Shot Chain-of-Thought Reasoning")
response = get_completion(few_shot_cot, temperature=0.3, max_tokens=400)
print(response)

### 3.3 Complex Problem Solving with CoT

Let's tackle a more complex problem.

In [None]:
complex_problem = """A company has three departments: Engineering (60 people), Sales (40 people), and Marketing (30 people).
- Engineering is getting a 15% increase in headcount
- Sales is getting a 25% increase
- Marketing is decreasing by 10%
- New hires cost $5,000 each to recruit
- Departing employees cost $2,000 each in offboarding

What is the total cost for all these changes, and what is the final total headcount?"""

print_section("Complex Problem with Detailed CoT")
complex_cot_prompt = f"{complex_problem}\n\nLet's solve this systematically, showing all calculations:"

response = get_completion(
    [{"role": "user", "content": complex_cot_prompt}],
    temperature=0.2,
    max_tokens=600
)
print(response)

## 4. Tree of Thoughts <a id="tot"></a>

Tree of Thoughts explores multiple reasoning paths before selecting the best one.

### 4.1 Generating Multiple Reasoning Paths

In [None]:
def tree_of_thoughts(problem: str, num_paths: int = 3) -> Dict[str, Any]:
    """
    Generate multiple reasoning paths and evaluate them
    """
    paths = []
    
    # Generate multiple reasoning paths
    print("Generating reasoning paths...\n")
    for i in range(num_paths):
        prompt = f"""Problem: {problem}

Provide a step-by-step solution. Think carefully and show your reasoning."""
        
        response = get_completion(
            [{"role": "user", "content": prompt}],
            temperature=0.7,  # Higher temperature for diversity
            max_tokens=400
        )
        
        paths.append({"path_id": i+1, "reasoning": response})
        print(f"Path {i+1}:")
        print("-" * 80)
        print(response[:200] + "..." if len(response) > 200 else response)
        print()
    
    # Evaluate paths
    print("\nEvaluating paths...\n")
    evaluations = []
    
    for path in paths:
        eval_prompt = f"""Evaluate this solution on a scale of 1-10 based on:
- Logical correctness
- Completeness
- Clarity of reasoning

Solution:
{path['reasoning']}

Provide ONLY a number from 1-10."""
        
        score_response = get_completion(
            [{"role": "user", "content": eval_prompt}],
            temperature=0.2,
            max_tokens=10
        )
        
        try:
            score = float(score_response.strip())
        except:
            score = 5.0
        
        evaluations.append({"path_id": path['path_id'], "score": score})
        print(f"Path {path['path_id']} score: {score}/10")
    
    # Select best path
    best_path = max(evaluations, key=lambda x: x['score'])
    best_reasoning = next(p['reasoning'] for p in paths if p['path_id'] == best_path['path_id'])
    
    return {
        "paths": paths,
        "evaluations": evaluations,
        "best_path_id": best_path['path_id'],
        "best_score": best_path['score'],
        "best_reasoning": best_reasoning
    }

# Test Tree of Thoughts
tot_problem = """You have 3 boxes: one contains only apples, one contains only oranges, and one contains both.
All boxes are labeled incorrectly. You can pick one fruit from one box to determine the correct labels for all boxes.
Which box should you pick from and why?"""

print_section("Tree of Thoughts Reasoning")
result = tree_of_thoughts(tot_problem, num_paths=3)

print_section("Best Solution Selected")
print(f"Path {result['best_path_id']} (Score: {result['best_score']}/10)")
print("-" * 80)
print(result['best_reasoning'])

## 5. Self-Consistency <a id="self-consistency"></a>

Self-consistency generates multiple reasoning paths and selects the most common answer.

### 5.1 Multiple Reasoning Paths with Majority Voting

In [None]:
def self_consistency(problem: str, num_samples: int = 5) -> Dict[str, Any]:
    """
    Generate multiple solutions and use majority voting
    """
    solutions = []
    final_answers = []
    
    print(f"Generating {num_samples} independent solutions...\n")
    
    for i in range(num_samples):
        prompt = f"{problem}\n\nSolve this step by step and provide your final answer clearly."
        
        response = get_completion(
            [{"role": "user", "content": prompt}],
            temperature=0.7,
            max_tokens=400
        )
        
        solutions.append(response)
        
        # Extract final answer (simplified - look for numbers or key phrases)
        # In production, use more sophisticated extraction
        lines = response.split('\n')
        for line in reversed(lines):
            if line.strip():
                final_answers.append(line.strip())
                break
        
        print(f"Solution {i+1}: {response[:150]}...")
        print()
    
    # Find most common answer
    answer_counts = Counter(final_answers)
    most_common_answer, count = answer_counts.most_common(1)[0]
    
    return {
        "solutions": solutions,
        "final_answers": final_answers,
        "most_common_answer": most_common_answer,
        "confidence": count / num_samples,
        "agreement_count": count,
        "total_samples": num_samples
    }

# Test self-consistency
sc_problem = """A bat and a ball together cost $1.10.
The bat costs $1.00 more than the ball.
How much does the ball cost?"""

print_section("Self-Consistency Approach")
result = self_consistency(sc_problem, num_samples=5)

print_section("Self-Consistency Results")
print(f"Most Common Answer: {result['most_common_answer']}")
print(f"Confidence: {result['confidence']:.0%} ({result['agreement_count']}/{result['total_samples']} solutions agree)")
print("\nAll final answers:")
for i, answer in enumerate(result['final_answers'], 1):
    print(f"{i}. {answer}")

## 6. Multi-Step Problem Solving <a id="multi-step"></a>

### 6.1 Least-to-Most Prompting

Break complex problems into simpler sub-problems.

In [None]:
def least_to_most_prompting(problem: str) -> Dict[str, Any]:
    """
    Break problem into sub-problems and solve incrementally
    """
    # Step 1: Decompose the problem
    print("Step 1: Problem Decomposition")
    print("-" * 80)
    
    decompose_prompt = f"""Break down this complex problem into 3-4 simpler sub-problems that need to be solved.

Problem: {problem}

List the sub-problems in order:"""
    
    decomposition = get_completion(
        [{"role": "user", "content": decompose_prompt}],
        temperature=0.3,
        max_tokens=300
    )
    print(decomposition)
    print()
    
    # Parse sub-problems
    sub_problems = []
    for line in decomposition.split('\n'):
        line = line.strip()
        if line and (line[0].isdigit() or line.startswith('-')):
            clean = line.lstrip('0123456789.-) ').strip()
            if clean:
                sub_problems.append(clean)
    
    # Step 2: Solve sub-problems incrementally
    print("\nStep 2: Solving Sub-Problems")
    print("=" * 80)
    
    solutions = []
    context = ""
    
    for i, sub_problem in enumerate(sub_problems[:4], 1):
        print(f"\nSub-problem {i}: {sub_problem}")
        print("-" * 80)
        
        solve_prompt = f"""Previous context:
{context}

Now solve this sub-problem:
{sub_problem}

Provide a concise solution:"""
        
        solution = get_completion(
            [{"role": "user", "content": solve_prompt}],
            temperature=0.3,
            max_tokens=300
        )
        
        solutions.append({"sub_problem": sub_problem, "solution": solution})
        context += f"\n\nSub-problem {i}: {sub_problem}\nSolution: {solution}"
        
        print(solution)
    
    # Step 3: Synthesize final answer
    print("\n\nStep 3: Synthesizing Final Answer")
    print("=" * 80)
    
    synthesize_prompt = f"""Based on the solutions to these sub-problems, provide a comprehensive final answer to the original problem.

Original Problem: {problem}

Sub-problem Solutions:
{context}

Final comprehensive answer:"""
    
    final_answer = get_completion(
        [{"role": "user", "content": synthesize_prompt}],
        temperature=0.3,
        max_tokens=400
    )
    
    print(final_answer)
    
    return {
        "decomposition": decomposition,
        "sub_problems": sub_problems,
        "solutions": solutions,
        "final_answer": final_answer
    }

# Test least-to-most prompting
ltm_problem = """Design a cloud-native microservices architecture for an e-commerce platform that needs to:
- Handle 10,000 concurrent users
- Process payments securely
- Manage inventory across multiple warehouses
- Provide real-time order tracking
- Scale automatically during peak shopping periods

What are the key architectural components and how should they interact?"""

print_section("Least-to-Most Prompting")
result = least_to_most_prompting(ltm_problem)

## 7. Reasoning with Verification <a id="verification"></a>

### 7.1 Self-Verification

The model checks its own work for errors.

In [None]:
def reasoning_with_verification(problem: str) -> Dict[str, Any]:
    """
    Solve problem, then verify the solution
    """
    # Step 1: Generate initial solution
    print("Step 1: Initial Solution")
    print("-" * 80)
    
    solve_prompt = f"{problem}\n\nSolve this step by step:"
    
    initial_solution = get_completion(
        [{"role": "user", "content": solve_prompt}],
        temperature=0.3,
        max_tokens=400
    )
    print(initial_solution)
    print()
    
    # Step 2: Verify the solution
    print("\nStep 2: Verification")
    print("-" * 80)
    
    verify_prompt = f"""Review this solution and check for errors:

Problem: {problem}

Solution:
{initial_solution}

Verify:
1. Are all calculations correct?
2. Is the logic sound?
3. Are there any mistakes or oversights?

Provide your verification:"""
    
    verification = get_completion(
        [{"role": "user", "content": verify_prompt}],
        temperature=0.2,
        max_tokens=400
    )
    print(verification)
    print()
    
    # Step 3: Correct if needed
    print("\nStep 3: Correction (if needed)")
    print("-" * 80)
    
    if "error" in verification.lower() or "mistake" in verification.lower() or "incorrect" in verification.lower():
        correct_prompt = f"""The verification found issues with the solution.

Original Solution:
{initial_solution}

Verification Feedback:
{verification}

Provide a corrected solution:"""
        
        corrected_solution = get_completion(
            [{"role": "user", "content": correct_prompt}],
            temperature=0.2,
            max_tokens=400
        )
        print("Issues found. Corrected solution:")
        print(corrected_solution)
        final_solution = corrected_solution
    else:
        print("No issues found. Original solution is correct.")
        final_solution = initial_solution
    
    return {
        "initial_solution": initial_solution,
        "verification": verification,
        "final_solution": final_solution,
        "corrected": "error" in verification.lower() or "mistake" in verification.lower()
    }

# Test reasoning with verification
verify_problem = """Calculate the compound interest on $5,000 invested at 6% annual interest rate,
compounded quarterly, for 3 years. Show your work."""

print_section("Reasoning with Self-Verification")
result = reasoning_with_verification(verify_problem)

print_section("Summary")
print(f"Correction needed: {'Yes' if result['corrected'] else 'No'}")

### 7.2 Multi-Agent Verification

Use different "agents" with different perspectives to verify solutions.

In [None]:
def multi_agent_verification(problem: str) -> Dict[str, Any]:
    """
    Have multiple 'agents' with different roles verify the solution
    """
    # Generate initial solution
    initial_solution = get_completion(
        [{"role": "user", "content": f"{problem}\n\nSolve this step by step:"}],
        temperature=0.3,
        max_tokens=400
    )
    
    print("Initial Solution:")
    print("-" * 80)
    print(initial_solution)
    print()
    
    # Different verification perspectives
    agents = [
        {
            "role": "Mathematical Rigor Checker",
            "prompt": "Check if all mathematical calculations and formulas are correct."
        },
        {
            "role": "Logical Consistency Checker",
            "prompt": "Verify that the logical flow and reasoning steps are sound."
        },
        {
            "role": "Assumptions Validator",
            "prompt": "Identify any assumptions made and verify they are reasonable and stated."
        }
    ]
    
    verifications = []
    
    print("\nMulti-Agent Verification:")
    print("=" * 80)
    
    for agent in agents:
        print(f"\n{agent['role']}:")
        print("-" * 80)
        
        verify_prompt = f"""You are a {agent['role']}.

Problem: {problem}

Solution to verify:
{initial_solution}

Task: {agent['prompt']}
Provide your assessment:"""
        
        verification = get_completion(
            [{"role": "user", "content": verify_prompt}],
            temperature=0.2,
            max_tokens=300
        )
        
        verifications.append({
            "agent": agent['role'],
            "assessment": verification
        })
        
        print(verification)
    
    # Synthesize all verifications
    print("\n\nFinal Synthesis:")
    print("=" * 80)
    
    all_assessments = "\n\n".join([f"{v['agent']}:\n{v['assessment']}" for v in verifications])
    
    synthesis_prompt = f"""Based on these verification assessments, provide a final verdict on the solution:

{all_assessments}

Is the solution correct? If not, what needs to be corrected?"""
    
    final_verdict = get_completion(
        [{"role": "user", "content": synthesis_prompt}],
        temperature=0.2,
        max_tokens=300
    )
    
    print(final_verdict)
    
    return {
        "initial_solution": initial_solution,
        "verifications": verifications,
        "final_verdict": final_verdict
    }

# Test multi-agent verification
ma_problem = """A rectangular garden is 20 meters long and 15 meters wide.
If you want to put a fence around it that costs $12 per meter, and also need
to leave a 3-meter opening for a gate, what is the total cost?"""

print_section("Multi-Agent Verification")
result = multi_agent_verification(ma_problem)

## 8. Summary and Best Practices <a id="summary"></a>

### Key Takeaways

**Deep Reasoning Techniques:**

1. **Chain-of-Thought**: Essential for complex reasoning tasks
   - Use "Let's solve this step by step"
   - Provide examples (few-shot) for consistency
   - Lower temperature (0.2-0.4) for factual accuracy

2. **Tree of Thoughts**: For problems with multiple valid approaches
   - Generate diverse paths with higher temperature
   - Evaluate and select best path
   - More API calls but better quality

3. **Self-Consistency**: Improves accuracy through voting
   - Generate 3-5 independent solutions
   - Use majority voting for final answer
   - Confidence = agreement percentage

4. **Least-to-Most**: For complex, decomposable problems
   - Break into sub-problems
   - Solve incrementally with context
   - Synthesize final answer

5. **Self-Verification**: Catch errors automatically
   - Verify calculations and logic
   - Multi-agent verification for thorough checking
   - Iterate until correct

### Best Practices

**When to Use Each Technique:**

| Technique | Best For | Cost | Accuracy Gain |
|-----------|----------|------|---------------|
| Basic CoT | Most reasoning tasks | Low | Medium |
| Few-Shot CoT | Consistent formatting needed | Low | High |
| Tree of Thoughts | Multiple valid approaches | High | Very High |
| Self-Consistency | High-stakes decisions | High | High |
| Least-to-Most | Complex decomposable problems | Medium | High |
| Self-Verification | Error-sensitive applications | Medium | Very High |

**Implementation Tips:**

1. **Start Simple**: Begin with basic CoT, add complexity as needed
2. **Adjust Temperature**: 
   - Low (0.2-0.3) for factual accuracy
   - Medium (0.5-0.7) for diverse reasoning paths
3. **Set Token Limits**: Allow enough for detailed reasoning (400-600 tokens)
4. **Monitor Costs**: Deep reasoning uses more API calls
5. **Cache Results**: Store intermediate reasoning for efficiency
6. **Measure Impact**: A/B test to verify accuracy improvements

**Combining Techniques:**

For maximum accuracy:
1. Use Least-to-Most to decompose complex problems
2. Apply Self-Consistency to each sub-problem
3. Use Self-Verification on the final answer
4. Keep reasoning traces for transparency and debugging

### Production Considerations

1. **Latency vs. Accuracy**: Deep reasoning increases response time
2. **Cost Management**: Multiple API calls add up quickly
3. **Caching**: Cache reasoning for similar questions
4. **Monitoring**: Track reasoning quality and success rates
5. **Fallbacks**: Have simpler methods if deep reasoning fails
6. **User Experience**: Show reasoning traces to build trust

### Next Steps

- Explore domain-specific reasoning (code, math, science)
- Integrate with tool usage for enhanced capabilities
- Build evaluation harnesses to measure improvements
- Combine with RAG for knowledge-grounded reasoning
- Implement reasoning traces in production systems

### Additional Resources

- [Chain-of-Thought Prompting](https://arxiv.org/abs/2201.11903)
- [Tree of Thoughts](https://arxiv.org/abs/2305.10601)
- [Self-Consistency Improves Chain of Thought](https://arxiv.org/abs/2203.11171)
- [Least-to-Most Prompting](https://arxiv.org/abs/2205.10625)
- [Azure AI Foundry Documentation](https://learn.microsoft.com/azure/ai-studio/)

## Practice Exercises

Try implementing these challenges:

1. **Hybrid Reasoning**: Combine Tree of Thoughts with Self-Consistency
2. **Domain-Specific**: Apply deep reasoning to code debugging or mathematical proofs
3. **Iterative Refinement**: Build a system that refines reasoning based on verification feedback
4. **Explainable AI**: Create reasoning traces that non-technical users can understand
5. **Multi-Modal Reasoning**: Extend to problems involving images or structured data

In [None]:
# Your practice code here
# Try implementing one of the challenges above!