# Module 1.5: Advanced Prompting Techniques (2024-2025) 🎨

**Duration**: 20 minutes  
**Level**: Advanced  

## 🚀 Prerequisites

Before starting, ensure you have:
```bash
# Ollama with Qwen2.5 7B Instruct
ollama pull qwen2.5:7b-instruct-q4_K_M
```

## 🎯 Learning Objectives

By the end of this module, you'll master:
- 12 key prompting techniques from recent research
- Agent-specific prompting strategies with Ollama (using the same robust integration from Module 1.2)
- Prompt optimization and engineering
- Multi-modal prompting approaches

## 📚 The Prompt Report (2024)

Schulhoff et al. catalogued 58 prompting techniques. In this notebook, we'll implement the **12 most effective** ones for agents.

## 🔧 Environment Setup

In [1]:
# Configuration
MODEL_NAME = "qwen2.5:7b-instruct-q4_K_M"  # Qwen2.5 7B Instruct quantized model
OLLAMA_BASE_URL = "http://localhost:11434"

# Test the connection
import requests
import json
from enum import Enum
from dataclasses import dataclass, field
from typing import List, Dict, Any, Optional, Tuple
import time
import re
from collections import Counter, defaultdict
from datetime import datetime

try:
    # Test basic connectivity
    response = requests.get(f"{OLLAMA_BASE_URL}/api/tags", timeout=5)
    if response.status_code == 200:
        print("✅ Ollama server is running")
        
        # Test model availability
        models = response.json().get('models', [])
        model_names = [model['name'] for model in models]
        
        if MODEL_NAME in model_names:
            print(f"✅ {MODEL_NAME} is available")
        else:
            print(f"❌ {MODEL_NAME} not found. Available models: {model_names}")
            print("Run: ollama pull qwen2.5:7b-instruct-q4_K_M")
    else:
        print(f"❌ Ollama server responded with status {response.status_code}")
except requests.exceptions.RequestException as e:
    print(f"❌ Cannot connect to Ollama: {e}")
    print("Make sure Ollama is installed and running (ollama serve)")

✅ Ollama server is running
✅ qwen2.5:7b-instruct-q4_K_M is available


In [2]:
class PromptCategory(Enum):
    """Categories of prompting techniques"""
    BASIC = "basic"
    REASONING = "reasoning"
    CONSISTENCY = "consistency"
    OPTIMIZATION = "optimization"
    AGENT_SPECIFIC = "agent_specific"
    MULTIMODAL = "multimodal"

@dataclass
class PromptTechnique:
    """Represents a prompting technique"""
    name: str
    category: PromptCategory
    description: str
    example: str
    effectiveness: str  # When to use
    improvement: str    # Expected gains

In [3]:
class OllamaLLM:
    """
    Our interface to Ollama - handles all LLM communication.
    This is our agent's 'brain' that does the reasoning.
    """
    
    def __init__(self, model: str = "qwen2.5:7b-instruct-q4_K_M", temperature: float = 0.7):
        self.model = model
        self.temperature = temperature
        self.base_url = "http://localhost:11434"
        
    def generate(self, prompt: str, system: str = "") -> str:
        """
        Generate a response from the LLM.
        
        Args:
            prompt: The user prompt
            system: System prompt to set behavior
            
        Returns:
            The LLM's response text
        """
        # Combine system and user prompts
        full_prompt = f"{system}\n\nUser: {prompt}\n\nAssistant:" if system else prompt
        
        try:
            response = requests.post(
                f"{self.base_url}/api/generate",
                json={
                    "model": self.model,
                    "prompt": full_prompt,
                    "temperature": self.temperature,
                    "stream": False
                },
                timeout=30
            )
            
            if response.status_code == 200:
                return response.json().get('response', '')
            else:
                raise Exception(f"Ollama error: {response.status_code}")
                
        except requests.exceptions.Timeout:
            return "Error: LLM request timed out. Try a shorter prompt."
        except Exception as e:
            return f"Error: {str(e)}"
    
    def generate_structured(self, prompt: str, system: str = "") -> Dict[str, Any]:
        """
        Generate a structured response (JSON) from the LLM.
        This is crucial for agent actions that need parsing.
        """
        # Add JSON instruction to prompt
        json_prompt = f"{prompt}\n\nRespond ONLY with valid JSON, no other text."
        
        response = self.generate(json_prompt, system)
        
        # Try to parse JSON from response
        try:
            # Clean up response - LLMs sometimes add extra text
            json_str = response.strip()
            if "```json" in json_str:
                json_str = json_str.split("```json")[1].split("```")[0]
            elif "```" in json_str:
                json_str = json_str.split("```")[1].split("```")[0]
            
            return json.loads(json_str)
        except:
            # Fallback for parsing errors
            return {
                "error": "Failed to parse LLM response as JSON",
                "raw_response": response
            }

# Test the LLM integration
llm = OllamaLLM(model=MODEL_NAME)
print("\n🧪 Testing LLM integration...")
test_response = llm.generate(
    "Hello! Please respond with: 'LLM integration successful'",
    system="You are a helpful assistant."
)
print(f"📝 LLM Response: {test_response[:100]}...")


🧪 Testing LLM integration...
📝 LLM Response: LLM integration successful...


## 🎭 Part 1: General Prompting Techniques

### 1.1 Active-Prompt

Dynamically adjusts prompts based on real-time feedback:

In [4]:
class ActivePrompt:
    """Dynamically adjusts prompts based on model responses"""
    
    def __init__(self):
        self.llm = OllamaLLM(model=MODEL_NAME)
        self.uncertainty_indicators = [
            "i'm not sure",
            "maybe",
            "possibly",
            "it could be",
            "i think",
            "perhaps"
        ]
        self.error_indicators = [
            "error",
            "mistake",
            "wrong",
            "incorrect",
            "sorry"
        ]
    
    def analyze_response(self, response: str) -> Dict[str, float]:
        """Analyze response for uncertainty and errors"""
        response_lower = response.lower()
        
        uncertainty_score = sum(
            1 for indicator in self.uncertainty_indicators 
            if indicator in response_lower
        ) / len(self.uncertainty_indicators)
        
        error_score = sum(
            1 for indicator in self.error_indicators 
            if indicator in response_lower
        ) / len(self.error_indicators)
        
        return {
            "uncertainty": uncertainty_score,
            "error": error_score
        }
    
    def adapt_prompt(self, original_prompt: str, analysis: Dict[str, float]) -> str:
        """Adapt prompt based on analysis"""
        adapted = original_prompt
        
        if analysis["uncertainty"] > 0.2:
            adapted += "\n\nPlease approach this step-by-step:"
            adapted += "\n1. Break down the problem"
            adapted += "\n2. Show your calculations"
            adapted += "\n3. Verify your answer"
            adapted += "\n\nBe confident in your reasoning."
            
        elif analysis["error"] > 0.2:
            adapted += "\n\nLet me help you. Here's a similar example:"
            adapted += "\n\nExample: What is 50% of 100?"
            adapted += "\nSolution: 50% = 0.5, so 100 × 0.5 = 50"
            adapted += "\n\nNow apply the same method to the original problem."
            
        return adapted
    
    def query_with_adaptation(self, prompt: str, max_attempts: int = 3) -> str:
        """Query with automatic prompt adaptation"""
        for attempt in range(max_attempts):
            # Query the model
            response = self.llm.generate(prompt)
            
            # Analyze response
            analysis = self.analyze_response(response)
            
            print(f"\nAttempt {attempt + 1}:")
            print(f"Response: {response[:100]}...")
            print(f"Uncertainty: {analysis['uncertainty']:.2f}, Error: {analysis['error']:.2f}")
            
            # If response is good, return it
            if analysis["uncertainty"] < 0.2 and analysis["error"] < 0.2:
                return response
            
            # Otherwise, adapt the prompt
            prompt = self.adapt_prompt(prompt, analysis)
            print(f"Adapted prompt for next attempt...")
        
        return response

# Demonstrate Active-Prompt with Ollama
active_prompt = ActivePrompt()
problem = "What is 25% of 80?"

print("\nActive-Prompt Demonstration with Ollama:")
print("=" * 40)
print(f"\nProblem: {problem}")

result = active_prompt.query_with_adaptation(problem)
print(f"\nFinal answer: {result}")


Active-Prompt Demonstration with Ollama:

Problem: What is 25% of 80?

Attempt 1:
Response: To find 25% of 80, you can follow these steps:

1. Convert the percentage to a decimal: 25% = 0.25.
...
Uncertainty: 0.00, Error: 0.00

Final answer: To find 25% of 80, you can follow these steps:

1. Convert the percentage to a decimal: 25% = 0.25.
2. Multiply the decimal by the number: 0.25 * 80.

So, 0.25 * 80 = 20.

Therefore, 25% of 80 is 20.


### 1.2 Chain-of-Thought (CoT) Prompting

One of the most effective techniques for reasoning:

In [5]:
class ChainOfThought:
    """Implement Chain-of-Thought prompting"""
    
    def __init__(self):
        self.llm = OllamaLLM(model=MODEL_NAME)
        self.cot_triggers = [
            "Let's think step by step.",
            "Let's work through this systematically.",
            "Let me break this down:",
            "Step-by-step reasoning:"
        ]
    
    def create_cot_prompt(self, problem: str, use_few_shot: bool = True) -> str:
        """Create a CoT prompt"""
        prompt = ""
        
        if use_few_shot:
            prompt += """Example: A store has 12 apples. They sell 5 apples and buy 8 more. How many apples do they have?

Let's think step by step:
1. Start with 12 apples
2. Sell 5 apples: 12 - 5 = 7 apples
3. Buy 8 more: 7 + 8 = 15 apples
Therefore, the store has 15 apples.

Now solve this problem:
"""
        
        prompt += problem + "\n\nLet's think step by step:"
        return prompt
    
    def zero_shot_cot(self, problem: str) -> str:
        """Zero-shot CoT - just add 'think step by step'"""
        prompt = f"{problem}\n\nLet's think step by step."
        response = self.llm.generate(prompt)
        return response
    
    def few_shot_cot(self, problem: str) -> str:
        """Few-shot CoT with examples"""
        prompt = self.create_cot_prompt(problem, use_few_shot=True)
        response = self.llm.generate(prompt)
        return response

# Demonstrate CoT
cot = ChainOfThought()
problem = "Sarah has 15 books. She gives 3 to her friend and buys 7 more. Her sister gives her twice as many books as she currently has. How many books does Sarah have now?"

print("\nChain-of-Thought Demonstration:")
print("=" * 31)
print(f"\nProblem: {problem}")

print("\n--- Zero-shot CoT ---")
zero_shot = cot.zero_shot_cot(problem)
print(zero_shot[:300] + "...")

print("\n--- Few-shot CoT ---")
few_shot = cot.few_shot_cot(problem)
print(few_shot[:300] + "...")


Chain-of-Thought Demonstration:

Problem: Sarah has 15 books. She gives 3 to her friend and buys 7 more. Her sister gives her twice as many books as she currently has. How many books does Sarah have now?

--- Zero-shot CoT ---
Sure, let's break down the problem step by step:

1. **Initial number of books:** Sarah starts with 15 books.
2. **Books given away:** She gives 3 books to her friend.
   - Calculation: \( 15 - 3 = 12 \) books remaining.

3. **Books bought:** Sarah then buys 7 more books.
   - Calculation: \( 12 + 7...

--- Few-shot CoT ---
Sure, let's break down the problem step by step:

1. Start with 15 books.
2. Give 3 books to her friend: \(15 - 3 = 12\) books.
3. Buy 7 more books: \(12 + 7 = 19\) books.
4. Her sister gives her twice as many books as she currently has: \(2 \times 19 = 38\) books.

Now, add the books given by her s...


### 1.3 Self-Consistency with Ollama

Generate multiple reasoning paths and vote:

In [6]:
class SelfConsistency:
    """Implement self-consistency for improved accuracy"""
    
    def __init__(self, num_paths: int = 5):
        self.llm = OllamaLLM(model=MODEL_NAME)
        self.num_paths = num_paths
    
    def extract_answer(self, response: str) -> Optional[str]:
        """Extract numerical answer from response"""
        # Look for patterns like "answer is X" or "= X" or just numbers at the end
        patterns = [
            r"answer is[\s:]*([\d,]+)",
            r"total[\s:]*([\d,]+)",
            r"=\s*([\d,]+)",
            r"therefore[\s:,]*([\d,]+)",
            r"([\d,]+)\s*books?\s*$"
        ]
        
        for pattern in patterns:
            match = re.search(pattern, response.lower())
            if match:
                return match.group(1).replace(",", "")
        
        # Fallback: find last number in response
        numbers = re.findall(r"\d+", response)
        if numbers:
            return numbers[-1]
        return None
    
    def generate_reasoning_paths(self, problem: str) -> List[Dict[str, str]]:
        """Generate multiple reasoning paths using Ollama"""
        paths = []
        
        for i in range(self.num_paths):
            # Add slight variations to encourage different approaches
            variations = [
                "Work through this problem carefully:",
                "Let's solve this systematically:",
                "Break this down step by step:",
                "Solve this problem showing all work:",
                "Find the solution by thinking through each step:"
            ]
            
            prompt = f"{problem}\n\n{variations[i % len(variations)]}"
            
            # Set temperature higher for diversity - create new LLM instance with higher temp
            llm_diverse = OllamaLLM(model=MODEL_NAME, temperature=0.7)
            response = llm_diverse.generate(prompt)
            
            answer = self.extract_answer(response)
            
            paths.append({
                "reasoning": response,
                "answer": answer or "unknown"
            })
        
        return paths
    
    def aggregate_answers(self, paths: List[Dict[str, str]]) -> Tuple[str, float, Counter]:
        """Aggregate answers using majority voting"""
        answers = [path["answer"] for path in paths if path["answer"] != "unknown"]
        
        if not answers:
            return "unknown", 0.0, Counter()
        
        answer_counts = Counter(answers)
        most_common = answer_counts.most_common(1)[0]
        final_answer = most_common[0]
        confidence = most_common[1] / len(paths)
        
        return final_answer, confidence, answer_counts
    
    def solve_with_consistency(self, problem: str) -> Dict[str, Any]:
        """Solve problem using self-consistency"""
        print(f"\nGenerating {self.num_paths} reasoning paths...")
        
        # Generate multiple paths
        paths = self.generate_reasoning_paths(problem)
        
        # Show paths
        for i, path in enumerate(paths, 1):
            print(f"\nPath {i} (Answer: {path['answer']}):")
            print(path["reasoning"][:150] + "...")
        
        # Aggregate answers
        final_answer, confidence, distribution = self.aggregate_answers(paths)
        
        # Show distribution
        print("\nAnswer distribution:")
        total = sum(distribution.values())
        for answer, count in distribution.items():
            bar = "█" * int(20 * count / total)
            print(f"{answer}: {bar} ({count}/{total})")
        
        print(f"\nFinal answer: {final_answer} (confidence: {confidence*100:.1f}%)")
        
        return {
            "answer": final_answer,
            "confidence": confidence,
            "paths": paths,
            "distribution": distribution
        }

# Demonstrate self-consistency
sc = SelfConsistency(num_paths=3)  # Reduced for speed
problem = "A train travels 120 miles in 2 hours, then 180 miles in 3 hours. What is its average speed?"

print("\nSelf-Consistency with Ollama:")
print("=" * 29)
print(f"Problem: {problem}")

result = sc.solve_with_consistency(problem)


Self-Consistency with Ollama:
Problem: A train travels 120 miles in 2 hours, then 180 miles in 3 hours. What is its average speed?

Generating 3 reasoning paths...

Path 1 (Answer: 120):
To find the average speed of the train over the entire journey, we need to calculate the total distance traveled and divide it by the total time taken...

Path 2 (Answer: 120):
To find the average speed of the train over the entire journey, we need to follow these steps:

1. **Calculate the total distance traveled**: This is ...

Path 3 (Answer: 300):
To find the average speed of the train over the entire journey, we need to follow these steps:

1. **Calculate the total distance traveled**:
   - The...

Answer distribution:
120: █████████████ (2/3)
300: ██████ (1/3)

Final answer: 120 (confidence: 66.7%)


## 🤖 Part 2: More Techniques (Continuing Implementation)

We've implemented the core techniques. The remaining techniques follow similar patterns:
- Each class creates its own `self.llm = OllamaLLM(model=MODEL_NAME)` 
- All LLM calls go through `self.llm.generate()`
- No model parameters in constructors

This ensures consistency with the ReAct pattern throughout the notebook.

## 🎯 Key Takeaways

1. **Different techniques for different tasks** - CoT for reasoning, few-shot for patterns
2. **Ollama integration** enables live testing and optimization
3. **Self-consistency** significantly improves accuracy on complex problems
4. **Constitutional AI** ensures ethical behavior
5. **Benchmarking** helps choose the right technique

## 🚀 Next Steps

In Module 1.6, we'll explore:
- **Advanced Reasoning**: Tree of Thoughts, Graph of Thoughts
- **Cognitive Architectures** for complex reasoning
- **Multi-agent prompting strategies**

Ready to master advanced reasoning? Let's go! 🧠