# Day 21: Prompt Engineering Patterns - Part 2

In this notebook, we'll implement and experiment with Chain-of-Thought (CoT) reasoning and Self-consistency methods. These advanced prompting techniques can significantly improve language model performance on complex reasoning tasks.

## Overview

We'll cover:
1. Setting up the environment
2. Chain-of-Thought (CoT) reasoning implementation
3. Self-consistency methods implementation
4. Comparing the effectiveness of different approaches

## 1. Setting Up the Environment

First, let's install and import the necessary libraries. We'll use OpenAI's API for this demonstration, but the concepts apply to any language model.

In [None]:
# Install required packages
!pip install openai python-dotenv requests matplotlib pandas numpy

In [None]:
import os
import openai
import json
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import time
import re
from collections import Counter
from dotenv import load_dotenv

# Load environment variables (API keys)
load_dotenv()

# Set up OpenAI API
openai.api_key = os.getenv("OPENAI_API_KEY")

# If you don't have an API key, you can use this function to simulate API calls
def simulate_llm_call(prompt, model="gpt-3.5-turbo", temperature=0.7):
    """Simulate a call to a language model API for demonstration purposes."""
    print("\n--- Prompt ---\n")
    print(prompt)
    print("\n--- [Simulated LLM Response] ---\n")
    
    # Simulate different responses based on the prompt
    if "step by step" in prompt.lower():
        if "math problem" in prompt.lower() or "calculate" in prompt.lower():
            return "Let me solve this step by step:\n1. First, I'll identify the key values\n2. Then I'll set up the equation\n3. Next, I'll solve for the unknown\n4. Finally, I'll verify my answer\n\nThe answer is 42."
        else:
            return "I'll think through this step by step and provide a detailed answer."
    else:
        return "This is a simulated response. Please provide your OpenAI API key for actual responses."

# Function to call OpenAI API
def call_llm(prompt, model="gpt-3.5-turbo", temperature=0.7):
    """Call the language model API with the given prompt."""
    try:
        if not openai.api_key or openai.api_key == "your-api-key-here":
            return simulate_llm_call(prompt, model, temperature)
        
        response = openai.ChatCompletion.create(
            model=model,
            messages=[
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": prompt}
            ],
            temperature=temperature
        )
        return response.choices[0].message.content
    except Exception as e:
        print(f"Error calling LLM API: {e}")
        return simulate_llm_call(prompt, model, temperature)

## 2. Chain-of-Thought (CoT) Reasoning

Chain-of-Thought (CoT) prompting encourages the model to break down complex problems into step-by-step reasoning processes, similar to showing your work in mathematics.

In [None]:
def standard_prompt(problem):
    """Create a standard prompt without CoT."""
    return f"Problem: {problem}\n\nAnswer:"

def zero_shot_cot_prompt(problem):
    """Create a zero-shot CoT prompt."""
    return f"Problem: {problem}\n\nLet's think step by step:"

def few_shot_cot_prompt(problem, examples):
    """Create a few-shot CoT prompt with examples."""
    prompt = "Solve the following problems by thinking step by step.\n\n"
    
    # Add examples
    for example in examples:
        prompt += f"Problem: {example['problem']}\n\n{example['solution']}\n\n"
    
    # Add the actual problem
    prompt += f"Problem: {problem}\n\nLet's think step by step:"
    
    return prompt

### 2.1 Math Word Problems

Let's test CoT reasoning on math word problems.

In [None]:
# Example math word problems
math_problems = [
    "A store has 24 apples. They sell 1/3 of them in the morning and 1/4 of the remaining apples in the afternoon. How many apples are left?",
    "If a train travels 60 miles per hour for 2.5 hours, how far does it travel?",
    "A recipe calls for 2 cups of flour to make 12 cookies. How much flour is needed for 18 cookies?",
    "A car uses 1 gallon of gas to travel 25 miles. How many gallons are needed for a 150-mile trip?"
]

# CoT examples for few-shot prompting
cot_examples = [
    {
        "problem": "John has 5 marbles. He buys 2 more bags of marbles, with 7 marbles in each bag. How many marbles does John have now?",
        "solution": "Let's think step by step:\n1. John starts with 5 marbles.\n2. He buys 2 bags of marbles, with 7 marbles in each bag.\n3. The number of marbles in the bags is 2 × 7 = 14 marbles.\n4. Now John has 5 + 14 = 19 marbles.\n\nAnswer: 19 marbles"
    },
    {
        "problem": "Sarah has $30. She spends 2/3 of her money on a book. How much money does she have left?",
        "solution": "Let's think step by step:\n1. Sarah starts with $30.\n2. She spends 2/3 of her money on a book.\n3. The amount she spends is 2/3 × $30 = $20.\n4. The amount she has left is $30 - $20 = $10.\n\nAnswer: $10"
    }
]

# Test standard prompting vs. zero-shot CoT vs. few-shot CoT
for problem in math_problems:
    print(f"Problem: {problem}\n")
    
    # Standard prompting
    standard_prompt_text = standard_prompt(problem)
    standard_response = call_llm(standard_prompt_text)
    print("Standard Response:")
    print(standard_response)
    print("\n" + "-"*50 + "\n")
    
    # Zero-shot CoT
    zero_shot_cot_prompt_text = zero_shot_cot_prompt(problem)
    zero_shot_cot_response = call_llm(zero_shot_cot_prompt_text)
    print("Zero-Shot CoT Response:")
    print(zero_shot_cot_response)
    print("\n" + "-"*50 + "\n")
    
    # Few-shot CoT
    few_shot_cot_prompt_text = few_shot_cot_prompt(problem, cot_examples)
    few_shot_cot_response = call_llm(few_shot_cot_prompt_text)
    print("Few-Shot CoT Response:")
    print(few_shot_cot_response)
    print("\n" + "="*80 + "\n")

### 2.2 Logical Reasoning Problems

Let's test CoT reasoning on logical reasoning problems.

In [None]:
# Example logical reasoning problems
logic_problems = [
    "If all A are B, and some B are C, can we conclude that some A are C?",
    "If it's raining, then the ground is wet. The ground is wet. Does that mean it's raining?",
    "All lions are mammals. Some mammals are carnivores. Are all lions carnivores?",
    "If I have a red marble, a blue marble, and a green marble in a bag, and I pick two marbles without replacement, what is the probability that I pick the red marble and then the blue marble?"
]

# CoT examples for logical reasoning
logic_cot_examples = [
    {
        "problem": "If all dogs are animals, and all animals need food, do all dogs need food?",
        "solution": "Let's think step by step:\n1. We know that all dogs are animals.\n2. We also know that all animals need food.\n3. Since all dogs are animals, and all animals need food, it follows that all dogs need food.\n\nAnswer: Yes, all dogs need food."
    },
    {
        "problem": "If no fish are mammals, and all whales are mammals, are any whales fish?",
        "solution": "Let's think step by step:\n1. We know that no fish are mammals.\n2. We also know that all whales are mammals.\n3. If all whales are mammals, and no fish are mammals, then whales cannot be fish.\n\nAnswer: No, no whales are fish."
    }
]

# Test standard prompting vs. zero-shot CoT vs. few-shot CoT
for problem in logic_problems:
    print(f"Problem: {problem}\n")
    
    # Standard prompting
    standard_prompt_text = standard_prompt(problem)
    standard_response = call_llm(standard_prompt_text)
    print("Standard Response:")
    print(standard_response)
    print("\n" + "-"*50 + "\n")
    
    # Zero-shot CoT
    zero_shot_cot_prompt_text = zero_shot_cot_prompt(problem)
    zero_shot_cot_response = call_llm(zero_shot_cot_prompt_text)
    print("Zero-Shot CoT Response:")
    print(zero_shot_cot_response)
    print("\n" + "-"*50 + "\n")
    
    # Few-shot CoT
    few_shot_cot_prompt_text = few_shot_cot_prompt(problem, logic_cot_examples)
    few_shot_cot_response = call_llm(few_shot_cot_prompt_text)
    print("Few-Shot CoT Response:")
    print(few_shot_cot_response)
    print("\n" + "="*80 + "\n")

## 3. Self-Consistency Methods

Self-consistency involves generating multiple reasoning paths for the same problem and selecting the most frequent answer. This helps reduce errors from individual reasoning mistakes.

In [None]:
def extract_final_answer(response):
    """Extract the final answer from a CoT response."""
    # Look for patterns like "Answer: X" or "Therefore, X" at the end
    answer_patterns = [
        r"Answer:\s*(.+)$",
        r"Therefore,\s*(.+)$",
        r"Thus,\s*(.+)$",
        r"So,\s*(.+)$",
        r"In conclusion,\s*(.+)$"
    ]
    
    for pattern in answer_patterns:
        match = re.search(pattern, response, re.IGNORECASE | re.MULTILINE)
        if match:
            return match.group(1).strip()
    
    # If no pattern matches, return the last sentence
    sentences = response.split('.')
    if sentences:
        return sentences[-1].strip()
    
    return response.strip()

def self_consistency(problem, num_samples=5, temperature=0.7):
    """Implement self-consistency by generating multiple reasoning paths."""
    responses = []
    answers = []
    
    # Generate multiple responses using zero-shot CoT
    prompt = zero_shot_cot_prompt(problem)
    
    for i in range(num_samples):
        response = call_llm(prompt, temperature=temperature)
        responses.append(response)
        
        # Extract the final answer
        answer = extract_final_answer(response)
        answers.append(answer)
        
        # Add a small delay to avoid rate limits
        time.sleep(1)
    
    # Find the most common answer
    answer_counts = Counter(answers)
    most_common_answer = answer_counts.most_common(1)[0][0]
    confidence = answer_counts[most_common_answer] / num_samples
    
    return {
        "responses": responses,
        "answers": answers,
        "most_common_answer": most_common_answer,
        "confidence": confidence
    }

### 3.1 Self-Consistency for Math Problems

Let's apply self-consistency to a challenging math problem.

In [None]:
# A challenging math problem
challenging_math_problem = "If the probability of rain on Saturday is 60% and the probability of rain on Sunday is 70%, what is the probability of rain on both days? Assume the events are independent."

# Apply self-consistency
sc_results = self_consistency(challenging_math_problem, num_samples=3, temperature=0.7)

# Display results
print(f"Problem: {challenging_math_problem}\n")
print(f"Most common answer: {sc_results['most_common_answer']}")
print(f"Confidence: {sc_results['confidence']:.2f} ({sc_results['answers'].count(sc_results['most_common_answer'])}/{len(sc_results['answers'])})\n")

print("Individual reasoning paths:")
for i, (response, answer) in enumerate(zip(sc_results['responses'], sc_results['answers'])):
    print(f"\nAttempt {i+1}:")
    print(response)
    print(f"Extracted answer: {answer}")
    print("-" * 50)

### 3.2 Self-Consistency with Different Temperatures

Let's explore how temperature affects self-consistency results.

In [None]:
# A problem with potential for different reasoning paths
probability_problem = "A bag contains 3 red marbles, 4 blue marbles, and 5 green marbles. If you draw 2 marbles without replacement, what is the probability of drawing a red marble followed by a blue marble?"

# Test different temperatures
temperatures = [0.1, 0.5, 0.9]
temperature_results = {}

for temp in temperatures:
    print(f"\nTesting temperature = {temp}")
    sc_results = self_consistency(probability_problem, num_samples=3, temperature=temp)
    temperature_results[temp] = sc_results
    
    print(f"Most common answer: {sc_results['most_common_answer']}")
    print(f"Confidence: {sc_results['confidence']:.2f} ({sc_results['answers'].count(sc_results['most_common_answer'])}/{len(sc_results['answers'])})")
    print(f"All answers: {sc_results['answers']}")
    print("-" * 50)

# Visualize the results
plt.figure(figsize=(10, 6))

# Plot confidence vs. temperature
confidences = [temperature_results[temp]['confidence'] for temp in temperatures]
plt.plot(temperatures, confidences, marker='o', linestyle='-', linewidth=2)

plt.xlabel('Temperature')
plt.ylabel('Confidence (Agreement Ratio)')
plt.title('Effect of Temperature on Self-Consistency Confidence')
plt.grid(True)
plt.show()

## 4. Comparing Standard, CoT, and Self-Consistency Approaches

Let's compare the performance of standard prompting, CoT reasoning, and self-consistency on a set of challenging problems.

In [None]:
# Test problems with known answers
test_problems = [
    {
        "problem": "A store has 24 apples. They sell 1/3 of them in the morning and 1/4 of the remaining apples in the afternoon. How many apples are left?",
        "answer": "12"
    },
    {
        "problem": "If a train travels 60 miles per hour for 2.5 hours, how far does it travel?",
        "answer": "150 miles"
    },
    {
        "problem": "If the probability of rain on Saturday is 60% and the probability of rain on Sunday is 70%, what is the probability of rain on both days? Assume the events are independent.",
        "answer": "42%"
    }
]

# Function to check if an answer is correct
def is_correct(predicted, actual):
    """Check if the predicted answer matches the actual answer."""
    # Normalize answers for comparison
    predicted = re.sub(r'[^0-9.%]', '', predicted.lower())
    actual = re.sub(r'[^0-9.%]', '', actual.lower())
    return predicted == actual

# Compare approaches
results = []

for problem_data in test_problems:
    problem = problem_data["problem"]
    correct_answer = problem_data["answer"]
    
    print(f"Problem: {problem}")
    print(f"Correct answer: {correct_answer}\n")
    
    # Standard approach
    standard_response = call_llm(standard_prompt(problem))
    standard_answer = extract_final_answer(standard_response)
    standard_correct = is_correct(standard_answer, correct_answer)
    
    # CoT approach
    cot_response = call_llm(zero_shot_cot_prompt(problem))
    cot_answer = extract_final_answer(cot_response)
    cot_correct = is_correct(cot_answer, correct_answer)
    
    # Self-consistency approach (simplified to 3 samples for demonstration)
    sc_results = self_consistency(problem, num_samples=3)
    sc_answer = sc_results["most_common_answer"]
    sc_correct = is_correct(sc_answer, correct_answer)
    
    print(f"Standard answer: {standard_answer} (Correct: {standard_correct})")
    print(f"CoT answer: {cot_answer} (Correct: {cot_correct})")
    print(f"Self-consistency answer: {sc_answer} (Correct: {sc_correct})")
    print("-" * 50)
    
    results.append({
        "problem": problem,
        "standard_correct": standard_correct,
        "cot_correct": cot_correct,
        "sc_correct": sc_correct
    })

# Calculate accuracy for each approach
standard_accuracy = sum(r["standard_correct"] for r in results) / len(results)
cot_accuracy = sum(r["cot_correct"] for r in results) / len(results)
sc_accuracy = sum(r["sc_correct"] for r in results) / len(results)

print(f"Standard accuracy: {standard_accuracy:.2f}")
print(f"CoT accuracy: {cot_accuracy:.2f}")
print(f"Self-consistency accuracy: {sc_accuracy:.2f}")

# Visualize the results
plt.figure(figsize=(10, 6))
methods = ["Standard", "Chain-of-Thought", "Self-Consistency"]
accuracies = [standard_accuracy, cot_accuracy, sc_accuracy]

plt.bar(methods, accuracies, color=["blue", "green", "orange"])
plt.ylim(0, 1.1)
plt.ylabel("Accuracy")
plt.title("Comparison of Prompting Methods")

for i, v in enumerate(accuracies):
    plt.text(i, v + 0.05, f"{v:.2f}", ha="center")

plt.show()

## 5. Conclusion

In this notebook, we've explored and implemented Chain-of-Thought (CoT) reasoning and Self-consistency methods. Here are the key takeaways:

1. **Chain-of-Thought (CoT) reasoning** significantly improves performance on complex reasoning tasks by breaking down problems into step-by-step solutions.

2. **Zero-shot CoT** can be implemented simply by adding "Let's think step by step" to prompts, making it a very efficient technique.

3. **Few-shot CoT** provides even better guidance by showing examples of step-by-step reasoning, which helps the model understand the expected reasoning pattern.

4. **Self-consistency** further improves reliability by generating multiple reasoning paths and selecting the most consistent answer, reducing the impact of individual reasoning errors.

5. **Temperature settings** affect the diversity of reasoning paths in self-consistency, with lower temperatures generally leading to more consistent answers but potentially missing creative solutions.

These advanced prompting techniques are particularly valuable for complex reasoning tasks like math problems, logical reasoning, and multi-step decision-making processes.

In the next part, we'll explore function calling and tool use with JSON schemas.