# 🐛 SOFAI-Core: Code Debugging Domain

## Automated Bug Fixing with DebugBench & LeetCode Validation

In this notebook, you will learn:

1. **What is the Code Debugging Domain?** - Problem definition and components
2. **The DebugBench Dataset** - 17 bug types, 4,253+ instances
3. **Setting Up LeetCode Validation** - Step-by-step session cookie setup
4. **Domain Architecture** - All components explained
5. **Running the Full Pipeline** - End-to-end debugging with SOFAI

---

### ⚠️ Important Prerequisites

This domain requires:
1. **DebugBench Dataset** - Already included in `domains/code_debugging/data/`
2. **LeetCode Account** - For real code validation
3. **LEETCODE_SESSION Cookie** - See setup instructions below
4. **Ollama** - For LLM inference (optional for component demos)

In [None]:
# ============================================================
# COLAB SETUP - Run this cell first if using Google Colab
# ============================================================
import subprocess
import sys
import os

# Check if running in Colab
IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    print("�� Running in Google Colab - Setting up environment...")
    
    # Clone the repository
    if not os.path.exists('SOFAI-LM'):
        !git clone https://github.com/YOUR_USERNAME/SOFAI-LM.git
    %cd SOFAI-LM
    
    # Install dependencies
    !pip install -q -r requirements.txt
    
    # Install Ollama for Colab
    !curl -fsSL https://ollama.com/install.sh | sh
    
    # Start Ollama server in background
    import subprocess
    subprocess.Popen(['ollama', 'serve'], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
    import time
    time.sleep(3)  # Wait for server to start
    
    print("✅ Colab setup complete!")
else:
    print("Running locally - no Colab setup needed.")


---

## Part 1: Understanding the Code Debugging Domain

### What is Code Debugging?

**Code Debugging** is the process of identifying and fixing errors (bugs) in source code. In the SOFAI framework, this is treated as a Constraint Satisfaction Problem where:

- **Problem**: A buggy code snippet with a known problem description
- **Solution**: Fixed code that passes all test cases
- **Constraint**: The fixed code must be semantically equivalent to the intended solution

### The Challenge

Unlike graph coloring where validation is deterministic, code debugging requires:
- **Semantic understanding** of what the code should do
- **Syntax correctness** (no compile errors)
- **Passing all test cases** (often hidden)

### SOFAI's Approach

```
┌─────────────────────────────────────────────────────────────────┐
│                     Code Debugging Flow                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  1. DEBUGBENCH        2. PROMPT             3. LLM              │
│  ───────────────      ────────             ─────                │
│  Load buggy code  →   IO_INTENTION   →    Generate             │
│  + description        format prompt        fixed code           │
│                                                                 │
│       ↓                                                         │
│                                                                 │
│  4. PARSER            5. LEETCODE API      6. FEEDBACK          │
│  ──────────           ─────────────        ──────────          │
│  Extract code    →    Submit &        →   Pass: Done!          │
│  from <code>tags      run tests           Fail: Retry          │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
```

In [None]:
# Setup: Add project root to path
import sys
import os

project_root = os.path.dirname(os.getcwd()) if 'notebooks' in os.getcwd() else os.getcwd()
if project_root not in sys.path:
    sys.path.insert(0, project_root)

print(f"Project root: {project_root}")

---

## Part 2: The DebugBench Dataset

### What is DebugBench?

[DebugBench](https://github.com/thunlp/DebugBench) is a benchmark dataset for code debugging containing:
- **4,253 Python3 buggy code instances**
- **17 distinct bug types**
- Problems sourced from LeetCode
- Oracle (correct) solutions for reference

### The 17 Bug Types

| Category | Bug Types |
|----------|----------|
| **Logic Errors** | condition error, operation error, variable error |
| **Syntax Errors** | missing colons, illegal indentation, unclosed parentheses, unclosed string |
| **Reference Errors** | undefined methods, undefined objects, faulty indexing |
| **Semantic Errors** | misused == or =, illegal keywords, illegal comment |
| **Multiple Bugs** | double (2 bugs), triple (3 bugs), quadruple (4 bugs), other error |

### Dataset Location

The dataset is pre-loaded in:
```
domains/code_debugging/data/
├── python3_condition error.json
├── python3_double.json
├── python3_faulty indexing.json
├── python3_illegal comment.json
├── python3_illegal indentation.json
├── python3_illegal keywords.json
├── python3_missing colons.json
├── python3_misused == or =.json
├── python3_operation error.json
├── python3_other error.json
├── python3_quadruple.json
├── python3_triple.json
├── python3_unclosed parentheses.json
├── python3_unclosed string.json
├── python3_undefined methods.json
├── python3_undefined objects.json
└── python3_variable error.json
```

In [None]:
# Explore the bug types and dataset size
from domains.code_debugging.data_loader import BUG_TYPES, get_problem_count, DEBUGBENCH_PATH

print("📊 DebugBench Dataset Overview")
print("=" * 60)
print(f"\nDataset location: {DEBUGBENCH_PATH}")
print(f"\nAvailable bug types ({len(BUG_TYPES)} total):")
print("-" * 60)

total_problems = 0
for bug_type in BUG_TYPES:
    count = get_problem_count(bug_type=bug_type)
    total_problems += count
    print(f"  • {bug_type:25s} : {count:4d} problems")

print("-" * 60)
print(f"  {'TOTAL':25s} : {total_problems:4d} problems")

In [None]:
# Load and examine a sample problem
from domains.code_debugging.data_loader import load_problem_from_dataset

print("📝 Sample Problem (condition error)")
print("=" * 60)

# Load a specific bug type
problem = load_problem_from_dataset(language="Python3", bug_type="condition error", problem_index=0)

print(f"\n🔖 Problem: {problem.slug}")
print(f"📊 Level: {problem.level}")
print(f"🐛 Bug Type: {problem.bug_type}")
print(f"\n📋 Description:")
print(problem.description[:500] + "..." if len(problem.description) > 500 else problem.description)

In [None]:
# Show the buggy code
print("\n🐛 Buggy Code:")
print("-" * 60)
print(problem.buggy_code)

In [None]:
# Show the oracle (correct) code for comparison
print("\n✅ Oracle (Correct) Code:")
print("-" * 60)
print(problem.oracle_code)

In [None]:
# The CodeDebuggingProblem dataclass
print("\n📦 CodeDebuggingProblem Structure")
print("=" * 60)
print("""
@dataclass
class CodeDebuggingProblem:
    slug: str          # LeetCode problem ID (e.g., 'two-sum')
    description: str   # Problem statement
    examples: List[str] # Input/output examples
    constraints: str    # Problem constraints
    level: str         # 'easy', 'medium', 'hard'
    buggy_code: str    # The code to fix
    oracle_code: str   # Reference correct solution
    explanations: str  # Bug explanation
    content: str       # Full problem content
    bug_type: str      # One of the 17 bug types
""")

---

## Part 3: Setting Up LeetCode Validation 🔐

The Code Debugging domain uses **real LeetCode submission** to validate solutions. This ensures the fixed code actually works!

### Why LeetCode Validation?

- ✅ Real test cases (including hidden ones)
- ✅ Accurate feedback (runtime errors, wrong answers)
- ✅ Performance metrics (runtime, memory)
- ⚠️ Requires LeetCode account
- ⚠️ Rate limited (15s cooldown between submissions)

### Step-by-Step Setup

#### Step 1: Log into LeetCode

1. Open your browser and go to [leetcode.com](https://leetcode.com)
2. Log into your account

#### Step 2: Get the Session Cookie

**For Chrome/Edge:**
1. Right-click anywhere on LeetCode → "Inspect" (or press F12)
2. Go to **Application** tab → **Cookies** → `https://leetcode.com`
3. Find the cookie named `LEETCODE_SESSION`
4. Copy its **Value** (it's a long string starting with `eyJ...`)

**For Firefox:**
1. Right-click → "Inspect Element" → **Storage** tab → **Cookies**
2. Find `LEETCODE_SESSION` and copy its value

#### Step 3: Set Environment Variable

```bash
# In terminal (temporary, for current session)
export LEETCODE_SESSION='eyJ...your_long_session_value...'

# Or add to ~/.bashrc or ~/.zshrc (permanent)
echo 'export LEETCODE_SESSION="eyJ..."' >> ~/.zshrc
source ~/.zshrc
```

#### Step 4: Verify Setup

In [None]:
# Check if LEETCODE_SESSION is set
from domains.code_debugging.utils import check_leetcode_session

print("🔐 LeetCode Session Check")
print("=" * 60)

if check_leetcode_session():
    session = os.environ.get('LEETCODE_SESSION', '')
    print("✅ LEETCODE_SESSION is set!")
    print(f"   Value preview: {session[:20]}...{session[-20:]}")
    print("\n   LeetCode validation will work.")
else:
    print("❌ LEETCODE_SESSION is NOT set!")
    print("\n   To set it, run in your terminal:")
    print("   export LEETCODE_SESSION='your_session_cookie_value'")
    print("\n   Or set it here for this notebook session:")
    print("   (uncomment and fill in the next cell)")

In [None]:
# OPTIONAL: Set the session cookie here (for this notebook only)
# ⚠️ WARNING: Do not commit this value to git!

# Uncomment and fill in your session cookie:
# os.environ['LEETCODE_SESSION'] = 'eyJ...your_session_value_here...'

# Then re-run the check:
# print("Session set!" if check_leetcode_session() else "Failed to set session")

### Important Notes About LeetCode Validation

| Aspect | Details |
|--------|--------|
| **Rate Limiting** | 15-second cooldown between submissions (enforced by code) |
| **Session Expiry** | Cookies expire. If validation fails, get a new session cookie. |
| **Account Risk** | Using automated submissions may violate LeetCode ToS. Use responsibly. |
| **Without Session** | Domain can still load problems and parse solutions, but cannot validate. |

---

## Part 4: Domain Architecture Deep Dive

### File Structure

```
domains/code_debugging/
├── code_debugging_domain.py    # Main DomainInterface implementation
├── data_loader.py              # Load problems from DebugBench JSON
├── validator.py                # LeetCode API wrapper
├── leetcode_tester.py          # LeetCodeTester class
├── leetcode_env/               # LeetCode environment (from DebugBench)
├── prompt_builder.py           # IO_INTENTION_PROMPT construction
├── solution_parser.py          # Extract <code></code> tags
├── utils.py                    # Helper functions
└── data/                       # DebugBench JSON files
```

### 4.1 The Data Loader

In [None]:
print("📂 Data Loader")
print("=" * 60)
print("""
load_problem_from_dataset(
    language='Python3',        # Only Python3 supported
    bug_type=None,             # Specific type or None for random
    problem_index=None         # Specific index or None for random
) -> CodeDebuggingProblem

Loads problems from domains/code_debugging/data/python3_*.json
""")

# Demo: Load random problem
random_problem = load_problem_from_dataset()
print(f"\nRandom problem loaded:")
print(f"  Slug: {random_problem.slug}")
print(f"  Bug type: {random_problem.bug_type}")
print(f"  Level: {random_problem.level}")

### 4.2 The Prompt Builder (IO_INTENTION_PROMPT)

In [None]:
from domains.code_debugging.prompt_builder import build_debugging_prompt, IO_INTENTION_PROMPT_TEMPLATE

print("📝 Prompt Builder (IO_INTENTION_PROMPT)")
print("=" * 60)
print("\nTemplate structure:")
print("-" * 60)
print(IO_INTENTION_PROMPT_TEMPLATE[:500] + "...")

In [None]:
# Generate a full prompt
print("\n📜 Generated Prompt Example")
print("=" * 60)

prompt = build_debugging_prompt(problem, episodic_examples=None)
print(prompt[:1500] + "..." if len(prompt) > 1500 else prompt)

### Key Points About the Prompt:

1. **Structured Format**: Uses `{LANG}`, `{DESCRIPTION}`, `{EXAMPLES}`, `{CONSTRAINTS}`, `{BUGGY_CODE}`

2. **Expected Output**: LLM must respond with:
   ```
   <code>
   fixed code here
   </code>
   <exp>
   short explanation about the bug
   </exp>
   ```

3. **Episodic Examples**: Can include past problem-solution pairs for few-shot learning

### 4.3 The Solution Parser

In [None]:
from domains.code_debugging.solution_parser import parse_fixed_code, parse_explanation

print("🔍 Solution Parser")
print("=" * 60)
print("""
Extracts fixed code from LLM response.

Priority:
1. <code>...</code> tags (preferred)
2. ```python...``` markdown blocks
3. ```...``` generic code blocks
4. Entire response (fallback)
""")

# Test parsing
test_responses = [
    # Correct format
    """<code>
class Solution:
    def twoSum(self, nums, target):
        for i in range(len(nums)):
            for j in range(i+1, len(nums)):
                if nums[i] + nums[j] == target:
                    return [i, j]
</code>
<exp>
Fixed the loop range to avoid index out of bounds
</exp>""",
    
    # Markdown format
    """Here is the fixed code:
```python
def solution():
    return True
```""",
]

for i, response in enumerate(test_responses, 1):
    code = parse_fixed_code(response)
    exp = parse_explanation(response)
    print(f"\nResponse {i}:")
    print(f"  Extracted code ({len(code)} chars): {code[:50]}...")
    print(f"  Explanation: {exp if exp else 'None found'}")

### 4.4 The LeetCode Validator

In [None]:
print("✅ LeetCode Validator")
print("=" * 60)
print("""
validate_code_with_leetcode(
    code: str,           # The Python code to submit
    task_id: str,        # LeetCode problem slug (e.g., 'two-sum')
    language: str        # 'Python3'
) -> Tuple[bool, Dict]

Returns:
  - (True, {status_msg: 'Accepted', runtime, memory})
  - (False, {status_msg, error, last_testcase, expected, actual})

Features:
  • 15-second cooldown between submissions (prevents rate limiting)
  • Singleton pattern (reuses LeetCodeTester instance)
  • Handles environment errors gracefully
""")

In [None]:
# Demonstrate validation (without actually submitting)
print("\n📋 Validation Flow")
print("-" * 60)
print("""
1. Check LEETCODE_SESSION environment variable
2. Create LeetCodeTester (singleton)
3. Wait for cooldown if needed (15s)
4. Submit code to LeetCode API
5. Wait for results
6. Return (is_valid, feedback)
""")

# Show what feedback looks like
print("\n📊 Example Feedback (Success):")
print({
    'status_msg': 'Accepted',
    'runtime': '40 ms',
    'memory': '14.2 MB',
    'status_runtime': 'beats 95%'
})

print("\n📊 Example Feedback (Failure):")
print({
    'status_msg': 'Wrong Answer',
    'last_testcase': 'nums = [2,7,11,15], target = 9',
    'expected_output': '[0, 1]',
    'code_output': '[1, 0]'
})

---

## Part 5: The Complete Domain Implementation

In [None]:
from domains.code_debugging.code_debugging_domain import CodeDebuggingDomain

print("🎮 Complete Domain Workflow (Without LLM)")
print("=" * 60)

# Step 1: Create domain
domain = CodeDebuggingDomain()
print("\n1️⃣ Created CodeDebuggingDomain")

# Step 2: Generate a problem
problem = domain.generate_problem(bug_type="condition error")
print(f"\n2️⃣ Generated Problem:")
print(f"   Slug: {problem.slug}")
print(f"   Bug Type: {problem.bug_type}")
print(f"   Level: {problem.level}")

# Step 3: Build prompt
prompt = domain.build_prompt(problem, episodic_examples=None)
print(f"\n3️⃣ Built Prompt ({len(prompt)} chars)")

# Step 4: Simulate LLM response (use oracle code)
simulated_response = f"<code>\n{problem.oracle_code}\n</code>\n<exp>Fixed the condition error</exp>"
print(f"\n4️⃣ Simulated LLM Response (using oracle code)")

# Step 5: Parse solution
solution = domain.parse_solution(simulated_response)
print(f"\n5️⃣ Parsed Solution ({len(solution)} chars)")
print(f"   First 100 chars: {solution[:100]}...")

# Step 6: Memory representation
prob_repr = domain.get_problem_representation(problem)
sol_repr = domain.format_solution_for_memory(solution)
print(f"\n6️⃣ Memory Representations:")
print(f"   Problem: {prob_repr[:80]}...")
print(f"   Solution: {sol_repr[:80]}...")

In [None]:
# Test validation (only if LEETCODE_SESSION is set)
print("\n7️⃣ Validation Step")
print("-" * 60)

if check_leetcode_session():
    print("⚠️ LeetCode session is set.")
    print("   To actually validate, uncomment the code below.")
    print("   WARNING: This will submit to LeetCode and count toward rate limits!")
    
    # Uncomment to actually validate:
    # is_valid, feedback = domain.validate_solution(problem, solution)
    # print(f"   Valid: {is_valid}")
    # print(f"   Feedback: {domain.format_feedback(feedback)}")
else:
    print("❌ LEETCODE_SESSION not set - skipping validation")
    print("   Set the environment variable to enable real validation.")

---

## Part 6: Running with the Full SOFAI Framework 🚀

**Requirements:**
- Ollama running with a model
- LEETCODE_SESSION set (for validation)

Without LeetCode session, the domain will run but validation will fail.

In [None]:
# system deps
!sudo apt-get update
!sudo apt-get install -y zstd


In [None]:
# Install Ollama
!curl https://ollama.ai/install.sh | sh
!pip install -q ollama

!nvidia-smi
!ollama serve > /tmp/ollama.log 2>&1 &
!sleep 2



In [None]:
# Check Ollama availability and ensure model is pulled
import subprocess

MODEL_NAME = 'mistral'  # Change this to use a different model

def check_ollama():
    try:
        result = subprocess.run(['ollama', 'list'], capture_output=True, text=True, timeout=5)
        if result.returncode == 0:
            print("✅ Ollama is available!")
            return True
    except:
        pass
    print("❌ Ollama not available")
    return False

def ensure_model_available(model_name):
    """Check if model exists, pull if not."""
    try:
        result = subprocess.run(['ollama', 'list'], capture_output=True, text=True, timeout=10)
        if model_name in result.stdout:
            print(f"✅ Model '{model_name}' is available.")
            return True
        
        print(f"⬇️ Model '{model_name}' not found. Pulling...")
        pull_result = subprocess.run(['ollama', 'pull', model_name], timeout=600)
        if pull_result.returncode == 0:
            print(f"✅ Model '{model_name}' pulled successfully.")
            return True
    except Exception as e:
        print(f"❌ Error: {e}")
    return False

ollama_available = check_ollama()
if ollama_available:
    ensure_model_available(MODEL_NAME)


In [None]:
if ollama_available:
    from core.metacognitive_module import MCModule
    
    print("=" * 60)
    print("🐛 Running SOFAI with Code Debugging Domain")
    print("=" * 60)
    
    # Create domain and problem
    domain = CodeDebuggingDomain()
    problem = domain.generate_problem(bug_type="missing colons")  # Easier bug type
    
    print(f"\n📊 Problem: {problem.slug}")
    print(f"   Bug Type: {problem.bug_type}")
    print(f"   Level: {problem.level}")
    print(f"\n🐛 Buggy Code Preview:")
    print(problem.buggy_code[:300] + "...")
    
    # Create MCModule
    mc = MCModule(
        domain=domain,
        llm_model="mistral",
        max_iterations=2  # Fewer iterations due to LeetCode rate limits
    )
    
    print("\n⏳ Starting solve process...")
    print("   (Note: Each iteration may take 15+ seconds due to LeetCode cooldown)")
    
    # Solve!
    result = mc.solve(problem, verbose=True)
    
    # Display results
    print("\n" + "=" * 60)
    print("📋 Final Results")
    print("=" * 60)
    print(f"Solved: {result['solved']}")
    print(f"Solved by: {'S1' if result['s1_solved'] else 'S2' if result['s2_solved'] else 'None'}")
    print(f"Iterations: {result['iterations']}")
    print(f"Total time: {result['total_time']:.2f}s")
    if result['solution']:
        print(f"\n✅ Fixed Code Preview:")
        print(result['solution'][:300] + "...")
else:
    print("\n⚠️ Skipping live demo - Ollama not available")

---

## Part 7: Understanding the Feedback Loop

When validation fails, the domain provides detailed feedback to help the LLM correct its solution.

In [None]:
# Demonstrate feedback formatting
print("🔄 Feedback Examples")
print("=" * 60)

domain = CodeDebuggingDomain()

# Example 1: Accepted
feedback1 = {'status_msg': 'Accepted', 'runtime': '40 ms', 'memory': '14.2 MB'}
print(f"\n✅ Success Feedback:")
print(f"   {domain.format_feedback(feedback1)}")

# Example 2: Wrong Answer
feedback2 = {
    'status_msg': 'Wrong Answer',
    'last_testcase': 'nums = [2,7,11,15], target = 9',
    'expected_output': '[0, 1]',
    'code_output': '[1, 0]'
}
print(f"\n❌ Wrong Answer Feedback:")
print(f"   {domain.format_feedback(feedback2)}")

# Example 3: Runtime Error
feedback3 = {
    'status_msg': 'Runtime Error',
    'full_runtime_error': 'IndexError: list index out of range'
}
print(f"\n❌ Runtime Error Feedback:")
print(f"   {domain.format_feedback(feedback3)}")

# Example 4: Compile Error
feedback4 = {
    'status_msg': 'Compile Error',
    'compile_error': 'SyntaxError: invalid syntax at line 5'
}
print(f"\n❌ Compile Error Feedback:")
print(f"   {domain.format_feedback(feedback4)}")

---

## Part 8: Best Practices & Troubleshooting

### Common Issues

| Issue | Solution |
|-------|----------|
| `EnvironmentError: LEETCODE_SESSION not set` | Set the environment variable (see Part 3) |
| `Session expired` | Get a new session cookie from LeetCode |
| `Rate limited` | Wait longer between submissions (cooldown is 15s) |
| `Timeout` | LeetCode API may be slow; increase timeout |
| `404 error on problem` | Problem slug might be wrong in dataset |

### Tips for Better Results

1. **Start with syntax bugs** (missing colons, indentation) - easier for LLMs
2. **Use fewer iterations** to avoid rate limiting
3. **Consider S2 with stronger model** for logic bugs
4. **Monitor LeetCode daily submission limits**

---

## Summary: Key Takeaways 🎯

1. **Code Debugging as CSP**: Fix buggy code to pass all test cases

2. **DebugBench Dataset**: 4,253+ Python3 problems across 17 bug types

3. **LeetCode Validation**: Real test execution for accurate feedback
   - Requires `LEETCODE_SESSION` environment variable
   - 15-second cooldown between submissions

4. **IO_INTENTION_PROMPT**: Structured format for LLM debugging
   - Expects `<code>...</code>` and `<exp>...</exp>` tags

5. **Feedback Loop**: Detailed error messages help LLM iterate

6. **Challenges**: Rate limiting, session expiry, complex logic bugs

---

## Next Steps

- Review the Graph Coloring notebook for comparison
- Try creating a custom domain using the templates
- Experiment with different LLM models for code debugging

Happy debugging! 🐛🔧