# Demo 1: The Self-Reparation

## Concept: Self-Correction / Reflexion

In this demo, we'll explore the most basic feedback loop for self-improving code:

```
Code ‚Üí Error ‚Üí LLM ‚Üí Fixed Code
```

### What We'll Build

A Python script that:
1. Executes a buggy function
2. Captures the error (traceback)
3. Sends the code + error to an LLM
4. Receives corrected code
5. Replaces the buggy function and runs successfully

### Key Insight

We're using **stderr (terminal error output)** as a "learning signal" for the LLM. This closes the feedback loop between execution and improvement.

---

### Related Papers

- **Self-Refine**: Iterative Refinement with Self-Feedback  
  [arXiv:2303.17651](https://arxiv.org/abs/2303.17651)

- **Reflexion**: Language Agents with Verbal Reinforcement Learning  
  [arXiv:2303.11366](https://arxiv.org/abs/2303.11366)

## Setup

You have three options for the LLM provider:

### Option 1: OpenAI (Recommended)
- Get an API key from [platform.openai.com](https://platform.openai.com)
- Add secret `OPENAI_API_KEY` in Colab Secrets

### Option 2: Google Gemini (FREE)
- Get a free API key from [Google AI Studio](https://aistudio.google.com/apikey)
- Add secret `GEMINI_API_KEY` in Colab Secrets

### Option 3: Groq (FREE - Very Fast)
- Get a free API key from [console.groq.com](https://console.groq.com)
- Add secret `GROQ_API_KEY` in Colab Secrets
- Uses gpt-oss model

In the next cell, uncomment the option you want to use.

In [2]:
# ============================================================
# OPTION 1: OpenAI (Recommended - requires API key with credits)
# ============================================================
!pip install openai -q

from google.colab import userdata
from openai import OpenAI

client = OpenAI(api_key=userdata.get('OPENAI_API_KEY'))

print("Setup complete! Using OpenAI")

# ============================================================
# OPTION 2: Google Gemini (FREE - uncomment below, comment above)
# ============================================================
# !pip install google-generativeai -q
#
# from google.colab import userdata
# import google.generativeai as genai
#
# genai.configure(api_key=userdata.get('GEMINI_API_KEY'))
#
# # Wrapper class to make Gemini API compatible with OpenAI-style calls
# class GeminiClient:
#     def __init__(self):
#         self._model = genai.GenerativeModel('gemini-1.5-flash')
#
#     class _Completions:
#         def __init__(self, model):
#             self._model = model
#
#         def create(self, model=None, messages=None, temperature=0.7, **kwargs):
#             # Convert OpenAI message format to Gemini prompt
#             prompt_parts = []
#             for msg in messages:
#                 role = msg.get('role', 'user')
#                 content = msg.get('content', '')
#                 if role == 'system':
#                     prompt_parts.append(f"Instructions: {content}")
#                 else:
#                     prompt_parts.append(content)
#
#             prompt = "\n\n".join(prompt_parts)
#
#             response = self._model.generate_content(
#                 prompt,
#                 generation_config=genai.GenerationConfig(temperature=temperature)
#             )
#
#             # Create OpenAI-compatible response structure
#             class Message:
#                 def __init__(self, text):
#                     self.content = text
#
#             class Choice:
#                 def __init__(self, text):
#                     self.message = Message(text)
#
#             class Response:
#                 def __init__(self, text):
#                     self.choices = [Choice(text)]
#
#             return Response(response.text)
#
#     @property
#     def chat(self):
#         class Chat:
#             def __init__(chat_self):
#                 chat_self.completions = GeminiClient._Completions(self._model)
#         return Chat()
#
# client = GeminiClient()
#
# print("Setup complete! Using Google Gemini (FREE)")

# ============================================================
# OPTION 3: Groq (FREE - very fast, uncomment below, comment above)
# ============================================================
# !pip install openai -q
#
# from google.colab import userdata
# from openai import OpenAI
#
# client = OpenAI(
#     api_key=userdata.get('GROQ_API_KEY'),
#     base_url="https://api.groq.com/openai/v1"
# )
#
# # IMPORTANT: When using Groq, change the model in API calls from
# # "gpt-4o-mini" to "openai/gpt-oss-20b"
#
# print("Setup complete! Using Groq (FREE)")

Setup complete! Using OpenAI


## The Broken Code

Here's our intentionally buggy function. It calculates the average of a list of numbers, but has a critical flaw: **it doesn't handle empty lists**.

When given an empty list, it will raise a `ZeroDivisionError`.

In [3]:
# Intentionally buggy function - division by zero when empty list
def calculate_average(numbers):
    """Calculate the average of a list of numbers."""
    total = sum(numbers)
    return total / len(numbers)  # Bug: fails on empty list!

## Error Capture Utility

This helper function runs any function and captures both successful results and errors. The error information (including the full traceback) is what we'll feed to the LLM.

In [4]:
import traceback

def run_with_error_capture(func, *args):
    """
    Execute a function and capture any errors.

    Returns a dict with:
    - success: bool
    - result: the return value (if success)
    - error_type, error_message, traceback: error details (if failure)
    """
    try:
        result = func(*args)
        return {"success": True, "result": result}
    except Exception as e:
        return {
            "success": False,
            "error_type": type(e).__name__,
            "error_message": str(e),
            "traceback": traceback.format_exc()
        }

## Watch It Fail

Let's run our function with two test cases:
1. A normal list `[1, 2, 3, 4, 5]` ‚Üí should return `3.0`
2. An empty list `[]` ‚Üí will trigger the bug!

In [5]:
# Test cases
# "To err is human, to self-repair is divine" - Workshop 2025
test_cases = [
    [19, 92, 20, 25],  # Normal case: should return 39.0
    [],                # Edge case: empty list (bug trigger!)
]

print("üß™ Running tests on the buggy function:\n")

for test in test_cases:
    result = run_with_error_capture(calculate_average, test)
    print(f"Input: {test}")
    if result["success"]:
        print(f"‚úÖ Result: {result['result']}")
    else:
        print(f"‚ùå Error: {result['error_type']}: {result['error_message']}")
    print()

üß™ Running tests on the buggy function:

Input: [19, 92, 20, 25]
‚úÖ Result: 39.0

Input: []
‚ùå Error: ZeroDivisionError: division by zero



## The Self-Repairing Loop

Now for the magic! We'll create a function that:
1. Takes broken code and an error message
2. Sends them to the LLM with a prompt asking for a fix
3. Extracts and returns the corrected code

This is the **core of the self-repairing pattern**: using execution feedback to guide improvement.

In [6]:
def ask_llm_to_fix(code: str, error: str) -> str:
    """
    Send broken code + error to LLM, get fixed code back.

    Args:
        code: The source code of the buggy function
        error: The full traceback from the error

    Returns:
        The corrected Python code as a string
    """

    prompt = f"""Here is a Python function that has a bug:

    ```python
    {code}
    ```

    When executed, it produces this error:
    ```
    {error}
    ```

    Please fix the bug and return ONLY the corrected Python code block, nothing else.
    Make sure to handle edge cases appropriately.
    """

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        temperature=0  # More stable output for reproducibility
    )

    # Extract code from response
    content = response.choices[0].message.content

    # Parse code block (handle different markdown formats)
    if "```python" in content:
        code = content.split("```python")[1].split("```")[0]
    elif "```" in content:
        code = content.split("```")[1].split("```")[0]
    else:
        code = content

    return code.strip()

## Running the Self-Repairing Process

Here's where everything comes together:

1. We get the source code of our buggy function
2. We run it and capture the error
3. We ask the LLM to fix it
4. We execute the fixed code to redefine the function
5. We test again to verify the fix works!

In [7]:
import inspect

# Get the source code of our buggy function
original_code = inspect.getsource(calculate_average)
print("üìÑ Original code:")
print(original_code)
print()

# Run and capture error
result = run_with_error_capture(calculate_average, [])

if not result["success"]:
    print("üî¥ Error detected!")
    print(f"Error: {result['error_type']}: {result['error_message']}\n")

    print("üîß Asking LLM to fix the code...\n")
    fixed_code = ask_llm_to_fix(original_code, result["traceback"])

    print("‚úÖ Fixed code received:")
    print("-" * 40)
    print(fixed_code)
    print("-" * 40)

    # Execute the fixed code to define the new function
    # This replaces the old calculate_average with the fixed version
    exec(fixed_code, globals())

    print("\nüß™ Testing the fixed function:")
    print()

    for test in test_cases:
        result = run_with_error_capture(calculate_average, test)
        print(f"Input: {test}")
        if result["success"]:
            print(f"‚úÖ Result: {result['result']}")
        else:
            print(f"‚ùå Still broken: {result['error_message']}")
        print()

üìÑ Original code:
def calculate_average(numbers):
    """Calculate the average of a list of numbers."""
    total = sum(numbers)
    return total / len(numbers)  # Bug: fails on empty list!


üî¥ Error detected!
Error: ZeroDivisionError: division by zero

üîß Asking LLM to fix the code...

‚úÖ Fixed code received:
----------------------------------------
def calculate_average(numbers):
    """Calculate the average of a list of numbers."""
    if not numbers:  # Check for empty list
        return 0  # Return 0 or an appropriate value for empty list
    total = sum(numbers)
    return total / len(numbers)
----------------------------------------

üß™ Testing the fixed function:

Input: [19, 92, 20, 25]
‚úÖ Result: 39.0

Input: []
‚úÖ Result: 0



## Key Takeaways

### What We Learned

1. **The Feedback Loop**: By capturing execution errors and feeding them back to an LLM, we create a simple but powerful self-correction mechanism.

2. **stderr as Learning Signal**: The traceback isn't just for humans - it's rich information that tells the LLM exactly what went wrong and where.

3. **Dynamic Code Replacement**: Using `exec()` we can replace functions at runtime with their corrected versions.

### Limitations & Safety Considerations

‚ö†Ô∏è‚ö†Ô∏è‚ö†Ô∏è **This is a demo!** In production:

- **Never `exec()` untrusted code** - always sandbox LLM-generated code
- **Validate fixes** - run comprehensive tests before accepting changes
- **Human review** - critical changes should be reviewed before deployment
- **Rate limiting** - prevent infinite loops of failed fixes

### Next Steps

In Demo 2, we'll expand this concept to **evolutionary optimization** - instead of fixing one bug, we'll evolve an entire population of solutions!