# <font color='blue'> Sec 12. Huamn Loop Feedback </font>

#### <font color='red'> Content: Code Generation and Debugging Assistant </font>
In this hands-on exercise, you will <font color='red'> implement **iterative feedback loops** where an AI generates, tests, and revises Python code snippets based on test results and feedback. </font>

We will use an LLM Feedback loop to create and iteratively improve a Python function called `process_data` that is best described using the following examples:

```python
process_data([1, 2, 3, 4, 5], mode='average')  # Should return 3.0
process_data([1, 2, 'a', 3], mode='sum')  # Should return 6
```


### Outline:

    0. Setup
    1. Define Task and Test Cases
    2. Initial Generation
    3. Expand the Test Cases
    4. First Iteration with Feedback
    5. Create Feedback Loop
    6. Reflection

## 0. Setup

In [1]:
# Import necessary libraries
# No changes needed in this cell
from openai import OpenAI
from IPython.display import Markdown, display
import os

import json

with open('api_key.json', 'r') as file:
    api_keys = json.load(file)
    openai_key = api_keys["openai"]

# print(openai_key)
client = OpenAI(api_key=openai_key)

from utils import get_completion
from utils_human_feedback import execute_code, format_feedback


## 1. Define Task and Test Cases

We will create a Python function called `process_data` that analyzes numerical data with the following (possibly incomplete) set of requirements:

1. The function should accept a list of numbers and an optional parameter 'mode' that can be 'sum' or 'average' (default should be 'average').
2. If mode is 'sum', return the sum of all numbers.
3. If mode is 'average', return the average (mean) of all numbers.

Example:
```python
process_data([1, 2, 3, 4, 5], mode='average')  # Should return 3.0
process_data([1, 2, 'a', 3], mode='sum')  # Should return 6
```
 

In [2]:
task_description = """
We will create a Python function called `process_data` that

1. has a list as input, and an optional input parameter 'mode' that can be 'sum' or 'average' (default should be 'average').
2. The input list has any type of data, float, int, str etc.
2. If mode is 'sum', the function returns the sum of all numbers of the input list.
3. If mode is 'average', the function returns the average (mean) of all number of the input list.
"""

In [7]:
test_cases = [
    {"inputs": ([1, 2, 3, 4, 5], "sum"), "expected": 15},
    {"inputs": ([1, 2, 3, 4, 5], "average"), "expected": 3.0},
    {"inputs": ([1, 2, 3], "average"), "expected": 2},
    {"inputs": ([], "average"), "expected": 0},
    {"inputs": ([], "sum"), "expected": 0},
    {"inputs": ([1, 2, 'a', 3], "sum"), "expected": 6},
    {"inputs": ([1, 200, 'a', {}], "average"), "expected": 100.5},
]

## 2. Initial Generation

### Initial basic Prompt

In [4]:
# Basic prompt for initial code generation

initial_prompt = f"""
You are a processfional Python developer. 

{task_description}

Write only the function surrounded by ```python and ``` without any additional explanations or examples.

Example:

def process_data(input_list, mode="average"):
    .......
    .....

run process_data([1,2]), it returns 1.5
"""

# Get initial completion
messages = [{"role": "user", "content": initial_prompt}]
initial_response = get_completion(client, messages)

def extract_code(code):
    lines = code.split("\n")
    start = lines.index("```python") + 1
    end = lines.index("```", start)
    return "\n".join(lines[start:end])

initial_code = extract_code(initial_response)

print("Initial Generated Code:")
print(initial_code)

# Execute and test the initial code
initial_results = execute_code(initial_code, test_cases)
initial_feedback = format_feedback(initial_results)

print()
print("**********************************")
print("\nTest Results:")
print(initial_feedback)

Initial Generated Code:
def process_data(input_list, mode="average"):
    total = 0
    count = 0
    for item in input_list:
        if isinstance(item, (int, float)):
            total += item
            count += 1
        elif isinstance(item, str):
            try:
                num = float(item)
                total += num
                count += 1
            except ValueError:
                continue
    if count == 0:
        return 0
    if mode == "sum":
        return total
    elif mode == "average":
        return total / count
    else:
        raise ValueError("Invalid mode. Choose 'sum' or 'average'.")

**********************************

Test Results:
Test Results: 7 passed, 0 failed


### Add feedback to correct the code

In [8]:
# A new prompt that asks the AI to revise its own work
prompt_corrective = f"""
You are a processfional Python developer. 

Your previously generated code "process_data" failed a test case.
Please fix the bug, and explain what is the issue mentioned in the feedback.

Your previous code:
---
{initial_code}
---

Feedback on your code:
---
{initial_feedback}
---

Please provide the complete, corrected python code, still named process_data.
"""

In [9]:
messages = [{"role": "user", "content": prompt_corrective}]
response = get_completion(client, messages)

code = extract_code(response)

print("Generated Code:")
print(code)

# Execute and test the initial code
results = execute_code(code, test_cases)
feedback = format_feedback(results)

print()
print("**********************************")
print("\nTest Results:")
print(feedback)

Generated Code:
def process_data(input_list, mode="average"):
    total = 0
    count = 0
    for item in input_list:
        if isinstance(item, (int, float)):
            total += item
            count += 1
        elif isinstance(item, str):
            try:
                num = float(item)
                total += num
                count += 1
            except ValueError:
                continue
    if count == 0:
        return 0  # No numeric data found
    if mode == "sum":
        return total
    elif mode == "average":
        return total / count
    else:
        raise ValueError("Invalid mode. Choose 'sum' or 'average'.")

**********************************

Test Results:
Test Results: 7 passed, 0 failed


## 3. Expand the Test Cases

Now, pretend that you've used this code in a production setting and have received feedback. The first version of your generated code worked marvelously, and now you are seeking to expand the capabilities of your function.

Unfortunately, your product manager is on vacation, but you have know your function needs to:
* 1 support a new mode, "median"
* 2 ignore non-numeric values
* 3 handle empty lists, returning None

So, following test-driven development practices, you update your tests:

In [10]:
# These are the new test cases. No updates needed.
test_cases = [
    {"inputs": ([1, 2, 3, 4, 5], "sum"), "expected": 15},
    {"inputs": ([1, 2, 3, 4, 5], "average"), "expected": 3.0},
    {"inputs": ([11, 12, 13, 14, 15], "sum"), "expected": 65},
    {"inputs": ([11, 12, 13, 14, 15], "average"), "expected": 13.0},
    {"inputs": ([], "sum"), "expected": None},
    {"inputs": ([1, 3, 4], "median"), "expected": 3},
    {"inputs": ([1, 2, 3, 5], "median"), "expected": 2.5},
    {"inputs": ([1, 2, "a", 3], "sum"), "expected": 6},
    {"inputs": ([1, 2, None, 3, "b", 4], "average"), "expected": 2.5},
    {"inputs": ([10], "median"), "expected": 10},
    {"inputs": ([], "median"), "expected": None},
    {"inputs": ([1, 2, 3, 4, 5], "invalid_mode"), "expected": ValueError},
]

In [11]:
# Re-test the code
# No updates are needed in this cell
print("Initial Generated Code:")
print(code)

# Execute and test the initial code
results = execute_code(code, test_cases)
feedback = format_feedback(results)

print("\nTest Results:")
print(feedback)

Initial Generated Code:
def process_data(input_list, mode="average"):
    total = 0
    count = 0
    for item in input_list:
        if isinstance(item, (int, float)):
            total += item
            count += 1
        elif isinstance(item, str):
            try:
                num = float(item)
                total += num
                count += 1
            except ValueError:
                continue
    if count == 0:
        return 0  # No numeric data found
    if mode == "sum":
        return total
    elif mode == "average":
        return total / count
    else:
        raise ValueError("Invalid mode. Choose 'sum' or 'average'.")

Test Results:
Test Results: 7 passed, 5 failed

Failed Test Cases:

Test #5:
  Inputs: ([], 'sum')
  Expected: None
  Actual: 0

Test #6:
  Inputs: ([1, 3, 4], 'median')
  Expected: 3
  Error: Invalid mode. Choose 'sum' or 'average'.

Test #7:
  Inputs: ([1, 2, 3, 5], 'median')
  Expected: 2.5
  Error: Invalid mode. Choose 'sum' or 'average

## 4. First Iteration with Feedback

In [13]:
# A new prompt that asks the AI to revise its own work
prompt_corrective = f"""
You are a processfional Python developer. 

Again, your previously generated code "process_data" now failed 5 test cases.
In addition to "sum" and "average", now we need to consider mode = "median".
Also, if mode is not in above, the function should return ValueError.

Please fix the bug again, and explain what is the issue mentioned in the feedback.

Your previous code:
---
{code}
---

Feedback on your code:
---
{feedback}
---

Please provide the complete, corrected python code, still named process_data.
"""

messages = [{"role": "user", "content": prompt_corrective}]
response = get_completion(client, messages)

code = extract_code(response)

print("Generated Code:")
print(code)

# Execute and test the initial code
results = execute_code(code, test_cases)
feedback = format_feedback(results)

print()
print("**********************************")
print("\nTest Results:")
print(feedback)

Generated Code:
def process_data(input_list, mode="average"):
    # Supported modes
    supported_modes = {"sum", "average", "median"}

    # Validate mode
    if mode not in supported_modes:
        raise ValueError(f"Invalid mode. Choose one of {supported_modes}.")

    # Initialize a list to store numeric values
    numeric_values = []

    # Extract numeric data from input_list
    for item in input_list:
        if isinstance(item, (int, float)):
            numeric_values.append(item)
        elif isinstance(item, str):
            try:
                num = float(item)
                numeric_values.append(num)
            except ValueError:
                continue

    # Handle empty numeric data
    if not numeric_values:
        return None

    # Perform computation based on mode
    if mode == "sum":
        return sum(numeric_values)
    elif mode == "average":
        return sum(numeric_values) / len(numeric_values)
    elif mode == "median":
        sorted_vals = sorted

## 5. Create Feedback Loop

We may want to give the LLM more than one chance to generate the correct code. We may even want to introduce test cases gradually, so that it has the opportunity to fix errors one at a time.

Let's develop a loop that will start from scratch and run the loop a maximum number of times or until the code is correct.

In [14]:
from pprint import pprint

iterations = []

# Get initial completion and extract code
messages = [{"role": "user", "content": initial_prompt}]
initial_response = get_completion(client, messages) # <-- initial_response
initial_code = extract_code(initial_response) # <-- initial_code

# Execute and test the initial code
initial_results = execute_code(initial_code, test_cases) # <-- initial_results
initial_feedback = format_feedback(initial_results) # <-- initial_feedback

# Store the initial iteration
iterations.append(
    {
        "iteration": 0,
        "code": initial_code,
        "test_results": {
            "passed": initial_results["passed"],
            "failed": initial_results["failed"],
        },
    }
)

pprint(iterations[-1]["test_results"])

current_code = initial_code
current_feedback = initial_feedback

task_description = """
We will create a Python function called `process_data` that

1. has a list as input, and an optional input parameter 'mode' that can be 'sum' or 'average' (default should be 'average').
2. The input list has any type of data, float, int, str etc.
2. If mode is 'sum', the function returns the sum of all numbers of the input list.
3. If mode is 'average', the function returns the average (mean) of all number of the input list.
4. If mode is 'median', the function returns the median value of all numbers of the input list.
5. If mode is not above, the function returns ValueError
"""

# Loop to improve the code based on feedback
for i in range(3):
    if iterations[-1]["test_results"]["failed"] == 0:
        print("\nSuccess! All tests passed.")
        break
    feedback_prompt = f"""
    You are an expert Python developer. You wrote a function based on these requirements:

    {task_description}

    Here is your current implementation:
    ```python
    {current_code}
    ```
    I've tested your code and here are the results:
    {current_feedback}
    Please improve your code to fix any issues and make sure it passes all test cases.
    Write only the improved function without any explanation.
    """    

    # Feedback prompt with retry
    # feedback_prompt = ...
    messages = [{"role": "user", "content": feedback_prompt}]

    improved_response = get_completion(client, messages) # <-- improved_response
    improved_code = extract_code(improved_response) # <-- improved_code

    # Execute and test the improved code

    improved_results = execute_code(improved_code, test_cases) # <-- improved_results
    improved_feedback = format_feedback(improved_results) # <-- improved_feedback

    # NOTE here we append any histroical code, results and feedback in iteration
    iterations.append(
        {
            "iteration": i + 1,
            "code": improved_code,
            "test_results": {
                "passed": improved_results["passed"],
                "failed": improved_results["failed"],
            },
        }
    )
    pprint(iterations[-1]["test_results"])

    current_code = improved_code
    current_feedback = improved_feedback


{'failed': 6, 'passed': 6}
{'failed': 0, 'passed': 12}

Success! All tests passed.


### Visualize all iteration

In [15]:
# View a summary of the different iterations
from pprint import pprint
pprint(iterations, width=200)

[{'code': 'def process_data(input_list, mode="average"):\n'
          '    total = 0\n'
          '    count = 0\n'
          '    for item in input_list:\n'
          '        if isinstance(item, (int, float)):\n'
          '            total += item\n'
          '            count += 1\n'
          '    if count == 0:\n'
          '        return 0\n'
          '    if mode == "sum":\n'
          '        return total\n'
          '    elif mode == "average":\n'
          '        return total / count',
  'iteration': 0,
  'test_results': {'failed': 6, 'passed': 6}},
 {'code': 'def process_data(input_list, mode="average"):\n'
          '    import math\n'
          '\n'
          '    # Extract numeric values from the list\n'
          '    numeric_values = [item for item in input_list if isinstance(item, (int, float))]\n'
          '    \n'
          '    # Handle empty numeric list\n'
          '    if not numeric_values:\n'
          '        if mode == "median" or mode == "averag

### Latest code

In [16]:
# Print the final code
print(iterations[-1]["code"])

def process_data(input_list, mode="average"):
    import math

    # Extract numeric values from the list
    numeric_values = [item for item in input_list if isinstance(item, (int, float))]
    
    # Handle empty numeric list
    if not numeric_values:
        if mode == "median" or mode == "average" or mode == "sum":
            return None
        else:
            raise ValueError("Invalid mode")
    
    if mode == "sum":
        return sum(numeric_values)
    elif mode == "average":
        return sum(numeric_values) / len(numeric_values)
    elif mode == "median":
        sorted_vals = sorted(numeric_values)
        n = len(sorted_vals)
        mid = n // 2
        if n % 2 == 1:
            return sorted_vals[mid]
        else:
            return (sorted_vals[mid - 1] + sorted_vals[mid]) / 2
    else:
        raise ValueError("Invalid mode")
