In [1]:
import os
from openai import OpenAI
from dotenv import load_dotenv
from IPython.display import display_markdown


load_dotenv()

openai_api_key = os.getenv("OPENAI_API_KEY")
anthropic_api_key = os.getenv("ANTHROPIC_API_KEY")
groq_api_key = os.getenv("GROQ_API_KEY")


if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")

OpenAI API Key exists and begins sk-proj-
Anthropic API Key exists and begins sk-ant-
groq API Key exists and begins gsk_FZH


In [2]:
LLM = "OPENAI"
MODEL = "gpt-4o-mini"
client = OpenAI()

We will start the **"generation"** chat history with the system prompt, as we said before. In this case, let the LLM act like a Python
programmer eager to receive feedback / critique by the user.


In [4]:
generation_chat_history = [
    {
        "role": "system",
        "content": "You are a Python programmer tasked with generating high quality Python code."
        "Your task is to Generate the best content possible for the user's request. If the user provides critique,"
        "respond with a revised version of your previous attempt.",
    }
]

Now, as the user, we are going to ask the LLM to generate an implementation of the **Merge Sort** algorithm. Just add a new message with the **user** role to the chat history.


In [5]:
generation_chat_history.append(
    {
        "role": "user",
        "content": "Extract content between two XML tags <tool>Tool description</tool>",
    }
)

Let's generate the first version of the essay.


In [6]:
mergesort_code = (
    client.chat.completions.create(messages=generation_chat_history, model=MODEL)
    .choices[0]
    .message.content
)


generation_chat_history.append({"role": "assistant", "content": mergesort_code})

In [7]:
display_markdown(mergesort_code, raw=True)

To extract content between two XML tags using Python, you can utilize the `xml.etree.ElementTree` module, which provides efficient methods for parsing and navigating XML data. Below is a simple example that demonstrates how to extract content between `<tool>` tags.

```python
import xml.etree.ElementTree as ET

def extract_tool_description(xml_string):
    # Parse the XML string
    root = ET.fromstring(xml_string)
    
    # Find all 'tool' elements and extract their text
    tool_descriptions = [tool.text for tool in root.findall('tool')]
    
    return tool_descriptions

# Example XML input
xml_data = '''
<tools>
    <tool>Tool description 1</tool>
    <tool>Tool description 2</tool>
    <tool>Tool description 3</tool>
</tools>
'''

# Extract tool descriptions
descriptions = extract_tool_description(xml_data)
print(descriptions)
```

### Explanation:
- The code defines a function `extract_tool_description` that takes an XML string as input.
- It uses `ElementTree` to parse the XML and find all occurrences of the `<tool>` tag.
- The text content of each `<tool>` tag is collected in a list and returned.
- The example XML demonstrates how to use the function, and the results are printed out.

### Output:
The output of the above code will be:
```
['Tool description 1', 'Tool description 2', 'Tool description 3']
```

You can modify the `xml_data` string to include different tool descriptions, and the function will extract them accordingly.

In [8]:
reflection_chat_history = [
    {
        "role": "system",
        "content": "You are Andrej Karpathy, an experienced computer scientist. You are tasked with generating critique and recommendations for the user's code",
    }
]

The user message, in this case, is the essay generated in the previous step. We simply add the `mergesort_code` to the `reflection_chat_history`.


In [9]:
reflection_chat_history.append({"role": "user", "content": mergesort_code})

Now, let's generate a critique to the Python code.


In [10]:
critique = (
    client.chat.completions.create(messages=reflection_chat_history, model=MODEL)
    .choices[0]
    .message.content
)

In [11]:
display_markdown(critique, raw=True)

Your code for extracting content between XML tags using Python's `xml.etree.ElementTree` module is quite well-structured and serves its purpose effectively. Here are some critiques and recommendations to enhance the code quality, usability, and robustness:

### Critique:

1. **XML Parsing Assumptions**:
   - The code assumes that the input XML string is well-formed and contains a single root element. If the input is not well-formed, `ET.fromstring()` will raise a `ParseError`.

2. **Handling Namespaces**:
   - If your XML includes namespaces, the `findall` method will not work as expected. The code does not currently handle XML namespaces.

3. **Text Content Only**:
   - The code currently only extracts the text if the `<tool>` tag has direct text content. If the `<tool>` contains sub-elements or is empty, you might want to consider how to handle that.

4. **Return Type**:
   - The function returns a list of strings, which is great for simplicity. However, if no tools are found, it may be useful to return an empty list explicitly, indicating that the input was processed correctly.

### Recommendations:

1. **Error Handling**:
   - Include error handling to catch parsing errors or malformed XML. For example:
     ```python
     try:
         root = ET.fromstring(xml_string)
     except ET.ParseError as e:
         print(f"Error parsing XML: {e}")
         return []
     ```

2. **Namespace Support**:
   - If you expect to handle XML with namespaces, consider modifying the `findall` call to incorporate namespace handling:
     ```python
     tool_descriptions = [tool.text for tool in root.findall('{namespace}tool')]
     ```

3. **Robust Text Retrieval**:
   - If there’s a possibility of encountering empty `<tool>` tags or sub-elements, modify the extraction logic:
     ```python
     tool_descriptions = [ET.tostring(tool, encoding='utf-8').decode('utf-8') for tool in root.findall('tool')]
     ```

4. **Unit Testing**:
   - Consider writing unit tests for various input cases to ensure the function behaves as expected. This can include tests for well-formed XML, malformed XML, XML with namespaces, and instructions for missing tags.

5. **Function Documentation**:
   - Add a docstring to the function to improve code readability and provide clarity on its purpose and input/output. Here’s an example:
     ```python
     def extract_tool_description(xml_string):
         """
         Extracts descriptions from <tool> elements in an XML string.
         
         Parameters:
         xml_string (str): A string representation of XML containing <tool> elements.

         Returns:
         List[str]: A list of tool descriptions extracted from the XML.
         """
     ```

### Revised Code Example:

Here is an improved version of your code incorporating some of the recommendations:

```python
import xml.etree.ElementTree as ET

def extract_tool_description(xml_string):
    """
    Extracts descriptions from <tool> elements in an XML string.
    
    Parameters:
    xml_string (str): A string representation of XML containing <tool> elements.

    Returns:
    List[str]: A list of tool descriptions extracted from the XML, or an empty list in case of errors.
    """
    try:
        root = ET.fromstring(xml_string)
    except ET.ParseError as e:
        print(f"Error parsing XML: {e}")
        return []

    # Find all 'tool' elements and extract their text
    tool_descriptions = [tool.text.strip() if tool.text is not None else '' for tool in root.findall('tool')]
    return tool_descriptions

# Example XML input
xml_data = '''
<tools>
    <tool>Tool description 1</tool>
    <tool>Tool description 2</tool>
    <tool>Tool description 3</tool>
</tools>
'''

# Extract tool descriptions
descriptions = extract_tool_description(xml_data)
print(descriptions)
```

These suggestions should help improve the overall quality and flexibility of your XML extraction function. Good luck with your coding!

Finally, we just need to add this _critique_ to the `generation_chat_history`, in this case, as the `user` role.


In [12]:
generation_chat_history.append({"role": "user", "content": critique})

## Generation Step (II)


In [13]:
essay = (
    client.chat.completions.create(messages=generation_chat_history, model=MODEL)
    .choices[0]
    .message.content
)

In [14]:
display_markdown(essay, raw=True)

Thank you for your thoughtful critique and recommendations! Here's a revised version of the code that integrates your feedback, focusing on error handling, namespace support, improved text retrieval, and documentation. 

```python
import xml.etree.ElementTree as ET
from typing import List

def extract_tool_description(xml_string: str) -> List[str]:
    """
    Extracts descriptions from <tool> elements in an XML string.
    
    Parameters:
    xml_string (str): A string representation of XML containing <tool> elements.

    Returns:
    List[str]: A list of tool descriptions extracted from the XML, or an empty list in the case of errors.
    """
    try:
        root = ET.fromstring(xml_string)
    except ET.ParseError as e:
        print(f"Error parsing XML: {e}")
        return []

    # Find all 'tool' elements and extract their text, accounting for possible None values
    tool_descriptions = [tool.text.strip() if tool.text is not None else '' for tool in root.findall('tool')]
    
    return tool_descriptions

# Example XML input
xml_data = '''
<tools>
    <tool>Tool description 1</tool>
    <tool>Tool description 2</tool>
    <tool>Tool description 3</tool>
    <tool></tool> <!-- Example of an empty tool -->
</tools>
'''

# Extract tool descriptions
descriptions = extract_tool_description(xml_data)
print(descriptions)  # Output: ['Tool description 1', 'Tool description 2', 'Tool description 3', '']
```

### Improvements Made:
1. **Error Handling**: The function now handles `ParseError` and returns an empty list if the XML cannot be parsed, providing better user feedback.
  
2. **Improved Text Handling**: The extraction process now ensures that empty tool descriptions are represented as empty strings instead of `None`, thus giving a clearer output.

3. **Documentation**: A docstring has been added to explain the function, its parameters, and return values clearly.

### Further Considerations:
4. **Namespace Handling**: If we need to deal with namespaces in future inputs, we can modify the `findall` call to handle those cases by passing the full qualified name along with the namespace.

5. **Unit Tests**: Implementing unit tests for different scenarios will further enhance the reliability of this function and help verify its functionality.

Feel free to provide any additional feedback or specific requirements, and I'll be happy to revise again or make further enhancements!