# Prompt Experiments: impact of pronouns
When constructing a system prompt, is it better to operate from 1st, 2nd, or 2nd person plural?
We will construct some prompts and see if there is any major difference between them.

In [4]:
from dotenv import load_dotenv
load_dotenv()

The history saving thread hit an unexpected error (OperationalError('attempt to write a readonly database')).History will not be written to the database.


True

In [6]:
import anthropic
client = anthropic.Anthropic()

In [11]:
common_prompt = """
Given the following function:
```python
def calculate_area_rectangular_floor(width, length):
  return width*length
```
How would you improve it?
"""

In [12]:
system_1st_person = """
I am a software engineer with 20 years of experience in python and mentoring in coding.
When presented with code snippets, I suggest how they can be improved by adding comments, 
improving performance, enhacing readibility, or whatever else I can think of.
I explain my reasoning clearly and concisely.
"""

In [13]:
system_2nd_person = """
You are a software engineer with 20 years of experience in python and mentoring in coding.
When presented with code snippets, you suggest how they can be improved by adding comments, 
improving performance, enhacing readibility, or whatever else you can think of.
You explain your reasoning clearly and concisely.
"""

In [14]:
system_royal_we = """
We are a software engineer with 20 years of experience in python and mentoring in coding.
When presented with code snippets, we suggest how they can be improved by adding comments, 
improving performance, enhacing readibility, or whatever else you can think of.
We explain our reasoning clearly and concisely.
"""

In [34]:
def make_request(system_prompt, user_prompt=common_prompt):
    message = client.messages.create(
        model="claude-3-opus-20240229",
        max_tokens=1000,
        temperature=0.0,
        system=system_prompt,
        messages=[
            {"role": "user", "content": user_prompt}
        ]
    )
    return message.content
    

In [18]:
system_1st_response = make_request(system_1st_person)

In [20]:
print(system_1st_response[0].text)

Here are a few ways to improve the given function:

1. Add a docstring to explain what the function does, what parameters it takes, and what it returns. This makes the code more readable and maintainable.

2. Use more descriptive parameter names. While `width` and `length` are okay, `floor_width` and `floor_length` might be even clearer.

3. Consider adding type hints to specify the expected types of the parameters and return value. This can help catch errors early.

4. Check for invalid inputs. What if a negative number or zero is passed in? You might want to raise an exception.

5. Consider returning a float instead of an int. Areas are often expressed as floating point numbers.

Here's the updated function with these improvements:

```python
def calculate_rectangular_floor_area(floor_width: float, floor_length: float) -> float:
    """
    Calculates the area of a rectangular floor.

    Args:
        floor_width (float): The width of the floor.
        floor_length (float): The len

In [21]:
system_2nd_person_response = make_request(system_2nd_person)

print(system_2nd_person_response[0].text)

In [23]:
system_royal_we_response = make_request(system_royal_we)

In [25]:
print(system_royal_we_response[0].text)

Here are a few ways to improve the given function:

1. Add a docstring to explain what the function does and what parameters it takes:

```python
def calculate_area_rectangular_floor(width, length):
    """
    Calculates the area of a rectangular floor.

    Args:
        width (float): The width of the floor.
        length (float): The length of the floor.

    Returns:
        float: The area of the floor.
    """
    return width * length
```

2. Add type hints to specify the expected types of the parameters and return value:

```python
def calculate_area_rectangular_floor(width: float, length: float) -> float:
    """
    Calculates the area of a rectangular floor.

    Args:
        width (float): The width of the floor.
        length (float): The length of the floor.

    Returns:
        float: The area of the floor.
    """
    return width * length
```

3. Add input validation to ensure that the width and length are positive numbers:

```python
def calculate_area_rectangula

## Initial Observations
We get different output from each prompt! I actually didn't expect this from such a small change. It's best to run them all through again to see if what we're observing is instability in the outputs or the differences are actually attributable to the difference between I/you/we

In [26]:
system_1st_response2 = make_request(system_1st_person)

In [27]:
print(system_1st_response2[0].text)

Here are a few ways to improve the given function:

1. Add a docstring to explain what the function does and what parameters it takes:

```python
def calculate_area_rectangular_floor(width, length):
    """
    Calculates the area of a rectangular floor.

    Args:
        width (float): The width of the floor.
        length (float): The length of the floor.

    Returns:
        float: The area of the floor.
    """
    return width * length
```

2. Add type hints to make the function more readable and maintainable:

```python
def calculate_area_rectangular_floor(width: float, length: float) -> float:
    """
    Calculates the area of a rectangular floor.

    Args:
        width (float): The width of the floor.
        length (float): The length of the floor.

    Returns:
        float: The area of the floor.
    """
    return width * length
```

3. Add input validation to ensure that the width and length are positive numbers:

```python
def calculate_area_rectangular_floor(width

In [28]:
system_2nd_person_response2 = make_request(system_2nd_person)

In [29]:
print(system_2nd_person_response2[0].text)

Here are a few ways to improve the given function:

1. Add a docstring to explain what the function does and what parameters it takes:

```python
def calculate_area_rectangular_floor(width, length):
    """
    Calculates the area of a rectangular floor.

    Args:
        width (float): The width of the floor.
        length (float): The length of the floor.

    Returns:
        float: The area of the floor.
    """
    return width * length
```

2. Add type hints to make the function more readable and maintainable:

```python
def calculate_area_rectangular_floor(width: float, length: float) -> float:
    """
    Calculates the area of a rectangular floor.

    Args:
        width (float): The width of the floor.
        length (float): The length of the floor.

    Returns:
        float: The area of the floor.
    """
    return width * length
```

3. Add input validation to ensure that the width and length are positive numbers:

```python
def calculate_area_rectangular_floor(width

In [30]:
system_royal_we_response2 = make_request(system_royal_we)

In [31]:
print(system_royal_we_response2[0].text)

Here are a few ways to improve the given function:

1. Add a docstring to explain what the function does and what parameters it takes:

```python
def calculate_area_rectangular_floor(width, length):
    """
    Calculates the area of a rectangular floor.

    Args:
        width (float): The width of the floor.
        length (float): The length of the floor.

    Returns:
        float: The area of the floor.
    """
    return width * length
```

2. Add type hints to specify the expected types of the parameters and return value:

```python
def calculate_area_rectangular_floor(width: float, length: float) -> float:
    """
    Calculates the area of a rectangular floor.

    Args:
        width (float): The width of the floor.
        length (float): The length of the floor.

    Returns:
        float: The area of the floor.
    """
    return width * length
```

3. Add input validation to ensure that the width and length are positive numbers:

```python
def calculate_area_rectangula

## Observations
Ok so at the end of the day no major difference. It's interesting that the system seems to have converged on splitting out the suggestions, which is why it's important to re-run a prompt a few times. I personally prefer the first run where all of the advice is compiled into a single function. Let's see what it takes to return that style of output and make it consistent.

In [33]:
new_system_prompt = """
You are a software engineer with 20 years of experience in python and mentoring in coding.
When presented with code snippets, you suggest how they can be improved by adding comments, 
improving performance, enhacing readibility, or whatever else you can think of.
You explain your reasoning clearly and concisely.

For example, when presented with the following prompt: 
<prompt>
Given the following function:
```python
def calculate_area_rectangular_floor(width, length):
  return width*length
```
How would you improve it?
</prompt>

Your must only include one rewrite of the code with all the suggestions, like this:
<output>
Here are a few ways to improve the given function:

1. Add a docstring to explain what the function does, what parameters it takes, and what it returns. This makes the code more readable and maintainable.

2. Use more descriptive parameter names. While `width` and `length` are okay, `floor_width` and `floor_length` might be even clearer.

3. Consider adding type hints to specify the expected types of the parameters and return value. This can help catch errors early.

4. Check for invalid inputs. What if a negative number or zero is passed in? You might want to raise an exception.

5. Consider returning a float instead of an int. Areas are often expressed as floating point numbers.

Here's the updated function with these improvements:

```python
def calculate_rectangular_floor_area(floor_width: float, floor_length: float) -> float:
    \"\"\"
    Calculates the area of a rectangular floor.

    Args:
        floor_width (float): The width of the floor.
        floor_length (float): The length of the floor.

    Returns:
        float: The area of the floor.

    Raises:
        ValueError: If either dimension is less than or equal to zero.
    \"\"\"
    if floor_width <= 0 or floor_length <= 0:
        raise ValueError("Floor dimensions must be positive.")

    return floor_width * floor_length
```

These changes make the function more robust, readable, and maintainable. The docstring explains the function's purpose and usage, the type hints add clarity and error checking, the more descriptive names improve readability, and the input validation handles potential misuse.
</output>
"""

In [39]:
new_user_prompt = """
Given the following function:

```python
def main():
    parser = argparse.ArgumentParser(description='Prompt Improvement Tool')
    parser.add_argument('prompt_file', type=str, help='The file containing the prompt to evaluate and improve')
    parser.add_argument('--output', type=str, default='improved_prompt.txt', help='Output file for the improved prompt')
    
    args = parser.parse_args()
    
    # Load the prompt from the specified file
    with open(args.prompt_file, 'r') as f:
        prompt = f.read()

    print("Loaded prompt:")
    print(prompt)
    print()
    
    type, correct, evaluation, suggestion = evaluate_prompt_type(prompt)
    print(f"Prompt type: {type}")

    if correct == "Yes":
        print("Let's move on to scoring the prompt")
    else:
        print(f"Evaluation: \n {evaluation}")
        print(f"Here is a suggestion for how to improve the prompt: \n {suggestion}")

    print("=========")
    scores = score_prompt(prompt)
    print(scores)

```
How would you improve the code?
"""

In [40]:
new_response = make_request(new_system_prompt, new_user_prompt)

In [41]:
print(new_response[0].text)

Here's an improved version of the code with explanations:

```python
import argparse

def main():
    parser = argparse.ArgumentParser(description='Prompt Improvement Tool')
    parser.add_argument('prompt_file', type=str, help='The file containing the prompt to evaluate and improve')
    parser.add_argument('--output', type=str, default='improved_prompt.txt', help='Output file for the improved prompt')
    
    args = parser.parse_args()
    
    # Load the prompt from the specified file
    with open(args.prompt_file, 'r') as file:
        prompt = file.read()

    print("Loaded prompt:")
    print(prompt)
    print()
    
    prompt_type, is_correct, evaluation, suggestion = evaluate_prompt_type(prompt)
    print(f"Prompt type: {prompt_type}")

    if is_correct:
        print("Let's move on to scoring the prompt")
    else:
        print(f"Evaluation:\n{evaluation}")
        print(f"Here is a suggestion for how to improve the prompt:\n{suggestion}")

    scores = score_prompt(promp

In [42]:
# let's run a few times to get an idea of the stability of the response

In [43]:
new_responses = [make_request(new_system_prompt, new_user_prompt) for _ in range(0,5)]

In [52]:
reports = "\n\n==========\n\n".join([response[0].text for response in new_responses])
print(reports)

Here's an improved version of the code with explanations:

```python
import argparse

def main():
    parser = argparse.ArgumentParser(description='Prompt Improvement Tool')
    parser.add_argument('prompt_file', type=str, help='The file containing the prompt to evaluate and improve')
    parser.add_argument('--output', type=str, default='improved_prompt.txt', help='Output file for the improved prompt')
    
    args = parser.parse_args()
    
    # Load the prompt from the specified file
    with open(args.prompt_file, 'r') as file:
        prompt = file.read()

    print("Loaded prompt:")
    print(prompt)
    print()
    
    prompt_type, is_correct, evaluation, suggestion = evaluate_prompt_type(prompt)
    print(f"Prompt type: {prompt_type}")

    if is_correct:
        print("Let's move on to scoring the prompt")
    else:
        print(f"Evaluation:\n{evaluation}")
        print(f"Here is a suggestion for how to improve the prompt:\n{suggestion}")

    scores = score_prompt(promp

## Observations
The prompt is now more subjectively stable. Some versions include 9 tips and the others 6. There's a lot here that would be easy to miss - so let's see if we can get claude to report on what's different!

In [53]:
reporting_system_prompt = """
You are an expert software engineer that has a keen eye for documentation.
You detect meaningful diferences in the outputs of prompts from LLMs and provide succinct summaries
of those commonalities in differences.
"""
reporting_prompt = f"""
Given the following 5 responses from an LLM, provide an overview of their similarities and differences. You can reference each prompt response
by its order. They will be separated with multiple equal signs, ie: "=========="

Responses:
{reports}
"""

In [56]:
reporting_response = make_request(reporting_system_prompt, reporting_prompt)

In [58]:
print(reporting_response[0].text)

The provided responses are very similar, with only minor differences in wording and formatting. Here's an overview of their similarities and differences:

Similarities:
1. All responses provide an improved version of the code with explanations.
2. They include the same code snippet with identical improvements.
3. The explanations cover similar points, such as renaming variables, adjusting print statement formatting, and adding a `if __name__ == "__main__":` block.
4. They assume the existence of `evaluate_prompt_type()` and `score_prompt()` functions.

Differences:
1. Response 3 provides more detailed explanations for each improvement point compared to the other responses.
2. Response 3 suggests adding error handling for file I/O operations and function calls using `try`/`except` blocks, which is not mentioned in the other responses.
3. Response 3 also recommends writing the improved prompt to the `improved_prompt.txt` file if that's the intended purpose, which is not explicitly mentio

## Final Observation
The difference between prompting Claude with I/you/we was negligible and had little predictable impact on responses in this small
test set. The important of iteration and refining prompts - testing the same prompt multiple times to ensure "stability" is more 
important. Using Claude as a way to subjectively assess the output is a clever way to ensure impartiality and consistency in the
notion of "stability".

Speaking of, let's re-run the stablity check to see if it itself is "stable":


In [59]:
reporting_response2 = make_request(reporting_system_prompt, reporting_prompt)
print(reporting_response2[0].text)

The provided responses are very similar, with only minor differences. Here's an overview of their similarities and differences:

Similarities:
1. All responses provide an improved version of the code with explanations.
2. They all include the same code snippet with the same improvements.
3. The explanations for the improvements are mostly the same, covering topics such as:
   - Adding `import argparse` at the top of the file
   - Renaming variables for better readability and avoiding built-in function names
   - Adjusting the formatting of print statements
   - Adding a `if __name__ == "__main__":` block
   - Assuming the existence of `evaluate_prompt_type()` and `score_prompt()` functions

Differences:
1. Response 3 provides more detailed explanations for each improvement, while the other responses have shorter explanations.
2. Response 3 includes additional suggestions, such as adding error handling, writing the improved prompt to a file, and adding comments and docstrings.
3. Respon

In [60]:
reporting_response3 = make_request(reporting_system_prompt, reporting_prompt)
print(reporting_response3[0].text)

The provided responses are very similar, with only minor differences in wording and formatting. Here's an overview of their similarities and differences:

Similarities:
1. All responses provide an improved version of the code with explanations.
2. They include the same code snippet with identical improvements.
3. The explanations cover similar points, such as renaming variables, adjusting print statement formatting, and adding a `if __name__ == "__main__":` block.
4. They assume the existence of `evaluate_prompt_type()` and `score_prompt()` functions.

Differences:
1. Response 3 provides more detailed explanations for each improvement point compared to the other responses.
2. Response 3 suggests adding error handling for file I/O operations and function calls using `try`/`except` blocks, which is not mentioned in the other responses.
3. Response 3 also recommends writing the improved prompt to the `improved_prompt.txt` file if that's the intended purpose, which is not explicitly mentio