# Simple Python agent workflow

![image info](../../images/agent_demo1.png)

Here is an example of a simple Python agent workflow using a Python agent with the following steps:

1. User Query:

The process starts when the user submits a query or request to the Python agent.

2. Code and Test Case Generation:

The agent interprets the user's query and generates the corresponding Python code. Alongside the code, the agent creates a test case to verify the functionality of the generated code.

3. Execution and Validation:

The agent attempts to run the generated code to ensure it executes without errors.
The agent then runs the test case to confirm that the code produces the correct output.

4. Retry Mechanism:

If the code fails to run or the test case does not pass, the agent initiates a retry.
It regenerates the code and test case, addressing any issues identified during the previous attempt.

5. Result Output:

Once the code runs successfully and passes the test case, the agent delivers the result to the user.


## In summary:
user query

-> agent generate code and test case

-> check if the code is runnable and the test is passed

  -> if no, retry and regenerate

  -> if yes output result



## Install dependencies & setup keys

First we will install the python SDK and set our API key!

In [1]:
!pip install mistralai==1.0.0

In [2]:
import os
from mistralai import Mistral
import re

api_key = os.environ["MISTRAL_API_KEY"]

client = Mistral(api_key=api_key)

## Agent

You can create an Agent in https://console.mistral.ai/build/agents/new, for this notebook we will use mistral-large-2407 as the model powering our agents!

Here is the instruction provided to the Python agent we create:

```
You are a Python coding assistant that only outputs Python code without any explanations or comments.

Return one Python function for the given query and one test case using assertion.

Return Python code with two sections:

## Python function

## Test case

  """
```

After we create the agent, we will retrieve the Agents ID from the UI where we created the agent:



In [3]:
agent_id= "ag:8e2706f0:20240807:python-agent:ffdc79f8"

## Simple Python agent workflow

Now we can write several functions `run_python_agent`, `extract_code`, `check_code`, and combine them in a `run_workflow` function.

In [18]:
def run_python_agent(query):
    """
    Sends a user query to a Python agent and returns the response.

    Args:
        query (str): The user query to be sent to the Python agent.

    Returns:
        str: The response content from the Python agent.
    """
    print("### Run Python agent")
    print(f"User query: {query}")
    try:
        response = client.agents.complete(
            agent_id= agent_id,
            messages = [
                {
                    "role": "user",
                    "content":  query
                },
            ]
        )
        result = response.choices[0].message.content
        return result
    except Exception as e:
        print(f"Request failed: {e}. Please check your request.")
        return None

def extract_pattern(text, pattern):
    """
    Extracts a pattern from the given text.

    Args:
        text (str): The text to search within.
        pattern (str): The regex pattern to search for.

    Returns:
        str: The extracted pattern or None if not found.
    """
    match = re.search(pattern, text, flags=re.DOTALL)
    if match:
        return match.group(1).strip()
    return None

def extract_code(result):
    """
    Extracts Python function and test case from the response content.

    Args:
        result (str): The response content from the Python agent.

    Returns:
        tuple: A tuple containing the extracted Python function, test function, and a retry flag.
    """
    retry = False
    print("### Extracting Python code")
    python_function = extract_pattern(result, r'## Python function(.*?)## Test case')
    if not python_function:
        retry = True
        print("Python function failed to generate or wrong output format. Setting retry to True.")

    print("### Extracting test case")
    test_function = extract_pattern(result, r'## Test case(.*?)```')
    if not test_function:
        retry = True
        print("Test function failed to generate or wrong output format. Setting retry to True.")

    return python_function, test_function, retry

def check_code(python_function, test_function):
    """
    Executes the Python function and its test case, and checks for any errors.

    Args:
        python_function (str): The Python function to be executed.
        test_function (str): The test case to be executed.

    Returns:
        bool: A flag indicating whether the code execution needs to be retried.

    Warning:
        This code is designed to run code that’s been generated by a model, which may not be entirely reliable.
        It's strongly recommended to run this in a sandbox environment.
    """
    retry = False
    try:
        exec(python_function)
        try:
            exec(test_function)
            print("Code passed test case.")
        except Exception:
            print(f"Test failed")
            retry = True
            print("Setting retry to True")
    except Exception:
        print(f"Code failed.")
        retry = True
        print("Setting retry to True")
    return retry

def run_workflow(query):
    """
    Runs the complete workflow to generate, extract, and validate Python code.

    Args:
        query (str): The user query to be processed.
    """
    print("### ENTER WORKFLOW")
    i = 0
    max_retries = 3
    retry = True # just to get it started
    while i < max_retries and retry:
        print(f"TRY # {i}")
        i += 1
        result = run_python_agent(query)
        python_function, test_function, retry = extract_code(result)
        retry = check_code(python_function, test_function)

    if not retry:
        print(f"Validated Python function: ```{python_function}```")
    print("### EXIT WORKFLOW")


In [93]:
run_workflow("How can I remove duplicates from a list")

### ENTER WORKFLOW
TRY # 0
### Run Python agent
User query: How can I remove duplicates from a list
### Extracting Python code
### Extracting test case
Test function failed to generate or wrong output format. Setting retry to True.
Test failed
Setting retry to True
TRY # 1
### Run Python agent
User query: How can I remove duplicates from a list
### Extracting Python code
### Extracting test case
Code passed test case.
Validated Python function: ```def remove_duplicates(input_list):
    return list(set(input_list))```
### EXIT WORKFLOW


In [6]:
run_workflow("How can I sort a list of words and add the word love to each of word ")

### ENTER WORKFLOW
TRY # 0
### Run Python agent
User query: How can I sort a list of words and add the word love to each of word 
### Extracting Python code
### Extracting test case
Code passed test case.
Validated Python function: ```def sort_and_add_love(words):
    words.sort()
    return [word + 'love' for word in words]```
### EXIT WORKFLOW


In [7]:
run_workflow("How can calculate fibinacci ")

### ENTER WORKFLOW
TRY # 0
### Run Python agent
User query: How can calculate fibinacci 
### Extracting Python code
### Extracting test case
Code passed test case.
Validated Python function: ```def fibonacci(n):
    if n <= 0:
        return []
    elif n == 1:
        return [0]
    elif n == 2:
        return [0, 1]
    else:
        fib_sequence = [0, 1]
        while len(fib_sequence) < n:
            fib_sequence.append(fib_sequence[-1] + fib_sequence[-2])
        return fib_sequence```
### EXIT WORKFLOW


# (Optional) Trace and Evaluate your Agent

Now that your agent is running, you can optionally trace and evaluate it with Arize Phoenix. Phoenix is an open-source framework for tracing and evaluating LLM applications, including agents and RAG pipelines.

Tracing refers to the process of recording the calls made between your application and the LLM. Evaluation can be thought of as the performance testing of your agent. Phoenix provides a UI for you to view traces and evaluations, as well as a suite of evaluation templates.

To start off, create a Phoenix account and get your API key [here](https://phoenix.arize.com).

### Set up Phoenix

In [8]:
!pip install -q openinference-instrumentation-mistralai arize-phoenix

In [None]:
from getpass import getpass
if not (api_key := os.getenv("PHOENIX_API_KEY")):
    api_key = getpass("🔑 Enter your Phoenix API key: ")
os.environ["PHOENIX_API_KEY"] = api_key

In [14]:
from openinference.instrumentation.mistralai import MistralAIInstrumentor
from phoenix.otel import register
import os

# Add Phoenix API Key for tracing
os.environ["PHOENIX_CLIENT_HEADERS"] = f"api_key={os.getenv('PHOENIX_API_KEY')}"
os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "https://app.phoenix.arize.com"

# configure the Phoenix tracer
tracer_provider = register() 

# Phoenix provides an openinference package that automatically traces all requests to Mistral
MistralAIInstrumentor().instrument(tracer_provider=tracer_provider, skip_dep_check=True)

Overriding of current TracerProvider is not allowed
Attempting to instrument while already instrumented


🔭 OpenTelemetry Tracing Details 🔭
|  Phoenix Project: mistral_agent_tracing
|  Span Processor: SimpleSpanProcessor
|  Collector Endpoint: https://app.phoenix.arize.com/v1/traces
|  Transport: HTTP
|  Transport Headers: {'api_key': '****'}
|  
|  Using a default SpanProcessor. `add_span_processor` will overwrite this default.
|  
|  `register` has set this TracerProvider as the global OpenTelemetry default.
|  To disable this behavior, call `register` with `set_global_tracer_provider=False`.



### Run your agent again

In [None]:
run_workflow("How can I remove duplicates from a list")

In [None]:
run_workflow("How can I sort a list of words and add the word love to each of word ")

In [None]:
run_workflow("How can calculate fibinacci ")

### View your traces
You should now be able to view traces in [Phoenix](https://app.phoenix.arize.com).

### Evaluate your agent

Now let's evaluate your agent. The flow for batch evaluation is as follows:

1. Export traces from Phoenix
2. Attach labels to the traces. These can be created using an LLM as a judge, using code-based evaluation, or using a combination of both.
3. Import the labeled traces into Phoenix.

#### Export traces from Phoenix

In [None]:
import phoenix as px

spans = px.Client().get_spans_dataframe()

spans.head()

#### Evaluate traces
Phoenix has a [built-in LLM Judge template that can be used to evaluate Code Generation Agents](https://docs.arize.com/phoenix/evaluation/how-to-evals/running-pre-tested-evals/code-generation-eval). We'll use that template here.

The template requires two columns to be added to the dataframe:
- output: The code generated by the agent
- input: The original user query

In [37]:
spans['output'] = spans['attributes.llm.output_messages'].apply(lambda x: extract_code(x[0]['message.content']))
spans['input'] = spans['attributes.input.value']
spans.head()

### Extracting Python code
### Extracting test case
Test function failed to generate or wrong output format. Setting retry to True.
### Extracting Python code
### Extracting test case
Test function failed to generate or wrong output format. Setting retry to True.
### Extracting Python code
### Extracting test case
### Extracting Python code
### Extracting test case
### Extracting Python code
### Extracting test case


Unnamed: 0_level_0,name,span_kind,parent_id,start_time,end_time,status_code,status_message,events,context.span_id,context.trace_id,...,attributes.llm.token_count.completion,attributes.llm.token_count.total,attributes.output.mime_type,attributes.output.value,attributes.llm.invocation_parameters,attributes.llm.input_messages,code,query,output,input
context.span_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
9dcc050d2cbe1fb3,MistralClient.agents,LLM,,2024-10-16 16:49:07.371632+00:00,2024-10-16 16:49:10.186793+00:00,OK,,[],9dcc050d2cbe1fb3,441ee3cd7233ddeb502369742fa743aa,...,85,155,application/json,"{""id"":""036330244c4d490a94921cb76403738a"",""obje...","{""agent_id"": ""ag:ad73bfd7:20240912:python-code...",[{'message.content': 'How can I remove duplica...,(```python\ndef remove_duplicates(lst):\n r...,"{""messages"": [{""role"": ""user"", ""content"": ""How...",(```python\ndef remove_duplicates(lst):\n r...,"{""messages"": [{""role"": ""user"", ""content"": ""How..."
da0d7cde2e4cf311,MistralClient.agents,LLM,,2024-10-16 16:49:10.353094+00:00,2024-10-16 16:49:15.158835+00:00,OK,,[],da0d7cde2e4cf311,6e5ea1185f6161d5fea4558435b796be,...,85,155,application/json,"{""id"":""c6ba9bdb8c604fe1af0d05d5f9e9ad77"",""obje...","{""agent_id"": ""ag:ad73bfd7:20240912:python-code...",[{'message.content': 'How can I remove duplica...,(```python\ndef remove_duplicates(lst):\n r...,"{""messages"": [{""role"": ""user"", ""content"": ""How...",(```python\ndef remove_duplicates(lst):\n r...,"{""messages"": [{""role"": ""user"", ""content"": ""How..."
33f7a33fad606580,MistralClient.agents,LLM,,2024-10-16 16:50:29.864045+00:00,2024-10-16 16:50:33.187772+00:00,OK,,[],33f7a33fad606580,45afc6c918d091a852db77ab10e58710,...,77,147,application/json,"{""id"":""0f7b9c06648941478e0163558281d716"",""obje...","{""agent_id"": ""ag:ad73bfd7:20240912:python-code...",[{'message.content': 'How can I remove duplica...,(def remove_duplicates(lst):\n return list(...,"{""messages"": [{""role"": ""user"", ""content"": ""How...",(def remove_duplicates(lst):\n return list(...,"{""messages"": [{""role"": ""user"", ""content"": ""How..."
d88a511a82477003,MistralClient.agents,LLM,,2024-10-16 16:50:33.351857+00:00,2024-10-16 16:50:36.773357+00:00,OK,,[],d88a511a82477003,04e3dc347d55f19c59b4f3e89f90ca64,...,82,159,application/json,"{""id"":""167bc13355cd4a998c8aed587e76266b"",""obje...","{""agent_id"": ""ag:ad73bfd7:20240912:python-code...",[{'message.content': 'How can I sort a list of...,(def sort_and_add_love(words):\n words.sort...,"{""messages"": [{""role"": ""user"", ""content"": ""How...",(def sort_and_add_love(words):\n words.sort...,"{""messages"": [{""role"": ""user"", ""content"": ""How..."
671411e820abfd2d,MistralClient.agents,LLM,,2024-10-16 16:50:36.884457+00:00,2024-10-16 16:50:41.075278+00:00,OK,,[],671411e820abfd2d,4aa91861501996e84880a7eea6a7b812,...,162,228,application/json,"{""id"":""771eceabd81e44a2a277ebe30ab6f0d7"",""obje...","{""agent_id"": ""ag:ad73bfd7:20240912:python-code...",[{'message.content': 'How can calculate fibina...,(def fibonacci(n):\n if n <= 0:\n re...,"{""messages"": [{""role"": ""user"", ""content"": ""How...",(def fibonacci(n):\n if n <= 0:\n re...,"{""messages"": [{""role"": ""user"", ""content"": ""How..."


Now we can use the `CODE_READABILITY_PROMPT_TEMPLATE` to evaluate the code. The method below takes in a row of the dataframe and returns a label and explanation.

Note that we're also asking the LLM to return an explanation for the label. This will be visible in the UI, and is extremely helpful for debugging.

In [90]:
from phoenix.evals import CODE_READABILITY_PROMPT_TEMPLATE

def evaluate_code(row):
    try:
        response = client.chat.complete(
            model="mistral-large-latest",
            messages = [
                {
                    "role": "user",
                    "content":  CODE_READABILITY_PROMPT_TEMPLATE.explanation_template.format(output=row['output'], input=row['input'])
                },
            ]
        )
        result = response.choices[0].message.content
        explanation, label = result.split("LABEL: ")
        label = label.replace("\"", "")
        return label, explanation
    except Exception as e:
        print(f"Request failed: {e}. Please check your request.")
        return None

Since we're going to use Mistral to evaluate the code, we'll uninstrument Mistral to prevent our traces view from being too cluttered.

In [None]:
MistralAIInstrumentor().uninstrument()

Now we can run our `evaluate_code` method on each row of the dataframe:

In [91]:
results = spans.apply(evaluate_code, axis=1)
spans['label'] = results.apply(lambda x: x[0] if x else None)
spans['explanation'] = results.apply(lambda x: x[1] if x else None)
spans['score'] = spans['label'].apply(lambda x: 1 if x == "readable" else 0)

We should now have three more columns in our dataframe: `label`, `explanation`, and `score`.

In [92]:
spans.head()

Unnamed: 0_level_0,name,span_kind,parent_id,start_time,end_time,status_code,status_message,events,context.span_id,context.trace_id,...,attributes.output.value,attributes.llm.invocation_parameters,attributes.llm.input_messages,code,query,output,input,label,explanation,score
context.span_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
9dcc050d2cbe1fb3,MistralClient.agents,LLM,,2024-10-16 16:49:07.371632+00:00,2024-10-16 16:49:10.186793+00:00,OK,,[],9dcc050d2cbe1fb3,441ee3cd7233ddeb502369742fa743aa,...,"{""id"":""036330244c4d490a94921cb76403738a"",""obje...","{""agent_id"": ""ag:ad73bfd7:20240912:python-code...",[{'message.content': 'How can I remove duplica...,(```python\ndef remove_duplicates(lst):\n r...,"{""messages"": [{""role"": ""user"", ""content"": ""How...",(```python\ndef remove_duplicates(lst):\n r...,"{""messages"": [{""role"": ""user"", ""content"": ""How...",readable,"To evaluate the readability of the given code,...",1
da0d7cde2e4cf311,MistralClient.agents,LLM,,2024-10-16 16:49:10.353094+00:00,2024-10-16 16:49:15.158835+00:00,OK,,[],da0d7cde2e4cf311,6e5ea1185f6161d5fea4558435b796be,...,"{""id"":""c6ba9bdb8c604fe1af0d05d5f9e9ad77"",""obje...","{""agent_id"": ""ag:ad73bfd7:20240912:python-code...",[{'message.content': 'How can I remove duplica...,(```python\ndef remove_duplicates(lst):\n r...,"{""messages"": [{""role"": ""user"", ""content"": ""How...",(```python\ndef remove_duplicates(lst):\n r...,"{""messages"": [{""role"": ""user"", ""content"": ""How...",readable,"To evaluate the readability of the code, we ne...",1
33f7a33fad606580,MistralClient.agents,LLM,,2024-10-16 16:50:29.864045+00:00,2024-10-16 16:50:33.187772+00:00,OK,,[],33f7a33fad606580,45afc6c918d091a852db77ab10e58710,...,"{""id"":""0f7b9c06648941478e0163558281d716"",""obje...","{""agent_id"": ""ag:ad73bfd7:20240912:python-code...",[{'message.content': 'How can I remove duplica...,(def remove_duplicates(lst):\n return list(...,"{""messages"": [{""role"": ""user"", ""content"": ""How...",(def remove_duplicates(lst):\n return list(...,"{""messages"": [{""role"": ""user"", ""content"": ""How...",unreadable\n\n************,1. **Initial Observation**: The given code sni...,0
d88a511a82477003,MistralClient.agents,LLM,,2024-10-16 16:50:33.351857+00:00,2024-10-16 16:50:36.773357+00:00,OK,,[],d88a511a82477003,04e3dc347d55f19c59b4f3e89f90ca64,...,"{""id"":""167bc13355cd4a998c8aed587e76266b"",""obje...","{""agent_id"": ""ag:ad73bfd7:20240912:python-code...",[{'message.content': 'How can I sort a list of...,(def sort_and_add_love(words):\n words.sort...,"{""messages"": [{""role"": ""user"", ""content"": ""How...",(def sort_and_add_love(words):\n words.sort...,"{""messages"": [{""role"": ""user"", ""content"": ""How...",readable,"To evaluate the readability of the code, let's...",1
671411e820abfd2d,MistralClient.agents,LLM,,2024-10-16 16:50:36.884457+00:00,2024-10-16 16:50:41.075278+00:00,OK,,[],671411e820abfd2d,4aa91861501996e84880a7eea6a7b812,...,"{""id"":""771eceabd81e44a2a277ebe30ab6f0d7"",""obje...","{""agent_id"": ""ag:ad73bfd7:20240912:python-code...",[{'message.content': 'How can calculate fibina...,(def fibonacci(n):\n if n <= 0:\n re...,"{""messages"": [{""role"": ""user"", ""content"": ""How...",(def fibonacci(n):\n if n <= 0:\n re...,"{""messages"": [{""role"": ""user"", ""content"": ""How...",unreadable\n\n************,### EXPLANATION:\n\n1. **Structure and Formatt...,0


#### Import labeled traces into Phoenix

In [None]:
from phoenix.trace import SpanEvaluations

px.Client().log_evaluations(SpanEvaluations(eval_name="Code Quality", dataframe=spans))

You can now view your evaluations in the Phoenix UI:

![image info](../../third_party/Phoenix/images/phoenix-agent-eval.png)