# Data analytical multi-agent workflow:

![image info](../../images/agent_demo2.png)

You can also use multiple agents in a workflow. Here is an example:

1. Data Analysis Planning:

  The planning agent writes a comprehensive data analysis plan, outlining the steps required to analyze the data.

2. Code Generation and Execution:

  For each step in the analysis plan, the Python agent generates the corresponding code.
The Python agent then executes the generated code to perform the specified analysis.

3. Analysis Report Summarization:

  Based on the results of the executed code, the summarization agent writes an analysis report.
The report summarizes the findings and insights derived from the data analysis.


## Install dependencies

First we will install the python SDK and set our API key!

In [1]:
!pip install mistralai==1.0.0



In [3]:
import os
from mistralai import Mistral
import re

api_key = os.environ["MISTRAL_API_KEY"]

client = Mistral(api_key=api_key)

## Agents
You can create an Agent in https://console.mistral.ai/build/agents/new, for this notebook we will use mistral-large-2407 as the model powering our agents!

Here are the instructions provided to the agents we created:

### Planning agent:

```
You are a data analytical planning assistant. Given a dataset and its description,
your task is to provide specific and simple analysis plans, detailed instructions,
and suggested Python code that can later be given to a separate Python agent to generate
the Python code for executing the analysis plan.
Do not create figures.

Return output with the following format:

## Total number of steps:

## Step 1:
```

### Python agent:
```
You are a Python coding assistant that only outputs Python code without any explanations or comments.
Given an instruction and the suggested Python code, return the correct Python code.
```

### Summarization agent:
```
You are an analysis summarization assistant.
Given a dataset's description and the analysis results. Provide an analysis report.
```

### Agents IDs
Next, we will retrieve the Agents IDs from the UI where we created the agents.


In [4]:
planning_agent_id =  "ag:8e2706f0:20240807:planning-agent:a0b3a053"
summarization_agent_id = "ag:8e2706f0:20240807:summarization-agent:d7482b8f"
python_agent_id = "ag:8e2706f0:20240807:python-agent:482c86de"

# Analysis Planning

In [5]:
def run_analysis_planning_agent(query):
    """
    Sends a user query to a Python agent and returns the response.

    Args:
        query (str): The user query to be sent to the Python agent.

    Returns:
        str: The response content from the Python agent.
    """
    print("### Run Planning agent")
    print(f"User query: {query}")
    try:
        response = client.agents.complete(
            agent_id= planning_agent_id,
            messages = [
                {
                    "role": "user",
                    "content":  query
                },
            ]
        )
        result = response.choices[0].message.content
        return result
    except Exception as e:
        print(f"Request failed: {e}. Please check your request.")
        return None

In [6]:
query = """
Load this data: https://raw.githubusercontent.com/fivethirtyeight/data/master/bad-drivers/bad-drivers.csv

The dataset consists of 51 datapoints and has eight columns:
- State
- Number of drivers involved in fatal collisions per billion miles
- Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding
- Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired
- Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted
- Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents
- Car Insurance Premiums ($)
- Losses incurred by insurance companies for collisions per insured driver ($)
"""

In [7]:
planning_result = run_analysis_planning_agent(query)

### Run Planning agent
User query: 
Load this data: https://raw.githubusercontent.com/fivethirtyeight/data/master/bad-drivers/bad-drivers.csv

The dataset consists of 51 datapoints and has eight columns:
- State
- Number of drivers involved in fatal collisions per billion miles
- Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding
- Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired
- Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted
- Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents
- Car Insurance Premiums ($)
- Losses incurred by insurance companies for collisions per insured driver ($)



In [8]:
print(planning_result)

## Total number of steps: 3

## Step 1:

### Analysis Plan:
Load the dataset from the provided URL.

### Detailed Instructions:
1. Use the pandas library to read the CSV file.
2. Store the data in a DataFrame.

### Suggested Python Code:
```python
import pandas as pd

url = "https://raw.githubusercontent.com/fivethirtyeight/data/master/bad-drivers/bad-drivers.csv"
data = pd.read_csv(url)
```

## Step 2:

### Analysis Plan:
Inspect the dataset to understand its structure and check for any missing values.

### Detailed Instructions:
1. Display the first few rows of the dataset.
2. Check for missing values in each column.

### Suggested Python Code:
```python
print(data.head())
print(data.isnull().sum())
```

## Step 3:

### Analysis Plan:
Perform basic statistical analysis on the dataset.

### Detailed Instructions:
1. Calculate and display summary statistics for the dataset.
2. Calculate the mean, median, and standard deviation for the numerical columns.

### Suggested Python Code:
```p

# Generate and execute Python code for each planning step

In [9]:
class PythonAgentWorkflow:
    def __init__(self):
        pass

    def extract_pattern(self, text, pattern):
        """
        Extracts a pattern from the given text.

        Args:
            text (str): The text to search within.
            pattern (str): The regex pattern to search for.

        Returns:
            str: The extracted pattern or None if not found.
        """
        match = re.search(pattern, text, flags=re.DOTALL)
        if match:
            return match.group(1).strip()
        return None

    def extract_step_i(self, planning_result, i, n_step):
        """
        Extracts the content of a specific step from the planning result.

        Args:
            planning_result (str): The planning result text.
            i (int): The step number to extract.
            n_step (int): The total number of steps.

        Returns:
            str: The extracted step content or None if not found.
        """
        if i < n_step:
            pattern = rf'## Step {i}:(.*?)## Step {i+1}'
        elif i == n_step:
            pattern = rf'## Step {i}:(.*)'
        else:
            print(f"Invalid step number {i}. It should be between 1 and {n_step}.")
            return None

        step_i = self.extract_pattern(planning_result, pattern)
        if not step_i:
            print(f"Failed to extract Step {i} content.")
            return None

        return step_i

    def extract_code(self, python_agent_result):
          """
          Extracts Python function and test case from the response content.

          Args:
              result (str): The response content from the Python agent.

          Returns:
              tuple: A tuple containing the extracted Python function and a retry flag.
          """
          retry = False
          print("### Extracting Python code")
          python_code = self.extract_pattern(python_agent_result, r'```python(.*?)```')
          if not python_code:
              retry = True
              print("Python function failed to generate or wrong output format. Setting retry to True.")

          return python_code, retry

    def run_python_agent(self, query):
        """
        Sends a user query to a Python agent and returns the response.

        Args:
            query (str): The user query to be sent to the Python agent.

        Returns:
            str: The response content from the Python agent.
        """
        print("### Run Python agent")
        print(f"User query: {query}")
        try:
            response = client.agents.complete(
                agent_id= python_agent_id,
                messages = [
                    {
                        "role": "user",
                        "content":  query
                    },
                ]
            )
            result = response.choices[0].message.content
            return result

        except Exception as e:
            print(f"Request failed: {e}. Please check your request.")
            return None

    def check_code(self, python_function, state):
        """
        Executes the Python function and checks for any errors.

        Args:
            python_function (str): The Python function to be executed.

        Returns:
            bool: A flag indicating whether the code execution needs to be retried.

        Warning:
            This code is designed to run code that’s been generated by a model, which may not be entirely reliable.
            It's strongly recommended to run this in a sandbox environment.
        """
        retry = False
        try:
            print(f"### Python function to run: {python_function}")
            exec(python_function, state)
            print("Code executed successfully.")
        except Exception:
            print(f"Code failed.")
            retry = True
            print("Setting retry to True")
        return retry

    def process_step(self, planning_result, i, n_step, max_retries, state):
        """
        Processes a single step, including retries.

        Args:
            planning_result (str): The planning result text.
            i (int): The step number to process.
            n_step (int): The total number of steps.
            max_retries (int): The maximum number of retries.

        Returns:
            str: The extracted step content or None if not found.
        """

        retry = True
        j = 0
        while j < max_retries and retry:
            print(f"TRY # {j}")
            j += 1
            step_i = self.extract_step_i(planning_result, i, n_step)
            if step_i:
                print(step_i)
                python_agent_result = self.run_python_agent(step_i)
                python_code, retry = self.extract_code(python_agent_result)
                print(python_code)
                retry = self.check_code(python_code, state)
        return None

    def workflow(self, planning_result):
        """
        Executes the workflow for processing planning results.

        Args:
            planning_result (str): The planning result text.
        """
        state = {}
        print("### ENTER WORKFLOW")
        n_step = int(self.extract_pattern(planning_result, '## Total number of steps:\s*(\d+)'))
        for i in range(1, n_step + 1):
            print(f"STEP # {i}")
            self.process_step(planning_result, i, n_step, max_retries=2, state=state)


        print("### Exit WORKFLOW")
        return None

In [10]:
import sys
import io

# See the output of print statements in the console while also capturing it in a variable,
class Tee(io.StringIO):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.original_stdout = sys.stdout

    def write(self, data):
        self.original_stdout.write(data)
        super().write(data)

    def flush(self):
        self.original_stdout.flush()
        super().flush()

# Create an instance of the Tee class
tee_stream = Tee()

# Redirect stdout to the Tee instance
sys.stdout = tee_stream


Python_agent = PythonAgentWorkflow()
Python_agent.workflow(planning_result)

# Restore the original stdout
sys.stdout = tee_stream.original_stdout

# Get the captured output
captured_output = tee_stream.getvalue()

### ENTER WORKFLOW
STEP # 1
TRY # 0
### Analysis Plan:
Load the dataset from the provided URL.

### Detailed Instructions:
1. Use the pandas library to read the CSV file.
2. Store the data in a DataFrame.

### Suggested Python Code:
```python
import pandas as pd

url = "https://raw.githubusercontent.com/fivethirtyeight/data/master/bad-drivers/bad-drivers.csv"
data = pd.read_csv(url)
```
### Run Python agent
User query: ### Analysis Plan:
Load the dataset from the provided URL.

### Detailed Instructions:
1. Use the pandas library to read the CSV file.
2. Store the data in a DataFrame.

### Suggested Python Code:
```python
import pandas as pd

url = "https://raw.githubusercontent.com/fivethirtyeight/data/master/bad-drivers/bad-drivers.csv"
data = pd.read_csv(url)
```
### Extracting Python code
import pandas as pd

url = "https://raw.githubusercontent.com/fivethirtyeight/data/master/bad-drivers/bad-drivers.csv"
data = pd.read_csv(url)
### Python function to run: import pandas as pd

url 

# Summarization

In [11]:
response = client.agents.complete(
    agent_id= summarization_agent_id,
    messages = [
        {
            "role": "user",
            "content":  query + captured_output
        },
    ]
)
result = response.choices[0].message.content


In [12]:
print(result)

## Analysis Report

### Dataset Description
The dataset consists of 51 data points representing different states in the U.S. and includes the following eight columns:
1. **State**: The name of the state.
2. **Number of drivers involved in fatal collisions per billion miles**: The rate of drivers involved in fatal collisions per billion miles driven.
3. **Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding**: The percentage of drivers speeding during fatal collisions.
4. **Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired**: The percentage of drivers who were alcohol-impaired during fatal collisions.
5. **Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted**: The percentage of drivers who were not distracted during fatal collisions.
6. **Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents**: The percentage of drivers who had not been involved in previous accidents.
7. **

# (Optional) Trace and Evaluate your Agent

Now that your agent is running, you can optionally trace and evaluate it with Arize Phoenix. Phoenix is an open-source framework for tracing and evaluating LLM applications, including agents and RAG pipelines.

Tracing refers to the process of recording the calls made between your application and the LLM. Evaluation can be thought of as the performance testing of your agent. Phoenix provides a UI for you to view traces and evaluations, as well as a suite of evaluation templates.

To start off, create a Phoenix account and get your API key [here](https://phoenix.arize.com).

### Set up Phoenix

In [13]:
!pip install -q openinference-instrumentation-mistralai arize-phoenix

In [14]:
from openinference.instrumentation.mistralai import MistralAIInstrumentor
from phoenix.otel import register
import os

# Add Phoenix API Key for tracing
PHOENIX_API_KEY = "48624b3c339c53f04c6:a6d9fef"
os.environ["PHOENIX_CLIENT_HEADERS"] = f"api_key={PHOENIX_API_KEY}"
os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "https://app.phoenix.arize.com"

# configure the Phoenix tracer
tracer_provider = register() 

# Phoenix provides an openinference package that automatically traces all requests to Mistral
MistralAIInstrumentor().instrument(tracer_provider=tracer_provider, skip_dep_check=True)

  from .autonotebook import tqdm as notebook_tqdm


🔭 OpenTelemetry Tracing Details 🔭
|  Phoenix Project: mistral_agent_2
|  Span Processor: SimpleSpanProcessor
|  Collector Endpoint: https://app.phoenix.arize.com/v1/traces
|  Transport: HTTP
|  Transport Headers: {'api_key': '****'}
|  
|  Using a default SpanProcessor. `add_span_processor` will overwrite this default.
|  
|  `register` has set this TracerProvider as the global OpenTelemetry default.
|  To disable this behavior, call `register` with `set_global_tracer_provider=False`.



### Run your agent

In [15]:
planning_result = run_analysis_planning_agent(query)

### Run Planning agent
User query: 
Load this data: https://raw.githubusercontent.com/fivethirtyeight/data/master/bad-drivers/bad-drivers.csv

The dataset consists of 51 datapoints and has eight columns:
- State
- Number of drivers involved in fatal collisions per billion miles
- Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding
- Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired
- Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted
- Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents
- Car Insurance Premiums ($)
- Losses incurred by insurance companies for collisions per insured driver ($)



In [16]:
# Create an instance of the Tee class
tee_stream = Tee()

# Redirect stdout to the Tee instance
sys.stdout = tee_stream


Python_agent = PythonAgentWorkflow()
Python_agent.workflow(planning_result)

# Restore the original stdout
sys.stdout = tee_stream.original_stdout

# Get the captured output
captured_output = tee_stream.getvalue()

### ENTER WORKFLOW
STEP # 1
TRY # 0
### Description:
Load the dataset from the provided URL into a Pandas DataFrame.

### Instructions:
1. Use the Pandas library to read the CSV file from the URL.
2. Store the data in a DataFrame named `df`.

### Suggested Python Code:
```python
import pandas as pd

url = 'https://raw.githubusercontent.com/fivethirtyeight/data/master/bad-drivers/bad-drivers.csv'
df = pd.read_csv(url)
```
### Run Python agent
User query: ### Description:
Load the dataset from the provided URL into a Pandas DataFrame.

### Instructions:
1. Use the Pandas library to read the CSV file from the URL.
2. Store the data in a DataFrame named `df`.

### Suggested Python Code:
```python
import pandas as pd

url = 'https://raw.githubusercontent.com/fivethirtyeight/data/master/bad-drivers/bad-drivers.csv'
df = pd.read_csv(url)
```
### Extracting Python code
import pandas as pd

url = 'https://raw.githubusercontent.com/fivethirtyeight/data/master/bad-drivers/bad-drivers.csv'
df = pd

In [17]:
response = client.agents.complete(
    agent_id= summarization_agent_id,
    messages = [
        {
            "role": "user",
            "content":  query + captured_output
        },
    ]
)
result = response.choices[0].message.content

In [18]:
print(result)

### Analysis Report

#### Dataset Description
The dataset contains information about driving behaviors and insurance metrics across 51 U.S. states. The dataset includes the following columns:

- **State**: The name of the state.
- **Number of drivers involved in fatal collisions per billion miles**: The rate of fatal collisions per billion miles driven.
- **Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding**: The percentage of fatal collisions involving speeding.
- **Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired**: The percentage of fatal collisions involving alcohol impairment.
- **Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted**: The percentage of fatal collisions not involving distractions.
- **Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents**: The percentage of fatal collisions involving drivers with no previous accidents.
- **Car Insurance Premiums (

### View your traces
You should now be able to view traces in [Phoenix](https://app.phoenix.arize.com).

### Evaluate your agent

Now let's evaluate your agent. The flow for batch evaluation is as follows:

1. Export traces from Phoenix
2. Attach labels to the traces. These can be created using an LLM as a judge, using code-based evaluation, or using a combination of both.
3. Import the labeled traces into Phoenix.

#### Export traces from Phoenix

In [21]:
import phoenix as px

spans = px.Client().get_spans_dataframe()

spans.head()



Unnamed: 0_level_0,name,span_kind,parent_id,start_time,end_time,status_code,status_message,events,context.span_id,context.trace_id,...,attributes.llm.output_messages,attributes.llm.token_count.prompt,attributes.input.value,attributes.input.mime_type,attributes.llm.token_count.completion,attributes.llm.token_count.total,attributes.output.mime_type,attributes.output.value,attributes.llm.invocation_parameters,attributes.llm.input_messages
context.span_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
e47365057e2c8bcc,MistralClient.agents,LLM,,2024-10-16 18:56:34.274511+00:00,2024-10-16 18:56:46.406556+00:00,OK,,[],e47365057e2c8bcc,22678831aa324e4b171394f156af9a03,...,[{'message.content': '## Total number of steps...,293,"{""messages"": [{""role"": ""user"", ""content"": ""\nL...",application/json,367,660,application/json,"{""id"":""86817fa2de0c495cb6120fbdc4bdfe93"",""obje...","{""agent_id"": ""ag:ad73bfd7:20241009:planning-ag...",[{'message.content': ' Load this data: https:/...
81682642a644b5e5,MistralClient.agents,LLM,,2024-10-16 18:57:56.825176+00:00,2024-10-16 18:57:58.956025+00:00,OK,,[],81682642a644b5e5,4725fed7be91e18ee8e70f64eeed5b82,...,[{'message.content': '```python import pandas ...,168,"{""messages"": [{""role"": ""user"", ""content"": ""###...",application/json,61,229,application/json,"{""id"":""d89506c5343e4f93a09f6cbe8b42facf"",""obje...","{""agent_id"": ""ag:ad73bfd7:20241009:python-code...",[{'message.content': '### Description: Load th...
d8635ee34648298b,MistralClient.agents,LLM,,2024-10-16 18:57:59.294874+00:00,2024-10-16 18:58:00.961450+00:00,OK,,[],d8635ee34648298b,7e723183239af2bb7cc300c383c124e7,...,[{'message.content': '```python # Display the ...,142,"{""messages"": [{""role"": ""user"", ""content"": ""###...",application/json,40,182,application/json,"{""id"":""93c6d4434cd946718744629d31efa346"",""obje...","{""agent_id"": ""ag:ad73bfd7:20241009:python-code...",[{'message.content': '### Description: Inspect...
a2247b0c2142629e,MistralClient.agents,LLM,,2024-10-16 18:58:01.057661+00:00,2024-10-16 18:58:03.009694+00:00,OK,,[],a2247b0c2142629e,1d823518ded053d0244ee0e20fadf3d3,...,[{'message.content': '```python descriptive_st...,142,"{""messages"": [{""role"": ""user"", ""content"": ""###...",application/json,26,168,application/json,"{""id"":""90507fa7f96c4cc0a87f646e80b17303"",""obje...","{""agent_id"": ""ag:ad73bfd7:20241009:python-code...",[{'message.content': '### Description: Perform...
2aef7e010a055107,MistralClient.agents,LLM,,2024-10-16 18:58:21.822048+00:00,2024-10-16 18:58:47.476847+00:00,OK,,[],2aef7e010a055107,358f3431ae046baa14f14ccec1e2e842,...,[{'message.content': '### Analysis Report ###...,3104,"{""messages"": [{""role"": ""user"", ""content"": ""\nL...",application/json,944,4048,application/json,"{""id"":""e37eaaf3445b45049518be4576a1ebff"",""obje...","{""agent_id"": ""ag:ad73bfd7:20241009:summarizati...",[{'message.content': ' Load this data: https:/...


When it comes to evaluating agents, a good general approach is to break down the steps your agent must complete, and evaluate each step individually.

In this case, we can evaluate:
1. The code generated by the Python agent
2. The analysis report written by the summarization agent

We'll evaluate each of these steps individually.

#### Evaluate Code Generation

Phoenix has a [built-in LLM Judge template that can be used to evaluate Code Generation Agents](https://docs.arize.com/phoenix/evaluation/how-to-evals/running-pre-tested-evals/code-generation-eval). We'll use that template here.

The template requires two columns to be added to the dataframe:
- output: The code generated by the agent
- input: The original user query

We already have the input, so just need to extract solely the generated code from the `attributes.llm.output_messages` column:

In [32]:
code_gen_spans = spans[spans['attributes.llm.invocation_parameters'].str.contains(python_agent_id)]
code_gen_spans['generated_code'] = code_gen_spans['attributes.llm.output_messages'].apply(lambda x: PythonAgentWorkflow().extract_code(x[0]['message.content']))
code_gen_spans.head()

### Extracting Python code
### Extracting Python code
### Extracting Python code


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  code_gen_spans['generated_code'] = code_gen_spans['attributes.llm.output_messages'].apply(lambda x: PythonAgentWorkflow().extract_code(x[0]['message.content']))


Unnamed: 0_level_0,name,span_kind,parent_id,start_time,end_time,status_code,status_message,events,context.span_id,context.trace_id,...,attributes.llm.token_count.prompt,attributes.input.value,attributes.input.mime_type,attributes.llm.token_count.completion,attributes.llm.token_count.total,attributes.output.mime_type,attributes.output.value,attributes.llm.invocation_parameters,attributes.llm.input_messages,generated_code
context.span_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
81682642a644b5e5,MistralClient.agents,LLM,,2024-10-16 18:57:56.825176+00:00,2024-10-16 18:57:58.956025+00:00,OK,,[],81682642a644b5e5,4725fed7be91e18ee8e70f64eeed5b82,...,168,"{""messages"": [{""role"": ""user"", ""content"": ""###...",application/json,61,229,application/json,"{""id"":""d89506c5343e4f93a09f6cbe8b42facf"",""obje...","{""agent_id"": ""ag:ad73bfd7:20241009:python-code...",[{'message.content': '### Description: Load th...,(import pandas as pd\n\nurl = 'https://raw.git...
d8635ee34648298b,MistralClient.agents,LLM,,2024-10-16 18:57:59.294874+00:00,2024-10-16 18:58:00.961450+00:00,OK,,[],d8635ee34648298b,7e723183239af2bb7cc300c383c124e7,...,142,"{""messages"": [{""role"": ""user"", ""content"": ""###...",application/json,40,182,application/json,"{""id"":""93c6d4434cd946718744629d31efa346"",""obje...","{""agent_id"": ""ag:ad73bfd7:20241009:python-code...",[{'message.content': '### Description: Inspect...,(# Display the first few rows of the DataFrame...
a2247b0c2142629e,MistralClient.agents,LLM,,2024-10-16 18:58:01.057661+00:00,2024-10-16 18:58:03.009694+00:00,OK,,[],a2247b0c2142629e,1d823518ded053d0244ee0e20fadf3d3,...,142,"{""messages"": [{""role"": ""user"", ""content"": ""###...",application/json,26,168,application/json,"{""id"":""90507fa7f96c4cc0a87f646e80b17303"",""obje...","{""agent_id"": ""ag:ad73bfd7:20241009:python-code...",[{'message.content': '### Description: Perform...,(descriptive_stats = df.describe()\nprint(desc...


In [37]:
from phoenix.evals import CODE_READABILITY_PROMPT_TEMPLATE

def evaluate_code(row):
    try:
        response = client.chat.complete(
            model="mistral-large-latest",
            messages = [
                {
                    "role": "user",
                    "content":  CODE_READABILITY_PROMPT_TEMPLATE.explanation_template.format(output=row['generated_code'], input=row['attributes.input.value'])
                },
            ]
        )
        result = response.choices[0].message.content
        explanation, label = result.split("LABEL: ")
        label = label.replace("\"", "")
        return label, explanation
    except Exception as e:
        print(f"Request failed: {e}. Please check your request.")
        return None

Since we're going to use Mistral to evaluate the code, we'll uninstrument Mistral to prevent our traces view from being too cluttered.

In [34]:
MistralAIInstrumentor().uninstrument()

Now we can run our `evaluate_code` method on each row of the dataframe:

In [None]:
results = code_gen_spans.apply(evaluate_code, axis=1)
code_gen_spans['label'] = results.apply(lambda x: x[0] if x else None)
code_gen_spans['explanation'] = results.apply(lambda x: x[1] if x else None)
code_gen_spans['score'] = code_gen_spans['label'].apply(lambda x: 1 if x == "readable" else 0)

#### Evaluate Summarization

We'll use a simple prompt to evaluate the summarization agent, this time using a different prebuilt evaluation template.

In [42]:
# Filter the spans to only include those from our summarization agent
summarization_spans = spans[spans['attributes.llm.invocation_parameters'].str.contains(summarization_agent_id)]
summarization_spans.head()

Unnamed: 0_level_0,name,span_kind,parent_id,start_time,end_time,status_code,status_message,events,context.span_id,context.trace_id,...,attributes.llm.token_count.completion,attributes.llm.token_count.total,attributes.output.mime_type,attributes.output.value,attributes.llm.invocation_parameters,attributes.llm.input_messages,generated_code,label,explanation,score
context.span_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2aef7e010a055107,MistralClient.agents,LLM,,2024-10-16 18:58:21.822048+00:00,2024-10-16 18:58:47.476847+00:00,OK,,[],2aef7e010a055107,358f3431ae046baa14f14ccec1e2e842,...,944,4048,application/json,"{""id"":""e37eaaf3445b45049518be4576a1ebff"",""obje...","{""agent_id"": ""ag:ad73bfd7:20241009:summarizati...",[{'message.content': ' Load this data: https:/...,"(None, True)",,,0


In [44]:
import phoenix.evals.default_templates as templates

def evaluate_summarization(row):
    try:
        response = client.chat.complete(
            model="mistral-large-latest",
            messages = [
                {
                    "role": "user",
                    "content":  templates.SUMMARIZATION_PROMPT_TEMPLATE.explanation_template.format(output=row['attributes.output.value'], input=row['attributes.input.value'])
                },
            ]
        )
        result = response.choices[0].message.content
        explanation, label = result.split("LABEL: ")
        label = label.replace("\"", "")
        return label, explanation
    except Exception as e:
        print(f"Request failed: {e}. Please check your request.")
        return None

In [None]:
summarization_spans['label'], summarization_spans['explanation'] = zip(*summarization_spans.apply(evaluate_summarization, axis=1))
summarization_spans['score'] = summarization_spans['label'].apply(lambda x: 1 if x == "good" else 0)
summarization_spans.head()

#### Import labeled traces into Phoenix

In [46]:
from phoenix.trace import SpanEvaluations

px.Client().log_evaluations(
    SpanEvaluations(eval_name="Code Quality", dataframe=code_gen_spans),
    SpanEvaluations(eval_name="Summarization", dataframe=summarization_spans)
)



You can now view your evaluations in the Phoenix UI:

![image info](../../third_party/Phoenix/images/phoenix-agent-summarization-eval.png)