## Loading Conversations from DataFrames

In some cases, conversation data might be available in DataFrames rather than JSON. 
For example, you might have a DataFrame for messages and another for outcomes.
This section demonstrates how to create Conversation objects from these separate DataFrames.

## Setup and Imports

In [2]:
# Import necessary libraries
import os
import pandas as pd

# Import Agentune simulate components
from agentune.simulate.models import Conversation, Message, Outcome, ParticipantRole

## Create Sample Conversation Data

First, let's create a fabricated sample dataset that mimics the structure of real conversation data.

In [15]:
# Create sample DataFrames that might come from a database or CSV files
# First, let's create a DataFrame for messages
messages_df = pd.DataFrame([
    {'conversation_id': 'conv_001', 'sender': 'customer', 'content': 'I received a damaged product and need a replacement', 'timestamp': '2024-05-10T09:15:00.000000'},
    {'conversation_id': 'conv_001', 'sender': 'agent', 'content': 'I apologize for the inconvenience. We can arrange a replacement right away.', 'timestamp': '2024-05-10T09:17:30.000000'},
    {'conversation_id': 'conv_001', 'sender': 'customer', 'content': 'Please do, and I expect a refund on the delivery fee as well.', 'timestamp': '2024-05-10T09:21:05.000000'},
    {'conversation_id': 'conv_002', 'sender': 'customer', 'content': 'Is your warranty transferable if I sell the product?', 'timestamp': '2024-05-15T14:35:22.000000'},
    {'conversation_id': 'conv_002', 'sender': 'agent', 'content': 'Yes, our warranty stays with the product for the full term regardless of ownership changes.', 'timestamp': '2024-05-15T14:38:45.000000'},
    {'conversation_id': 'conv_002', 'sender': 'customer', 'content': 'No, that\'s all. Thanks again!', 'timestamp': '2024-05-15T14:42:20.000000'}
])

# Now, let's create a DataFrame for outcomes
outcomes_df = pd.DataFrame([
    {'conversation_id': 'conv_001', 'name': 'resolved', 'description': 'Issue was successfully resolved'},
    {'conversation_id': 'conv_002', 'name': 'unresolved', 'description': 'Issue was not resolved'}
    ])

### Display the DataFrames

In [16]:
# Display the DataFrames
print("Messages DataFrame:")
messages_df.head()

Messages DataFrame:


Unnamed: 0,conversation_id,sender,content,timestamp
0,conv_001,customer,I received a damaged product and need a replac...,2024-05-10T09:15:00.000000
1,conv_001,agent,I apologize for the inconvenience. We can arra...,2024-05-10T09:17:30.000000
2,conv_001,customer,"Please do, and I expect a refund on the delive...",2024-05-10T09:21:05.000000
3,conv_002,customer,Is your warranty transferable if I sell the pr...,2024-05-15T14:35:22.000000
4,conv_002,agent,"Yes, our warranty stays with the product for t...",2024-05-15T14:38:45.000000


In [17]:
print("\nOutcomes DataFrame:")
outcomes_df.head()


Outcomes DataFrame:


Unnamed: 0,conversation_id,name,description
0,conv_001,resolved,Issue was successfully resolved
1,conv_002,unresolved,Issue was not resolved


### Functions for Loading and Processing Conversation Data

In [18]:
def create_conversations_from_dataframes(
    messages_df: pd.DataFrame,
    outcomes_df: pd.DataFrame
) -> list[Conversation]:
    """
    Convert message and outcome DataFrames into Conversation objects.
    Simplified version using patterns from utils.py
    """
    conversations = []

    # Group by conversation_id, similar to load_conversations_from_csv
    for conv_id, group in messages_df.groupby('conversation_id'):
        # Sort by timestamp to ensure message order
        group = group.sort_values('timestamp')

        # Create messages using the same logic as utils.py
        messages = []
        for _, row in group.iterrows():
            # Reuse the sender conversion logic from utils.py
            sender = ParticipantRole.CUSTOMER if row['sender'].lower() == 'customer' else ParticipantRole.AGENT

            message = Message(
                sender=sender,
                content=str(row['content']),
                timestamp=pd.to_datetime(row['timestamp']).to_pydatetime()
            )
            messages.append(message)

        # Get outcome for this conversation
        outcome_row = outcomes_df[outcomes_df['conversation_id'] == conv_id]
        outcome = None
        if not outcome_row.empty:
            first_outcome = outcome_row.iloc[0]
            outcome = Outcome(
                name=str(first_outcome['name']),
                description=str(first_outcome['description'])
            )

        # Create conversation
        conversation = Conversation(
            messages=tuple(messages),
            outcome=outcome
        )
        conversations.append(conversation)
    
    return conversations


def conversations_to_dataframe(conversations: list[Conversation]) -> pd.DataFrame:
    """
    Convert a list of Conversation objects to a DataFrame for analysis.

    Args:
        conversations: List of Conversation objects

    Returns:
        DataFrame containing conversation data
    """
    data = []

    for i, conv in enumerate(conversations):
        # Extract first message content (truncated for display)
        first_message = conv.messages[0].content[:100] + "..." if conv.messages else ""

        # Extract outcome name (if available)
        outcome_name = conv.outcome.name if conv.outcome else "unknown"

        # Get conversation length (number of messages)
        num_messages = len(conv.messages)

        # Get conversation participants
        # Use the value attribute of the ParticipantRole enum
        participants = set(msg.sender.value for msg in conv.messages)

        # Calculate conversation duration (if timestamps are available)
        duration_minutes = None
        if num_messages > 1:
            try:
                # The timestamp is already a datetime object, so no need for fromisoformat
                start_time = conv.messages[0].timestamp
                end_time = conv.messages[-1].timestamp
                duration = end_time - start_time
                duration_minutes = duration.total_seconds() / 60
            except (ValueError, AttributeError):
                pass

        # Build row data
        row = {
            'id': f'conversation_{i}',
            'num_messages': num_messages,
            'outcome': outcome_name,
            'participants': ', '.join(participants),
            'first_message': first_message,
            'duration_minutes': duration_minutes,
            'conversation_object': conv  # Store the original conversation object
        }

        data.append(row)

    return pd.DataFrame(data)

## Generate and Save Sample Data

In [19]:
# Convert the DataFrames to Conversation objects
df_conversations = create_conversations_from_dataframes(messages_df, outcomes_df)

# Create a DataFrame from the Conversation objects for display and analysis
df_conversations_df = conversations_to_dataframe(df_conversations)

# Display the resulting DataFrame
print(f"Created {len(df_conversations)} conversations from DataFrames")
display(df_conversations_df[['id', 'num_messages', 'outcome', 'participants', 'first_message', 'duration_minutes']])

Created 2 conversations from DataFrames


Unnamed: 0,id,num_messages,outcome,participants,first_message,duration_minutes
0,conversation_0,3,resolved,"agent, customer",I received a damaged product and need a replac...,6.083333
1,conversation_1,3,unresolved,"agent, customer",Is your warranty transferable if I sell the pr...,6.966667


### Load Conversation Data from File

In [20]:
# Load conversations from the CSV file we saw earlier
csv_path = os.path.join(os.path.dirname(os.path.abspath('__file__')), 'data', 'sample_conversations.csv')

# Check if the file exists
if os.path.exists(csv_path):
    # Read the CSV file into DataFrames
    csv_messages_df = pd.read_csv(csv_path)
    
    # Group by conversation_id to create a list of unique conversation IDs
    conversation_ids = csv_messages_df['conversation_id'].unique()
    
    # Create an outcomes DataFrame from the CSV data
    # For each conversation, get the last row to extract outcome information
    outcomes_data = []
    for conv_id in conversation_ids:
        conv_rows = csv_messages_df[csv_messages_df['conversation_id'] == conv_id]
        last_row = conv_rows.iloc[-1]
        outcomes_data.append({
            'conversation_id': conv_id,
            'name': last_row.get('outcome_name', 'unknown'),
            'description': last_row.get('outcome_description', '')
        })
    
    csv_outcomes_df = pd.DataFrame(outcomes_data)
    
    # Display the DataFrames
    print("CSV Messages DataFrame:")
    display(csv_messages_df.head())
    
    print("\nExtracted Outcomes DataFrame:")
    display(csv_outcomes_df.head())
    
    # Convert to Conversation objects
    csv_conversations = create_conversations_from_dataframes(csv_messages_df, csv_outcomes_df)
    print(f"\nCreated {len(csv_conversations)} conversations from CSV file")
else:
    print(f"CSV file not found at: {csv_path}")

CSV Messages DataFrame:


Unnamed: 0,conversation_id,sender,content,timestamp,outcome_name,outcome_description
0,conv_001,customer,"Last night, I waited in line for 2 hours in th...",2024-01-15T09:00:00+00:00,resolved,Issue was successfully resolved
1,conv_001,agent,"We’re very sorry, I am the Guangdong Customer ...",2024-01-15T09:02:00+00:00,resolved,Issue was successfully resolved
2,conv_001,customer,How can consumers supervise you if you don't s...,2024-01-15T09:05:00+00:00,resolved,Issue was successfully resolved
3,conv_001,agent,We will continue to improve various services a...,2024-01-15T09:09:00+00:00,resolved,Issue was successfully resolved
4,conv_001,customer,Nonsense. China Telecom has failed to make pro...,2024-01-15T09:14:00+00:00,resolved,Issue was successfully resolved



Extracted Outcomes DataFrame:


Unnamed: 0,conversation_id,name,description
0,conv_001,resolved,Issue was successfully resolved
1,conv_002,resolved,Issue was successfully resolved
2,conv_003,resolved,Issue was successfully resolved
3,conv_004,resolved,Issue was successfully resolved
4,conv_005,unresolved,Issue was not resolved



Created 100 conversations from CSV file


## Summary of DataFrame Loading Approach

The approach demonstrated above allows you to:

1. Work with conversation data stored in separate DataFrames (messages and outcomes)
2. Convert these DataFrames into Conversation objects compatible with Agentune
3. Support multiple data sources (JSON, CSV, database exports) by transforming them into DataFrames first
4. Apply the same analysis and RAG techniques to conversations regardless of their original data source

This flexibility is particularly useful when integrating with existing data pipelines or when combining data from multiple sources.