## Dataset Usage and Extensions

This notebook demonstrates how to process and analyze multi-agent LLM interaction data. Here are some ways you can extend this work:

In [None]:
def analyze_response_quality(examples):
    """Analyze response characteristics and quality indicators."""
    analysis = {
        'response_lengths': {'a': [], 'b': []},
        'model_performance': {},
        'question_types': {},
        'language_distribution': {}
    }
    
    for example in examples:
        context = example['context']
        
        # Response lengths
        resp_a_len = len(context['response_a'])
        resp_b_len = len(context['response_b'])
        analysis['response_lengths']['a'].append(resp_a_len)
        analysis['response_lengths']['b'].append(resp_b_len)
        
        # Model performance tracking
        model_a = context['model_a']
        model_b = context['model_b']
        winner = context['winner']
        
        for model in [model_a, model_b]:
            if model not in analysis['model_performance']:
                analysis['model_performance'][model] = {'wins': 0, 'losses': 0, 'ties': 0, 'total': 0}
        
        analysis['model_performance'][model_a]['total'] += 1
        analysis['model_performance'][model_b]['total'] += 1
        
        if winner == 'model_a':
            analysis['model_performance'][model_a]['wins'] += 1
            analysis['model_performance'][model_b]['losses'] += 1
        elif winner == 'model_b':
            analysis['model_performance'][model_b]['wins'] += 1
            analysis['model_performance'][model_a]['losses'] += 1
        elif winner == 'tie':
            analysis['model_performance'][model_a]['ties'] += 1
            analysis['model_performance'][model_b]['ties'] += 1
        
        # Language distribution
        language = context.get('language', 'Unknown')
        analysis['language_distribution'][language] = analysis['language_distribution'].get(language, 0) + 1
        
        # Simple question type classification
        question = example['input'].lower()
        if 'what' in question or 'how' in question:
            qtype = 'informational'
        elif 'why' in question:
            qtype = 'explanatory'
        elif 'vs' in question or 'better' in question:
            qtype = 'comparative'
        else:
            qtype = 'other'
        
        analysis['question_types'][qtype] = analysis['question_types'].get(qtype, 0) + 1
    
    return analysis

# Perform analysis
print("üî¨ Analyzing Multi-Agent Response Patterns...")
analysis = analyze_response_quality(processed_examples)

print(f"\\n{CYAN}üìè Response Length Analysis:{END}")
avg_len_a = sum(analysis['response_lengths']['a']) / len(analysis['response_lengths']['a'])
avg_len_b = sum(analysis['response_lengths']['b']) / len(analysis['response_lengths']['b'])
print(f"  Average Model A response length: {avg_len_a:.0f} characters")
print(f"  Average Model B response length: {avg_len_b:.0f} characters")

print(f"\\n{CYAN}üèÜ Model Performance Summary:{END}")
for model, stats in analysis['model_performance'].items():
    win_rate = stats['wins'] / stats['total'] * 100 if stats['total'] > 0 else 0
    print(f"  {model}: {win_rate:.1f}% win rate ({stats['wins']}/{stats['total']} matches)")

print(f"\\n{CYAN}‚ùì Question Type Distribution:{END}")
for qtype, count in analysis['question_types'].items():
    print(f"  {qtype.capitalize()}: {count} questions")

print(f"\\n{CYAN}üåç Language Distribution:{END}")
for lang, count in analysis['language_distribution'].items():
    print(f"  {lang}: {count} examples")

## Interactive Data Exploration

Let's add some interactive features to explore the dataset further and understand the multi-agent dynamics.

In [None]:
# Create the final dataset structure
output_data = {
    "examples": processed_examples
}

print(f"{BLUE}Final Dataset Statistics:{END}")
print(f"  Total examples: {len(output_data['examples'])}")
print(f"  Dataset source: lmsys/chatbot_arena_conversations")
print(f"  Focus: Multi-LLM agent interaction and coordination")
print()

# Show the JSON structure (pretty formatted)
print(f"{BLUE}Sample JSON Output Structure:{END}")
print("=" * 60)
sample_output = {
    "examples": [output_data["examples"][0]]  # Just show first example
}

# Pretty print the JSON structure
import json
print(json.dumps(sample_output, indent=2, ensure_ascii=False)[:1000] + "...")

print(f"\\n{GREEN}‚úÖ Dataset processing complete!{END}")
print(f"This dataset can be used for research into:")
print("  ‚Ä¢ Multi-agent coordination patterns")
print("  ‚Ä¢ Communication efficiency between LLMs")
print("  ‚Ä¢ Model comparison and evaluation methods")
print("  ‚Ä¢ Preference learning and ranking")

## Generate Final Dataset Output

Now let's create the final output structure that matches the original script's format.

In [None]:
# Let's examine the first processed example in detail
print("üîç First Processed Example Structure:")
print("=" * 60)
first_example = processed_examples[0]

print(f"{CYAN}Input (User Question):{END}")
print(f"  {first_example['input'][:100]}...")
print()

print(f"{CYAN}Context (Multi-Agent Metadata):{END}")
for key, value in first_example['context'].items():
    if key in ['response_a', 'response_b']:
        print(f"  {key}: {str(value)[:100]}...")
    else:
        print(f"  {key}: {value}")
print()

print(f"{CYAN}Output (Winner Decision):{END}")
print(f"  {first_example['output']}")
print()

print(f"{CYAN}Dataset Info:{END}")
print(f"  Dataset: {first_example['dataset']}")
print(f"  Split: {first_example['split']}")
print()

# Show comparison of models
print("üèÜ Model Comparison Analysis:")
print("=" * 60)
for i, example in enumerate(processed_examples):
    model_a = example['context']['model_a']
    model_b = example['context']['model_b']
    winner = example['context']['winner']
    
    print(f"Example {i+1}: {model_a} vs {model_b} ‚Üí Winner: {winner}")

# Summary statistics
winners = [ex['context']['winner'] for ex in processed_examples]
winner_counts = {}
for winner in winners:
    winner_counts[winner] = winner_counts.get(winner, 0) + 1

print(f"\\nüìä Winner Distribution:")
for winner, count in winner_counts.items():
    print(f"  {winner}: {count} examples")

## Explore Processed Data

Let's examine the structure of our processed data to understand how the standardization works.

In [None]:
def process_chatbot_arena(data: List[Dict], limit: int = 200) -> List[Dict]:
    """
    Process lmsys/chatbot_arena_conversations dataset.
    
    This function transforms raw chatbot arena data into a standardized format
    that focuses on multi-LLM agent interactions and comparisons.
    
    Args:
        data: List of raw chatbot arena conversation records
        limit: Maximum number of examples to process
    
    Returns:
        List of processed examples in standardized format
    """
    print(f"{YELLOW}Processing{END} chatbot_arena_conversations")
    examples = []

    for item in data[:limit]:
        # Extract conversation from both models
        conv_a = item.get("conversation_a", [])
        conv_b = item.get("conversation_b", [])

        if not conv_a or not conv_b:
            continue

        # Build multi-turn conversation as input
        user_msg = conv_a[0].get("content", "") if conv_a else ""
        response_a = conv_a[1].get("content", "") if len(conv_a) > 1 else ""
        response_b = conv_b[1].get("content", "") if len(conv_b) > 1 else ""

        # Create structured example
        example = {
            "input": user_msg,
            "context": {
                "model_a": item.get("model_a", ""),
                "model_b": item.get("model_b", ""),
                "winner": item.get("winner", ""),
                "judge": item.get("judge", ""),
                "language": item.get("language", ""),
                "turn": item.get("turn", 1),
                "response_a": response_a,
                "response_b": response_b
            },
            "output": f"Winner: {item.get('winner', 'unknown')}",
            "dataset": "lmsys/chatbot_arena_conversations",
            "split": "train"
        }
        examples.append(example)

    print(f"{GREEN}Processed{END} {len(examples)} examples from chatbot_arena")
    return examples

# Test the function with our sample data
processed_examples = process_chatbot_arena(sample_chatbot_arena_data, limit=10)
print(f"\\n‚úÖ Successfully processed {len(processed_examples)} examples")

## Data Processing Functions

Now let's implement the core data processing functions from the original script. These functions transform the raw chatbot arena data into a standardized format suitable for multi-agent coordination research.

In [None]:
# Sample raw chatbot arena data (simulating what would be loaded from external JSON files)
# This data represents conversations where two different LLMs compete to answer questions
# and a judge determines the winner

sample_chatbot_arena_data = [
    {
        "conversation_a": [
            {
                "content": "What is the difference between OpenCL and CUDA?"
            },
            {
                "content": "OpenCL and CUDA are two different programming models that are used for parallel computing.OpenCL is a general-purposeÂπ∂Ë°åÁºñÁ®ãÊé•Âè£ that allows developers to write parallel code that can run on any platform that supportsCL, which includes most modern operating systems and computer systems, including Windows, Linux, and macOS. It provides a lower-level, more flexible API that is more suitable for building large-scale distributed computing systems.CUDA is a specific implementation ofOpenCL that is designed for performance and scalability in devices with multiple GPU(s). It was developed by Nvidia and is widely used for scientific computing, machine learning, andÂõæÂΩ¢ÂõæÂÉèÂ§ÑÁêÜ. The CUDA API provides a high-level, more efficient way to write parallel code that can run on top of a GPU, while still maintaining the ability to write code that can be easily adapted for other parallel computing platforms."
            }
        ],
        "conversation_b": [
            {
                "content": "What is the difference between OpenCL and CUDA?"
            },
            {
                "content": "OpenCL and CUDA are both programming languages for parallel computing on GPUs, but they differ in several key ways:\n\n1.   Architecture: OpenCL is a general-purpose parallel computing language, while CUDA is designed specifically for GPU computing. OpenCL can run on various types of processors, including CPUs, GPUs, and FPGAs, whereas CUDA is only compatible with NVIDIA GPUs.\n2.   Language: OpenCL is a C++ language, while CUDA is a C++ language with additional features for GPU computing. CUDA provides additional libraries and tools for handling graphics and other tasks unique to GPUs.\n3.   Driver support: OpenCL has a wider range of device support, with support for not just NVIDIA GPUs but also other manufacturers like AMD, Intel, and ARM. CUDA is only available on NVIDIA GPUs, and requires the CUDA Toolkit to be installed on the host machine.\n4.   Memory management: OpenCL provides a unified memory model, where all the data is stored in global memory, and the kernel can access any part of it. CUDA, on the other hand, uses a hierarchical memory architecture, where data is divided into host, device, and shared memory.\n5.   Performance: OpenCL and CUDA have different strengths and weaknesses in terms of performance. OpenCL is generally faster for numerical computations, while CUDA is faster for graphics and other tasks that require fast memory access. CUDA also has better support for 64-bit floating-point operations, which are essential for some scientific applications.\n\nIn summary, OpenCL and CUDA are both programming languages for parallel computing on GPUs, but they differ in terms of architecture, language, driver support, memory management, and performance. OpenCL is a general-purpose language that can run on various types of processors, while CUDA is specific to NVIDIA GPUs and provides additional libraries and tools for handling graphics and other tasks unique to GPUs."
            }
        ],
        "model_a": "chatglm-6b",
        "model_b": "koala-13b",
        "winner": "model_b",
        "judge": "arena_user_973",
        "language": "English",
        "turn": 1
    },
    {
        "conversation_a": [
            {
                "content": "Why did my parent not invite me to their wedding?"
            },
            {
                "content": "It is possible that your parent did not invite you to their wedding for a variety of reasons. It could be that they did not want to burden you with the planning or expense of the event, or they may not have had the time or resources to include you. Additionally, it could be that they did not feel comfortable having you attend due to your age or relationship with them. It is important to remember that your parent may have had their own reasons for not inviting you, and it is not your place to judge their choices. If you feel neglected or upset about not being invited, it is best to talk to them about it and express your feelings in a non-accusatory way."
            }
        ],
        "conversation_b": [
            {
                "content": "Why did my parent not invite me to their wedding?"
            },
            {
                "content": "It is likely that they wanted to keep the guest list small and intimate. Maybe they felt that inviting you would make the event too crowded, or that they wanted to keep the event reserved for close friends and family. It could be that they wanted to avoid any potential drama or conflict that could arise from having too many people at the event. Whatever the reason, it's important to remember that they made this decision out of love and respect for you, and that their decision doesn't define your relationship."
            }
        ],
        "model_a": "oasst-pythia-12b",
        "model_b": "alpaca-13b",
        "winner": "tie",
        "judge": "arena_user_973",
        "language": "English",
        "turn": 1
    },
    {
        "conversation_a": [
            {
                "content": "Fuji vs. Nikon, which is better?"
            },
            {
                "content": "Both Fuji and Nikon are popular camera brands that offer a wide range of cameras for different types of photography. It really depends on your specific needs and preferences as to which brand is \"better\" for you.\n\nFujifilm is known for its high image quality, especially in the areas of color accuracy and dynamic range. Many photographers also appreciate the company's focus on ergonomics and the overall user experience of its cameras. Additionally, Fujifilm has a loyal following among enthusiasts and professional photographers, thanks to its commitment to film photography and its high-end mirrorless cameras.\n\nNikon, on the other hand, is known for its advanced autofocus system and image stabilization technology, as well as its high-end cameras and lenses. Nikon also has a wide selection of cameras and lenses to choose from, including both DSLR and mirrorless options.\n\nUltimately, it's difficult to say which brand is \"better\" without knowing more about your specific needs and preferences. It might be helpful to research both brands further and try out a few different models before making a decision."
            }
        ],
        "conversation_b": [
            {
                "content": "Fuji vs. Nikon, which is better?"
            },
            {
                "content": "This is a subjective question and the answer depends on the individual preferences and needs. Both Fujifilm and Nikon are well-known for their high-quality cameras, and the choice between the two ultimately depends on the user's personal preferences and needs.\n\nSome factors to consider when deciding between a Fuji and a Nikon camera include:\n\n1. Image quality: Fuji cameras are known for their high-quality images, with a distinctive film-like quality. Nikon cameras are also notable for their image quality, but they may not offer the same level of sharpness and color accuracy as Fuji cameras.\n\n2. Focusing: Fuji cameras are known for their fast and precise autofocus, while Nikon cameras tend to have a slower and more manual focus system.\n\n3. Image size: Fuji cameras are known for their wide range of photo sizes, from small compacts to large-format cameras. Nikon cameras are also capable of producing high-quality images, but they may not be as versatile as Fuji cameras.\n\n4. User interface: Fuji cameras have a user-friendly interface, with a simplified design that is easy to navigate. Nikon cameras may have a more complex interface, but they also offer a wealth of features and customization options.\n\nUltimately, the best camera for you will depend on your specific needs and preferences. If you're looking for a high-quality camera that offers a unique film-like experience, a Fuji camera may be the right choice for you. If you're looking for a camera that is easy to use and offers a wide range of photo sizes, a Nikon camera may be the best choice."
            }
        ],
        "model_a": "koala-13b",
        "model_b": "oasst-pythia-12b",
        "winner": "model_b",
        "judge": "arena_user_973",
        "language": "English",
        "turn": 1
    }
]

print(f"üìä Sample data loaded: {len(sample_chatbot_arena_data)} examples")
print("üîç Each example contains conversation between two models with winner judgment")

## Sample Data

Instead of loading from external files, we'll inline some sample data from the chatbot arena conversations dataset. This makes the notebook completely self-contained.

In [None]:
# Import required libraries
import json
import sys
from typing import Any, Dict, List
from pprint import pprint

# Define color constants for formatted output
BLUE, GREEN, YELLOW, CYAN, END = "\033[94m", "\033[92m", "\033[93m", "\033[96m", "\033[0m"

print("‚úÖ Imports completed successfully!")

# Multi-Agent Coordination Communication-Efficiency Dataset

This notebook demonstrates the processing of multi-LLM agent interaction datasets. The original script loads and processes data from the lmsys/chatbot_arena_conversations dataset to create a standardized format for studying multi-agent coordination and communication patterns.

## Overview

This dataset processing script:
- Loads chatbot arena conversation data
- Standardizes the format to match a specific schema
- Extracts conversation examples with model comparisons
- Focuses on multi-LLM agent interactions and their evaluation

Let's explore the data processing pipeline step by step.