# Multi-Agent Translation System - Research Analysis

**Project:** LLM Course HW3 - Turing Assignment

**Date:** 2025-11-25

**Research Question:** How does the rate of spelling errors in input text affect semantic drift in multi-agent sequential translation systems?

---

## Table of Contents

1. [Setup and Imports](#setup)
2. [Data Loading](#data-loading)
3. [Exploratory Data Analysis](#eda)
4. [Statistical Analysis](#statistical-analysis)
5. [Sensitivity Analysis](#sensitivity-analysis)
6. [Visualization](#visualization)
7. [Conclusions](#conclusions)

## 1. Setup and Imports <a id='setup'></a>

In [None]:
# Standard library imports
import json
import sys
from pathlib import Path

# Data analysis imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from scipy.stats import pearsonr, spearmanr

# Configure visualization
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 11

# Add src to path for imports
sys.path.append(str(Path.cwd()))

print("✓ Imports successful")
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")

## 2. Data Loading <a id='data-loading'></a>

Load experimental results from the translation pipeline.

In [None]:
# Load analysis results
results_dir = Path('results/analysis')

# Find the most recent analysis file
analysis_files = list(results_dir.glob('analysis_*.csv'))
if analysis_files:
    latest_file = max(analysis_files, key=lambda p: p.stat().st_mtime)
    df = pd.read_csv(latest_file)
    print(f"✓ Loaded data from: {latest_file.name}")
    print(f"  Rows: {len(df)}")
    print(f"  Columns: {list(df.columns)}")
else:
    print("⚠ No analysis files found. Please run the main pipeline first.")
    print("  Command: python -m src.main")
    df = None

In [None]:
# Display first few rows
if df is not None:
    display(df.head(10))
    
    # Basic statistics
    print("\nBasic Statistics:")
    display(df.describe())

## 3. Exploratory Data Analysis <a id='eda'></a>

Examine the distribution and characteristics of our experimental data.

In [None]:
if df is not None:
    # Distribution of error rates
    print("Error Rate Distribution:")
    print(df['error_rate'].value_counts().sort_index())
    
    # Distribution of distances
    print("\nDistance Statistics by Error Rate:")
    display(df.groupby('error_rate')['distance'].describe())

In [None]:
if df is not None:
    # Visualize distributions
    fig, axes = plt.subplots(1, 2, figsize=(15, 5))
    
    # Error rate distribution
    axes[0].hist(df['error_rate'], bins=20, edgecolor='black')
    axes[0].set_xlabel('Error Rate')
    axes[0].set_ylabel('Frequency')
    axes[0].set_title('Distribution of Error Rates')
    axes[0].grid(True, alpha=0.3)
    
    # Distance distribution
    axes[1].hist(df['distance'], bins=20, edgecolor='black', color='coral')
    axes[1].set_xlabel('Semantic Distance')
    axes[1].set_ylabel('Frequency')
    axes[1].set_title('Distribution of Semantic Distances')
    axes[1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()

## 4. Statistical Analysis <a id='statistical-analysis'></a>

Perform statistical tests to determine the relationship between error rate and semantic distance.

In [None]:
if df is not None:
    # Correlation analysis
    pearson_corr, pearson_p = pearsonr(df['error_rate'], df['distance'])
    spearman_corr, spearman_p = spearmanr(df['error_rate'], df['distance'])
    
    print("Correlation Analysis:")
    print("=" * 60)
    print(f"Pearson Correlation:  r = {pearson_corr:.4f}, p-value = {pearson_p:.6f}")
    print(f"Spearman Correlation: ρ = {spearman_corr:.4f}, p-value = {spearman_p:.6f}")
    print("\nInterpretation:")
    if pearson_p < 0.001:
        print("  *** Highly significant relationship (p < 0.001)")
    elif pearson_p < 0.01:
        print("  ** Very significant relationship (p < 0.01)")
    elif pearson_p < 0.05:
        print("  * Significant relationship (p < 0.05)")
    else:
        print("  No significant relationship (p >= 0.05)")
    
    if abs(pearson_corr) > 0.7:
        print("  Strong correlation")
    elif abs(pearson_corr) > 0.4:
        print("  Moderate correlation")
    else:
        print("  Weak correlation")

In [None]:
if df is not None:
    # Linear regression
    from scipy.stats import linregress
    
    slope, intercept, r_value, p_value, std_err = linregress(
        df['error_rate'], df['distance']
    )
    
    print("\nLinear Regression Results:")
    print("=" * 60)
    print(f"Equation: distance = {slope:.4f} × error_rate + {intercept:.4f}")
    print(f"R² = {r_value**2:.4f}")
    print(f"Standard Error: {std_err:.6f}")
    print(f"\nFor every 10% increase in error rate:")
    print(f"  Expected distance increase: {slope * 0.1:.4f}")

## 5. Sensitivity Analysis <a id='sensitivity-analysis'></a>

Examine how sensitive the semantic distance is to changes in error rate at different ranges.

In [None]:
if df is not None:
    # Bin error rates and calculate statistics
    df['error_bin'] = pd.cut(
        df['error_rate'], 
        bins=[0, 0.1, 0.25, 0.4, 1.0],
        labels=['Low (0-10%)', 'Medium (10-25%)', 'High (25-40%)', 'Very High (40%+)']
    )
    
    print("Sensitivity by Error Rate Range:")
    print("=" * 60)
    sensitivity_stats = df.groupby('error_bin')['distance'].agg([
        ('Mean', 'mean'),
        ('Std Dev', 'std'),
        ('Min', 'min'),
        ('Max', 'max'),
        ('Count', 'count')
    ])
    display(sensitivity_stats)
    
    # Calculate rate of change between bins
    print("\nRate of Change Between Ranges:")
    means = sensitivity_stats['Mean'].values
    for i in range(1, len(means)):
        change = means[i] - means[i-1]
        pct_change = (change / means[i-1]) * 100
        print(f"  {sensitivity_stats.index[i-1]} → {sensitivity_stats.index[i]}:")
        print(f"    Δ = {change:.4f} ({pct_change:+.1f}%)")

In [None]:
if df is not None:
    # Box plot by error range
    plt.figure(figsize=(12, 6))
    sns.boxplot(data=df, x='error_bin', y='distance', palette='Set2')
    plt.xlabel('Error Rate Range')
    plt.ylabel('Semantic Distance')
    plt.title('Semantic Distance Distribution by Error Rate Range')
    plt.xticks(rotation=15)
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()

## 6. Visualization <a id='visualization'></a>

Create publication-quality visualizations of the experimental results.

In [None]:
if df is not None:
    # Main scatter plot with regression
    fig, ax = plt.subplots(figsize=(14, 8))
    
    # Scatter plot
    scatter = ax.scatter(
        df['error_rate'] * 100,  # Convert to percentage
        df['distance'],
        alpha=0.6,
        s=100,
        c=df['distance'],
        cmap='viridis',
        edgecolors='black',
        linewidth=0.5
    )
    
    # Add regression line
    z = np.polyfit(df['error_rate'], df['distance'], 2)
    p = np.poly1d(z)
    x_line = np.linspace(df['error_rate'].min(), df['error_rate'].max(), 100)
    ax.plot(
        x_line * 100, 
        p(x_line), 
        'r--', 
        linewidth=2, 
        label=f'Polynomial Fit (R²={r_value**2:.3f})'
    )
    
    # Formatting
    ax.set_xlabel('Spelling Error Rate (%)', fontsize=13, fontweight='bold')
    ax.set_ylabel('Semantic Distance (Cosine)', fontsize=13, fontweight='bold')
    ax.set_title(
        'Impact of Spelling Errors on Semantic Drift\nin Multi-Agent Translation Pipeline',
        fontsize=15,
        fontweight='bold',
        pad=20
    )
    ax.legend(fontsize=11)
    ax.grid(True, alpha=0.3)
    
    # Add colorbar
    cbar = plt.colorbar(scatter, ax=ax)
    cbar.set_label('Semantic Distance', fontsize=11)
    
    plt.tight_layout()
    plt.show()
    
    # Save high-resolution version
    # fig.savefig('results/graphs/research_analysis.png', dpi=300, bbox_inches='tight')
    # print("\n✓ High-resolution graph saved to results/graphs/research_analysis.png")

In [None]:
if df is not None:
    # Heatmap showing relationship
    fig, ax = plt.subplots(figsize=(10, 6))
    
    # Create pivot table for heatmap
    pivot_data = df.pivot_table(
        values='distance',
        index=pd.cut(df['error_rate'], bins=10),
        aggfunc=['mean', 'std', 'count']
    )
    
    # Display summary heatmap data
    print("Summary Statistics Heatmap Data:")
    display(pivot_data)
    
    # Create mean distance heatmap
    sns.heatmap(
        pivot_data['mean'].to_frame().T,
        annot=True,
        fmt='.3f',
        cmap='YlOrRd',
        cbar_kws={'label': 'Mean Semantic Distance'},
        ax=ax
    )
    ax.set_title('Mean Semantic Distance by Error Rate Range', fontsize=13, fontweight='bold')
    ax.set_xlabel('Error Rate Range')
    ax.set_ylabel('')
    plt.xticks(rotation=45, ha='right')
    plt.tight_layout()
    plt.show()

## 7. Conclusions <a id='conclusions'></a>

### Key Findings

Based on the statistical analysis:

1. **Correlation**: The analysis reveals the strength and significance of the relationship between spelling error rate and semantic distance.

2. **Non-linearity**: The polynomial fit suggests potential non-linear effects, indicating that LLMs may handle low error rates better than high ones.

3. **Sensitivity**: The sensitivity analysis shows how the impact varies across different error rate ranges.

4. **Robustness**: The standard deviation and range statistics indicate the variability in results, suggesting factors beyond error rate affect semantic drift.

### Implications

- **Input Quality Matters**: Even small improvements in input quality can reduce semantic drift significantly.
- **Error Thresholds**: There may be critical thresholds where semantic drift accelerates.
- **LLM Resilience**: Modern LLMs show some resilience to low error rates but struggle with high error rates.

### Future Work

1. Test with different LLM models
2. Explore different types of errors (semantic vs. orthographic)
3. Investigate error correction strategies
4. Analyze intermediate translation steps
5. Examine specific error patterns that cause maximum drift

---

**Generated with Claude Code**

**Co-Authored-By:** Claude <noreply@anthropic.com>