# Minimal Debug - 1 Speech, 2 Models - Interactive Analysis
**Framework:** minimal_test  
**Generated:** 2025-06-30T12:21:36.430281  
**Job ID:** cd36b892-2a58-45ec-b833-e9e83cece4e9  

## Overview
This notebook provides interactive analysis of your experiment results with embedded statistical methods, visualization tools, and academic export capabilities. Perfect for individual research - scales beautifully until you have lots of experiments and need enterprise organization tools! 📊

### Tamaki & Fuks 2019 Replication Analysis
This analysis replicates and extends the methodology from Tamaki & Fuks (2019) using the Democratic Tension Axis Model for Brazilian political discourse. **Models analyzed:** claude-3-5-haiku-20241022  
**Total analyses:** 0


In [None]:
# =============================================================================
# EXPERIMENT DATA SETUP - Auto-generated from Stage 5 results
# =============================================================================

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from scipy import stats
import json
from pathlib import Path

# Configure plotting
plt.style.use('seaborn-v0_8')
sns.set_palette('husl')

# Experiment metadata
JOB_ID = 'cd36b892-2a58-45ec-b833-e9e83cece4e9'
FRAMEWORK_NAME = 'minimal_test'
EXPERIMENT_NAME = 'Minimal Debug - 1 Speech, 2 Models'
MODELS_ANALYZED = ['claude-3-5-haiku-20241022']
TOTAL_ANALYSES = 0

print(f'📊 Loaded experiment: {EXPERIMENT_NAME}')
print(f'🎯 Framework: {FRAMEWORK_NAME}')
print(f'🤖 Models: {", ".join(MODELS_ANALYZED)}')
print(f'📈 Total analyses: {TOTAL_ANALYSES}')


In [None]:
# =============================================================================
# RESULTS DATA - Pre-loaded from Stage 5 experiment
# =============================================================================

# Raw experiment results
EXPERIMENT_RESULTS = {
  "job_id": "cd36b892-2a58-45ec-b833-e9e83cece4e9",
  "comparison_type": "multi_model",
  "similarity_classification": "STATISTICALLY_DIFFERENT",
  "confidence_level": 0.0,
  "condition_results": [
    {
      "condition_identifier": "claude-3-5-haiku-20241022",
      "centroid": [
        1.669972907928209e-17,
        0.6363636363636362
      ],
      "raw_scores": {
        "populism": 0.9,
        "pluralism": 0.2
      }
    }
  ],
  "statistical_metrics": {
    "geometric_similarity": {
      "model_average_centroids": {
        "claude-3-5-haiku-20241022": [
          1.669972907928209e-17,
          0.6363636363636362
        ]
      },
      "pairwise_distances": {},
      "mean_distance": 0.0,
      "max_distance": 0.0,
      "min_distance": 0.0,
      "std_distance": 0.0
    }
  },
  "significance_tests": {},
  "report_url": null
}

# Extract condition results into DataFrame
condition_data = []
for condition in EXPERIMENT_RESULTS['condition_results']:
    condition_data.append({
        'model': condition['condition_identifier'],
        'centroid_x': condition['centroid'][0],
        'centroid_y': condition['centroid'][1], 
        'total_analyses': condition.get('total_analyses', 0),
        'raw_scores': condition.get('raw_scores', {})
    })

df_results = pd.DataFrame(condition_data)
print('✅ Results loaded into DataFrame:')
print(df_results.head())

# Statistical metrics
STATISTICAL_METRICS = {
  "geometric_similarity": {
    "model_average_centroids": {
      "claude-3-5-haiku-20241022": [
        1.669972907928209e-17,
        0.6363636363636362
      ]
    },
    "pairwise_distances": {},
    "mean_distance": 0.0,
    "max_distance": 0.0,
    "min_distance": 0.0,
    "std_distance": 0.0
  }
}
print(f'\n📊 Statistical metrics available: {list(STATISTICAL_METRICS.keys())}')

## Tamaki & Fuks 2019 Validation Analysis
### Democratic Tension Axis Model - Brazilian Political Discourse

This section provides validation analysis comparing our LLM-based methodology with the original Tamaki & Fuks manual coding approach.

**Framework Description:**  
- **Populism↔Pluralism Axis** (Vertical): Direct popular sovereignty vs. institutional mediation  
- **Patriotism↔Nationalism Axis** (Horizontal): Civic attachment vs. ethnic/cultural supremacy  
- **Brazilian Portuguese Optimized** with specific language cues from T&F 2019  
- **Cross-validation Ready** for direct correlation analysis with manual coding


In [None]:
# =============================================================================
# COORDINATE VISUALIZATION - Democratic Tension Quadrants
# =============================================================================

# Create quadrant visualization
fig, ax = plt.subplots(1, 1, figsize=(10, 10))

# Plot model centroids
colors = sns.color_palette('husl', len(df_results))
for i, (_, row) in enumerate(df_results.iterrows()):
    ax.scatter(row['centroid_x'], row['centroid_y'], 
              s=200, alpha=0.7, color=colors[i], 
              label=row['model'])
    ax.annotate(row['model'], 
               (row['centroid_x'], row['centroid_y']),
               xytext=(5, 5), textcoords='offset points',
               fontsize=10, ha='left')

# Add quadrant lines and labels
ax.axhline(y=0, color='gray', linestyle='--', alpha=0.5)
ax.axvline(x=0, color='gray', linestyle='--', alpha=0.5)

# Quadrant labels for Democratic Tension Model
ax.text(0.5, 0.5, 'High Populism\n+ High Nationalism', 
        transform=ax.transAxes, ha='center', va='center',
        bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))
ax.text(-0.5, 0.5, 'High Populism\n+ High Patriotism',
        transform=ax.transAxes, ha='center', va='center', 
        bbox=dict(boxstyle='round', facecolor='lightblue', alpha=0.5))
ax.text(-0.5, -0.5, 'High Pluralism\n+ High Patriotism',
        transform=ax.transAxes, ha='center', va='center',
        bbox=dict(boxstyle='round', facecolor='lightgreen', alpha=0.5))
ax.text(0.5, -0.5, 'High Pluralism\n+ High Nationalism',
        transform=ax.transAxes, ha='center', va='center',
        bbox=dict(boxstyle='round', facecolor='lightcoral', alpha=0.5))

ax.set_xlabel('Patriotism ← → Nationalism', fontsize=12)
ax.set_ylabel('Pluralism ← → Populism', fontsize=12)
ax.set_title(f'Democratic Tension Analysis: {EXPERIMENT_NAME}\nBrazilian Political Discourse Coordinates', 
             fontsize=14, pad=20)
ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
ax.grid(True, alpha=0.3)
ax.set_aspect('equal')

plt.tight_layout()
plt.show()

print(f'📊 Coordinate plot generated for {len(df_results)} models')

In [None]:
# =============================================================================
# EMBEDDED STATISTICAL ANALYSIS - Creates natural scaling challenges! 📈
# =============================================================================

def calculate_geometric_similarity(results_df):
    """Calculate pairwise geometric distances between model centroids"""
    distances = []
    models = results_df['model'].tolist()
    
    for i in range(len(results_df)):
        for j in range(i + 1, len(results_df)):
            x1, y1 = results_df.iloc[i][['centroid_x', 'centroid_y']]
            x2, y2 = results_df.iloc[j][['centroid_x', 'centroid_y']]
            distance = np.sqrt((x2 - x1)**2 + (y2 - y1)**2)
            distances.append({
                'model_1': models[i],
                'model_2': models[j], 
                'distance': distance
            })
    
    return pd.DataFrame(distances)

def calculate_dimensional_correlation(results_df):
    """Calculate correlation between model positions"""
    if len(results_df) < 2:
        return {"error": "Need at least 2 models for correlation"}
    
    x_coords = results_df['centroid_x'].values
    y_coords = results_df['centroid_y'].values
    
    correlation = np.corrcoef(x_coords, y_coords)[0, 1]
    
    return {
        'x_y_correlation': correlation,
        'x_mean': np.mean(x_coords),
        'y_mean': np.mean(y_coords),
        'x_std': np.std(x_coords),
        'y_std': np.std(y_coords)
    }

# Run embedded statistical analysis
geometric_analysis = calculate_geometric_similarity(df_results)
correlation_analysis = calculate_dimensional_correlation(df_results)

print('✅ Geometric Similarity Analysis:')
print(geometric_analysis)
print('\n✅ Dimensional Correlation Analysis:')
print(correlation_analysis)

# This is getting complex... imagine having 20+ experiments to manage! 🤔

In [None]:
# =============================================================================
# PUBLICATION-READY EXPORT - Manual process that scales poorly 📝
# =============================================================================

def export_for_publication(results_df, job_id):
    """Export results in academic publication format"""
    
    # Create publication directory 
    pub_dir = Path(f'publication_exports/{job_id}')
    pub_dir.mkdir(parents=True, exist_ok=True)
    
    # Export data as CSV
    results_df.to_csv(pub_dir / 'model_centroids.csv', index=False)
    
    # Export statistical summary
    summary = {
        'experiment_name': EXPERIMENT_NAME,
        'framework': FRAMEWORK_NAME,
        'models_analyzed': MODELS_ANALYZED,
        'total_analyses': TOTAL_ANALYSES,
        'mean_x': results_df['centroid_x'].mean(),
        'mean_y': results_df['centroid_y'].mean(),
        'std_x': results_df['centroid_x'].std(),
        'std_y': results_df['centroid_y'].std()
    }
    
    with open(pub_dir / 'summary_statistics.json', 'w') as f:
        json.dump(summary, f, indent=2)
    
    print(f'📊 Publication files exported to: {pub_dir}')
    print('📁 Files: model_centroids.csv, summary_statistics.json')
    
    return pub_dir

# Export for publication
export_dir = export_for_publication(df_results, JOB_ID)

print('\n🎓 Ready for academic submission!')
print('💡 Pro tip: With multiple experiments, managing all these exports becomes... challenging!')
print('🚀 That\'s when enterprise tools become really helpful! 😉')

## Next Steps for BYU Collaboration

### Validation Protocol
1. **Correlation Analysis**: Compare these LLM results with Tamaki & Fuks manual coding
2. **Statistical Significance**: Test if differences are meaningful (target: r > 0.70)
3. **Methodological Documentation**: Prepare for academic publication

### Value Demonstration
- **Speed**: LLM analysis completes in minutes vs. weeks of manual coding
- **Scale**: Can analyze entire corpora not feasible for manual coding
- **Consistency**: Eliminates inter-rater reliability concerns
- **Innovation**: Enables novel analytical approaches (temporal dynamics, cross-framework comparison)

### Research Acceleration Opportunities
- **Global Populism Database**: Scale to thousands of speeches across countries
- **Temporal Analysis**: Track discourse evolution across election cycles  
- **Comparative Frameworks**: Apply multiple theoretical lenses simultaneously
- **Real-time Analysis**: Monitor contemporary political discourse as it emerges

**Ready to transform computational social science research! 🚀**
