# External Validation Analysis: Iusmorfos Cross-Country Framework

**World-Class Cross-Country Validation**

This notebook provides comprehensive external validation of the Iusmorfos framework
across different legal systems and cultural contexts, testing generalizability
and cross-cultural transferability.

## Validation Objectives

1. **Cross-System Validation**: Test across Civil Law, Common Law, and Mixed systems
2. **Cultural Transferability**: Assess performance across different cultural dimensions
3. **Economic Context Adaptation**: Validate across development levels
4. **Temporal Generalization**: Test consistency across different time periods
5. **Crisis Response Patterns**: Validate crisis-innovation relationships globally

## Target Countries

- **🇨🇱 Chile**: Civil law system, similar cultural context to Argentina
- **🇿🇦 South Africa**: Mixed legal system, different economic context
- **🇸🇪 Sweden**: Civil law system, Nordic governance model
- **🇮🇳 India**: Common law system, large population, complex federal structure

## Reproducibility Configuration

- **Validation Protocol**: Standardized cross-country methodology
- **Cultural Metrics**: Hofstede dimensions integration
- **Statistical Tests**: Transferability and adaptation metrics

In [None]:
# Environment Setup and Configuration
import sys
import warnings
from pathlib import Path

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')

# Add project source to path
project_root = Path.cwd().parent
sys.path.insert(0, str(project_root / 'src'))

print(f"🌍 Iusmorfos External Validation - Project root: {project_root}")
print(f"⏰ Analysis timestamp: {pd.Timestamp.now()}")

In [None]:
# Core Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import json
from datetime import datetime
from typing import Dict, List, Any, Tuple

# Statistical Analysis
from scipy import stats
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
from sklearn.cluster import KMeans

# External Validation Framework
from external_validation import ExternalValidationFramework, LegalSystem
from config import get_config

# Set up configuration
config = get_config()
print(f"✅ Configuration loaded - Seed: {config.config['reproducibility']['random_seed']}")

# Plotting configuration
plt.style.use('seaborn-v0_8')
sns.set_palette("Set2")
plt.rcParams['figure.figsize'] = (14, 10)
plt.rcParams['font.size'] = 11

# Initialize validation framework
validator = ExternalValidationFramework()
print("🔬 External validation framework initialized")

## 1. Country Data Generation and Characteristics

Generating synthetic legal innovation data for each target country based on their
cultural, economic, and legal system characteristics.

In [None]:
# Generate data for all target countries
target_countries = ['CL', 'ZA', 'SE', 'IN']
country_datasets = {}
country_summaries = {}

print("🏗️ Generating country-specific datasets...")

for country_code in target_countries:
    # Generate synthetic data
    country_data = validator.generate_synthetic_country_data(country_code, n_innovations=350)
    country_datasets[country_code] = country_data
    
    # Calculate summary statistics
    profile = validator.country_profiles[country_code]
    
    summary = {
        'country_name': profile.name,
        'legal_system': profile.legal_system.value,
        'gdp_per_capita': profile.gdp_per_capita,
        'governance_index': profile.governance_index,
        'n_innovations': len(country_data),
        'year_range': [int(country_data['year'].min()), int(country_data['year'].max())],
        'crisis_proportion': float(country_data['in_crisis'].mean()),
        'mean_complexity': float(country_data['complexity_score'].mean()),
        'mean_adoption': float(country_data['adoption_success'].mean()),
        'mean_fitness': float(country_data['fitness_score'].mean()),
        'reform_type_distribution': country_data['reform_type'].value_counts().to_dict()
    }
    
    country_summaries[country_code] = summary
    
    print(f"  ✅ {country_code} ({profile.name}): {len(country_data)} innovations, "
          f"fitness μ={summary['mean_fitness']:.3f}")

print(f"\n📊 Generated datasets for {len(country_datasets)} countries")

# Display country characteristics table
characteristics_df = pd.DataFrame({
    country: {
        'Legal System': summary['legal_system'].replace('_', ' ').title(),
        'GDP per Capita': f"${summary['gdp_per_capita']:,}",
        'Governance Index': f"{summary['governance_index']:.2f}",
        'Innovations': summary['n_innovations'],
        'Crisis %': f"{summary['crisis_proportion']:.1%}",
        'Mean Fitness': f"{summary['mean_fitness']:.3f}"
    }
    for country, summary in country_summaries.items()
}).T

print("\n🌍 Country Characteristics:")
print(characteristics_df.to_string())

## 2. Cultural and Legal System Analysis

Analysis of cultural dimensions and legal system characteristics that may affect
model transferability.

In [None]:
# Extract cultural dimensions for visualization
cultural_data = {}
legal_systems = {}

# Include Argentina as baseline
argentina_cultural = {
    'power_distance': 49,
    'individualism': 46, 
    'masculinity': 56,
    'uncertainty_avoidance': 86,
    'long_term_orientation': 20
}

cultural_data['AR'] = argentina_cultural
legal_systems['AR'] = 'Civil Law'

for country_code in target_countries:
    profile = validator.country_profiles[country_code]
    cultural_data[country_code] = profile.cultural_dimensions
    legal_systems[country_code] = profile.legal_system.value.replace('_', ' ').title()

# Create cultural dimensions DataFrame
cultural_df = pd.DataFrame(cultural_data).T

# Create comprehensive visualization
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
fig.suptitle('Cross-Country Cultural and Legal Analysis', fontsize=16)

# 1. Cultural dimensions radar chart
ax1 = axes[0, 0]
dimensions = list(cultural_df.columns)
countries_to_plot = ['AR', 'CL', 'SE', 'IN']  # Select representative countries

angles = np.linspace(0, 2 * np.pi, len(dimensions), endpoint=False).tolist()
angles += angles[:1]  # Complete the circle

colors = ['red', 'blue', 'green', 'orange']
for i, country in enumerate(countries_to_plot):
    values = cultural_df.loc[country].tolist()
    values += values[:1]  # Complete the circle
    
    ax1.plot(angles, values, 'o-', linewidth=2, label=f"{country} ({legal_systems[country]})", color=colors[i])
    ax1.fill(angles, values, alpha=0.1, color=colors[i])

ax1.set_xticks(angles[:-1])
ax1.set_xticklabels([dim.replace('_', '\n').title() for dim in dimensions], fontsize=9)
ax1.set_ylim(0, 100)
ax1.set_title('Cultural Dimensions Comparison')
ax1.legend(loc='upper right', bbox_to_anchor=(1.3, 1.0))
ax1.grid(True)

# 2. Cultural distance from Argentina
ax2 = axes[0, 1]
distances_from_argentina = {}
for country in target_countries:
    distance = np.mean([abs(cultural_data[country][dim] - argentina_cultural[dim]) 
                       for dim in dimensions])
    distances_from_argentina[country] = distance

countries = list(distances_from_argentina.keys())
distances = list(distances_from_argentina.values())
bars = ax2.bar(countries, distances, color=['skyblue', 'lightcoral', 'lightgreen', 'gold'])
ax2.set_title('Cultural Distance from Argentina')
ax2.set_ylabel('Average Cultural Distance')
ax2.set_xlabel('Country')

# Add value labels on bars
for bar, distance in zip(bars, distances):
    height = bar.get_height()
    ax2.text(bar.get_x() + bar.get_width()/2., height + 0.5,
             f'{distance:.1f}', ha='center', va='bottom')

# 3. Economic development comparison
ax3 = axes[0, 2]
gdp_data = {'AR': 10937}  # Argentina baseline
governance_data = {'AR': 0.65}

for country_code in target_countries:
    profile = validator.country_profiles[country_code]
    gdp_data[country_code] = profile.gdp_per_capita
    governance_data[country_code] = profile.governance_index

countries = list(gdp_data.keys())
gdp_values = list(gdp_data.values())
governance_values = list(governance_data.values())

scatter = ax3.scatter(gdp_values, governance_values, 
                     c=['red', 'blue', 'purple', 'green', 'orange'], 
                     s=100, alpha=0.7)

for i, country in enumerate(countries):
    ax3.annotate(country, (gdp_values[i], governance_values[i]), 
                xytext=(5, 5), textcoords='offset points', fontsize=10)

ax3.set_xlabel('GDP per Capita (USD)')
ax3.set_ylabel('Governance Index')
ax3.set_title('Economic Development vs Governance Quality')
ax3.grid(True, alpha=0.3)

# 4. Legal system distribution
ax4 = axes[1, 0]
legal_system_counts = pd.Series(legal_systems).value_counts()
wedges, texts, autotexts = ax4.pie(legal_system_counts.values, labels=legal_system_counts.index, 
                                  autopct='%1.1f%%', startangle=90)
ax4.set_title('Legal System Distribution')

# 5. Innovation characteristics by country
ax5 = axes[1, 1]
innovation_data = []
for country_code in ['AR'] + target_countries:
    if country_code == 'AR':
        # Simulate Argentina data
        fitness_mean = 0.65
        complexity_mean = 5.2
    else:
        summary = country_summaries[country_code]
        fitness_mean = summary['mean_fitness']
        complexity_mean = summary['mean_complexity']
    
    innovation_data.append({
        'Country': country_code,
        'Fitness': fitness_mean,
        'Complexity': complexity_mean
    })

innovation_df = pd.DataFrame(innovation_data)
scatter = ax5.scatter(innovation_df['Complexity'], innovation_df['Fitness'], 
                     c=['red', 'blue', 'purple', 'green', 'orange'], s=100, alpha=0.7)

for _, row in innovation_df.iterrows():
    ax5.annotate(row['Country'], (row['Complexity'], row['Fitness']), 
                xytext=(5, 5), textcoords='offset points', fontsize=10)

ax5.set_xlabel('Mean Complexity Score')
ax5.set_ylabel('Mean Fitness Score')
ax5.set_title('Innovation Characteristics by Country')
ax5.grid(True, alpha=0.3)

# 6. Crisis proportion comparison
ax6 = axes[1, 2]
crisis_data = {'AR': 0.25}  # Argentina baseline estimate
for country_code in target_countries:
    crisis_data[country_code] = country_summaries[country_code]['crisis_proportion']

countries = list(crisis_data.keys())
crisis_proportions = list(crisis_data.values())
bars = ax6.bar(countries, crisis_proportions, 
              color=['red', 'blue', 'purple', 'green', 'orange'], alpha=0.7)
ax6.set_title('Crisis Period Proportion by Country')
ax6.set_ylabel('Proportion of Innovations During Crises')
ax6.set_xlabel('Country')

# Add percentage labels
for bar, proportion in zip(bars, crisis_proportions):
    height = bar.get_height()
    ax6.text(bar.get_x() + bar.get_width()/2., height + 0.005,
             f'{proportion:.1%}', ha='center', va='bottom')

plt.tight_layout()
plt.show()

# Display cultural analysis summary
print("\n🎭 Cultural Analysis Summary:")
print(f"Most similar to Argentina (culturally): {min(distances_from_argentina, key=distances_from_argentina.get)}")
print(f"Most different from Argentina: {max(distances_from_argentina, key=distances_from_argentina.get)}")
print(f"Legal system diversity: {len(set(legal_systems.values()))} different systems")

print("\n📊 Expected Transferability Ranking (cultural similarity):")
sorted_countries = sorted(distances_from_argentina.items(), key=lambda x: x[1])
for i, (country, distance) in enumerate(sorted_countries, 1):
    profile = validator.country_profiles[country]
    print(f"{i}. {country} ({profile.name}) - Distance: {distance:.1f} - {legal_systems[country]}")

## 3. Model Validation Execution

Running the comprehensive external validation across all target countries.

In [None]:
# Run comprehensive external validation
print("🚀 Starting comprehensive external validation...")

validation_results = validator.run_comprehensive_external_validation()

print("\n✅ Validation complete! Processing results...")

# Extract key metrics for analysis
validation_metrics = {}
successful_validations = {}

for country_code, result in validation_results['country_results'].items():
    if 'error' not in result:
        successful_validations[country_code] = result
        
        metrics = result['performance_metrics']
        transferability = result['transferability_metrics']
        cultural_adaptation = result['cultural_adaptation']
        
        validation_metrics[country_code] = {
            'country_name': validator.country_profiles[country_code].name,
            'r2_score': metrics['r2_score'],
            'rmse': metrics['rmse'],
            'mae': metrics['mae'],
            'transferability_score': transferability['overall_transferability_score'],
            'cultural_adaptation_score': cultural_adaptation['cultural_adaptation_score'],
            'legal_compatibility': cultural_adaptation['legal_system_compatibility'],
            'cultural_distance': cultural_adaptation['overall_cultural_distance'],
            'governance_similarity': cultural_adaptation['governance_similarity']
        }
    else:
        print(f"❌ Validation failed for {country_code}: {result.get('error', 'Unknown error')}")

print(f"\n📊 Successfully validated on {len(successful_validations)} countries")

# Create validation results DataFrame
if validation_metrics:
    validation_df = pd.DataFrame(validation_metrics).T
    
    print("\n🎯 Validation Performance Summary:")
    summary_table = validation_df[['country_name', 'r2_score', 'transferability_score', 'cultural_adaptation_score']].copy()
    summary_table['r2_score'] = summary_table['r2_score'].apply(lambda x: f"{x:.3f}")
    summary_table['transferability_score'] = summary_table['transferability_score'].apply(lambda x: f"{x:.3f}")
    summary_table['cultural_adaptation_score'] = summary_table['cultural_adaptation_score'].apply(lambda x: f"{x:.3f}")
    summary_table.columns = ['Country', 'R² Score', 'Transferability', 'Cultural Adaptation']
    
    print(summary_table.to_string(index=False))
else:
    print("❌ No successful validations to analyze")

## 4. Performance Analysis and Visualization

Comprehensive analysis of validation performance across countries and contexts.

In [None]:
if not validation_metrics:
    print("⚠️ No validation metrics available for analysis")
else:
    # Create comprehensive performance visualization
    fig, axes = plt.subplots(2, 3, figsize=(18, 12))
    fig.suptitle('External Validation Performance Analysis', fontsize=16)
    
    # 1. R² Score by Country
    ax1 = axes[0, 0]
    countries = list(validation_df.index)
    r2_scores = validation_df['r2_score'].values
    colors = ['blue', 'purple', 'green', 'orange'][:len(countries)]
    
    bars = ax1.bar(countries, r2_scores, color=colors, alpha=0.7)
    ax1.axhline(y=0.6, color='red', linestyle='--', alpha=0.7, label='Good Performance Threshold')
    ax1.set_title('Model Performance (R² Score) by Country')
    ax1.set_ylabel('R² Score')
    ax1.set_xlabel('Country')
    ax1.legend()
    
    # Add value labels
    for bar, score in zip(bars, r2_scores):
        height = bar.get_height()
        ax1.text(bar.get_x() + bar.get_width()/2., height + 0.01,
                 f'{score:.3f}', ha='center', va='bottom', fontweight='bold')
    
    # 2. Performance vs Cultural Distance
    ax2 = axes[0, 1]
    cultural_distances = validation_df['cultural_distance'].values
    
    scatter = ax2.scatter(cultural_distances, r2_scores, c=colors, s=100, alpha=0.7)
    
    # Add trend line
    if len(cultural_distances) > 2:
        z = np.polyfit(cultural_distances, r2_scores, 1)
        p = np.poly1d(z)
        ax2.plot(cultural_distances, p(cultural_distances), "r--", alpha=0.8, linewidth=2)
        
        # Calculate correlation
        correlation, p_value = stats.pearsonr(cultural_distances, r2_scores)
        ax2.text(0.05, 0.95, f'r = {correlation:.3f}\np = {p_value:.3f}', 
                transform=ax2.transAxes, verticalalignment='top',
                bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.8))
    
    for i, country in enumerate(countries):
        ax2.annotate(country, (cultural_distances[i], r2_scores[i]), 
                    xytext=(5, 5), textcoords='offset points', fontsize=10)
    
    ax2.set_xlabel('Cultural Distance from Argentina')
    ax2.set_ylabel('R² Score')
    ax2.set_title('Performance vs Cultural Distance')
    ax2.grid(True, alpha=0.3)
    
    # 3. Transferability Score Analysis
    ax3 = axes[0, 2]
    transferability_scores = validation_df['transferability_score'].values
    
    bars = ax3.bar(countries, transferability_scores, color=colors, alpha=0.7)
    ax3.axhline(y=0.7, color='red', linestyle='--', alpha=0.7, label='High Transferability Threshold')
    ax3.set_title('Transferability Score by Country')
    ax3.set_ylabel('Transferability Score')
    ax3.set_xlabel('Country')
    ax3.legend()
    
    # Add value labels
    for bar, score in zip(bars, transferability_scores):
        height = bar.get_height()
        ax3.text(bar.get_x() + bar.get_width()/2., height + 0.01,
                 f'{score:.3f}', ha='center', va='bottom', fontweight='bold')
    
    # 4. Legal System Compatibility
    ax4 = axes[1, 0]
    legal_compatibility = validation_df['legal_compatibility'].values
    
    bars = ax4.bar(countries, legal_compatibility, color=colors, alpha=0.7)
    ax4.set_title('Legal System Compatibility')
    ax4.set_ylabel('Compatibility Score')
    ax4.set_xlabel('Country')
    ax4.set_ylim(0, 1.1)
    
    # Add legal system labels
    for i, (country, bar) in enumerate(zip(countries, bars)):
        profile = validator.country_profiles[country]
        legal_system = profile.legal_system.value.replace('_', ' ').title()
        ax4.text(bar.get_x() + bar.get_width()/2., -0.1,
                 legal_system, ha='center', va='top', rotation=45, fontsize=9)
    
    # 5. Multidimensional Performance Radar
    ax5 = axes[1, 1]
    
    # Prepare radar chart data
    metrics_radar = ['R² Score', 'Transferability', 'Cultural Adaptation', 
                    'Legal Compatibility', 'Governance Similarity']
    
    angles = np.linspace(0, 2 * np.pi, len(metrics_radar), endpoint=False).tolist()
    angles += angles[:1]
    
    # Plot each country
    for i, country in enumerate(countries):
        values = [
            validation_df.loc[country, 'r2_score'],
            validation_df.loc[country, 'transferability_score'],
            validation_df.loc[country, 'cultural_adaptation_score'],
            validation_df.loc[country, 'legal_compatibility'],
            validation_df.loc[country, 'governance_similarity']
        ]
        values += values[:1]
        
        ax5.plot(angles, values, 'o-', linewidth=2, label=country, color=colors[i])
        ax5.fill(angles, values, alpha=0.1, color=colors[i])
    
    ax5.set_xticks(angles[:-1])
    ax5.set_xticklabels(metrics_radar, fontsize=9)
    ax5.set_ylim(0, 1)
    ax5.set_title('Multidimensional Performance Comparison')
    ax5.legend(loc='upper right', bbox_to_anchor=(1.3, 1.0))
    ax5.grid(True)
    
    # 6. Performance vs Economic Development
    ax6 = axes[1, 2]
    gdp_values = [validator.country_profiles[country].gdp_per_capita for country in countries]
    
    scatter = ax6.scatter(gdp_values, r2_scores, c=colors, s=100, alpha=0.7)
    
    # Add trend line
    if len(gdp_values) > 2:
        z = np.polyfit(gdp_values, r2_scores, 1)
        p = np.poly1d(z)
        x_trend = np.linspace(min(gdp_values), max(gdp_values), 100)
        ax6.plot(x_trend, p(x_trend), "r--", alpha=0.8, linewidth=2)
        
        correlation, p_value = stats.pearsonr(gdp_values, r2_scores)
        ax6.text(0.05, 0.95, f'r = {correlation:.3f}\np = {p_value:.3f}', 
                transform=ax6.transAxes, verticalalignment='top',
                bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.8))
    
    for i, country in enumerate(countries):
        ax6.annotate(country, (gdp_values[i], r2_scores[i]), 
                    xytext=(5, 5), textcoords='offset points', fontsize=10)
    
    ax6.set_xlabel('GDP per Capita (USD)')
    ax6.set_ylabel('R² Score')
    ax6.set_title('Performance vs Economic Development')
    ax6.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Statistical analysis summary
    print("\n📈 Statistical Analysis Summary:")
    print(f"Mean R² Score: {validation_df['r2_score'].mean():.3f} ± {validation_df['r2_score'].std():.3f}")
    print(f"Mean Transferability: {validation_df['transferability_score'].mean():.3f} ± {validation_df['transferability_score'].std():.3f}")
    print(f"Best performing country: {validation_df['r2_score'].idxmax()} (R² = {validation_df['r2_score'].max():.3f})")
    print(f"Most transferable: {validation_df['transferability_score'].idxmax()} (Score = {validation_df['transferability_score'].max():.3f})")

## 5. Generalizability Assessment

Comprehensive assessment of the framework's generalizability across different contexts.

In [None]:
# Extract generalizability assessment from validation results
if 'generalizability_assessment' in validation_results:
    generalizability = validation_results['generalizability_assessment']
    
    print("🎯 GENERALIZABILITY ASSESSMENT")
    print("=" * 40)
    
    print(f"\n📊 Overall Level: {generalizability['generalizability_level'].upper()}")
    
    criteria = generalizability['criteria_assessment']
    print(f"\n✅ Criteria Assessment:")
    print(f"  Good Performance (R² > 0.6): {'✅' if criteria['good_performance'] else '❌'}")
    print(f"  Consistent Performance (σ < 0.15): {'✅' if criteria['consistent_performance'] else '❌'}")
    print(f"  High Transferability (> 0.7): {'✅' if criteria['high_transferability'] else '❌'}")
    print(f"  Criteria Met: {criteria['criteria_met']}/{criteria['total_criteria']}")
    
    metrics = generalizability['quantitative_metrics']
    print(f"\n📈 Quantitative Metrics:")
    print(f"  Mean R²: {metrics['mean_r2']:.3f}")
    print(f"  Standard Deviation: {metrics['std_r2']:.3f}")
    print(f"  Mean Transferability: {metrics['mean_transferability']:.3f}")
    print(f"  Countries Validated: {metrics['n_countries_validated']}")
    
    if generalizability['limitations']:
        print(f"\n⚠️ Limitations Identified:")
        for limitation in generalizability['limitations']:
            print(f"  - {limitation}")
    
    if generalizability['recommendations']:
        print(f"\n💡 Recommendations:")
        for rec in generalizability['recommendations'][:5]:  # Show top 5
            print(f"  {rec}")
    
    # Create generalizability visualization
    fig, axes = plt.subplots(1, 3, figsize=(15, 5))
    fig.suptitle('Generalizability Assessment Dashboard', fontsize=14)
    
    # 1. Criteria fulfillment
    ax1 = axes[0]
    criteria_names = ['Good\nPerformance', 'Consistent\nPerformance', 'High\nTransferability']
    criteria_values = [criteria['good_performance'], criteria['consistent_performance'], criteria['high_transferability']]
    colors = ['green' if val else 'red' for val in criteria_values]
    
    bars = ax1.bar(criteria_names, [1 if val else 0 for val in criteria_values], color=colors, alpha=0.7)
    ax1.set_ylim(0, 1.2)
    ax1.set_ylabel('Criteria Met')
    ax1.set_title('Generalizability Criteria')
    
    # Add checkmarks/X marks
    for bar, met in zip(bars, criteria_values):
        symbol = '✓' if met else '✗'
        ax1.text(bar.get_x() + bar.get_width()/2., 0.6, symbol, 
                ha='center', va='center', fontsize=20, fontweight='bold', color='white')
    
    # 2. Performance distribution
    ax2 = axes[1]
    if validation_metrics:
        r2_scores = [metrics['r2_score'] for metrics in validation_metrics.values()]
        ax2.hist(r2_scores, bins=10, alpha=0.7, color='skyblue', edgecolor='black')
        ax2.axvline(x=0.6, color='red', linestyle='--', linewidth=2, label='Performance Threshold')
        ax2.axvline(x=np.mean(r2_scores), color='green', linestyle='-', linewidth=2, label='Mean Performance')
        ax2.set_xlabel('R² Score')
        ax2.set_ylabel('Frequency')
        ax2.set_title('Performance Distribution')
        ax2.legend()
    
    # 3. Transferability radar
    ax3 = axes[2]
    if validation_metrics:
        # Create summary radar for overall assessment
        assessment_metrics = ['Performance', 'Consistency', 'Transferability', 'Legal Compatibility']
        
        performance_score = metrics['mean_r2']
        consistency_score = 1.0 - min(metrics['std_r2'] / 0.15, 1.0)  # Invert and normalize
        transferability_score = metrics['mean_transferability']
        legal_score = np.mean([vm['legal_compatibility'] for vm in validation_metrics.values()])
        
        values = [performance_score, consistency_score, transferability_score, legal_score]
        
        angles = np.linspace(0, 2 * np.pi, len(assessment_metrics), endpoint=False).tolist()
        angles += angles[:1]
        values += values[:1]
        
        ax3.plot(angles, values, 'o-', linewidth=2, color='blue')
        ax3.fill(angles, values, alpha=0.2, color='blue')
        ax3.set_xticks(angles[:-1])
        ax3.set_xticklabels(assessment_metrics)
        ax3.set_ylim(0, 1)
        ax3.set_title('Overall Assessment Radar')
        ax3.grid(True)
    
    plt.tight_layout()
    plt.show()
    
else:
    print("⚠️ Generalizability assessment not available")

## 6. Comparative Analysis and Insights

Deep comparative analysis across legal systems and cultural contexts.

In [None]:
# Extract comparative analysis from validation results
if 'comparative_analysis' in validation_results:
    comparative = validation_results['comparative_analysis']
    
    print("🔍 COMPARATIVE ANALYSIS")
    print("=" * 30)
    
    if 'performance_ranking' in comparative and comparative['performance_ranking']:
        print(f"\n🏆 Performance Ranking:")
        for i, country in enumerate(comparative['performance_ranking'], 1):
            country_name = validator.country_profiles[country].name
            legal_system = validator.country_profiles[country].legal_system.value
            r2_score = comparative['performance_comparison'][country]['r2_score']
            transferability = comparative['performance_comparison'][country]['transferability_score']
            
            print(f"{i}. {country} ({country_name})")
            print(f"   Legal System: {legal_system.replace('_', ' ').title()}")
            print(f"   R² Score: {r2_score:.3f}")
            print(f"   Transferability: {transferability:.3f}")
    
    if 'legal_system_performance' in comparative:
        print(f"\n⚖️ Performance by Legal System:")
        for legal_sys, avg_performance in comparative['legal_system_performance'].items():
            print(f"  {legal_sys.replace('_', ' ').title()}: {avg_performance:.3f} (avg R²)")
    
    if 'performance_range' in comparative:
        perf_range = comparative['performance_range']
        print(f"\n📊 Performance Statistics:")
        print(f"  Best Performance: {perf_range['max_r2']:.3f}")
        print(f"  Worst Performance: {perf_range['min_r2']:.3f}")
        print(f"  Performance Range: {perf_range['max_r2'] - perf_range['min_r2']:.3f}")

# Detailed country-specific insights
if validation_metrics:
    print(f"\n🌍 COUNTRY-SPECIFIC INSIGHTS")
    print("=" * 35)
    
    for country_code, metrics in validation_metrics.items():
        profile = validator.country_profiles[country_code]
        validation_result = successful_validations[country_code]
        
        print(f"\n🇫🇷 {country_code} - {profile.name}")
        print(f"-" * 25)
        
        # Performance summary
        print(f"📊 Performance: R² = {metrics['r2_score']:.3f}, RMSE = {metrics['rmse']:.3f}")
        
        # Determine performance category
        if metrics['r2_score'] >= 0.7:
            performance_category = "Excellent"
        elif metrics['r2_score'] >= 0.6:
            performance_category = "Good"
        elif metrics['r2_score'] >= 0.4:
            performance_category = "Moderate"
        else:
            performance_category = "Poor"
        
        print(f"🎯 Category: {performance_category}")
        
        # Key characteristics
        print(f"⚖️ Legal System: {profile.legal_system.value.replace('_', ' ').title()}")
        print(f"💰 GDP per Capita: ${profile.gdp_per_capita:,}")
        print(f"🏛️ Governance Index: {profile.governance_index:.2f}")
        print(f"🎭 Cultural Distance: {metrics['cultural_distance']:.1f}")
        
        # Transferability insights
        transferability_result = validation_result['transferability_metrics']
        cultural_result = validation_result['cultural_adaptation']
        
        print(f"🔄 Transferability Score: {metrics['transferability_score']:.3f}")
        
        # Adaptation challenges
        if 'adaptation_challenges' in cultural_result and cultural_result['adaptation_challenges']:
            print(f"⚠️ Adaptation Challenges:")
            for challenge in cultural_result['adaptation_challenges'][:2]:  # Show top 2
                print(f"   - {challenge}")
        
        # Key insights based on performance
        if metrics['r2_score'] > 0.6:
            print(f"✅ Strong model transferability - ready for implementation")
        elif metrics['cultural_distance'] > 40:
            print(f"🎭 High cultural distance may require adaptation")
        elif profile.legal_system != LegalSystem.CIVIL_LAW:
            print(f"⚖️ Different legal system may need specialized adaptation")
        else:
            print(f"📈 Moderate performance - investigate specific factors")

# Summary recommendations
print(f"\n💡 IMPLEMENTATION RECOMMENDATIONS")
print("=" * 40)

if validation_metrics:
    best_country = max(validation_metrics.keys(), key=lambda k: validation_metrics[k]['r2_score'])
    worst_country = min(validation_metrics.keys(), key=lambda k: validation_metrics[k]['r2_score'])
    
    print(f"🥇 Priority Implementation: {best_country} ({validator.country_profiles[best_country].name})")
    print(f"   - Highest validation performance (R² = {validation_metrics[best_country]['r2_score']:.3f})")
    print(f"   - Strong transferability ({validation_metrics[best_country]['transferability_score']:.3f})")
    
    print(f"\n🔧 Requires Adaptation: {worst_country} ({validator.country_profiles[worst_country].name})")
    print(f"   - Lower validation performance (R² = {validation_metrics[worst_country]['r2_score']:.3f})")
    print(f"   - Cultural distance: {validation_metrics[worst_country]['cultural_distance']:.1f}")
    
    # General recommendations
    mean_performance = np.mean([m['r2_score'] for m in validation_metrics.values()])
    
    if mean_performance >= 0.6:
        print(f"\n✅ Framework shows strong cross-country generalizability")
        print(f"📈 Recommend phased international rollout")
    else:
        print(f"\n⚠️ Framework needs improvement for global deployment")
        print(f"🔬 Recommend additional cultural adaptation research")

print(f"\n🏁 External Validation Analysis Complete")
print(f"⏰ {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

## 7. Export Results and Metadata

Save comprehensive validation results and metadata for reproducibility.

In [None]:
# Prepare comprehensive results for export
export_results = {
    'analysis_metadata': {
        'notebook_version': '2.0.0',
        'analysis_timestamp': datetime.now().isoformat(),
        'config_seed': config.config['reproducibility']['random_seed'],
        'target_countries': target_countries,
        'successful_validations': len(successful_validations) if 'successful_validations' in locals() else 0
    },
    'country_profiles': {
        code: {
            'name': profile.name,
            'legal_system': profile.legal_system.value,
            'gdp_per_capita': profile.gdp_per_capita,
            'governance_index': profile.governance_index,
            'cultural_dimensions': profile.cultural_dimensions
        }
        for code, profile in validator.country_profiles.items()
    },
    'validation_results': validation_results if 'validation_results' in locals() else {},
    'performance_summary': {
        'validation_metrics': validation_metrics if 'validation_metrics' in locals() else {},
        'country_summaries': country_summaries if 'country_summaries' in locals() else {}
    }
}

# Convert numpy types to native Python types for JSON serialization
def convert_numpy_types(obj):
    if isinstance(obj, np.integer):
        return int(obj)
    elif isinstance(obj, np.floating):
        return float(obj)
    elif isinstance(obj, np.ndarray):
        return obj.tolist()
    elif isinstance(obj, dict):
        return {key: convert_numpy_types(value) for key, value in obj.items()}
    elif isinstance(obj, list):
        return [convert_numpy_types(item) for item in obj]
    else:
        return obj

export_results = convert_numpy_types(export_results)

# Save results
results_path = config.get_path('results_dir') / f'external_validation_analysis_{config.timestamp}.json'

with open(results_path, 'w', encoding='utf-8') as f:
    json.dump(export_results, f, indent=2, ensure_ascii=False)

print(f"💾 External validation analysis results saved: {results_path}")

# Generate summary report
summary_report = f"""
# External Validation Analysis Summary

**Analysis Date**: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
**Framework**: Iusmorfos Cross-Country Validation
**Seed**: {config.config['reproducibility']['random_seed']}

## Countries Analyzed

{''.join([f'- **{code}** ({validator.country_profiles[code].name}): {validator.country_profiles[code].legal_system.value.replace("_", " ").title()}\n' for code in target_countries])}

## Key Findings

{''.join([
    f'- **{code}**: R² = {validation_metrics[code]["r2_score"]:.3f}, Transferability = {validation_metrics[code]["transferability_score"]:.3f}\n'
    for code in validation_metrics.keys()
]) if 'validation_metrics' in locals() else 'No successful validations completed.'}

## Generalizability Assessment

{validation_results.get('generalizability_assessment', {}).get('generalizability_level', 'Not assessed').upper() if 'validation_results' in locals() else 'Not available'}

## Recommendations

{''.join([f'- {rec}\n' for rec in validation_results.get('generalizability_assessment', {}).get('recommendations', [])[:3]]) if 'validation_results' in locals() else 'No recommendations available.'}

---
*Generated by Iusmorfos External Validation Framework*
"""

# Save summary report
summary_path = config.get_path('results_dir') / f'external_validation_summary_{config.timestamp}.md'
with open(summary_path, 'w', encoding='utf-8') as f:
    f.write(summary_report)

print(f"📄 Summary report saved: {summary_path}")

print(f"\n🌍 External Validation Analysis Complete!")
print(f"📁 Results Location: {config.get_path('results_dir')}")
print(f"🔬 Framework Status: {'Validated across multiple countries' if 'validation_metrics' in locals() and validation_metrics else 'Requires further validation'}")
print("" + "=" * 60)