# MBTI Faculty Voice Research - Google Colab

This notebook contains:
1. **MBTI Voice Accuracy Experiment** - Run the full 480-trial experiment
2. **Ada Lovelace Essay Generation** - Generate essay on MBTI research
3. **Upload to Commonplace** - Upload essay to Inquiry Institute Commonplace

## Setup

1. Install dependencies
2. Set API keys (OpenRouter, Supabase)
3. Run the cells below

## 1. Install Dependencies

In [1]:
%pip install -q openai pydantic python-dotenv requests pandas matplotlib seaborn

Note: you may need to restart the kernel to use updated packages.


## 2. Configure API Keys

Set your API keys below. For security, you can use Colab's secrets manager or set them directly.

In [2]:
import os

# Try to use Colab Secrets Manager (preferred method)
try:
    from google.colab import userdata
    print("üîê Using Colab Secrets Manager...")
    
    # Get secrets from Colab Secrets Manager
    OPENROUTER_API_KEY = userdata.get('OPENROUTER_API_KEY')
    SUPABASE_URL = userdata.get('SUPABASE_URL', 'https://xougqdomkoisrxdnagcj.supabase.co')
    SUPABASE_ANON_KEY = userdata.get('SUPABASE_ANON_KEY')
    
    os.environ["OPENROUTER_API_KEY"] = OPENROUTER_API_KEY
    os.environ["NEXT_PUBLIC_SUPABASE_URL"] = SUPABASE_URL
    os.environ["NEXT_PUBLIC_SUPABASE_ANON_KEY"] = SUPABASE_ANON_KEY
    
    print("‚úÖ API keys loaded from Colab Secrets!")
    
except (ModuleNotFoundError, KeyError) as e:
    # Fallback to manual input if not in Colab or secrets not set
    print("‚ö†Ô∏è  Colab Secrets not available, using manual input...")
    print("üí° Tip: Set secrets in Colab using the üîë icon in the left sidebar")
    print("   Secrets to add: OPENROUTER_API_KEY, SUPABASE_URL, SUPABASE_ANON_KEY\n")
    
    from getpass import getpass
    
    # OpenRouter API Key (required for experiment and essay generation)
    OPENROUTER_API_KEY = getpass("Enter OpenRouter API Key (sk-or-v1-...): ")
    os.environ["OPENROUTER_API_KEY"] = OPENROUTER_API_KEY
    
    # Supabase credentials (required for uploading to Commonplace)
    SUPABASE_URL = input("Enter Supabase URL (https://xxx.supabase.co): ").strip() or "https://xougqdomkoisrxdnagcj.supabase.co"
    os.environ["NEXT_PUBLIC_SUPABASE_URL"] = SUPABASE_URL
    
    SUPABASE_ANON_KEY = getpass("Enter Supabase Anon Key: ")
    os.environ["NEXT_PUBLIC_SUPABASE_ANON_KEY"] = SUPABASE_ANON_KEY
    
    print("\n‚úÖ API keys configured!")

‚ö†Ô∏è  Colab Secrets not available, using manual input...
üí° Tip: Set secrets in Colab using the üîë icon in the left sidebar
   Secrets to add: OPENROUTER_API_KEY, SUPABASE_URL, SUPABASE_ANON_KEY



StdinNotImplementedError: getpass was called, but this frontend does not support input requests.

## 3. Run MBTI Voice Accuracy Experiment

Run the full experiment or load existing results. The experiment tests 10 faculty personae across 16 MBTI types with 3 test prompts each (480 trials total).

In [3]:
# Option 1: Load existing results (if available)
# Uncomment to skip experiment and use existing results
# import pandas as pd
# df = pd.read_csv('mbti_voice_results.csv')
# df = df[df['voice_accuracy'] != -1]
# print(f"‚úÖ Loaded {len(df)} existing results")
# skip_experiment = True

skip_experiment = False  # Set to True to skip running the experiment

if not skip_experiment:
    print("üöÄ Running MBTI Voice Accuracy Experiment...")
    print("   This will test 10 personae √ó 16 MBTI types √ó 3 prompts = 480 trials")
    print("   This may take 15-30 minutes depending on API response times.\n")
    
    # Upload the experiment script from GitHub or use it directly
    # For now, we'll download it from the repo
    import requests
    
    try:
        # Download the experiment script from GitHub
        script_url = "https://raw.githubusercontent.com/InquiryInstitute/mbti-faculty-voice-research/main/mbti_voice_eval.py"
        response = requests.get(script_url)
        if response.status_code == 200:
            with open('mbti_voice_eval.py', 'w') as f:
                f.write(response.text)
            print("‚úÖ Downloaded mbti_voice_eval.py from GitHub")
        else:
            print("‚ö†Ô∏è  Could not download script from GitHub. Please upload mbti_voice_eval.py manually.")
            print("   You can upload it via: Files ‚Üí Upload to session storage")
    except Exception as e:
        print(f"‚ö†Ô∏è  Error downloading script: {e}")
        print("   Please upload mbti_voice_eval.py manually via Files ‚Üí Upload")
    
    # Now run the experiment
    print("\nüîÑ Starting experiment execution...\n")
    
    try:
        # Import and run the experiment
        import sys
        import importlib.util
        
        if os.path.exists('mbti_voice_eval.py'):
            spec = importlib.util.spec_from_file_location("mbti_voice_eval", "mbti_voice_eval.py")
            if spec is None or spec.loader is None:
                raise ImportError("Failed to create module spec for mbti_voice_eval.py")
            module = importlib.util.module_from_spec(spec)
            # Add the current directory to sys.path for any imports
            sys.path.insert(0, os.path.dirname(os.path.abspath('mbti_voice_eval.py')))
            spec.loader.exec_module(module)
            
            # Run the experiment
            module.run_experiment()
            print("\n‚úÖ Experiment completed! Results saved to mbti_voice_results.csv and mbti_voice_results.jsonl")
        else:
            print("‚ùå mbti_voice_eval.py not found. Please upload it manually.")
            print("   Option 1: Upload via Files ‚Üí Upload to session storage")
            print("   Option 2: Run: !wget https://raw.githubusercontent.com/InquiryInstitute/mbti-faculty-voice-research/main/mbti_voice_eval.py")
    except Exception as e:
        print(f"‚ùå Error running experiment: {e}")
        import traceback
        traceback.print_exc()
        print("\nüí° Fallback: You can run it manually with:")
        print("   !python mbti_voice_eval.py")
else:
    print("‚è≠Ô∏è  Skipping experiment - using existing results")

üöÄ Running MBTI Voice Accuracy Experiment...
   This will test 10 personae √ó 16 MBTI types √ó 3 prompts = 480 trials
   This may take 15-30 minutes depending on API response times.



‚úÖ Downloaded mbti_voice_eval.py from GitHub

üîÑ Starting experiment execution...



‚ùå Error running experiment: 'NoneType' object has no attribute '__dict__'

üí° Fallback: You can run it manually with:
   !python mbti_voice_eval.py


Traceback (most recent call last):
  File "/tmp/ipykernel_2286/3441200319.py", line 50, in <module>
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/runner/work/mbti-faculty-voice-research/mbti-faculty-voice-research/mbti_voice_eval.py", line 119, in <module>
    @dataclass(frozen=True)
     ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/dataclasses.py", line 1222, in wrap
    return _process_class(cls, init, repr, eq, order, unsafe_hash,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/dataclasses.py", line 947, in _process_class
    and _is_type(type, cls, dataclasses, dataclasses.KW_ONLY,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/da

## 4. Analyze Results and Generate Visualizations

Analyze the experiment results and create tables and graphs for inclusion in the essay.

In [4]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import json
from collections import defaultdict
import numpy as np

# Set style for better-looking plots
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 10

def load_results(jsonl_path="mbti_voice_results.jsonl", csv_path="mbti_voice_results.csv"):
    """Load experiment results from JSONL or CSV."""
    results = []
    
    # Try JSONL first
    try:
        with open(jsonl_path, 'r') as f:
            for line in f:
                record = json.loads(line)
                if record.get('voice_accuracy') and record.get('voice_accuracy') != -1:
                    results.append(record)
        print(f"‚úÖ Loaded {len(results)} valid results from {jsonl_path}")
        return results
    except FileNotFoundError:
        pass
    
    # Try CSV
    try:
        df = pd.read_csv(csv_path)
        # Filter valid results
        df_valid = df[df['voice_accuracy'] != -1]
        results = df_valid.to_dict('records')
        print(f"‚úÖ Loaded {len(results)} valid results from {csv_path}")
        return results
    except FileNotFoundError:
        print(f"‚ö†Ô∏è  No results file found. Run the experiment first.")
        return []

# Load results
results = load_results()

if results:
    df = pd.DataFrame(results)
    print(f"\nüìä Dataset Summary:")
    print(f"   Total valid trials: {len(df)}")
    print(f"   Personae: {df['persona_name'].nunique()}")
    print(f"   MBTI types: {df['mbti'].nunique()}")
    print(f"   Average voice accuracy: {df['voice_accuracy'].mean():.2f}")
else:
    print("‚ö†Ô∏è  No results to analyze. Please run the experiment first.")
    df = None

‚ö†Ô∏è  No results file found. Run the experiment first.
‚ö†Ô∏è  No results to analyze. Please run the experiment first.


In [5]:
if df is not None and len(df) > 0:
    # Convert numeric columns
    numeric_cols = ['voice_accuracy', 'style_marker_coverage', 'persona_consistency', 
                     'clarity', 'overfitting_to_mbti']
    for col in numeric_cols:
        if col in df.columns:
            df[col] = pd.to_numeric(df[col], errors='coerce')
    
    # Create summary statistics table
    print("=" * 60)
    print("SUMMARY STATISTICS")
    print("=" * 60)
    
    summary_stats = df[numeric_cols].describe()
    print("\nOverall Statistics:")
    print(summary_stats.round(2))
    
    # By MBTI type
    print("\n" + "=" * 60)
    print("BY MBTI TYPE")
    print("=" * 60)
    mbti_stats = df.groupby('mbti')['voice_accuracy'].agg(['mean', 'std', 'count']).round(2)
    mbti_stats.columns = ['Mean Accuracy', 'Std Dev', 'Count']
    mbti_stats = mbti_stats.sort_values('Mean Accuracy', ascending=False)
    print(mbti_stats)
    
    # By Persona
    print("\n" + "=" * 60)
    print("BY PERSONA")
    print("=" * 60)
    persona_stats = df.groupby('persona_name')['voice_accuracy'].agg(['mean', 'std', 'count']).round(2)
    persona_stats.columns = ['Mean Accuracy', 'Std Dev', 'Count']
    persona_stats = persona_stats.sort_values('Mean Accuracy', ascending=False)
    print(persona_stats)
    
    # Save summary tables
    mbti_stats.to_csv('mbti_summary_table.csv')
    persona_stats.to_csv('persona_summary_table.csv')
    print("\n‚úÖ Summary tables saved to CSV files")
    
    # Generate comprehensive results analysis for LLM
    print("\n" + "=" * 60)
    print("GENERATING COMPREHENSIVE RESULTS ANALYSIS")
    print("=" * 60)
    
    # Find best/worst performers
    best_persona = persona_stats.index[0]
    worst_persona = persona_stats.index[-1]
    best_mbti = mbti_stats.index[0]
    worst_mbti = mbti_stats.index[-1]
    
    # Calculate correlations
    correlations = df[['voice_accuracy', 'style_marker_coverage', 'persona_consistency', 
                       'clarity', 'overfitting_to_mbti']].corr()['voice_accuracy'].sort_values(ascending=False)
    
    # Generate detailed analysis text
    results_analysis = f"""
EXPERIMENTAL RESULTS ANALYSIS
==============================

Dataset Overview:
- Total valid trials: {len(df)}
- Personae tested: {df['persona_name'].nunique()}
- MBTI types tested: {df['mbti'].nunique()}
- Prompts per combination: {len(df) // (df['persona_name'].nunique() * df['mbti'].nunique())}

Overall Performance:
- Average voice accuracy: {df['voice_accuracy'].mean():.2f} (range: {df['voice_accuracy'].min():.2f} - {df['voice_accuracy'].max():.2f})
- Average persona consistency: {df['persona_consistency'].mean():.2f}
- Average style marker coverage: {df['style_marker_coverage'].mean():.2f}
- Average MBTI overfitting: {df['overfitting_to_mbti'].mean():.2f}

Top Performers:
- Best persona: {best_persona} (mean accuracy: {persona_stats.loc[best_persona, 'Mean Accuracy']:.2f})
- Best MBTI type: {best_mbti} (mean accuracy: {mbti_stats.loc[best_mbti, 'Mean Accuracy']:.2f})

Lowest Performers:
- Worst persona: {worst_persona} (mean accuracy: {persona_stats.loc[worst_persona, 'Mean Accuracy']:.2f})
- Worst MBTI type: {worst_mbti} (mean accuracy: {mbti_stats.loc[worst_mbti, 'Mean Accuracy']:.2f})

Key Correlations with Voice Accuracy:
{chr(10).join([f"- {metric}: {corr:.3f}" for metric, corr in correlations.items() if metric != 'voice_accuracy'])}

MBTI Type Performance (Top 5):
{chr(10).join([f"- {mbti}: {mbti_stats.loc[mbti, 'Mean Accuracy']:.2f} (n={int(mbti_stats.loc[mbti, 'Count'])})" for mbti in mbti_stats.head(5).index])}

Persona Performance (Top 5):
{chr(10).join([f"- {persona}: {persona_stats.loc[persona, 'Mean Accuracy']:.2f} (n={int(persona_stats.loc[persona, 'Count'])})" for persona in persona_stats.head(5).index])}

Statistical Insights:
- Standard deviation of voice accuracy: {df['voice_accuracy'].std():.2f}
- Trials with high overfitting (score > 3): {len(df[df['overfitting_to_mbti'] > 3])} ({len(df[df['overfitting_to_mbti'] > 3])/len(df)*100:.1f}%)
- Trials with high consistency (score >= 4): {len(df[df['persona_consistency'] >= 4])} ({len(df[df['persona_consistency'] >= 4])/len(df)*100:.1f}%)
"""
    
    # Statistical hypothesis testing
    print("\n" + "=" * 60)
    print("STATISTICAL HYPOTHESIS TESTING")
    print("=" * 60)
    
    from scipy import stats
    import numpy as np
    
    # Hypothesis 1: MBTI overlays improve voice accuracy
    # H0: Mean voice accuracy with MBTI = Mean voice accuracy without MBTI (baseline)
    # H1: Mean voice accuracy with MBTI > Mean voice accuracy without MBTI
    
    # For this test, we'll compare MBTI types to see if there's significant variation
    # and compare top performers vs bottom performers
    
    # Test 1: One-way ANOVA - Do MBTI types differ significantly in voice accuracy?
    mbti_groups = [df[df['mbti'] == mbti]['voice_accuracy'].values for mbti in df['mbti'].unique()]
    f_stat, p_value_anova = stats.f_oneway(*mbti_groups)
    
    # Test 2: T-test - Do top 25% MBTI types perform significantly better than bottom 25%?
    top_quartile_threshold = df.groupby('mbti')['voice_accuracy'].mean().quantile(0.75)
    bottom_quartile_threshold = df.groupby('mbti')['voice_accuracy'].mean().quantile(0.25)
    
    top_mbti_types = df.groupby('mbti')['voice_accuracy'].mean()[df.groupby('mbti')['voice_accuracy'].mean() >= top_quartile_threshold].index
    bottom_mbti_types = df.groupby('mbti')['voice_accuracy'].mean()[df.groupby('mbti')['voice_accuracy'].mean() <= bottom_quartile_threshold].index
    
    top_scores = df[df['mbti'].isin(top_mbti_types)]['voice_accuracy'].values
    bottom_scores = df[df['mbti'].isin(bottom_mbti_types)]['voice_accuracy'].values
    
    t_stat, p_value_ttest = stats.ttest_ind(top_scores, bottom_scores, alternative='greater')
    
    # Test 3: Correlation test - Is there a significant correlation between style coverage and accuracy?
    corr_coef, p_value_corr = stats.pearsonr(df['style_marker_coverage'], df['voice_accuracy'])
    
    # Test 4: Effect size (Cohen's d) for top vs bottom MBTI types
    pooled_std = np.sqrt(((len(top_scores) - 1) * top_scores.std()**2 + (len(bottom_scores) - 1) * bottom_scores.std()**2) / (len(top_scores) + len(bottom_scores) - 2))
    cohens_d = (top_scores.mean() - bottom_scores.mean()) / pooled_std if pooled_std > 0 else 0
    
    # Generate hypothesis testing results
    hypothesis_results = f"""

STATISTICAL HYPOTHESIS TESTING RESULTS
======================================

Primary Hypothesis: MBTI overlays improve voice accuracy in faculty agents.

Test 1: ANOVA - Do MBTI types differ significantly in voice accuracy?
- F-statistic: {f_stat:.4f}
- p-value: {p_value_anova:.6f}
- Result: {'REJECT H0' if p_value_anova < 0.05 else 'FAIL TO REJECT H0'} - {'MBTI types show significant variation' if p_value_anova < 0.05 else 'No significant variation between MBTI types'}
- Interpretation: {'There is statistically significant evidence that MBTI types produce different voice accuracy scores (p < 0.05)' if p_value_anova < 0.05 else 'No statistically significant evidence that MBTI types differ in voice accuracy (p >= 0.05)'}

Test 2: Independent T-test - Do top-performing MBTI types significantly outperform bottom performers?
- Top quartile MBTI types: {', '.join(top_mbti_types[:5])}
- Bottom quartile MBTI types: {', '.join(bottom_mbti_types[:5])}
- Top quartile mean: {top_scores.mean():.3f} (n={len(top_scores)})
- Bottom quartile mean: {bottom_scores.mean():.3f} (n={len(bottom_scores)})
- Mean difference: {top_scores.mean() - bottom_scores.mean():.3f}
- t-statistic: {t_stat:.4f}
- p-value: {p_value_ttest:.6f}
- Result: {'REJECT H0' if p_value_ttest < 0.05 else 'FAIL TO REJECT H0'} - {'Top MBTI types significantly outperform bottom types' if p_value_ttest < 0.05 else 'No significant difference between top and bottom MBTI types'}
- Effect size (Cohen's d): {cohens_d:.3f} ({'large' if abs(cohens_d) > 0.8 else 'medium' if abs(cohens_d) > 0.5 else 'small'} effect)
- Interpretation: {'Top-performing MBTI types produce significantly higher voice accuracy scores than bottom performers (p < 0.05)' if p_value_ttest < 0.05 else 'No statistically significant difference between top and bottom MBTI types (p >= 0.05)'}

Test 3: Pearson Correlation - Relationship between style marker coverage and voice accuracy
- Correlation coefficient: {corr_coef:.4f}
- p-value: {p_value_corr:.6f}
- Result: {'SIGNIFICANT CORRELATION' if p_value_corr < 0.05 else 'NO SIGNIFICANT CORRELATION'}
- Interpretation: {'There is a statistically significant correlation between style marker coverage and voice accuracy (p < 0.05)' if p_value_corr < 0.05 else 'No statistically significant correlation between style marker coverage and voice accuracy (p >= 0.05)'}

Overall Hypothesis Validation:
- Primary hypothesis {'SUPPORTED' if (p_value_anova < 0.05 and p_value_ttest < 0.05) else 'PARTIALLY SUPPORTED' if p_value_anova < 0.05 else 'NOT SUPPORTED'}: {'MBTI overlays show statistically significant effects on voice accuracy' if (p_value_anova < 0.05 and p_value_ttest < 0.05) else 'MBTI overlays show some variation but limited evidence of systematic improvement' if p_value_anova < 0.05 else 'No statistically significant evidence that MBTI overlays improve voice accuracy'}
- Statistical significance level: Œ± = 0.05
"""
    
    print(hypothesis_results)
    
    # Append hypothesis testing to results analysis
    results_analysis += hypothesis_results
    
    # Save analysis to file
    with open('results_analysis.txt', 'w') as f:
        f.write(results_analysis)
    
    # Save hypothesis testing results separately
    with open('hypothesis_testing_results.txt', 'w') as f:
        f.write(hypothesis_results)
    
    print("\n‚úÖ Comprehensive analysis saved to results_analysis.txt")
    print("‚úÖ Hypothesis testing results saved to hypothesis_testing_results.txt")
    
else:
    print("‚ö†Ô∏è  No data to summarize")
    results_analysis = None

‚ö†Ô∏è  No data to summarize


In [6]:
if df is not None and len(df) > 0:
    # Figure 1: Voice Accuracy Distribution
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))
    
    # 1. Histogram of voice accuracy
    axes[0, 0].hist(df['voice_accuracy'], bins=20, edgecolor='black', alpha=0.7)
    axes[0, 0].axvline(df['voice_accuracy'].mean(), color='red', linestyle='--', 
                       label=f'Mean: {df["voice_accuracy"].mean():.2f}')
    axes[0, 0].set_xlabel('Voice Accuracy Score')
    axes[0, 0].set_ylabel('Frequency')
    axes[0, 0].set_title('Distribution of Voice Accuracy Scores')
    axes[0, 0].legend()
    axes[0, 0].grid(True, alpha=0.3)
    
    # 2. Voice Accuracy by MBTI Type
    mbti_order = df.groupby('mbti')['voice_accuracy'].mean().sort_values(ascending=False).index
    mbti_means = df.groupby('mbti')['voice_accuracy'].mean().reindex(mbti_order)
    axes[0, 1].barh(range(len(mbti_means)), mbti_means.values)
    axes[0, 1].set_yticks(range(len(mbti_means)))
    axes[0, 1].set_yticklabels(mbti_means.index)
    axes[0, 1].set_xlabel('Mean Voice Accuracy')
    axes[0, 1].set_title('Voice Accuracy by MBTI Type')
    axes[0, 1].grid(True, alpha=0.3, axis='x')
    
    # 3. Voice Accuracy by Persona
    persona_order = df.groupby('persona_name')['voice_accuracy'].mean().sort_values(ascending=False).index
    persona_means = df.groupby('persona_name')['voice_accuracy'].mean().reindex(persona_order)
    axes[1, 0].barh(range(len(persona_means)), persona_means.values)
    axes[1, 0].set_yticks(range(len(persona_means)))
    axes[1, 0].set_yticklabels(persona_means.index, fontsize=8)
    axes[1, 0].set_xlabel('Mean Voice Accuracy')
    axes[1, 0].set_title('Voice Accuracy by Persona')
    axes[1, 0].grid(True, alpha=0.3, axis='x')
    
    # 4. Box plot: Voice Accuracy by MBTI
    df_sorted = df.copy()
    df_sorted['mbti'] = pd.Categorical(df_sorted['mbti'], categories=mbti_order)
    sns.boxplot(data=df_sorted, y='mbti', x='voice_accuracy', ax=axes[1, 1])
    axes[1, 1].set_xlabel('Voice Accuracy Score')
    axes[1, 1].set_ylabel('MBTI Type')
    axes[1, 1].set_title('Voice Accuracy Distribution by MBTI Type')
    
    plt.tight_layout()
    plt.savefig('voice_accuracy_analysis.png', dpi=300, bbox_inches='tight')
    print("‚úÖ Saved: voice_accuracy_analysis.png")
    plt.show()
else:
    print("‚ö†Ô∏è  No data to visualize")

‚ö†Ô∏è  No data to visualize


In [7]:
if df is not None and len(df) > 0:
    # Figure 2: Correlation and Multi-metric Analysis
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))
    
    # 1. Correlation heatmap
    corr_cols = ['voice_accuracy', 'style_marker_coverage', 'persona_consistency', 
                 'clarity', 'overfitting_to_mbti']
    corr_data = df[corr_cols].corr()
    sns.heatmap(corr_data, annot=True, fmt='.2f', cmap='coolwarm', center=0, 
                square=True, ax=axes[0, 0])
    axes[0, 0].set_title('Correlation Matrix of Evaluation Metrics')
    
    # 2. Style Marker Coverage vs Voice Accuracy
    axes[0, 1].scatter(df['style_marker_coverage'], df['voice_accuracy'], alpha=0.5)
    axes[0, 1].set_xlabel('Style Marker Coverage')
    axes[0, 1].set_ylabel('Voice Accuracy')
    axes[0, 1].set_title('Style Coverage vs Voice Accuracy')
    axes[0, 1].grid(True, alpha=0.3)
    
    # 3. Persona Consistency vs Voice Accuracy
    axes[1, 0].scatter(df['persona_consistency'], df['voice_accuracy'], alpha=0.5)
    axes[1, 0].set_xlabel('Persona Consistency')
    axes[1, 0].set_ylabel('Voice Accuracy')
    axes[1, 0].set_title('Persona Consistency vs Voice Accuracy')
    axes[1, 0].grid(True, alpha=0.3)
    
    # 4. MBTI Overfitting Distribution
    axes[1, 1].hist(df['overfitting_to_mbti'], bins=15, edgecolor='black', alpha=0.7)
    axes[1, 1].axvline(df['overfitting_to_mbti'].mean(), color='red', linestyle='--',
                       label=f'Mean: {df["overfitting_to_mbti"].mean():.2f}')
    axes[1, 1].set_xlabel('MBTI Overfitting Score')
    axes[1, 1].set_ylabel('Frequency')
    axes[1, 1].set_title('Distribution of MBTI Overfitting Scores')
    axes[1, 1].legend()
    axes[1, 1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.savefig('metrics_analysis.png', dpi=300, bbox_inches='tight')
    print("‚úÖ Saved: metrics_analysis.png")
    plt.show()
else:
    print("‚ö†Ô∏è  No data to visualize")

‚ö†Ô∏è  No data to visualize


In [8]:
if df is not None and len(df) > 0:
    # Figure 3: Heatmap of Persona x MBTI Performance
    pivot_data = df.pivot_table(
        values='voice_accuracy',
        index='persona_name',
        columns='mbti',
        aggfunc='mean'
    )
    
    plt.figure(figsize=(16, 10))
    sns.heatmap(pivot_data, annot=True, fmt='.2f', cmap='YlOrRd', 
                cbar_kws={'label': 'Mean Voice Accuracy'}, linewidths=0.5)
    plt.title('Voice Accuracy: Persona √ó MBTI Type Heatmap', fontsize=14, pad=20)
    plt.xlabel('MBTI Type', fontsize=12)
    plt.ylabel('Persona', fontsize=12)
    plt.xticks(rotation=45, ha='right')
    plt.yticks(rotation=0)
    plt.tight_layout()
    plt.savefig('persona_mbti_heatmap.png', dpi=300, bbox_inches='tight')
    print("‚úÖ Saved: persona_mbti_heatmap.png")
    plt.show()
    
    # Save pivot table as CSV
    pivot_data.to_csv('persona_mbti_heatmap_data.csv')
    print("‚úÖ Saved: persona_mbti_heatmap_data.csv")
else:
    print("‚ö†Ô∏è  No data to visualize")

‚ö†Ô∏è  No data to visualize


## 5. Generate Ada Lovelace Essay with Results Analysis

Generate the essay incorporating analysis of the experimental results.

## 6. Download Generated Files

Download all generated files including the essay, tables, and visualizations.

In [9]:
from openai import OpenAI
import json

# Setup OpenAI client for OpenRouter
def openai_client():
    api_key = os.getenv("OPENROUTER_API_KEY")
    base_url = "https://openrouter.ai/api/v1"
    
    if api_key and api_key.startswith("sk-or-v1-"):
        return OpenAI(
            api_key=api_key,
            base_url=base_url,
            default_headers={
                "HTTP-Referer": "https://colab.research.google.com",
                "X-Title": "MBTI Faculty Voice Research"
            }
        )
    return OpenAI(api_key=api_key)

client = openai_client()

def generate_lovelace_essay(results_summary=None):
    """Generate essay by Ada Lovelace on MBTI research, incorporating results analysis."""
    model = os.getenv("OPENAI_MODEL", "openai/gpt-4o")
    
    # Build results context if available
    results_context = ""
    if results_summary:
        results_context = f"""

EXPERIMENTAL RESULTS AND ANALYSIS:
{results_summary}

CRITICAL: You must thoroughly analyze these experimental results and incorporate them into your essay. This is not optional - the results are the core of the research.

Your analysis should:
1. **Interpret the findings**: What do the numbers tell us about MBTI's effectiveness?
2. **Identify patterns**: Are there clear winners/losers? What explains the differences?
3. **Evaluate MBTI's utility**: Does the data support or challenge MBTI as a prompt engineering tool?
4. **Discuss implications**: What does this mean for creating faculty agents?
5. **Acknowledge limitations**: What can't we conclude from this data?
6. **Consider correlations**: How do style coverage, consistency, and overfitting relate to accuracy?

Be specific: Reference actual numbers, rankings, and patterns from the data. This is a data-driven essay, not just philosophical reflection.
"""
    
    prompt = f"""You are Ada Lovelace, writing a scientific commonplace essay on the investigation of MBTI's value in prompt engineering for faculty agent accuracy.

Context: This research examines whether Myers-Briggs Type Indicator (MBTI) personality overlays improve voice accuracy, consistency, and interpretability in AI faculty agents. The experiment tests 10 faculty personae across 16 MBTI types with 3 test prompts each (480 trials total), using an LLM-as-judge to evaluate voice accuracy.{results_context}

CRITICAL STRUCTURE REQUIREMENT: You must structure this essay following the scientific method:

1. **Abstract/Background & Hypothesis**: 
   - Begin with a clear research question
   - State a testable hypothesis (e.g., "MBTI overlays will significantly improve voice accuracy compared to baseline" or "Certain MBTI types will produce measurably higher voice accuracy scores")
   - Explain the theoretical basis for this hypothesis
   - Frame this in terms of symbolic systems and computational mechanisms

2. **Methods**:
   - Describe the experimental design (10 personae √ó 16 MBTI types √ó 3 prompts = 480 trials)
   - Explain the evaluation methodology (LLM-as-judge)
   - Note the statistical tests used (ANOVA, t-tests, correlation analysis)

3. **Results & Statistical Validation**:
   - Present the experimental findings
   - Report statistical test results (F-statistics, t-statistics, p-values, effect sizes)
   - Clearly state whether the hypothesis is SUPPORTED, PARTIALLY SUPPORTED, or NOT SUPPORTED
   - Include specific numerical evidence

4. **Discussion & Conclusion**:
   - Interpret what the statistical validation means
   - Discuss whether MBTI functions effectively as a "prompt compression ontology"
   - Consider implications for creating coherent, persistent agent identities
   - Acknowledge limitations and areas for further investigation
   - Reflect on the relationship between symbolic systems and computational mechanisms

Your task: Write a thoughtful, elegant scientific essay (2000-3000 words) that:
- Follows the scientific method structure above
- States a clear hypothesis in the abstract/background
- Uses statistical results to validate or invalidate that hypothesis
- Maintains your characteristic voice: elegant, analytical, visionary about computation's scope, precise but imaginative, with a "poetical science" sensibility
- Uses your signature moves: clarify mechanism vs meaning, structured explanation, poetical science sensibility
- Avoids modern dev slang, casual tone, or pretending firsthand modern tooling

Write in the style of your era (Victorian scientific culture) but addressing contemporary AI systems. Be thoughtful, precise, and allow for the imaginative possibilities while maintaining analytical rigor. The essay must be data-driven and hypothesis-testing focused."""

    messages = [
        {"role": "system", "content": """You are Ada Lovelace, the first computer programmer and a visionary of computation's potential. 
Your voice is elegant, analytical, visionary about computation's scope, precise but imaginative. 
You clarify mechanism vs meaning, provide structured explanations, and maintain a 'poetical science' sensibility.
You write in the style of Victorian scientific culture, with careful distinctions and elegant prose."""},
        {"role": "user", "content": prompt}
    ]
    
    print("Generating essay by Ada Lovelace...")
    print(f"Using model: {model}\n")
    
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0.8,
        max_tokens=4000
    )
    
    essay = response.choices[0].message.content
    
    # Format as markdown
    formatted = f"""# On the Investigation of MBTI in Prompt Engineering for Faculty Agent Accuracy

**Ada Lovelace**

*A Commonplace Essay*

---

{essay}"""
    
    print("‚úÖ Essay generated!")
    print(f"\nPreview (first 500 chars):\n{essay[:500]}...")
    
    return formatted

# Generate comprehensive results summary for essay
results_summary = None
if df is not None and len(df) > 0:
    # Load the detailed analysis if available
    try:
        with open('results_analysis.txt', 'r') as f:
            results_summary = f.read()
        print("‚úÖ Loaded comprehensive results analysis")
    except FileNotFoundError:
        # Fallback to basic summary
        mbti_stats = df.groupby('mbti')['voice_accuracy'].agg(['mean', 'std', 'count']).round(2)
        persona_stats = df.groupby('persona_name')['voice_accuracy'].agg(['mean', 'std', 'count']).round(2)
        
        results_summary = f"""
Total valid trials: {len(df)}
Average voice accuracy: {df['voice_accuracy'].mean():.2f} (range: {df['voice_accuracy'].min():.2f} - {df['voice_accuracy'].max():.2f})
Average persona consistency: {df['persona_consistency'].mean():.2f}
Average style marker coverage: {df['style_marker_coverage'].mean():.2f}
Average MBTI overfitting: {df['overfitting_to_mbti'].mean():.2f}

Top 3 MBTI types by voice accuracy:
{df.groupby('mbti')['voice_accuracy'].mean().sort_values(ascending=False).head(3).to_string()}

Top 3 personae by voice accuracy:
{df.groupby('persona_name')['voice_accuracy'].mean().sort_values(ascending=False).head(3).to_string()}
"""
    
    print("\nüìä Generating essay with results analysis...")
    print(f"   Analysis length: {len(results_summary)} characters")
else:
    print("‚ö†Ô∏è  No results available - generating essay without experimental data")

# Generate the essay with comprehensive results analysis
essay_content = generate_lovelace_essay(results_summary)

print("\n‚úÖ Essay generated with results analysis!")

‚ö†Ô∏è  No results available - generating essay without experimental data
Generating essay by Ada Lovelace...
Using model: openai/gpt-4o



AuthenticationError: Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}

## 7. Upload to Commonplace

Upload the essay to Inquiry Institute Commonplace.

## 8. Update Essay with Results Analysis

After generating visualizations and analysis, update the essay to incorporate the findings.

In [10]:
# Re-generate essay with comprehensive analysis if results are available
if df is not None and len(df) > 0 and 'results_analysis' in locals():
    print("üîÑ Updating essay with comprehensive results analysis...")
    
    # Load the full analysis
    try:
        with open('results_analysis.txt', 'r') as f:
            full_analysis = f.read()
        
        # Generate updated essay with full analysis
        updated_essay = generate_lovelace_essay(full_analysis)
        
        # Save updated essay
        with open('lovelace_essay_mbti_research.md', 'w', encoding='utf-8') as f:
            f.write(updated_essay)
        
        print("‚úÖ Essay updated with comprehensive results analysis!")
        print("   File: lovelace_essay_mbti_research.md")
        
        # Update the essay_content variable
        essay_content = updated_essay
        
    except FileNotFoundError:
        print("‚ö†Ô∏è  Results analysis file not found - using previously generated essay")
        print("   Run the analysis cells first to generate comprehensive analysis")
else:
    print("‚ÑπÔ∏è  Using previously generated essay")
    if 'essay_content' not in locals():
        print("‚ö†Ô∏è  No essay content available - run essay generation cell first")

‚ÑπÔ∏è  Using previously generated essay
‚ö†Ô∏è  No essay content available - run essay generation cell first


In [11]:
import requests
import re

def extract_title_and_content(markdown_text):
    """Extract title and content from markdown."""
    lines = markdown_text.split('\n')
    title = None
    content_start = 0
    
    for i, line in enumerate(lines):
        if line.startswith('# '):
            title = line[2:].strip()
            content_start = i + 1
            break
    
    if not title:
        title = "On the Investigation of MBTI in Prompt Engineering for Faculty Agent Accuracy"
    
    essay_content = '\n'.join(lines[content_start:])
    essay_content = essay_content.replace('**Ada Lovelace**', '').replace('*A Commonplace Essay*', '').strip()
    essay_content = essay_content.lstrip('-').strip()
    
    return title, essay_content

def upload_to_commonplace(title, content, jwt_token=None, use_colab_endpoint=True):
    """
    Upload essay to Commonplace via Supabase Edge Function.
    
    Uses the colab-commonplace endpoint which supports create/update/get operations.
    """
    
    supabase_url = os.getenv("NEXT_PUBLIC_SUPABASE_URL")
    supabase_anon_key = os.getenv("NEXT_PUBLIC_SUPABASE_ANON_KEY")
    
    if not jwt_token:
        jwt_token = getpass("Enter a.lovelace JWT token (or press Enter to skip upload): ").strip()
        if not jwt_token:
            print("‚ö†Ô∏è  Skipping upload. You can upload manually later.")
            return None
    
    # Use colab-commonplace endpoint (supports create/update/get)
    edge_function_url = f"{supabase_url}/functions/v1/colab-commonplace"
    
    # Convert markdown to HTML (basic conversion)
    html_content = content.replace('\n\n', '</p><p>').replace('\n', '<br>')
    html_content = f"<p>{html_content}</p>"
    
    payload = {
        "action": "create",
        "entry": {
            "title": title,
            "content": html_content,
            "status": "draft",
            "faculty_slug": "a-lovelace",
            "entry_type": "essay",
            "topics": ["mbti", "prompt-engineering", "faculty-agents", "ai-research"],
            "college": "ains",
            "metadata": {
                "provenance_mode": "ai_generated",
                "canonical_source_url": "https://github.com/InquiryInstitute/Inquiry.Institute/tree/main/mbti-faculty-voice-research",
                "colab_notebook_url": "https://colab.research.google.com/...",
                "source_refs": "Generated by Ada Lovelace faculty agent via Google Colab",
                "generated_by": "Ada Lovelace",
                "pinned": False
            }
        }
    }
    
    headers = {
        "Authorization": f"Bearer {jwt_token}",
        "apikey": supabase_anon_key,
        "Content-Type": "application/json"
    }
    
    print(f"üì§ Uploading essay to Commonplace...")
    print(f"   Title: {title}")
    print(f"   Faculty: a-lovelace")
    print(f"   Status: draft")
    print(f"   Endpoint: colab-commonplace\n")
    
    try:
        response = requests.post(edge_function_url, headers=headers, json=payload, timeout=30)
        
        if response.status_code == 201:
            result = response.json()
            if result.get("success"):
                print("‚úÖ Essay uploaded successfully!")
                entry = result.get("entry", {})
                print(f"   Entry ID: {entry.get('id')}")
                print(f"   Permalink: {entry.get('permalink', 'N/A')}")
                print(f"   Status: {entry.get('status')}")
                print(f"\nüí° To update later, use entry ID: {entry.get('id')}")
                return result
            else:
                print(f"‚ùå Upload failed: {result.get('error', 'Unknown error')}")
                return None
        else:
            error_data = response.json() if response.headers.get('content-type', '').startswith('application/json') else response.text
            print(f"‚ùå Upload failed: {response.status_code}")
            print(f"   Error: {json.dumps(error_data, indent=2)}")
            return None
    except Exception as e:
        print(f"‚ùå Request failed: {e}")
        return None

def update_entry(entry_id, jwt_token=None, **updates):
    """Update an existing Commonplace entry."""
    supabase_url = os.getenv("NEXT_PUBLIC_SUPABASE_URL")
    supabase_anon_key = os.getenv("NEXT_PUBLIC_SUPABASE_ANON_KEY")
    
    if not jwt_token:
        jwt_token = getpass("Enter JWT token: ").strip()
    
    edge_function_url = f"{supabase_url}/functions/v1/colab-commonplace"
    
    payload = {
        "action": "update",
        "entry_id": entry_id,
        "entry": updates
    }
    
    headers = {
        "Authorization": f"Bearer {jwt_token}",
        "apikey": supabase_anon_key,
        "Content-Type": "application/json"
    }
    
    response = requests.put(edge_function_url, headers=headers, json=payload, timeout=30)
    response.raise_for_status()
    return response.json()

# Extract title and content
title, content = extract_title_and_content(essay_content)

# Upload (will prompt for JWT token)
upload_result = upload_to_commonplace(title, content)

NameError: name 'essay_content' is not defined

In [12]:
from google.colab import files
import os

# Save essay to file
with open('lovelace_essay_mbti_research.md', 'w', encoding='utf-8') as f:
    f.write(essay_content)

print("‚úÖ Essay saved to lovelace_essay_mbti_research.md")

# List all generated files
generated_files = [
    'lovelace_essay_mbti_research.md',
    'voice_accuracy_analysis.png',
    'metrics_analysis.png',
    'persona_mbti_heatmap.png',
    'mbti_summary_table.csv',
    'persona_summary_table.csv',
    'persona_mbti_heatmap_data.csv'
]

print("\nüì¶ Generated files:")
for fname in generated_files:
    if os.path.exists(fname):
        size = os.path.getsize(fname)
        print(f"   ‚úÖ {fname} ({size:,} bytes)")
    else:
        print(f"   ‚ö†Ô∏è  {fname} (not found)")

print("\nüí° To download files, run:")
print("   files.download('lovelace_essay_mbti_research.md')")
print("   files.download('voice_accuracy_analysis.png')")
print("   files.download('metrics_analysis.png')")
print("   files.download('persona_mbti_heatmap.png')")

# Uncomment to auto-download all:
# for fname in generated_files:
#     if os.path.exists(fname):
#         files.download(fname)

ModuleNotFoundError: No module named 'google'

## Optional: Create a New Research Notebook

You can create additional research notebooks using the `create-colab-notebook` edge function.

In [13]:
def create_research_notebook(title, template="mbti-research", research_topic=None, description=None, jwt_token=None):
    """
    Create a new research notebook via Supabase Edge Function.
    
    Templates:
    - mbti-research: Pre-configured for MBTI voice accuracy research
    - essay-generation: Template for generating essays in faculty voice
    - experiment: General experiment template
    - custom: Empty template
    """
    if not jwt_token:
        jwt_token = getpass("Enter JWT token: ").strip()
        if not jwt_token:
            print("‚ö†Ô∏è  JWT token required to create notebooks")
            return None
    
    supabase_url = os.getenv("NEXT_PUBLIC_SUPABASE_URL")
    supabase_anon_key = os.getenv("NEXT_PUBLIC_SUPABASE_ANON_KEY")
    
    edge_function_url = f"{supabase_url}/functions/v1/create-colab-notebook"
    
    payload = {
        "title": title,
        "template": template,
    }
    
    if research_topic:
        payload["research_topic"] = research_topic
    if description:
        payload["description"] = description
    
    headers = {
        "Authorization": f"Bearer {jwt_token}",
        "apikey": supabase_anon_key,
        "Content-Type": "application/json"
    }
    
    print(f"üìì Creating research notebook: {title}")
    print(f"   Template: {template}\n")
    
    try:
        response = requests.post(edge_function_url, headers=headers, json=payload, timeout=30)
        
        if response.status_code == 201:
            result = response.json()
            if result.get("success"):
                print("‚úÖ Notebook created!")
                notebook_json = result.get("notebook_json", "{}")
                
                # Save notebook
                filename = f"{title.lower().replace(' ', '_')}.ipynb"
                with open(filename, 'w', encoding='utf-8') as f:
                    f.write(notebook_json)
                
                print(f"üíæ Saved to: {filename}")
                print(f"\nüìù Next steps:")
                print(f"   1. Download the .ipynb file")
                print(f"   2. Upload to Google Colab: File ‚Üí Upload notebook")
                print(f"   3. Or save to GitHub and open from there")
                
                return result
        else:
            error_data = response.json() if response.headers.get('content-type', '').startswith('application/json') else response.text
            print(f"‚ùå Creation failed: {response.status_code}")
            print(f"   Error: {json.dumps(error_data, indent=2)}")
            return None
    except Exception as e:
        print(f"‚ùå Request failed: {e}")
        return None

# Example: Create a new notebook
# create_research_notebook(
#     title="My Research Project",
#     template="experiment",
#     research_topic="Investigating voice accuracy in AI agents",
#     description="A notebook for my research project"
# )