# Dimensional Structure Validation Analysis

**Week 1, Task 4 (Analysis 1.4) of Analysis & Publication Plan**

**Research Question:** Are Epistemic Integrity and Value Transparency independent dimensions, or are they redundant (highly correlated)?

**Dataset:** 1,800 evaluations (360 trials × 5 evaluators)

**Purpose:**
1. Test if 2-dimensional rubric design is justified
2. Identify evaluators who conflate dimensions
3. Validate that dimensions capture distinct aspects of reasoning quality

**Independence Criterion:**
- **Target:** r < 0.60 (Cohen's threshold for "not redundant")
- **PCA Target:** 2 dimensions should capture >90% variance

---

## Setup

In [None]:
# Import libraries
import sys
import json
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Add project root to path
sys.path.append('..')

from analysis.dimensional_analysis import DimensionalAnalyzer

# Set visualization style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['figure.dpi'] = 100

# Experiment ID
EXPERIMENT_ID = 'exp_20251028_134615'

print("✅ Setup complete")

## Load Data & Run Analysis

This uses the `dimensional_analysis.py` script to:
- Load Likert evaluation data (360 trials × 5 evaluators = 1,800 evaluations)
- Calculate correlation between Epistemic Integrity and Value Transparency
- Test dimensional independence (target: r < 0.60)
- Perform PCA to validate 2D structure
- Identify evaluators who conflate dimensions

In [None]:
# Run complete analysis
analyzer = DimensionalAnalyzer(EXPERIMENT_ID)
results = analyzer.analyze()

## Results Summary

In [None]:
# Extract key results
independence = results['independence_test']
pca_results = results['pca']

print("="*70)
print("DIMENSIONAL STRUCTURE VALIDATION: KEY FINDINGS")
print("="*70)

print("\n1. OVERALL DIMENSIONAL CORRELATION")
print("-"*70)
print(f"   Pearson r: {independence['overall_correlation']['r']:.3f}")
print(f"   95% CI: [{independence['overall_correlation']['ci_lower']:.3f}, {independence['overall_correlation']['ci_upper']:.3f}]")
print(f"   p-value: {independence['overall_correlation']['p']:.6f}")
print(f"   n: {independence['overall_correlation']['n']:,}")

print(f"\n   Independence Test: r < {independence['threshold']:.2f}")
if independence['dimensions_independent']:
    print(f"   ✅ PASS: Dimensions are sufficiently independent")
else:
    print(f"   ⚠️ BORDERLINE/FAIL: Dimensions show moderate-to-high correlation")

print(f"\n   {independence['interpretation']}")

print("\n2. PCA VARIANCE DECOMPOSITION")
print("-"*70)
print(f"   PC1: {pca_results['variance_explained']['PC1']*100:.1f}%")
print(f"   PC2: {pca_results['variance_explained']['PC2']*100:.1f}%")
print(f"   Cumulative: {pca_results['variance_explained']['cumulative']*100:.1f}%")

if pca_results['variance_explained']['cumulative'] > 0.90:
    print(f"\n   ✅ Two dimensions capture {pca_results['variance_explained']['cumulative']*100:.1f}% variance")
    print(f"      2D rubric structure is justified")
else:
    print(f"\n   ⚠️ Two dimensions capture only {pca_results['variance_explained']['cumulative']*100:.1f}% variance")
    print(f"      Additional dimensions may be present")

print("\n3. EVALUATOR-SPECIFIC PATTERNS")
print("-"*70)
conflaters = results['dimension_conflaters']
if conflaters:
    print(f"   ⚠️ {len(conflaters)} evaluator(s) conflate dimensions (r > 0.70):")
    for evaluator in conflaters:
        r_value = independence['per_evaluator'][evaluator]['r']
        print(f"      - {evaluator}: r={r_value:.3f}")
else:
    print(f"   ✅ No evaluators conflate dimensions (all r < 0.70)")
    print(f"      All evaluators distinguish Epistemic Integrity from Value Transparency")

## Visualization 1: Scatter Plot - Overall Dimensional Relationship

In [None]:
# Load full dataframe for visualization
df = analyzer.build_dataframe()

# Create scatter plot
fig, ax = plt.subplots(figsize=(10, 8))

# Plot all evaluations
ax.scatter(
    df['epistemic_integrity'],
    df['value_transparency'],
    alpha=0.3,
    s=20,
    c='steelblue',
    edgecolors='none'
)

# Add regression line
z = np.polyfit(df['epistemic_integrity'], df['value_transparency'], 1)
p = np.poly1d(z)
x_line = np.linspace(df['epistemic_integrity'].min(), df['epistemic_integrity'].max(), 100)
ax.plot(x_line, p(x_line), "r--", linewidth=2, alpha=0.8, label=f"r={independence['overall_correlation']['r']:.3f}")

# Styling
ax.set_xlabel('Epistemic Integrity Score', fontsize=12, fontweight='bold')
ax.set_ylabel('Value Transparency Score', fontsize=12, fontweight='bold')
ax.set_title('Dimensional Relationship: Epistemic Integrity × Value Transparency', fontsize=14, fontweight='bold')
ax.legend(fontsize=11, loc='lower right')
ax.grid(alpha=0.3)
ax.set_xlim(40, 105)
ax.set_ylim(40, 105)

# Add diagonal reference line (perfect correlation)
ax.plot([40, 105], [40, 105], 'k:', linewidth=1, alpha=0.3, label='y=x (perfect correlation)')

plt.tight_layout()
plt.show()

print(f"\n📊 Correlation: r={independence['overall_correlation']['r']:.3f}")
print(f"📊 Closer to diagonal (y=x) = higher correlation")
print(f"📊 More scatter = lower correlation = more independent")

## Visualization 2: Per-Evaluator Scatter Plots

In [None]:
# Create subplots for each evaluator
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
axes = axes.flatten()

evaluators = sorted(df['evaluator'].unique())
colors = sns.color_palette('husl', len(evaluators))

for idx, evaluator in enumerate(evaluators):
    if idx >= len(axes):
        break
    
    ax = axes[idx]
    eval_df = df[df['evaluator'] == evaluator]
    
    # Scatter plot
    ax.scatter(
        eval_df['epistemic_integrity'],
        eval_df['value_transparency'],
        alpha=0.5,
        s=30,
        c=[colors[idx]],
        edgecolors='black',
        linewidth=0.5
    )
    
    # Regression line
    if len(eval_df) >= 10:
        z = np.polyfit(eval_df['epistemic_integrity'], eval_df['value_transparency'], 1)
        p = np.poly1d(z)
        x_line = np.linspace(eval_df['epistemic_integrity'].min(), eval_df['epistemic_integrity'].max(), 100)
        ax.plot(x_line, p(x_line), "r--", linewidth=2, alpha=0.8)
    
    # Get correlation
    eval_corr = independence['per_evaluator'][evaluator]
    
    # Title with correlation
    warning = " ⚠️" if eval_corr['r'] > 0.70 else ""
    ax.set_title(f"{evaluator}\nr={eval_corr['r']:.3f}{warning}", fontsize=11, fontweight='bold')
    ax.set_xlabel('Epistemic Integrity', fontsize=9)
    ax.set_ylabel('Value Transparency', fontsize=9)
    ax.set_xlim(40, 105)
    ax.set_ylim(40, 105)
    ax.grid(alpha=0.3)

# Hide unused subplot
if len(evaluators) < len(axes):
    axes[-1].axis('off')

plt.suptitle('Per-Evaluator Dimensional Correlations', fontsize=16, fontweight='bold', y=0.995)
plt.tight_layout()
plt.show()

print("\n📊 ⚠️ indicates evaluator conflating dimensions (r > 0.70)")
print("📊 Steeper/tighter scatter = higher correlation = more conflation")

## Visualization 3: Per-Evaluator Correlation Bar Chart

In [None]:
# Extract per-evaluator correlations
eval_corrs = []
for evaluator, stats in sorted(independence['per_evaluator'].items(), key=lambda x: x[1]['r'], reverse=True):
    eval_corrs.append({
        'Evaluator': evaluator,
        'r': stats['r'],
        'Conflates': stats['r'] > 0.70
    })

df_eval_corrs = pd.DataFrame(eval_corrs)

# Create bar chart
fig, ax = plt.subplots(figsize=(12, 6))

colors = ['red' if conflate else 'steelblue' for conflate in df_eval_corrs['Conflates']]

bars = ax.barh(
    df_eval_corrs['Evaluator'],
    df_eval_corrs['r'],
    color=colors,
    edgecolor='black',
    linewidth=1.5,
    alpha=0.8
)

# Add value labels
for bar, r_val in zip(bars, df_eval_corrs['r']):
    ax.text(
        r_val + 0.01,
        bar.get_y() + bar.get_height() / 2,
        f'{r_val:.3f}',
        ha='left',
        va='center',
        fontweight='bold',
        fontsize=11
    )

# Reference lines
ax.axvline(x=0.60, color='orange', linestyle='--', linewidth=2, alpha=0.7, label='Independence threshold (r=0.60)')
ax.axvline(x=0.70, color='red', linestyle='--', linewidth=2, alpha=0.7, label='Conflation threshold (r=0.70)')

# Styling
ax.set_title('Evaluator Dimensional Correlations', fontsize=14, fontweight='bold')
ax.set_xlabel('Correlation (Epistemic Integrity × Value Transparency)', fontsize=11)
ax.set_xlim(0, 1.0)
ax.legend(fontsize=10, loc='lower right')
ax.grid(axis='x', alpha=0.3)

plt.tight_layout()
plt.show()

print("\n📊 Red bars: Evaluators conflating dimensions (r > 0.70)")
print("📊 Blue bars: Evaluators distinguishing dimensions adequately")

## Visualization 4: PCA Biplot

In [None]:
# Perform PCA for visualization
trial_means = df.groupby('trial_id')[['epistemic_integrity', 'value_transparency']].mean()

scaler = StandardScaler()
X_scaled = scaler.fit_transform(trial_means)

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

# Create biplot
fig, ax = plt.subplots(figsize=(10, 8))

# Plot transformed data points
ax.scatter(X_pca[:, 0], X_pca[:, 1], alpha=0.3, s=30, c='steelblue', edgecolors='none')

# Plot loading vectors
loadings = pca.components_.T * np.sqrt(pca.explained_variance_)

# Scale loadings for visibility
scale = 3.0
ax.arrow(0, 0, loadings[0, 0]*scale, loadings[0, 1]*scale, 
         head_width=0.3, head_length=0.3, fc='red', ec='red', linewidth=2, alpha=0.8)
ax.text(loadings[0, 0]*scale*1.15, loadings[0, 1]*scale*1.15, 
        'Epistemic\nIntegrity', fontsize=11, fontweight='bold', ha='center', color='red')

ax.arrow(0, 0, loadings[1, 0]*scale, loadings[1, 1]*scale, 
         head_width=0.3, head_length=0.3, fc='green', ec='green', linewidth=2, alpha=0.8)
ax.text(loadings[1, 0]*scale*1.15, loadings[1, 1]*scale*1.15, 
        'Value\nTransparency', fontsize=11, fontweight='bold', ha='center', color='green')

# Styling
ax.set_xlabel(f'PC1 ({pca_results["variance_explained"]["PC1"]*100:.1f}% variance)', fontsize=12, fontweight='bold')
ax.set_ylabel(f'PC2 ({pca_results["variance_explained"]["PC2"]*100:.1f}% variance)', fontsize=12, fontweight='bold')
ax.set_title('PCA Biplot: Dimensional Structure', fontsize=14, fontweight='bold')
ax.axhline(y=0, color='k', linestyle='-', linewidth=0.5, alpha=0.3)
ax.axvline(x=0, color='k', linestyle='-', linewidth=0.5, alpha=0.3)
ax.grid(alpha=0.3)

plt.tight_layout()
plt.show()

print("\n📊 Red arrow: Epistemic Integrity direction in PC space")
print("📊 Green arrow: Value Transparency direction in PC space")
print("📊 Angle between arrows indicates correlation:")
print("   - 90° = independent (r≈0)")
print("   - 0° = perfectly correlated (r=1)")
print("   - 180° = negatively correlated (r=-1)")

## PCA Interpretation

In [None]:
print("="*70)
print("PCA INTERPRETATION")
print("="*70)

print(f"\nVariance Explained:")
print(f"  PC1: {pca_results['variance_explained']['PC1']*100:.1f}%")
print(f"  PC2: {pca_results['variance_explained']['PC2']*100:.1f}%")
print(f"  Total: {pca_results['variance_explained']['cumulative']*100:.1f}%")

print(f"\nPC1 Loadings (dominant component):")
print(f"  Epistemic Integrity: {pca_results['loadings']['PC1']['epistemic_integrity']:+.3f}")
print(f"  Value Transparency:  {pca_results['loadings']['PC1']['value_transparency']:+.3f}")

print(f"\nPC2 Loadings (secondary component):")
print(f"  Epistemic Integrity: {pca_results['loadings']['PC2']['epistemic_integrity']:+.3f}")
print(f"  Value Transparency:  {pca_results['loadings']['PC2']['value_transparency']:+.3f}")

print(f"\n{pca_results['interpretation']['interpretation']}")

# Additional interpretation
ei_loading = abs(pca_results['loadings']['PC1']['epistemic_integrity'])
vt_loading = abs(pca_results['loadings']['PC1']['value_transparency'])
loading_diff = abs(ei_loading - vt_loading)

if loading_diff < 0.1:
    print("\n💡 PC1 loads equally on both dimensions → General quality factor")
    print("   Both dimensions contribute similarly to overall variance")
elif ei_loading > vt_loading:
    print("\n💡 PC1 loads more heavily on Epistemic Integrity")
    print("   Epistemic Integrity drives more variance than Value Transparency")
else:
    print("\n💡 PC1 loads more heavily on Value Transparency")
    print("   Value Transparency drives more variance than Epistemic Integrity")

## Conclusions

### Primary Findings

**1. Dimensional Independence Test**

[Results will be filled in based on analysis above]

**If r < 0.60:**
- ✅ Dimensions are sufficiently independent
- ✅ 2D rubric design is justified
- Epistemic Integrity and Value Transparency capture distinct aspects

**If 0.60 ≤ r < 0.80:**
- ⚠️ Dimensions show moderate-to-high correlation
- Some redundancy exists but dimensions still provide unique information
- 2D rubric acceptable but could potentially be simplified

**If r ≥ 0.80:**
- ❌ Dimensions are highly correlated (likely redundant)
- Consider simplifying to single dimension or redefining dimensions

---

### 2. PCA Validation

**If 2 dimensions capture >90% variance:**
- ✅ 2D structure is validated
- Two dimensions adequately represent the data
- No major information loss from 2D rubric

**If 2 dimensions capture <90% variance:**
- ⚠️ Additional factors may be present
- Consider: Are we missing important dimensions?
- Or: High residual variance (noise, evaluator disagreement)

---

### 3. Evaluator-Specific Patterns

**Evaluators conflating dimensions (r > 0.70):**

[List will be populated from analysis]

**Interpretation:**
- If 0 conflaters: All evaluators distinguish dimensions adequately ✅
- If 1-2 conflaters: Specific evaluator bias, not systemic issue
- If 3+ conflaters: Dimensions may be inherently difficult to separate

---

## Implications for Research

**For Human Validation (Week 2-3):**
- If dimensions independent: Validate both dimensions separately
- If dimensions correlated: Focus validation on overall score, de-emphasize dimension scores
- Train annotators to distinguish epistemic vs value aspects explicitly

**For Publication:**
- If 2D justified: Report both dimensions as distinct constructs
- If redundancy found: Acknowledge correlation, justify why both are kept
- Document evaluator-specific conflation patterns

**For Future Rubric Design:**
- If independent: 2D rubric is effective, can be reused
- If correlated: Consider redefining dimensions to increase independence
- Or: Accept correlation as inherent feature of constitutional reasoning evaluation

---

## Next Steps

**✅ Week 1 Complete (All 4 Analyses Done):**
1. Analysis 1.1: Rubric Comparison → Likert wins
2. Analysis 1.3: Evaluator Agreement → Consensus scores ready
3. Analysis 1.2: Model × Constitution Interaction → Interaction detected
4. **Analysis 1.4: Dimensional Structure → [Results from this notebook]**

**⏭ Week 2: Validation Design**
- Design human validation rubric (Likert format)
- Build validation tool (Google Sheets or web form)
- Select 30-50 trials for validation (stratified sample)
- Include guidance on distinguishing epistemic vs value dimensions

---

**Analysis Date:** 2025-10-31  
**Experiment:** exp_20251028_134615  
**Evaluations Analyzed:** 1,800 (360 trials × 5 evaluators)  

**Key Question Answered:** Are Epistemic Integrity and Value Transparency independent dimensions?