# PsychohistoryML: FDR Correction (Notebook 10)

**Objective**: Apply False Discovery Rate (FDR) correction to all statistical tests across the project.

## Why FDR Correction?

**The Multiple Comparisons Problem:**
- Project contains 50+ statistical tests across NB05, NB06, NB09
- Without correction: P(≥1 false positive) = 1 - (0.95)^50 ≈ 92%
- Many "significant" findings may be false positives

**Benjamini-Hochberg FDR:**
- Controls expected proportion of false discoveries
- Less conservative than Bonferroni (doesn't require p < 0.001)
- Appropriate for exploratory analysis

## Audit Finding

Three-agent audit identified:
- ~50 tests without correction
- 69% chance of at least one false positive
- Need to re-evaluate which findings survive correction

In [None]:
import pandas as pd
import numpy as np
from scipy import stats
from statsmodels.stats.multitest import multipletests
import warnings
warnings.filterwarnings('ignore')

print('Setup complete')

## 1. Collect All P-Values from Project

Manually compile p-values from NB05, NB06, NB09 outputs.

In [None]:
# Compile all p-values from project notebooks
# Source: NB05 (warfare), NB06 (religion), NB09 (survival)

all_tests = [
    # NB05: Warfare Integration
    {'notebook': 'NB05', 'test': 'PC1_hier baseline', 'p_value': 0.077, 'category': 'complexity'},
    {'notebook': 'NB05', 'test': 'total_warfare baseline', 'p_value': 0.808, 'category': 'warfare'},
    {'notebook': 'NB05', 'test': 'complexity×warfare interaction', 'p_value': 0.486, 'category': 'interaction'},
    
    # NB06: Religion Integration
    {'notebook': 'NB06', 'test': 'moral_score effect', 'p_value': 0.042, 'category': 'religion'},
    {'notebook': 'NB06', 'test': 'ideology_score effect', 'p_value': 0.039, 'category': 'religion'},
    {'notebook': 'NB06', 'test': 'total_rel effect', 'p_value': 0.0003, 'category': 'religion'},
    {'notebook': 'NB06', 'test': 'PC1×moral interaction', 'p_value': 0.153, 'category': 'interaction'},
    {'notebook': 'NB06', 'test': 'PC1×ideology interaction', 'p_value': 0.232, 'category': 'interaction'},
    
    # NB09: Survival Analysis - Main Cox Model
    {'notebook': 'NB09', 'test': 'Cox: PC1_hier', 'p_value': 0.077, 'category': 'complexity'},
    {'notebook': 'NB09', 'test': 'Cox: PC2_hier', 'p_value': 0.742, 'category': 'complexity'},
    {'notebook': 'NB09', 'test': 'Cox: PC3_hier', 'p_value': 0.153, 'category': 'complexity'},
    {'notebook': 'NB09', 'test': 'Cox: total_warfare', 'p_value': 0.808, 'category': 'warfare'},
    {'notebook': 'NB09', 'test': 'Cox: total_rel', 'p_value': 0.0000003, 'category': 'religion'},
    
    # NB09: Survival - Era-Stratified
    {'notebook': 'NB09', 'test': 'Ancient: PC1_hier', 'p_value': 0.046, 'category': 'era_stratified'},
    {'notebook': 'NB09', 'test': 'Ancient: total_rel', 'p_value': 0.086, 'category': 'era_stratified'},
    {'notebook': 'NB09', 'test': 'Classical: PC1_hier', 'p_value': 0.347, 'category': 'era_stratified'},
    {'notebook': 'NB09', 'test': 'Classical: total_rel', 'p_value': 0.310, 'category': 'era_stratified'},
    {'notebook': 'NB09', 'test': 'Medieval: PC1_hier', 'p_value': 0.352, 'category': 'era_stratified'},
    {'notebook': 'NB09', 'test': 'Medieval: total_rel', 'p_value': 0.042, 'category': 'era_stratified'},
    {'notebook': 'NB09', 'test': 'Early Modern: PC1_hier', 'p_value': 0.971, 'category': 'era_stratified'},
    {'notebook': 'NB09', 'test': 'Early Modern: total_rel', 'p_value': 0.716, 'category': 'era_stratified'},
    
    # NB09: Interaction models
    {'notebook': 'NB09', 'test': 'complexity×warfare Cox', 'p_value': 0.0001, 'category': 'interaction'},
    {'notebook': 'NB09', 'test': 'PC1_squared (nonlinear)', 'p_value': 0.0001, 'category': 'nonlinear'},
    
    # NB09: Log-rank tests (era comparisons)
    {'notebook': 'NB09', 'test': 'Logrank: EM vs Medieval', 'p_value': 0.019, 'category': 'era_comparison'},
    {'notebook': 'NB09', 'test': 'Logrank: EM vs Classical', 'p_value': 0.012, 'category': 'era_comparison'},
    {'notebook': 'NB09', 'test': 'Logrank: EM vs Ancient', 'p_value': 0.0000, 'category': 'era_comparison'},
    {'notebook': 'NB09', 'test': 'Logrank: Med vs Classical', 'p_value': 0.578, 'category': 'era_comparison'},
    {'notebook': 'NB09', 'test': 'Logrank: Med vs Ancient', 'p_value': 0.0000, 'category': 'era_comparison'},
    {'notebook': 'NB09', 'test': 'Logrank: Class vs Ancient', 'p_value': 0.0000, 'category': 'era_comparison'},
    
    # NB09: Proportional hazards tests
    {'notebook': 'NB09', 'test': 'PH test: PC1_hier', 'p_value': 0.095, 'category': 'assumption'},
    {'notebook': 'NB09', 'test': 'PH test: PC2_hier', 'p_value': 0.663, 'category': 'assumption'},
    {'notebook': 'NB09', 'test': 'PH test: PC3_hier', 'p_value': 0.945, 'category': 'assumption'},
    {'notebook': 'NB09', 'test': 'PH test: total_rel', 'p_value': 0.764, 'category': 'assumption'},
    {'notebook': 'NB09', 'test': 'PH test: total_warfare', 'p_value': 0.608, 'category': 'assumption'},
]

tests_df = pd.DataFrame(all_tests)
print(f'Total tests compiled: {len(tests_df)}')
print(f'\nTests by category:')
print(tests_df['category'].value_counts())
print(f'\nTests by notebook:')
print(tests_df['notebook'].value_counts())

## 2. Apply Benjamini-Hochberg FDR Correction

In [None]:
# Apply FDR correction (Benjamini-Hochberg)
p_values = tests_df['p_value'].values

# FDR correction at alpha = 0.05
reject_fdr, p_adj_fdr, _, _ = multipletests(p_values, alpha=0.05, method='fdr_bh')

# For comparison: Bonferroni (very conservative)
reject_bonf, p_adj_bonf, _, _ = multipletests(p_values, alpha=0.05, method='bonferroni')

# Add to dataframe
tests_df['p_original'] = tests_df['p_value']
tests_df['p_fdr'] = p_adj_fdr
tests_df['p_bonferroni'] = p_adj_bonf
tests_df['sig_original'] = tests_df['p_original'] < 0.05
tests_df['sig_fdr'] = reject_fdr
tests_df['sig_bonferroni'] = reject_bonf

print('FDR Correction Results')
print('=' * 70)
print(f'Original significant (p < 0.05): {tests_df["sig_original"].sum()} / {len(tests_df)}')
print(f'FDR-corrected significant:       {tests_df["sig_fdr"].sum()} / {len(tests_df)}')
print(f'Bonferroni-corrected significant: {tests_df["sig_bonferroni"].sum()} / {len(tests_df)}')

In [None]:
# Show all tests sorted by original p-value
display_cols = ['notebook', 'test', 'category', 'p_original', 'p_fdr', 'sig_original', 'sig_fdr']
results = tests_df[display_cols].sort_values('p_original')

print('All Tests Ranked by P-Value')
print('=' * 90)
print(f'{"Test":<45} {"p_orig":>10} {"p_fdr":>10} {"Orig":>6} {"FDR":>6}')
print('-' * 90)

for _, row in results.iterrows():
    orig_mark = '*' if row['sig_original'] else ''
    fdr_mark = '*' if row['sig_fdr'] else ''
    print(f"{row['test']:<45} {row['p_original']:>10.4f} {row['p_fdr']:>10.4f} {orig_mark:>6} {fdr_mark:>6}")

## 3. Which Findings Survive FDR Correction?

In [None]:
# Findings that survive FDR correction
survivors = tests_df[tests_df['sig_fdr']].copy()

print('FINDINGS THAT SURVIVE FDR CORRECTION')
print('=' * 70)
print(f'Total survivors: {len(survivors)} / {len(tests_df)} ({100*len(survivors)/len(tests_df):.0f}%)\n')

for _, row in survivors.sort_values('p_fdr').iterrows():
    print(f"✓ {row['test']}")
    print(f"  p_original = {row['p_original']:.6f}, p_fdr = {row['p_fdr']:.6f}")
    print(f"  Category: {row['category']}, Source: {row['notebook']}\n")

In [None]:
# Findings that were "significant" but don't survive FDR
casualties = tests_df[(tests_df['sig_original']) & (~tests_df['sig_fdr'])].copy()

print('CASUALTIES: Originally Significant, Not After FDR')
print('=' * 70)
print(f'Total casualties: {len(casualties)}\n')

for _, row in casualties.sort_values('p_original').iterrows():
    print(f"✗ {row['test']}")
    print(f"  p_original = {row['p_original']:.4f} → p_fdr = {row['p_fdr']:.4f}")
    print(f"  Category: {row['category']}, Source: {row['notebook']}\n")

## 4. Interpretation & Implications

In [None]:
# Category-level summary
print('SUMMARY BY CATEGORY')
print('=' * 70)

category_summary = tests_df.groupby('category').agg({
    'sig_original': 'sum',
    'sig_fdr': 'sum',
    'test': 'count'
}).rename(columns={'sig_original': 'n_sig_orig', 'sig_fdr': 'n_sig_fdr', 'test': 'n_tests'})

print(category_summary.to_string())

print('\n' + '=' * 70)
print('KEY TAKEAWAYS')
print('=' * 70)

print('''
1. ROBUST FINDINGS (survive FDR):
   - Religion effect on survival (total_rel): p_fdr < 0.001
   - Era differences in survival (Ancient vs others): p_fdr < 0.001
   - Complexity×warfare interaction: p_fdr < 0.05
   - Non-linear complexity effect (PC1²): p_fdr < 0.05

2. EXPLORATORY (don't survive FDR):
   - Era-specific complexity effects (Ancient PC1 p=0.046 → p_fdr>0.05)
   - Individual religion sub-scores (moral, ideology)
   - Medieval religion effect

3. IMPLICATIONS FOR CLAIMS:
   - "Religion destabilizes" → CONFIRMED (robust)
   - "Ancient complexity effect" → EXPLORATORY (needs replication)
   - "Era differences exist" → CONFIRMED for Ancient vs later
   - "Complexity×warfare interaction" → CONFIRMED (robust)

4. RECOMMENDED FRAMING:
   - Primary hypothesis: Religion effect (confirmed)
   - Secondary exploratory: Era-specific patterns (tentative)
   - All subgroup analyses should be labeled "exploratory"
''')

In [None]:
# Export results
tests_df.to_csv('models/fdr_correction_results.csv', index=False)
print('Saved: models/fdr_correction_results.csv')

# Create summary for documentation
summary_df = tests_df[['test', 'category', 'p_original', 'p_fdr', 'sig_original', 'sig_fdr']].copy()
summary_df['status'] = summary_df.apply(
    lambda x: 'ROBUST' if x['sig_fdr'] else ('EXPLORATORY' if x['sig_original'] else 'NS'),
    axis=1
)
summary_df.to_csv('models/fdr_summary.csv', index=False)
print('Saved: models/fdr_summary.csv')

## 5. Updated Claims for Documentation

### Confirmed (FDR-corrected p < 0.05):
- Religion (total_rel) has destabilizing effect on survival (HR = 1.58, p_fdr < 0.001)
- Ancient era has significantly longer polity duration than later eras
- Complexity×warfare interaction is significant
- Non-linear complexity effect (inverted-U) is significant

### Exploratory (require replication):
- Era-specific complexity effects (Ancient negative, others neutral)
- Medieval religion effect
- Individual religion sub-components (moral, ideology)

### Not Significant:
- Main complexity effect (PC1) in pooled model
- Warfare main effect
- Complexity effects in Classical/Early Modern eras