# Reproducibility Lab: Day 2 -- Validation

## HYPOTHESIS-DRIVEN (Pain)

Welcome back. Today you will find out whether your pre-registered hypothesis holds up in an independent dataset.

---

## Part 1: Setup and Re-enter Your Pre-Registered Analysis

Run the setup cell below. **Before clicking Run**, fill in the three variables at the top of the cell to match your Day 1 pre-registration. These must match exactly what you committed to on Day 1 -- do not change them based on the results.

In [None]:
# =====================================================================
# FILL IN YOUR PRE-REGISTERED CHOICES FROM DAY 1 BEFORE RUNNING
# =====================================================================

covariates = None        # Example: ['Age', 'Sex'] or None
outlier_threshold = None # Example: 2 or 3, or None for no removal
subgroup = None          # Example: {'Sex': 0} or None for full sample

# =====================================================================

import subprocess, sys
subprocess.check_call([sys.executable, '-m', 'pip', 'install', 'nilearn', '-q'])

import os, urllib.request
base_url = 'https://raw.githubusercontent.com/cmahlen/python-stats-demo/main/'
files_needed = [
    'lab_helpers.py', 'atlas_labels.txt', 'data/roi_mni_coords.npy',
    'data/pain_discovery.npz', 'data/pain_validation.npz',
]
os.makedirs('data', exist_ok=True)
for f in files_needed:
    if not os.path.exists(f):
        urllib.request.urlretrieve(base_url + f, f)

import lab_helpers as helpers
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import pearsonr
from statsmodels.stats.multitest import multipletests

helpers.load_dataset('pain', 'discovery')
behavior = helpers.get_behavior()

# Re-run your hypothesis test with your pre-registered choices
all_results = helpers.test_network_edges('Subcortical',
                                          covariates=covariates,
                                          exclude_outliers=outlier_threshold,
                                          subgroup=subgroup)
hyp_mask = (
    (all_results['ROI_A'].str.contains('NAc') & all_results['ROI_B'].str.contains('DefaultA_PFCm')) |
    (all_results['ROI_B'].str.contains('NAc') & all_results['ROI_A'].str.contains('DefaultA_PFCm'))
)
hyp_results = all_results[hyp_mask].reset_index(drop=True)
n_hyp_tests = len(hyp_results)

alpha = 0.05
reject, p_corrected, _, _ = multipletests(hyp_results['p'], alpha=alpha, method='fdr_bh')
hyp_results['p_corrected'] = p_corrected
hyp_results['significant_fdr'] = reject
n_significant = hyp_results['significant_fdr'].sum()

def _short_name(roi):
    if '-' in roi:
        return roi
    parts = roi.split('_')
    if len(parts) >= 4:
        return '_'.join(parts[1:-1])
    return roi

print(f"Reloaded! {n_significant} significant NAc-PFCm edges (FDR corrected)")
print(f"\nYour hypothesis edges:")
print(hyp_results[['ROI_A', 'ROI_B', 'r', 'p', 'p_corrected', 'significant_fdr']].to_string())

---

## Part 2: Reflection Questions

Before we test your findings in the validation data, think about these questions:

1. **How confident are you that your findings will replicate?**
   - What would make you more or less confident?

2. **What could go wrong?**
   - Even with FDR correction, could your results still be false positives?

3. **How is your approach different from the exploratory group's?**
   - How many tests did you run compared to them?
   - Did you choose your analysis plan before or after seeing the results?

4. **Effect sizes**
   - How strong were the correlations you found?
   - Are these practically meaningful for understanding pain?

---

## Part 3: Validation

### Test your findings in the independent validation set

True effects should replicate in new data. You will now re-run your **exact pre-registered analysis** on the validation dataset.

In [None]:
# Load the validation dataset
helpers.load_dataset('pain', 'validation')

The cell below tests each of your hypothesis edges in the validation dataset and compares the results to your discovery findings. An edge "replicates" if it is significant in the same direction in both datasets. You do not need to understand every line of code -- just run the cell and read the output.

In [None]:
import pandas as pd

# Test ALL hypothesis edges in validation using the SAME analysis choices
val_behavior = helpers.get_behavior()
val_outcome = val_behavior['Pain_VAS'].values

validation_results = []
for _, row in hyp_results.iterrows():
    edge_vals = helpers.get_edge(row['ROI_A'], row['ROI_B'])
    r_val, p_val = pearsonr(edge_vals, val_outcome)

    sig_disc = row['significant_fdr']
    same_direction = (r_val * row['r']) > 0
    if sig_disc:
        if p_val < 0.05 and same_direction:
            replicated = 'YES'
        elif p_val < 0.05 and not same_direction:
            replicated = 'FLIPPED'
        else:
            replicated = 'NO'
    else:
        replicated = 'N/A'

    validation_results.append({
        'ROI_A': row['ROI_A'],
        'ROI_B': row['ROI_B'],
        'Disc_r': row['r'],
        'Disc_p': row['p'],
        'Sig_Disc': sig_disc,
        'Val_r': r_val,
        'Val_p': p_val,
        'Replicated': replicated,
    })

val_df = pd.DataFrame(validation_results)

print('=' * 80)
print('VALIDATION RESULTS: NAc-PFCm Edges')
print('=' * 80)
print('Validation uses uncorrected p < 0.05 + same direction as replication criterion\n')

display_df = val_df[['ROI_A', 'ROI_B', 'Disc_r', 'Disc_p', 'Sig_Disc', 'Val_r', 'Val_p', 'Replicated']]
display_df.index = range(1, len(display_df) + 1)
print(display_df.to_string())

if n_significant > 0:
    n_rep = (val_df['Replicated'] == 'YES').sum()
    print(f"\n{'=' * 80}")
    print(f'REPLICATION: {n_rep}/{n_significant} significant findings replicated')
    print(f"{'=' * 80}")

n_flipped = (val_df['Replicated'] == 'FLIPPED').sum()
if n_flipped > 0:
    print(f'WARNING: {n_flipped} finding(s) were significant but in the OPPOSITE direction!')
    print('A flipped direction means the effect is not replicating -- it is noise.')

The cell below creates side-by-side scatter plots comparing your discovery and validation results. The left plot shows the discovery set; the right plot shows the validation set. If the finding replicated, both plots should show a similar pattern.

In [None]:
# Side-by-side visualization for significant finding(s)
if n_significant > 0:
    sig_rows = val_df[val_df['Sig_Disc'] == True]

    for _, row in sig_rows.iterrows():
        fig, axes = plt.subplots(1, 2, figsize=(13, 5))

        # Discovery
        helpers.load_dataset('pain', 'discovery')
        disc_edge = helpers.get_edge(row['ROI_A'], row['ROI_B'])
        disc_out = helpers.get_behavior()['Pain_VAS'].values
        r_d, p_d = pearsonr(disc_edge, disc_out)

        axes[0].scatter(disc_edge, disc_out, alpha=0.5, color='steelblue')
        z = np.polyfit(disc_edge, disc_out, 1)
        x_line = np.linspace(disc_edge.min(), disc_edge.max(), 100)
        axes[0].plot(x_line, np.polyval(z, x_line), color='coral', linewidth=2)
        axes[0].set_xlabel('Functional Connectivity')
        axes[0].set_ylabel('Pain_VAS')
        axes[0].set_title(f'Discovery Set\nr = {r_d:.3f}, p = {p_d:.2e}')

        # Validation
        helpers.load_dataset('pain', 'validation')
        val_edge = helpers.get_edge(row['ROI_A'], row['ROI_B'])
        val_out = helpers.get_behavior()['Pain_VAS'].values
        r_v, p_v = pearsonr(val_edge, val_out)

        axes[1].scatter(val_edge, val_out, alpha=0.5, color='mediumseagreen')
        z = np.polyfit(val_edge, val_out, 1)
        x_line = np.linspace(val_edge.min(), val_edge.max(), 100)
        axes[1].plot(x_line, np.polyval(z, x_line), color='coral', linewidth=2)
        axes[1].set_xlabel('Functional Connectivity')
        axes[1].set_ylabel('Pain_VAS')
        axes[1].set_title(f'Validation Set\nr = {r_v:.3f}, p = {p_v:.2e}')

        short_a = _short_name(row['ROI_A'])
        short_b = _short_name(row['ROI_B'])
        fig.suptitle(f'{short_a} <-> {short_b}: Discovery vs Validation', fontsize=13)
        plt.tight_layout()
        plt.show()

        if row['Replicated'] == 'YES':
            print(f'REPLICATED in validation set!')
        else:
            print(f'Did not replicate in validation set.')
        print()
else:
    print('No significant findings to visualize.')

---

## Part 5: Your Turn -- Test a Non-Significant Edge

Pick one edge from your hypothesis set that was **not** significant in discovery. Does it look any different in validation? This helps build intuition about what "noise" looks like compared to a real effect.

<details>
<summary>Hint: Example code</summary>

```python
# Pick the last (least significant) edge from your hypothesis results
nonsig = hyp_results.iloc[-1]
helpers.load_dataset('pain', 'discovery')
helpers.plot_edge(nonsig['ROI_A'], nonsig['ROI_B'], 'Pain_VAS')
```
</details>

In [None]:
# Your Turn: test a non-significant edge here
