# Reproducibility Lab: Functional Connectivity and Pain

## HYPOTHESIS-DRIVEN ANALYSIS

## Learning Objectives

By the end of this lab, you will be able to:
- Formulate a hypothesis about brain-behavior relationships based on prior literature
- Use analytical tools (covariates, outliers, subgroups) to refine your analysis
- Pre-register an analysis plan before seeing your results
- Conduct a hypothesis-driven functional connectivity analysis with Bonferroni correction
- Validate findings in an independent dataset
- Present rigorous, replicable findings to the class

---

## Part 1: Setup and Load Data

In [None]:
# Install nilearn for brain visualizations and download data files
import subprocess, sys
subprocess.check_call([sys.executable, '-m', 'pip', 'install', 'nilearn', '-q'])

import os
import urllib.request

base_url = 'https://raw.githubusercontent.com/cmahlen/python-stats-demo/main/'
files_needed = [
    'lab_helpers.py',
    'atlas_labels.txt',
    'data/roi_mni_coords.npy',
    'data/pain_discovery.npz',
    'data/pain_validation.npz',
]

os.makedirs('data', exist_ok=True)
for f in files_needed:
    if not os.path.exists(f):
        print(f'Downloading {f}...')
        urllib.request.urlretrieve(base_url + f, f)

print('Setup complete!')

In [None]:
import lab_helpers as helpers
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import pearsonr

# Load the discovery dataset
helpers.load_dataset('pain', 'discovery')

**Tip: Exploring the `helpers` module**

All the pre-built functions in this lab are accessed through `helpers`. Two useful ways to discover what is available:

- **Tab completion:** In any code cell, type `helpers.` and press **Tab**. Colab will show a dropdown of all available functions.
- **Function documentation:** Run `help(helpers.plot_edge)` in a code cell to see what arguments any function accepts and what it does. Replace `plot_edge` with any function name. Run `help(helpers)` to see everything at once.

**A note on Colab's AI tools:** You are welcome to use Colab's built-in AI features (Gemini) to understand code or debug errors. However, your **analytical decisions** -- which regions to test, what covariates to include, how to handle outliers -- should be your own. These choices are what you will pre-register and defend in your presentation. This mirrors best practices in real research: AI is a tool to support your thinking, not replace it. If you are unsure whether something is appropriate to use AI for, ask Caleb.

---

## Part 2: Peek at the Data

Before testing your hypothesis, look at the actual data. This is a crucial first step in any analysis.

In [None]:
# What variables do we have? Print descriptions and units for each one.
helpers.describe_variables()

In [None]:
# Look at the first few rows of behavioral data
behavior = helpers.get_behavior()
behavior.head()

In [None]:
# Summary statistics
behavior.describe()

Each row is one subject. The table above shows the mean, standard deviation (std), minimum, and maximum for each variable. The 25%/50%/75% rows are **percentiles** -- for example, the 50% row is the median (the middle value).

In [None]:
# Always visualize your outcome variable first
plt.figure(figsize=(7, 4))
plt.hist(behavior['Pain_VAS'], bins=15, color='steelblue', edgecolor='white')
plt.xlabel('Pain_VAS (Pain Severity)')
plt.ylabel('Number of Subjects')
plt.title('Distribution of Pain Scores')
plt.tight_layout()
plt.show()

print(f'Mean: {behavior["Pain_VAS"].mean():.1f}')
print(f'Std: {behavior["Pain_VAS"].std():.1f}')
print(f'Range: {behavior["Pain_VAS"].min():.1f} to {behavior["Pain_VAS"].max():.1f}')

In [None]:
# Explore a relationship between two variables
plt.figure(figsize=(7, 4))
plt.scatter(behavior['Stress_Level'], behavior['Pain_VAS'], alpha=0.5,
            color='steelblue', edgecolors='white', linewidth=0.5)
plt.xlabel('Stress Level')
plt.ylabel('Pain_VAS')
plt.title('Stress Level vs Pain')
plt.tight_layout()
plt.show()

---

## Part 3: Background and Hypothesis

### Pain and Reward Circuitry

A growing body of research suggests that **reward circuitry** plays an important role in chronic pain vulnerability. The nucleus accumbens (NAc) and medial prefrontal cortex (mPFC) (key nodes of the brain's reward system) may show altered connectivity in chronic pain patients.

### Relevant Literature

> <a href="https://www.sciencedirect.com/science/article/pii/S0304395913006908">Geha et al., 2014</a>: Decreased food pleasure and disrupted satiety signals in chronic low back pain, ***Pain***

> <a href="https://www.nature.com/articles/s41467-025-65080-9">Yaakub et al., 2025:</a> Non-invasive ultrasonic neuromodulation of the human nucleus accumbens impacts reward sensitivity, ***Nature***

> <a href="https://pubmed.ncbi.nlm.nih.gov/25243988/">Metereau & Dreher 2015:</a> The medial orbitofrontal cortex encodes a general unsigned value signal during anticipation of both appetitive and aversive events, ***Cortex***

Read these articles (just the abstracts and maybe a few figures). 

Think about what brain regions you might expect to see functional connectivity correlate with pain symptoms. The most common way to measure pain symptoms is through the **Visual Analog Scale (VAS)**

### Your Turn: Formulate Your Hypothesis

Based on the literature above, write your hypothesis in one sentence. What brain regions do you expect to be connected to Pain_VAS? In which direction (positive or negative)?

> **H1: Connectivity between {which regions? enter here} correlates with Pain_VAS.**

This is a focused hypothesis: you are testing only the edges connecting X regions (left and right) with Y regions -- a small number of specific connections motivated by the literature.

Take a moment to think about *why* you expect this relationship. What does the literature suggest about how reward circuitry might influence pain perception?

---

## Part 4: Explore Your Target Regions

Before testing your hypothesis, examine the regions involved and visualize an example edge.

In [None]:
# What subcortical regions are available?
helpers.list_regions('Subcortical')

In [None]:
# What DefaultA regions are available?
helpers.list_regions('Default')

In [None]:
# Overview of all brain networks
helpers.list_networks()

**Tip:** Use `helpers.describe_regions('Subcortical')` to see decoded names for each region (e.g., what "NAc-rh" stands for).

In [None]:
# See decoded region names for subcortical areas
helpers.describe_regions('Subcortical')

In [None]:
# Visualize the overall connectivity structure
helpers.plot_connectome()

In [None]:
# Zoomed view of Subcortical network connectivity
helpers.plot_network_matrix('Subcortical')

### Visualize a Single Edge

Before diving into analysis, let's see what a brain connectivity scatter plot actually looks like. Here we'll plot the connectivity between two subcortical regions against Age -- this is just to get familiar with the visualization, not to test your hypothesis yet.

In [None]:
# Plot connectivity between two subcortical regions vs Age
# This is just to see what an FC scatter plot looks like
helpers.plot_edge('NAc-rh', 'CAU-rh', 'Age')

### Quick Reference: Functional Connectivity Concepts

**Functional connectivity (FC)** measures how correlated the activity is between two brain regions during a resting-state fMRI scan.

**Key terms:**
- **ROI (Region of Interest)**: A specific brain area. This dataset has 216 ROIs from a standard brain atlas.
- **Edge**: A connection between two ROIs. Each edge has a connectivity value per subject. With 216 ROIs, there are 23,220 unique edges.
- **Network**: ROIs are grouped into brain networks (e.g., Default Mode, Salience, Subcortical) based on their function.

**Interpreting Pearson r** (correlation strength):
- |r| < 0.10 -- negligible
- |r| around 0.10-0.20 -- small
- |r| around 0.20-0.30 -- medium
- |r| > 0.30 -- large (rare in neuroimaging)

**r-squared** (r x r) tells you the proportion of variance explained. An r of 0.20 means r-squared = 0.04 -- the brain connection explains about 4% of individual differences.

---

## Part 5: Your Analysis Toolkit

Before testing your hypothesis, let's learn the analytical tools available to you. Each tool is a standard technique that researchers use every day. Understanding these tools will help you make informed decisions when you pre-register your analysis plan in Part 6.

All of these tools work with both `plot_edge()` and the mass testing functions (`test_all_edges()`, `test_network_edges()`).

### 5a. Covariates: Controlling for Confounding Variables

A **covariate** is a variable you account for ("control for") to see whether a relationship still holds after removing its influence. When you control for a variable, the axes show "residualized" values -- what's left of each variable after statistically removing the influence of the covariate.

**Why does this matter?** Sometimes two variables look related, but the apparent relationship is actually driven by a third variable. Let's see an example with the behavioral data first, then apply the same logic to brain data.

In [None]:
# Is Social_Media use related to Sleep_Quality?
helpers.plot_behavior('Social_Media', 'Sleep_Quality')

In [None]:
# People who use more screens overall may have both more social media use
# AND worse sleep. Let's control for Screen_Time and see what happens:
helpers.plot_behavior('Social_Media', 'Sleep_Quality', covariates=['Screen_Time'])

Notice how the correlation changes after controlling for Screen_Time. The apparent link between social media and sleep quality was largely driven by overall screen time.

The same principle applies to brain data. Let's see how controlling for a behavioral variable changes a brain connectivity relationship:

In [None]:
# Does this Dorsal Attention - Visual edge predict Pain_VAS?
helpers.plot_edge('LH_DorsAttnA_TempOcc_1', 'RH_VisCent_ExStr_2', 'Pain_VAS')

In [None]:
# Now control for Stress_Level and see what happens:
helpers.plot_edge('LH_DorsAttnA_TempOcc_1', 'RH_VisCent_ExStr_2', 'Pain_VAS',
                  covariates=['Stress_Level'])

Notice how the correlation changed after controlling for Stress_Level. Covariates can **strengthen or weaken** findings. The important thing is to choose your covariates **before** you see your results -- otherwise you might unconsciously pick covariates that make your findings look better.

### 5b. Outlier Handling

An **outlier** is a data point that is unusually far from the rest. **Z-scores** measure how many standard deviations (SD) a value is from the mean. A z-score of 2 means the value is in roughly the most extreme 5% of data; a z-score of 3 is the most extreme 0.3%.

Outliers can have a big impact on correlations. Sometimes a "significant" result is driven entirely by a few extreme values. Other times, outliers can obscure a real effect. Let's see both cases.

You can add an `exclude_outliers` argument to remove extreme values:

In [None]:
# EXAMPLE 1: A "significant" result that disappears when outliers are removed
print('--- With all data ---')
helpers.plot_edge('LH_ContA_IPS_3', 'RH_SomMotB_Aud_1', 'Pain_VAS')

In [None]:
print('--- After removing outliers (z > 2) ---')
helpers.plot_edge('LH_ContA_IPS_3', 'RH_SomMotB_Aud_1', 'Pain_VAS',
                  exclude_outliers=2)

The correlation went from "significant" to nowhere near significant. Those few extreme data points were creating the illusion of a relationship.

Now let's see the opposite -- outliers *hiding* a real effect:

In [None]:
# EXAMPLE 2: A real effect that only appears after removing outliers
print('--- With all data ---')
helpers.plot_edge('LH_LimbicB_OFC_2', 'RH_ContB_PFCld_1', 'Pain_VAS')

In [None]:
print('--- After removing outliers (z > 2) ---')
helpers.plot_edge('LH_LimbicB_OFC_2', 'RH_ContB_PFCld_1', 'Pain_VAS',
                  exclude_outliers=2)

After removing extreme values, a real relationship emerged. Outliers can work both ways -- they can create false positives or hide true effects. That is why your outlier handling strategy should be decided **in advance** as part of your pre-registration, not adjusted after seeing your results.

### 5c. Subgroup Analysis

Sometimes a relationship exists in one group but not another. For example, does anxiety relate to pain differently for men versus women?

In [None]:
# Overall relationship: anxiety vs pain
helpers.plot_behavior('GAD7', 'Pain_VAS')

In [None]:
# Does the relationship differ by sex?
helpers.plot_behavior('GAD7', 'Pain_VAS', subgroup={'Sex': 0})  # women
helpers.plot_behavior('GAD7', 'Pain_VAS', subgroup={'Sex': 1})  # men

Notice how the relationship between anxiety and pain is much stronger in women than in men. Subgroup analysis can reveal effects hidden in the full sample. But there is an important trade-off: splitting your sample **reduces your sample size and statistical power**. Only analyze subgroups if you have a strong reason from the literature -- not just because it makes your results look better.

The `subgroup` argument works with brain data too:
```python
helpers.plot_edge('NAc-rh', 'RH_DefaultA_PFCm_1', 'Pain_VAS', subgroup={'Sex': 0})
```

### 5d. Multiple Comparisons

When you test many edges at once, some will appear "significant" just by chance. If you test 100 edges at p < 0.05, you would expect about 5 false positives even if there are no real effects at all.

There are two common ways to correct for this:

- **Bonferroni correction**: Divide your alpha (0.05) by the number of tests. Very conservative -- it controls the chance of *any* false positive. Simple, but can miss real effects when you have many tests.

- **FDR (False Discovery Rate) correction** (Benjamini-Hochberg): Controls the *proportion* of false positives among your significant results. Less conservative, and a standard in neuroimaging research.

Let's see this in action with an example unrelated to your hypothesis. We will test all edges within the Limbic network against Pain_VAS:

In [None]:
# Test all within-Limbic edges vs Pain_VAS (NOT your hypothesis -- just a demo)
limbic_results = helpers.test_network_edges('Limbic', within=True)
n_limbic = len(limbic_results)

# How many are "significant" without correction?
n_uncorrected = (limbic_results['p'] < 0.05).sum()
print(f'Tested {n_limbic} within-Limbic edges')
print(f'Significant at p < 0.05 (uncorrected): {n_uncorrected}')
print(f'Expected by chance alone: {n_limbic * 0.05:.0f}')

# Now apply FDR correction
limbic_fdr = helpers.test_network_edges('Limbic', within=True, correction='fdr')
n_fdr = (limbic_fdr['p_corrected'] < 0.05).sum()
print(f'\nSignificant after FDR correction: {n_fdr}')
print(f'\nMany "findings" disappear after proper correction!')

In [None]:
# Visualize the p-values from the Limbic network test
plt.figure(figsize=(10, 5))

colors = ['coral' if p < 0.05 else 'gray' for p in limbic_results['p']]
plt.scatter(range(n_limbic), sorted(limbic_results['p']), c=colors, s=40)
plt.axhline(0.05, color='orange', linestyle='--', linewidth=1,
            label='Uncorrected alpha = 0.05')
plt.yscale('log')
plt.xlabel('Edge (ranked by p-value)')
plt.ylabel('P-value (log scale)')
plt.title(f'{n_limbic} Within-Limbic Edges: {n_uncorrected} "significant" uncorrected, 0 after FDR')
plt.legend()
plt.grid(True, alpha=0.2)
plt.tight_layout()
plt.show()

print(f'Coral dots are below 0.05 -- these would be "significant" without correction.')
print(f'But none survive FDR correction. They are likely false positives.')

This is why multiple comparison correction is essential. Without it, you would report false positives as real findings. In your pre-registration below, we will use **FDR correction** as the default.

### Key Takeaway

You now have a full toolkit: **covariates**, **outlier handling**, and **subgroup analysis**. In the next section, you will formally commit to your analysis choices **before** seeing the results. This is called **pre-registration** -- it is what separates hypothesis-driven science from exploratory analysis.

---

## Part 6: Pre-Register Your Analysis

In real research, scientists often **pre-register** their analysis plan before collecting data. This means writing down exactly what you will test and how -- before you see the results. Pre-registration prevents you from unconsciously adjusting your analysis to get the answer you want.

Fill in each section below. Once you have committed to your plan, you will execute it in Part 8.

### My Pre-Registration

**Hypothesis:** Connectivity between the nucleus accumbens (NAc) and medial prefrontal cortex (DefaultA PFCm regions) correlates with Pain_VAS.

**Edges to test:** [Which specific edges will you test? e.g., "NAc <-> DefaultA_PFCm edges"]

**Number of tests:** [How many edges does this include? You will find out when you run the code in Part 8.]

**Correction method:** FDR (Benjamini-Hochberg) -- see explanation below

**Covariates:** [Which covariates will you control for, if any? Justify your choice based on the literature. Write "None" if you will not use covariates.]

**Outlier handling:** [Will you exclude outliers? If so, at what z-score threshold? (e.g., 2 or 3) Write "None" if you will not exclude outliers.]

**Subgroup analysis:** [Will you analyze any subgroups? If so, which? Write "None" if you will analyze the full sample.]

**Significance threshold:** 0.05 (after FDR correction)

---

## Part 7: Practice -- Testing a Known Effect

Before testing your hypothesis, let's practice with a well-established finding.

It is well documented in the neuroscience literature that functional connectivity within the **Default Mode Network** decreases with age. This is one of the most replicated findings in resting-state fMRI research. Let's test this in our data to build confidence with the tools before applying them to your hypothesis.

In [None]:
# Plot a single Default Mode Network edge vs Age
# These two regions are both in the Default Mode Network:
#   - LH_DefaultA_PFCm_1 = left medial prefrontal cortex
#   - LH_DefaultA_pCunPCC_1 = left precuneus/posterior cingulate
helpers.plot_edge('LH_DefaultA_PFCm_1', 'LH_DefaultA_pCunPCC_1', 'Age')

In [None]:
# Test all within-Default Mode edges vs Age with FDR correction
dmn_age = helpers.test_network_edges('Default Mode', behavior_col='Age',
                                      correction='fdr', within=True)

n_sig = (dmn_age['p_corrected'] < 0.05).sum()
print(f"\nEdges with significant age-related decline (FDR corrected): {n_sig}")

# Show the top findings
print("\nTop 10 within-Default Mode edges correlated with Age:")
print(dmn_age.head(10)[['ROI_A', 'ROI_B', 'r', 'p', 'p_corrected']].to_string())

You should see mostly **negative correlations** -- as age increases, Default Mode Network connectivity decreases. This is one of the most replicated findings in neuroscience.

If you see significant results after FDR correction, that confirms the tools are working correctly and the data contains real brain-behavior relationships.

Now you are ready to test your own hypothesis.

---

## Part 8: Test Your Hypothesis

Now execute your pre-registered analysis plan. The code below tests your hypothesis edges -- NAc regions connected to DefaultA PFCm regions.

In [None]:
# Step 1: Test edges involving the Subcortical network
# We focus on Subcortical because our hypothesis is about NAc (nucleus accumbens)
all_results = helpers.test_network_edges('Subcortical')

In [None]:
# Step 2: Filter to your specific hypothesis edges
# Keep only rows where one region is NAc and the other is DefaultA PFCm
hyp_mask = (
    (all_results['ROI_A'].str.contains('NAc') & all_results['ROI_B'].str.contains('DefaultA_PFCm')) |
    (all_results['ROI_B'].str.contains('NAc') & all_results['ROI_A'].str.contains('DefaultA_PFCm'))
)
hyp_results = all_results[hyp_mask].reset_index(drop=True)

n_hyp_tests = len(hyp_results)
print(f"NAc <-> DefaultA PFCm edges to test: {n_hyp_tests}")
print(f"\nAll results (sorted by p-value):")
hyp_results

### Apply Your Pre-Registered Choices

Modify the code below based on what you wrote in your pre-registration (Part 6). If you chose not to use covariates, outliers, or subgroups, leave them set to `None`.

In [None]:
# Step 3: Apply your pre-registered analysis choices
# Edit these variables based on what you wrote in your pre-registration:

covariates = None              # e.g., ['Age', 'Sex'] or None for no covariates
outlier_threshold = None       # e.g., 2 or 3, or None for no outlier removal
subgroup = None                # e.g., {'Sex': 0} or None for full sample

# Re-test with your pre-registered choices (only if you specified any)
if covariates is not None or outlier_threshold is not None or subgroup is not None:
    all_results = helpers.test_network_edges('Subcortical',
                                              covariates=covariates,
                                              exclude_outliers=outlier_threshold,
                                              subgroup=subgroup)

    # Re-filter to hypothesis edges (recompute the mask for the new results)
    hyp_mask = (
        (all_results['ROI_A'].str.contains('NAc') & all_results['ROI_B'].str.contains('DefaultA_PFCm')) |
        (all_results['ROI_B'].str.contains('NAc') & all_results['ROI_A'].str.contains('DefaultA_PFCm'))
    )
    hyp_results = all_results[hyp_mask].reset_index(drop=True)
    n_hyp_tests = len(hyp_results)
    print(f"\nRe-filtered to {n_hyp_tests} hypothesis edges with your pre-registered choices.")
    print(hyp_results[['ROI_A', 'ROI_B', 'r', 'p']].to_string())
else:
    print("No covariates, outlier removal, or subgroup specified.")
    print("Using the results from Step 2 above.")

### Apply Multiple Comparison Correction

Because you are testing multiple edges, you need to correct for multiple comparisons. There are two common approaches:

- **Bonferroni correction**: Divides your alpha (0.05) by the number of tests. Very conservative -- it controls the probability of even one false positive. Simple but can miss real effects.

- **FDR (False Discovery Rate) correction**: Controls the *proportion* of false positives among your significant results. Less conservative than Bonferroni, and widely used in neuroimaging research.

We will use **FDR correction** (Benjamini-Hochberg method), which is the standard approach in neuroimaging and is what the exploratory group used as well.

In [None]:
from statsmodels.stats.multitest import multipletests

alpha = 0.05

# Apply FDR (Benjamini-Hochberg) correction
reject, p_corrected, _, _ = multipletests(hyp_results['p'], alpha=alpha, method='fdr_bh')
hyp_results['p_corrected'] = p_corrected
hyp_results['significant_fdr'] = reject

# For reference, also compute Bonferroni threshold
bonferroni_threshold = alpha / n_hyp_tests

n_significant = hyp_results['significant_fdr'].sum()

print('=' * 60)
print('MULTIPLE COMPARISON CORRECTION')
print('=' * 60)
print(f'Number of tests: {n_hyp_tests}')
print(f'Method: FDR (Benjamini-Hochberg)')
print(f'For reference -- Bonferroni threshold would be: {bonferroni_threshold:.6f}')
print(f'\nSignificant after FDR correction: {n_significant}')

if n_significant > 0:
    print(f'\nSignificant edges:')
    sig = hyp_results[hyp_results['significant_fdr']]
    for _, row in sig.iterrows():
        print(f"  {row['ROI_A']} <-> {row['ROI_B']}: r={row['r']:.3f}, p={row['p']:.2e}, p_fdr={row['p_corrected']:.2e}")
else:
    print('\nNo edges survived FDR correction.')
    print('Most promising edge:')
    top = hyp_results.iloc[0]
    print(f"  {top['ROI_A']} <-> {top['ROI_B']}: r={top['r']:.3f}, p={top['p']:.2e}")

In [None]:
# Visualize the top hypothesis-driven findings
def _short_name(roi):
    """Extract readable short name from ROI label."""
    if '-' in roi:  # Subcortical (e.g., NAc-rh)
        return roi
    parts = roi.split('_')
    if len(parts) >= 4:
        return '_'.join(parts[1:-1])  # e.g. DefaultA_PFCm
    return roi

if n_significant > 0:
    sig_edges = hyp_results[hyp_results['significant_fdr']]
    n_plots = min(len(sig_edges), 3)
    fig, axes = plt.subplots(1, n_plots, figsize=(5*n_plots, 4))
    if n_plots == 1:
        axes = [axes]

    for idx, (_, row) in enumerate(sig_edges.head(n_plots).iterrows()):
        edge_vals = helpers.get_edge(row['ROI_A'], row['ROI_B'])
        outcome = behavior['Pain_VAS'].values
        r_val, p_val = pearsonr(edge_vals, outcome)

        axes[idx].scatter(edge_vals, outcome, alpha=0.5, color='steelblue')
        z = np.polyfit(edge_vals, outcome, 1)
        x_line = np.linspace(edge_vals.min(), edge_vals.max(), 100)
        axes[idx].plot(x_line, np.polyval(z, x_line), color='coral', linewidth=2)
        axes[idx].set_xlabel('Functional Connectivity')
        axes[idx].set_ylabel('Pain_VAS')

        short_a = _short_name(row['ROI_A'])
        short_b = _short_name(row['ROI_B'])
        axes[idx].set_title(f'{short_a} <-> {short_b}\nr = {r_val:.3f}, p = {p_val:.2e}')

    plt.tight_layout()
    plt.show()
else:
    # Plot most promising even if not significant
    top = hyp_results.iloc[0]
    helpers.plot_edge(top['ROI_A'], top['ROI_B'], 'Pain_VAS')

### Visualize Results on the Brain

Let's see where your significant edges are located in 3D brain space:

In [None]:
# Plot significant edges on a glass brain (using FDR-corrected p-values)
helpers.plot_glass_brain(hyp_results, p_threshold=0.05, corrected=True)

In [None]:
# P-value visualization for all hypothesis tests
plt.figure(figsize=(10, 5))

colors = ['mediumseagreen' if sig else 'gray' for sig in hyp_results['significant_fdr']]
plt.scatter(range(len(hyp_results)), hyp_results['p'], c=colors, s=60)
plt.axhline(bonferroni_threshold, color='coral', linestyle='--', linewidth=1,
            label=f'Bonferroni threshold (p={bonferroni_threshold:.2e})')
plt.axhline(0.05, color='orange', linestyle='--', linewidth=1,
            label='Uncorrected alpha = 0.05')
plt.yscale('log')
plt.xlabel('Edge (ranked by p-value)')
plt.ylabel('P-value (log scale)')
plt.title(f'P-values for {n_hyp_tests} NAc-PFCm Tests')
plt.legend()
plt.grid(True, alpha=0.2)
plt.tight_layout()
plt.show()

print(f'Green dots = significant after FDR correction')
print(f'Note: FDR correction uses a step-up procedure, not a single threshold line.')
print(f'The Bonferroni line is shown for reference only.')

---

## Part 9: Prepare Your Presentation

For next class, prepare a brief presentation (5-7 minutes) covering:

1. **Your hypothesis** -- What did you predict and why?
2. **Your pre-registered methods** -- Which edges did you test? What correction, covariates, and outlier threshold did you use?
3. **Your results** -- Report ALL tests (not just significant ones). Show visualizations.
4. **Why you believe your findings** -- What makes you confident? Do the brain regions make sense?
5. **How would you convince a skeptic?** -- What evidence would you point to?

**Important**: Be honest about null results! Rigorous science means reporting what you found, not just what you hoped to find.

In [None]:
# Save your key figures for your presentation
# Re-plot your strongest finding and save it
if n_significant > 0:
    top = hyp_results[hyp_results['significant_fdr']].iloc[0]
    helpers.plot_edge(top['ROI_A'], top['ROI_B'], 'Pain_VAS',
                      covariates=covariates,
                      exclude_outliers=outlier_threshold,
                      subgroup=subgroup)
    plt.savefig('my_finding.png', dpi=150, bbox_inches='tight')
    print("Figure saved as 'my_finding.png'")
    print("Download this file for your presentation.")
else:
    top = hyp_results.iloc[0]
    helpers.plot_edge(top['ROI_A'], top['ROI_B'], 'Pain_VAS',
                      covariates=covariates,
                      exclude_outliers=outlier_threshold,
                      subgroup=subgroup)
    plt.savefig('my_finding.png', dpi=150, bbox_inches='tight')
    print("Figure saved as 'my_finding.png'")
    print("Even null results are worth presenting!")

### Submit Your Notebook

When you are finished with Day 1, download this notebook and email it to **caleb_mahlen@urmc.rochester.edu**.

To download from Colab: **File → Download → Download .ipynb**

Your pre-registration, code, and any output you have generated will all be saved in the file.