# üß† Connectopy Analysis - One-Click Demo

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Sean0418/connectopy/blob/main/notebooks/colab_demo.ipynb)

This notebook demonstrates the Connectopy analysis pipeline using **real HCP data**. Just click **Runtime ‚Üí Run all** to execute the entire analysis!

## What this notebook does:
1. üì¶ Clones the repository and installs the connectopy package
2. üìä Loads HCP connectome data (cognitive + brain features)
3. üî¨ Runs sexual dimorphism analysis
4. üç∑ **Alcohol Classification**: Predicts alcohol use disorder using 4 models:
   - Random Forest, EBM, SVM, Logistic Regression
   - Sex-stratified models (separate for Males/Females)
   - GridSearchCV for hyperparameter tuning
   - Class imbalance handling (sample weights, SMOTE, SelectKBest)
5. üîó **Mediation Analysis**: Tests brain network mediation of cognitive-alcohol relationships
6. üìà Visualizes the results

---


## Step 1: Setup Environment

First, we'll clone the repository and install dependencies. This takes about 2-3 minutes.


In [None]:
# ============================================================================
# REPRODUCIBILITY: Set global random seed FIRST
# ============================================================================
RANDOM_SEED = 42

import random

random.seed(RANDOM_SEED)

import numpy as np

np.random.seed(RANDOM_SEED)

# Set environment variable for sklearn parallelism reproducibility
import os

os.environ["PYTHONHASHSEED"] = str(RANDOM_SEED)

print(f"üé≤ Global random seed set to {RANDOM_SEED}")

# Install interpret FIRST (required for EBM) - must be before package import
%pip install -q interpret

# Clone or update the repository
import shutil
import sys

# Always start from /content
%cd /content

# Clean up any old directories
for old_dir in ["Brain-Connectome", "connectopy"]:
    if os.path.exists(old_dir):
        print(f"Removing old {old_dir} directory...")
        shutil.rmtree(old_dir)

# Clear any cached imports BEFORE cloning
for mod in list(sys.modules.keys()):
    if "connectopy" in mod:
        del sys.modules[mod]

print("Cloning repository...")
!git clone https://github.com/Sean0418/connectopy.git
%cd /content/connectopy

# Verify structure
print(f"Current directory: {os.getcwd()}")
print(f"Contents: {os.listdir('.')}")

# Install the package
%pip install -q -e .

# Add src to path (needed for editable install with src layout in Colab)
import sys

src_path = "/content/connectopy/src"
if src_path not in sys.path:
    sys.path.insert(0, src_path)

# Verify import works
from connectopy.analysis import DimorphismAnalysis

print(f"‚úÖ Import test passed: {DimorphismAnalysis}")

print("‚úÖ Setup complete!")

## Step 2: Load Data

We'll create sample data for demonstration. For your own analysis, you would mount Google Drive or upload your HCP data.


In [None]:
import re
from pathlib import Path

import numpy as np
import pandas as pd

# Ensure numpy seed is set after import
np.random.seed(RANDOM_SEED)


def to_display_label(col: str) -> str:
    """Convert column name to clearer display label.
    
    Mapping:
    - Struct_PC1 ‚Üí TNPCA_Struct_PC1
    - Func_PC1 ‚Üí TNPCA_Func_PC1  
    - Raw_Struct_PC1 ‚Üí PCA_Struct_PC1
    - Raw_Func_PC1 ‚Üí PCA_Func_PC1
    - VAE_* ‚Üí unchanged (already clear)
    """
    # TN-PCA: add TNPCA_ prefix to Struct_PC* and Func_PC* (but not Raw_ or VAE_)
    if re.match(r"^(Struct_PC|Func_PC)\d+$", col):
        return f"TNPCA_{col}"
    # PCA: rename Raw_Struct_PC* ‚Üí PCA_Struct_PC*, Raw_Func_PC* ‚Üí PCA_Func_PC*
    if col.startswith("Raw_Struct_PC"):
        return col.replace("Raw_Struct_PC", "PCA_Struct_PC")
    if col.startswith("Raw_Func_PC"):
        return col.replace("Raw_Func_PC", "PCA_Func_PC")
    # VAE and everything else: keep as-is
    return col


# Load the HCP data
data_path = Path("data/processed/full_data.csv")

if not data_path.exists():
    raise FileNotFoundError(
        f"Data file not found at {data_path}\n"
        "Please ensure the HCP data is available.\n"
        "Options:\n"
        "  1. Mount Google Drive with your data: from google.colab import drive; drive.mount('/content/drive')\n"
        "  2. Upload full_data.csv to data/processed/\n"
        "  3. Download HCP data from https://db.humanconnectome.org/"
    )

print("Loading HCP data...")
data = pd.read_csv(data_path)

# Create alcohol target from SSAGA_Alc_D4_Ab_Dx if not present
# HCP coding: 1 = No diagnosis, 5 = Yes diagnosis (alcohol abuse/dependence)
if "alc_y" not in data.columns:
    if "SSAGA_Alc_D4_Ab_Dx" in data.columns:
        data["alc_y"] = np.where(data["SSAGA_Alc_D4_Ab_Dx"] == 5, 1, 0).astype(int)
        print("Created alcohol target (alc_y) from SSAGA_Alc_D4_Ab_Dx")
    else:
        raise ValueError("No alcohol target column found. Need 'alc_y' or 'SSAGA_Alc_D4_Ab_Dx'")

print(f"\nüìä Dataset loaded: {data.shape[0]} subjects, {data.shape[1]} features")
print("\nGender distribution:")
print(data["Gender"].value_counts())
print("\nüç∑ Alcohol diagnosis (alc_y) distribution:")
print(data["alc_y"].value_counts())
print(f"   Positive rate: {data['alc_y'].mean():.1%}")
data.head()

## Step 3: Sexual Dimorphism Analysis

We'll analyze which brain connectivity features differ significantly between males and females.


In [None]:
from connectopy.analysis import DimorphismAnalysis

# Run dimorphism analysis
analysis = DimorphismAnalysis(data, gender_column="Gender")

# Analyze ALL connectome features (structural + functional from all variants)
# TN-PCA: Struct_PC*, Func_PC* | PCA: Raw_Struct_PC*, Raw_Func_PC* | VAE: VAE_Struct_LD*, VAE_Func_LD*
all_conn_features = []
for prefix in ["Struct_PC", "Func_PC", "Raw_Struct_PC", "Raw_Func_PC", "VAE_Struct_LD", "VAE_Func_LD"]:
    all_conn_features.extend([c for c in data.columns if c.startswith(prefix)])

print(f"Analyzing {len(all_conn_features)} connectome features for sexual dimorphism...")
results = analysis.analyze(feature_columns=all_conn_features)

# Add display labels for clearer output
results["Display_Label"] = results["Feature"].apply(to_display_label)

# Show results
n_significant = results["Significant"].sum()
print(f"\nüî¨ Found {n_significant} significant features (FDR < 0.05)")
print("\nüìã Top 10 features by effect size:")
results[["Display_Label", "Cohen_D", "P_Adjusted", "Significant"]].head(10)

In [None]:
import matplotlib.pyplot as plt

# Plot effect sizes with display labels
fig, ax = plt.subplots(figsize=(10, 8))

top20 = results.head(20)
colors = ["#1f77b4" if d < 0 else "#d62728" for d in top20["Cohen_D"]]

ax.barh(range(len(top20)), top20["Cohen_D"].values, color=colors)
ax.set_yticks(range(len(top20)))
ax.set_yticklabels(top20["Display_Label"])  # Use clearer display labels
ax.set_xlabel("Cohen's D (Effect Size)")
ax.set_title("Sexual Dimorphism: Top 20 Features by Effect Size")
ax.axvline(0, color="black", linestyle="-", linewidth=0.5)
ax.invert_yaxis()

plt.tight_layout()
plt.show()

print("\nüìä Blue bars: Feature is higher in females")
print("üìä Red bars: Feature is higher in males")

## Step 4: Machine Learning Classification

We'll train multiple classifiers to predict alcohol use disorder:
- **Random Forest (RF)**: Ensemble of decision trees
- **EBM**: Explainable Boosting Machine (interpretable)
- **SVM**: Support Vector Machine
- **Logistic**: Logistic Regression with L1/L2 regularization


In [None]:
from connectopy.models import (
    ConnectomeEBM,
    ConnectomeLogistic,
    ConnectomeRandomForest,
    ConnectomeSVM,
    get_cognitive_features,
    get_connectome_features,
)

# Get cognitive and connectome features for each variant
cog_features = get_cognitive_features(data, include_age=True)
tnpca_features = get_connectome_features(data, "tnpca")  # TN-PCA: topology-preserving dim reduction
vae_features = get_connectome_features(data, "vae")      # VAE: variational autoencoder latent dims
pca_features = get_connectome_features(data, "pca")      # PCA: standard principal components

# Define feature sets to train on
feature_sets = {
    "TNPCA": cog_features + tnpca_features,
    "PCA": cog_features + pca_features,
    "VAE": cog_features + vae_features,
    "ALL": cog_features + tnpca_features + vae_features + pca_features,
}

# Remove empty feature sets (e.g., if VAE data not available)
feature_sets = {k: v for k, v in feature_sets.items() if len(v) > len(cog_features)}

print("üìä Feature Sets:")
print(f"   Cognitive: {len(cog_features)}")
for name, feats in feature_sets.items():
    conn_count = len(feats) - len(cog_features)
    print(f"   {name}: {len(cog_features)} cog + {conn_count} conn = {len(feats)} total")

# Store results for all models
all_results = []
rf_models = {}

# Train RF for each feature set √ó sex combination
for feat_name, feature_cols in feature_sets.items():
    for sex in ["M", "F"]:
        df_sex = data[data["Gender"] == sex].copy()
        sub = df_sex[feature_cols + ["alc_y"]].dropna()

        if len(sub) < 30:
            continue

        X = sub[feature_cols].values
        y = sub["alc_y"].astype(int).values

        if len(np.unique(y)) < 2:
            continue

        print(f"\n{'='*50}")
        print(f"üî¨ RF: {feat_name} features, Sex={sex}")
        print(f"{'='*50}")
        print(f"   Features: {len(feature_cols)}, Samples: {len(y)}, Positive: {y.mean():.1%}")

        rf = ConnectomeRandomForest(n_estimators=200, class_weight="balanced", random_state=RANDOM_SEED, n_jobs=-1)
        metrics = rf.fit_with_cv(
            X, y,
            feature_names=feature_cols,
            handle_imbalance=True,
            param_grid={"rf__n_estimators": [100, 200], "rf__max_depth": [None, 10]},
        )

        metrics["sex"] = sex
        metrics["model"] = "RF"
        metrics["features"] = feat_name
        all_results.append(metrics)
        rf_models[(feat_name, sex)] = rf

        print(f"   ‚úÖ Test AUC: {metrics['test_auc']:.3f}, Bal Acc: {metrics['test_bal_acc']:.3f}")

In [None]:
# Plot RF feature importance for best model (ALL features) per sex
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

for idx, sex in enumerate(["M", "F"]):
    # Use "ALL" features model if available, else first available
    key = ("ALL", sex) if ("ALL", sex) in rf_models else None
    if key is None:
        for k in rf_models:
            if k[1] == sex:
                key = k
                break
    if key is None:
        continue

    rf = rf_models[key]
    importance = rf.get_top_features(n=15)
    top15 = importance.head(15).iloc[::-1].copy()

    # Apply display labels for cleaner feature names
    top15["Display_Label"] = top15["Feature"].apply(to_display_label)

    ax = axes[idx]
    colors = plt.colormaps["viridis"](np.linspace(0.3, 0.9, len(top15)))
    ax.barh(top15["Display_Label"], top15["Importance"], color=colors)
    ax.set_xlabel("Importance")
    ax.set_title(f"RF Top 15 Features ({key[0]}, {sex}) - Alcohol Classification")

plt.tight_layout()
plt.show()

## Step 5: Additional Models (EBM, SVM, Logistic)

Train additional classifiers for comparison:
- **EBM**: Interpretable glass-box model
- **SVM**: Support Vector Machine with RBF kernel
- **Logistic**: Regularized logistic regression (L1/L2)


In [None]:
# Store models
ebm_models = {}
svm_models = {}
logistic_models = {}

# Use only "ALL" feature set for speed (comment out to train all)
train_feature_sets = {"ALL": feature_sets["ALL"]} if "ALL" in feature_sets else feature_sets

# Train EBM, SVM, Logistic for each feature set √ó sex
for feat_name, feature_cols in train_feature_sets.items():
    for sex in ["M", "F"]:
        df_sex = data[data["Gender"] == sex].copy()
        sub = df_sex[feature_cols + ["alc_y"]].dropna()

        if len(sub) < 30:
            continue

        X = sub[feature_cols].values
        y = sub["alc_y"].astype(int).values

        if len(np.unique(y)) < 2:
            continue

        print(f"\n{'='*60}")
        print(f"üìä Training {feat_name} features, Sex={sex}")
        print(f"   Features: {len(feature_cols)}, Samples: {len(y)}, Positive: {y.mean():.1%}")
        print(f"{'='*60}")

        # EBM
        print("üî¨ EBM...", end=" ")
        ebm = ConnectomeEBM(max_bins=32, learning_rate=0.01, max_leaves=3, interactions=0, random_state=RANDOM_SEED)
        ebm_metrics = ebm.fit_with_cv(X, y, feature_names=feature_cols, handle_imbalance=True, param_grid={"max_leaves": [2, 3]})
        ebm_metrics.update({"sex": sex, "model": "EBM", "features": feat_name})
        all_results.append(ebm_metrics)
        ebm_models[(feat_name, sex)] = ebm
        print(f"AUC={ebm_metrics['test_auc']:.3f}")

        # SVM
        print("üî¨ SVM...", end=" ")
        svm = ConnectomeSVM(random_state=RANDOM_SEED)
        svm_metrics = svm.fit_with_cv(X, y, feature_names=feature_cols, param_grid={"svm__C": [1, 10], "svm__kernel": ["rbf"]}, select_k_best=50)
        svm_metrics.update({"sex": sex, "model": "SVM", "features": feat_name})
        all_results.append(svm_metrics)
        svm_models[(feat_name, sex)] = svm
        print(f"AUC={svm_metrics['test_auc']:.3f}")

        # Logistic
        print("üî¨ Logistic...", end=" ")
        logistic = ConnectomeLogistic(random_state=RANDOM_SEED)
        log_metrics = logistic.fit_with_cv(X, y, feature_names=feature_cols, param_grid=[{"logistic__C": [0.1, 1], "logistic__penalty": ["l2"], "logistic__solver": ["lbfgs"]}], select_k_best=50)
        log_metrics.update({"sex": sex, "model": "Logistic", "features": feat_name})
        all_results.append(log_metrics)
        logistic_models[(feat_name, sex)] = logistic
        print(f"AUC={log_metrics['test_auc']:.3f}, NonZero={log_metrics['n_nonzero_coefs']}")

# Summary comparison table
print("\n" + "="*70)
print("üìä MODEL COMPARISON SUMMARY (by Feature Set)")
print("="*70)
results_df = pd.DataFrame(all_results)
summary_cols = ["model", "features", "sex", "test_auc", "test_bal_acc"]
summary_cols = [c for c in summary_cols if c in results_df.columns]
print(results_df[summary_cols].sort_values(["features", "model", "sex"]).to_string(index=False))

# Best model per feature set
print("\n" + "="*70)
print("üèÜ BEST MODEL PER FEATURE SET")
print("="*70)
for feat in results_df["features"].unique():
    subset = results_df[results_df["features"] == feat]
    best = subset.loc[subset["test_auc"].idxmax()]
    print(f"   {feat}: {best['model']} ({best['sex']}) - AUC={best['test_auc']:.3f}")

## Step 6: Mediation Analysis

Test whether brain networks **mediate** the relationship between cognitive traits and alcohol outcomes, stratified by sex.

**Research Question**: *Do the top-ranked features from our ML models show mediation effects? Does brain connectivity mediate the cognitive-alcohol relationship differently by sex?*

**Approach**: 
1. Use the **best performing model** (by AUC) to extract **top 3 cognitive** and **top 3 brain** features per variant
2. Test all combinations to find significant mediation pathways
3. Compare effects across variants and sexes

```
Top Cognitive Features (X) ‚Üí Top Brain Features (M) ‚Üí Alcohol (Y)
                                      ‚Üë
                                 Sex (moderator)
```


In [None]:
from connectopy.analysis import SexStratifiedMediation

# Find the BEST performing model (by AUC) to extract TOP N features
# Test multiple combinations to find significant mediation pathways

N_TOP = 3  # Number of top features to test per category

print(f"üìä Selecting best model (by AUC) and extracting top {N_TOP} features...")
print()

# Find best overall model by AUC
best_overall = max(all_results, key=lambda x: x["test_auc"])
model_type = best_overall["model"]
feat_set = best_overall["features"]
sex = best_overall["sex"]

print(f"üèÜ Best model: {model_type} ({feat_set}, {sex}) with AUC={best_overall['test_auc']:.3f}")

# Get the actual model object
if model_type == "RF":
    best_model = rf_models.get((feat_set, sex))
else:
    best_model = ebm_models.get((feat_set, sex))

importance = best_model.get_top_features(n=100)  # Get many to find top N of each type

# Find top N cognitive features
top_cog_features = []
for feat in importance["Feature"]:
    if feat in cog_features and feat not in top_cog_features:
        top_cog_features.append(feat)
        if len(top_cog_features) >= N_TOP:
            break

# Find top N brain features from each variant
top_brain_features = {}
for variant, variant_feats in [("TNPCA", tnpca_features), ("PCA", pca_features), ("VAE", vae_features)]:
    top_brain_features[variant] = []
    for feat in importance["Feature"]:
        if feat in variant_feats and feat not in top_brain_features[variant]:
            top_brain_features[variant].append(feat)
            if len(top_brain_features[variant]) >= N_TOP:
                break

print(f"\nüìã Top {N_TOP} cognitive features: {top_cog_features}")
for var, feats in top_brain_features.items():
    print(f"üìã Top {N_TOP} {var} brain features: {[to_display_label(f) for f in feats]}")

alcohol_col = "alc_y"

In [None]:
# Run mediation for ALL COMBINATIONS of top cognitive √ó brain features
print(f"üî¨ Testing all combinations: {len(top_cog_features)} cog √ó {N_TOP} brain √ó {len(top_brain_features)} variants...")
print(f"   Total tests: {len(top_cog_features) * N_TOP * len(top_brain_features)}")
print()

mediation_results = []
mediation = SexStratifiedMediation(n_bootstrap=500, random_state=RANDOM_SEED)  # Fewer bootstraps for speed

for variant, brain_feats in top_brain_features.items():
    for cog_col in top_cog_features:
        for brain_col in brain_feats:
            result = mediation.fit(
                data=data,
                cognitive_col=cog_col,
                brain_col=brain_col,
                alcohol_col=alcohol_col,
                sex_col="Gender",
            )

            mediation_results.append({
                "variant": variant,
                "cognitive": cog_col,
                "brain": brain_col,
                "brain_label": to_display_label(brain_col),
                "male_effect": result.male.indirect_effect,
                "male_sig": result.male.significant,
                "female_effect": result.female.indirect_effect,
                "female_sig": result.female.significant,
                "diff": result.difference,
                "diff_sig": result.diff_significant,
                "result": result,
            })

med_df = pd.DataFrame(mediation_results)

# Show summary
print("="*70)
print("üìã MEDIATION RESULTS SUMMARY")
print("="*70)

# Count significant findings
n_male_sig = med_df["male_sig"].sum()
n_female_sig = med_df["female_sig"].sum()
n_diff_sig = med_df["diff_sig"].sum()

print("\nüìä Significant mediations found:")
print(f"   Males: {n_male_sig}/{len(med_df)} pathways")
print(f"   Females: {n_female_sig}/{len(med_df)} pathways")
print(f"   Sex differences: {n_diff_sig}/{len(med_df)} pathways")

# Show top pathways by absolute effect size
print("\nüèÜ Top 10 pathways by effect size (Males):")
top_male = med_df.nlargest(10, "male_effect")[["cognitive", "brain_label", "male_effect", "male_sig"]]
top_male["male_effect"] = top_male["male_effect"].apply(lambda x: f"{x:+.4f}")
top_male["male_sig"] = top_male["male_sig"].apply(lambda x: "‚úÖ" if x else "")
print(top_male.to_string(index=False))

print("\nüèÜ Top 10 pathways by effect size (Females):")
top_female = med_df.nlargest(10, "female_effect")[["cognitive", "brain_label", "female_effect", "female_sig"]]
top_female["female_effect"] = top_female["female_effect"].apply(lambda x: f"{x:+.4f}")
top_female["female_sig"] = top_female["female_sig"].apply(lambda x: "‚úÖ" if x else "")
print(top_female.to_string(index=False))

In [None]:
# Visualize: Bar chart comparing mediation effects across all pathways
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Prepare data for plotting - group by variant
for idx, (sex, sex_label) in enumerate([("male", "Males"), ("female", "Females")]):
    ax = axes[idx]

    # Get data sorted by effect size
    col = f"{sex}_effect"
    sig_col = f"{sex}_sig"
    plot_df = med_df.sort_values(col, ascending=True).copy()

    # Create labels
    plot_df["label"] = plot_df["cognitive"].str[:12] + " ‚Üí " + plot_df["brain_label"].str[:15]

    # Color by variant
    variant_colors = {"TNPCA": "#1f77b4", "PCA": "#ff7f0e", "VAE": "#2ca02c"}
    colors = [variant_colors.get(v, "gray") for v in plot_df["variant"]]

    # Add star for significant
    labels = [f"{l} ‚úì" if s else l for l, s in zip(plot_df["label"], plot_df[sig_col])]

    bars = ax.barh(range(len(plot_df)), plot_df[col], color=colors, edgecolor="black", linewidth=0.5)
    ax.set_yticks(range(len(plot_df)))
    ax.set_yticklabels(labels, fontsize=8)
    ax.set_xlabel("Indirect Effect (Mediation)")
    ax.set_title(f"{sex_label}: Mediation Effects\n(‚úì = significant)", fontsize=12)
    ax.axvline(0, color="black", linestyle="-", linewidth=0.5)

# Add legend
from matplotlib.patches import Patch

legend_elements = [Patch(facecolor=c, label=v) for v, c in variant_colors.items()]
axes[1].legend(handles=legend_elements, loc="lower right", title="Variant")

plt.tight_layout()
plt.show()

# Heatmap of effects by cognitive √ó variant
print("\nüìä Mediation effects by Cognitive Feature √ó Variant:")
pivot_male = med_df.pivot_table(values="male_effect", index="cognitive", columns="variant", aggfunc="mean")
pivot_female = med_df.pivot_table(values="female_effect", index="cognitive", columns="variant", aggfunc="mean")

fig, axes = plt.subplots(1, 2, figsize=(12, 4))
for idx, (pivot, title) in enumerate([(pivot_male, "Males"), (pivot_female, "Females")]):
    ax = axes[idx]
    im = ax.imshow(pivot.values, cmap="RdBu_r", aspect="auto", vmin=-0.001, vmax=0.001)
    ax.set_xticks(range(len(pivot.columns)))
    ax.set_xticklabels(pivot.columns)
    ax.set_yticks(range(len(pivot.index)))
    ax.set_yticklabels(pivot.index, fontsize=9)
    ax.set_title(f"{title}: Mean Indirect Effect")
    plt.colorbar(im, ax=ax, label="Effect")

plt.tight_layout()
plt.show()

## üìã Summary

This notebook demonstrated the Connectopy analysis pipeline:

1. **Data Loading**: Loaded HCP connectome data (TN-PCA, PCA, VAE features)
2. **Dimorphism Analysis**: Identified sexually dimorphic brain connectivity patterns across all variants
3. **ML Classification**: Trained 4 classifiers for alcohol prediction:
   - **Random Forest (RF)**: Ensemble of decision trees
   - **EBM**: Explainable Boosting Machine (interpretable)
   - **SVM**: Support Vector Machine with feature selection
   - **Logistic**: Regularized logistic regression (L1/L2)
4. **Mediation Analysis**: Tested sex-stratified mediation using top features from best model
5. **Visualization**: Created publication-ready comparison plots

### Research Questions Addressed

> *1. Which connectome features show sexual dimorphism?*
> *2. Can we predict alcohol use disorder from cognitive + connectome features?*
> *3. Do brain networks mediate cognitive-alcohol relationships differently by sex and connectome representation?*

### Next Steps

- **Use your own data**: Upload HCP data to Google Drive and mount it
- **Run full pipeline**: Use `!python Runners/run_pipeline.py` for complete analysis
- **Use Docker**: `docker pull ghcr.io/sean0418/connectopy:latest`

### Links

- üì¶ [GitHub Repository](https://github.com/Sean0418/connectopy)
- üê≥ [Docker Image](https://ghcr.io/sean0418/connectopy)


In [None]:
print("\n" + "=" * 60)
print("üéâ Analysis Complete!")
print("=" * 60)
print(f"\nüìä Analyzed {data.shape[0]} subjects")
print(f"üî¨ Found {n_significant} significant dimorphic features")

# Show best model results
if all_results:
    best_result = max(all_results, key=lambda x: x.get("test_auc", 0))
    print("\nüç∑ Alcohol Classification Results:")
    print(f"   Best model: {best_result['model']} ({best_result['features']}, {best_result['sex']})")
    print(f"   Test AUC: {best_result['test_auc']:.3f}")
    print(f"   Test Balanced Accuracy: {best_result['test_bal_acc']:.3f}")

# Show mediation summary
print(f"\nüîó Mediation Results ({len(med_df)} pathways tested):")
print(f"   Significant in Males: {med_df['male_sig'].sum()}")
print(f"   Significant in Females: {med_df['female_sig'].sum()}")
print(f"   Significant sex differences: {med_df['diff_sig'].sum()}")

# Show strongest pathway
if len(med_df) > 0:
    strongest = med_df.loc[med_df["male_effect"].abs().idxmax()]
    print(f"\n   Strongest pathway: {strongest['cognitive']} ‚Üí {strongest['brain_label']}")
    print(f"   Male effect: {strongest['male_effect']:+.4f}, Female effect: {strongest['female_effect']:+.4f}")

print("\n‚≠ê Star us on GitHub: https://github.com/Sean0418/connectopy")