# üìä AI-Powered Neurodiagnostic Data Analysis

This notebook uses data-analysis-crow and custom Python code to analyze:
- **EEG/MEG data** (epilepsy, sleep disorders, brain-computer interfaces)
- **EMG/NCS data** (neuromuscular disorders)
- **MRI volumetric data** (brain atrophy, lesion load)
- **Clinical trial data** (treatment outcomes, biomarkers)
- **Neuropsychological test scores** (cognitive assessments)

---

## Step 1: Install Required Packages

In [None]:
# Core data science packages
!pip install -q pandas numpy scipy matplotlib seaborn

# Neuroimaging and neurophysiology
!pip install -q mne nibabel nilearn

# Machine learning
!pip install -q scikit-learn xgboost

# Statistical analysis
!pip install -q pingouin statsmodels

# Optional: data-analysis-crow (if you want AI-assisted analysis)
# Note: Requires significant setup, shown for reference
# !git clone https://github.com/Future-House/data-analysis-crow.git

print("‚úÖ All packages installed!")

## Step 2: Import Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import pingouin as pg

# Neuroimaging libraries
import mne
from nilearn import plotting, datasets

# Machine learning
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix

# Set visualization defaults
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)

print("‚úÖ Libraries imported successfully!")

## Example 1: EEG Data Analysis (Epilepsy Seizure Detection)

In [None]:
print("üß† EEG ANALYSIS: Epilepsy Seizure Detection\n")
print("=" * 80)

# Load sample EEG data (using MNE's sample dataset)
sample_data_folder = mne.datasets.sample.data_path()
sample_data_raw_file = sample_data_folder / 'MEG' / 'sample' / 'sample_audvis_raw.fif'

# Read raw EEG data
raw = mne.io.read_raw_fif(sample_data_raw_file, preload=True, verbose=False)

# Filter to get EEG channels only
raw.pick_types(meg=False, eeg=True, eog=False, exclude='bads')

# Apply bandpass filter (1-40 Hz for typical EEG analysis)
raw.filter(1., 40., fir_design='firwin')

print(f"‚úÖ Loaded EEG data:")
print(f"   Channels: {len(raw.ch_names)}")
print(f"   Sampling rate: {raw.info['sfreq']} Hz")
print(f"   Duration: {raw.times[-1]:.1f} seconds")

# Plot raw EEG
raw.plot(duration=5, n_channels=10, scalings='auto', title='Raw EEG Data')
plt.show()

# Compute power spectral density
spectrum = raw.compute_psd(fmax=40)
spectrum.plot(picks='eeg', average=True)
plt.title('EEG Power Spectral Density')
plt.show()

print("\nüìä Analysis complete!")

## Example 2: Clinical Trial Data Analysis

In [None]:
print("üìà CLINICAL TRIAL ANALYSIS: AD Drug Efficacy\n")
print("=" * 80)

# Create synthetic clinical trial data (replace with your actual data)
np.random.seed(42)
n_patients = 200

clinical_data = pd.DataFrame({
    'patient_id': range(1, n_patients + 1),
    'age': np.random.normal(72, 8, n_patients),
    'sex': np.random.choice(['M', 'F'], n_patients),
    'apoe4_status': np.random.choice(['carrier', 'non-carrier'], n_patients, p=[0.4, 0.6]),
    'treatment': np.random.choice(['Drug', 'Placebo'], n_patients),
    'baseline_mmse': np.random.normal(24, 2, n_patients),
    'month_6_mmse': None,
    'month_12_mmse': None,
    'baseline_csf_abeta': np.random.normal(450, 100, n_patients),
    'month_12_csf_abeta': None,
    'baseline_csf_tau': np.random.normal(380, 80, n_patients),
    'month_12_csf_tau': None,
})

# Simulate treatment effects
for idx, row in clinical_data.iterrows():
    if row['treatment'] == 'Drug':
        # Drug shows smaller decline
        clinical_data.at[idx, 'month_6_mmse'] = row['baseline_mmse'] - np.random.normal(0.8, 0.3)
        clinical_data.at[idx, 'month_12_mmse'] = row['baseline_mmse'] - np.random.normal(1.5, 0.5)
        clinical_data.at[idx, 'month_12_csf_abeta'] = row['baseline_csf_abeta'] + np.random.normal(80, 30)
        clinical_data.at[idx, 'month_12_csf_tau'] = row['baseline_csf_tau'] - np.random.normal(50, 20)
    else:
        # Placebo shows larger decline
        clinical_data.at[idx, 'month_6_mmse'] = row['baseline_mmse'] - np.random.normal(1.5, 0.4)
        clinical_data.at[idx, 'month_12_mmse'] = row['baseline_mmse'] - np.random.normal(3.0, 0.7)
        clinical_data.at[idx, 'month_12_csf_abeta'] = row['baseline_csf_abeta'] + np.random.normal(20, 30)
        clinical_data.at[idx, 'month_12_csf_tau'] = row['baseline_csf_tau'] + np.random.normal(30, 20)

# Calculate change scores
clinical_data['mmse_change_12m'] = clinical_data['month_12_mmse'] - clinical_data['baseline_mmse']
clinical_data['abeta_change_12m'] = clinical_data['month_12_csf_abeta'] - clinical_data['baseline_csf_abeta']
clinical_data['tau_change_12m'] = clinical_data['month_12_csf_tau'] - clinical_data['baseline_csf_tau']

print("\nüìä Dataset Overview:")
print(clinical_data.describe())

print("\n" + "=" * 80)

## Statistical Analysis: Treatment Efficacy

In [None]:
print("üìä STATISTICAL ANALYSIS: Primary Endpoint (MMSE Change)\n")
print("=" * 80)

# Group comparison
drug_group = clinical_data[clinical_data['treatment'] == 'Drug']['mmse_change_12m']
placebo_group = clinical_data[clinical_data['treatment'] == 'Placebo']['mmse_change_12m']

# T-test
t_stat, p_value = stats.ttest_ind(drug_group, placebo_group)

print(f"\nDrug Group (n={len(drug_group)}):")
print(f"   Mean MMSE change: {drug_group.mean():.2f} ¬± {drug_group.std():.2f}")

print(f"\nPlacebo Group (n={len(placebo_group)}):")
print(f"   Mean MMSE change: {placebo_group.mean():.2f} ¬± {placebo_group.std():.2f}")

print(f"\nüìà Independent T-test Results:")
print(f"   t-statistic: {t_stat:.3f}")
print(f"   p-value: {p_value:.4f}")

if p_value < 0.05:
    print(f"   ‚úÖ Statistically significant difference (p < 0.05)")
    effect_size = (drug_group.mean() - placebo_group.mean()) / np.sqrt((drug_group.var() + placebo_group.var()) / 2)
    print(f"   Cohen's d: {effect_size:.3f}")
else:
    print(f"   ‚ùå No statistically significant difference (p ‚â• 0.05)")

# Visualize results
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Box plot
clinical_data.boxplot(column='mmse_change_12m', by='treatment', ax=axes[0])
axes[0].set_title('MMSE Change at 12 Months by Treatment')
axes[0].set_xlabel('Treatment Group')
axes[0].set_ylabel('MMSE Change Score')
plt.suptitle('')

# Violin plot with swarmplot
sns.violinplot(data=clinical_data, x='treatment', y='mmse_change_12m', ax=axes[1])
sns.swarmplot(data=clinical_data, x='treatment', y='mmse_change_12m', 
              color='black', alpha=0.3, size=3, ax=axes[1])
axes[1].set_title('Distribution of MMSE Changes')
axes[1].set_xlabel('Treatment Group')
axes[1].set_ylabel('MMSE Change Score')

plt.tight_layout()
plt.show()

print("\n" + "=" * 80)

## Biomarker Analysis: CSF AŒ≤ and Tau

In [None]:
print("üß™ BIOMARKER ANALYSIS: CSF AŒ≤ and Tau\n")
print("=" * 80)

# Multiple endpoints analysis
biomarkers = ['abeta_change_12m', 'tau_change_12m']

results_summary = []

for biomarker in biomarkers:
    drug_bio = clinical_data[clinical_data['treatment'] == 'Drug'][biomarker]
    placebo_bio = clinical_data[clinical_data['treatment'] == 'Placebo'][biomarker]
    
    t_stat, p_val = stats.ttest_ind(drug_bio, placebo_bio)
    
    results_summary.append({
        'Biomarker': biomarker.replace('_change_12m', '').upper(),
        'Drug Mean': f"{drug_bio.mean():.1f}",
        'Placebo Mean': f"{placebo_bio.mean():.1f}",
        't-stat': f"{t_stat:.3f}",
        'p-value': f"{p_val:.4f}",
        'Significant': 'Yes' if p_val < 0.05 else 'No'
    })

results_df = pd.DataFrame(results_summary)
print("\nüìä Biomarker Results Summary:\n")
print(results_df.to_string(index=False))

# Correlation analysis
print("\n\nüîó CORRELATION ANALYSIS\n")
print("=" * 80)

# Correlation between MMSE change and biomarker changes
corr_abeta = stats.pearsonr(clinical_data['mmse_change_12m'], clinical_data['abeta_change_12m'])
corr_tau = stats.pearsonr(clinical_data['mmse_change_12m'], clinical_data['tau_change_12m'])

print(f"\nMMSE Change vs CSF AŒ≤ Change:")
print(f"   r = {corr_abeta[0]:.3f}, p = {corr_abeta[1]:.4f}")

print(f"\nMMSE Change vs CSF Tau Change:")
print(f"   r = {corr_tau[0]:.3f}, p = {corr_tau[1]:.4f}")

# Scatter plot with regression line
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

sns.regplot(data=clinical_data, x='abeta_change_12m', y='mmse_change_12m', 
            scatter_kws={'alpha':0.5}, ax=axes[0])
axes[0].set_title(f'MMSE vs CSF AŒ≤ Change (r={corr_abeta[0]:.3f})')
axes[0].set_xlabel('CSF AŒ≤ Change (pg/mL)')
axes[0].set_ylabel('MMSE Change')

sns.regplot(data=clinical_data, x='tau_change_12m', y='mmse_change_12m', 
            scatter_kws={'alpha':0.5}, color='orange', ax=axes[1])
axes[1].set_title(f'MMSE vs CSF Tau Change (r={corr_tau[0]:.3f})')
axes[1].set_xlabel('CSF Tau Change (pg/mL)')
axes[1].set_ylabel('MMSE Change')

plt.tight_layout()
plt.show()

print("\n" + "=" * 80)

## Subgroup Analysis: APOE4 Carriers

In [None]:
print("üß¨ SUBGROUP ANALYSIS: APOE4 Status\n")
print("=" * 80)

# Two-way ANOVA: Treatment x APOE4 status
anova_results = pg.anova(data=clinical_data, dv='mmse_change_12m', 
                         between=['treatment', 'apoe4_status'])

print("\nüìä Two-Way ANOVA Results:\n")
print(anova_results.to_string())

# Subgroup means
subgroup_means = clinical_data.groupby(['treatment', 'apoe4_status'])['mmse_change_12m'].agg(['mean', 'std', 'count'])

print("\n\nüìà Subgroup Means:\n")
print(subgroup_means)

# Visualization
plt.figure(figsize=(10, 6))
sns.barplot(data=clinical_data, x='apoe4_status', y='mmse_change_12m', hue='treatment')
plt.title('Treatment Effect by APOE4 Status')
plt.ylabel('MMSE Change at 12 Months')
plt.xlabel('APOE4 Status')
plt.legend(title='Treatment')
plt.axhline(y=0, color='black', linestyle='--', alpha=0.3)
plt.tight_layout()
plt.show()

print("\n" + "=" * 80)

## Machine Learning: Predictive Modeling

In [None]:
print("ü§ñ MACHINE LEARNING: Response Prediction\n")
print("=" * 80)

# Define responders (< median decline in MMSE)
median_decline = clinical_data['mmse_change_12m'].median()
clinical_data['responder'] = (clinical_data['mmse_change_12m'] > median_decline).astype(int)

# Prepare features
feature_cols = ['age', 'baseline_mmse', 'baseline_csf_abeta', 'baseline_csf_tau']
X = clinical_data[feature_cols].copy()
X['sex_encoded'] = (clinical_data['sex'] == 'M').astype(int)
X['apoe4_encoded'] = (clinical_data['apoe4_status'] == 'carrier').astype(int)
X['treatment_encoded'] = (clinical_data['treatment'] == 'Drug').astype(int)

y = clinical_data['responder']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train Random Forest classifier
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)

# Predictions
y_pred = rf_model.predict(X_test)

# Evaluation
print("\nüìä Model Performance:\n")
print(classification_report(y_test, y_pred, target_names=['Non-responder', 'Responder']))

# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=['Non-responder', 'Responder'],
            yticklabels=['Non-responder', 'Responder'])
plt.title('Confusion Matrix: Treatment Response Prediction')
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()

# Feature importance
feature_importance = pd.DataFrame({
    'Feature': X.columns,
    'Importance': rf_model.feature_importances_
}).sort_values('Importance', ascending=False)

print("\nüîç Feature Importance:\n")
print(feature_importance.to_string(index=False))

plt.figure(figsize=(10, 6))
sns.barplot(data=feature_importance, x='Importance', y='Feature')
plt.title('Predictive Features for Treatment Response')
plt.xlabel('Importance Score')
plt.tight_layout()
plt.show()

print("\n" + "=" * 80)

## Save Results and Generate Report

In [None]:
from google.colab import drive
import datetime

# Mount Google Drive
drive.mount('/content/drive')

# Create results directory
results_dir = '/content/drive/MyDrive/Neurodiagnostic_Analysis_Results'
!mkdir -p "{results_dir}"

# Save processed data
timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
clinical_data.to_csv(f"{results_dir}/clinical_data_processed_{timestamp}.csv", index=False)

# Generate text report
report_file = f"{results_dir}/analysis_report_{timestamp}.txt"

with open(report_file, 'w') as f:
    f.write("=" * 80 + "\n")
    f.write("NEURODIAGNOSTIC DATA ANALYSIS REPORT\n")
    f.write(f"Generated: {datetime.datetime.now()}\n")
    f.write("=" * 80 + "\n\n")
    
    f.write("STUDY OVERVIEW\n")
    f.write("-" * 80 + "\n")
    f.write(f"Total patients: {len(clinical_data)}\n")
    f.write(f"Drug group: {len(clinical_data[clinical_data['treatment']=='Drug'])}\n")
    f.write(f"Placebo group: {len(clinical_data[clinical_data['treatment']=='Placebo'])}\n\n")
    
    f.write("PRIMARY ENDPOINT (MMSE Change)\n")
    f.write("-" * 80 + "\n")
    f.write(f"Drug: {drug_group.mean():.2f} ¬± {drug_group.std():.2f}\n")
    f.write(f"Placebo: {placebo_group.mean():.2f} ¬± {placebo_group.std():.2f}\n")
    f.write(f"p-value: {p_value:.4f}\n\n")
    
    f.write("BIOMARKER RESULTS\n")
    f.write("-" * 80 + "\n")
    f.write(results_df.to_string(index=False))
    f.write("\n\n")
    
    f.write("FEATURE IMPORTANCE (ML Model)\n")
    f.write("-" * 80 + "\n")
    f.write(feature_importance.to_string(index=False))

print(f"‚úÖ Results saved to Google Drive:")
print(f"   - Data: {results_dir}/clinical_data_processed_{timestamp}.csv")
print(f"   - Report: {report_file}")
print(f"\nüìÅ Check your Google Drive: Neurodiagnostic_Analysis_Results folder")

## üí° Customize for Your Own Data

### Upload Your Own Dataset:
```python
from google.colab import files
uploaded = files.upload()

# Read your CSV file
your_data = pd.read_csv('your_filename.csv')
```

### Common Neurodiagnostic Data Types:

**1. EEG Data:**
- Load with MNE: `mne.io.read_raw_edf('your_file.edf')`
- Analyze seizures, sleep stages, brain connectivity

**2. MRI Volumetric Data:**
- Load with nibabel: `nib.load('T1.nii.gz')`
- Analyze hippocampal volumes, cortical thickness, lesion load

**3. Clinical Assessments:**
- MMSE, MoCA, ADAS-Cog scores
- Unified Parkinson's Disease Rating Scale (UPDRS)
- Expanded Disability Status Scale (EDSS) for MS

**4. Biomarker Data:**
- CSF: AŒ≤42, tau, p-tau, NfL
- Blood: NfL, GFAP, p-tau217
- Imaging: Amyloid PET, Tau PET standardized uptake value ratios (SUVRs)

### Next Steps:
1. Replace synthetic data with your actual clinical data
2. Adjust analysis based on your research questions
3. Combine with paper-qa for literature context
4. Use Robin to identify new therapeutic targets based on your findings