# 23. DP-SGD Baseline Experiments

DP-SGD (Abadi et al., 2016) trains a VAE from scratch on the retain set
with differential privacy guarantees via per-sample gradient clipping and
Gaussian noise injection. This provides a formal privacy baseline:
the forget set was never in training, so membership inference should be
at chance. The utility cost of DP noise establishes a lower bound on
what privacy guarantees cost.

We test epsilon in {1, 10, 50} to measure the privacy-utility tradeoff.
Uses Opacus for DP-SGD implementation.

In [1]:
import sys
sys.path.insert(0, '../src')

import json
import numpy as np
import torch
from pathlib import Path
import time

from train_dp import train_dp

DATA_PATH = '../data/adata_processed.h5ad'
SPLIT_PATH = '../outputs/p1/split_structured.json'
OUTPUT_BASE = Path('../outputs/p2/dp_sgd')
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f'Device: {DEVICE}')

Device: cpu


## 1. Train DP-SGD VAE at Multiple Epsilon Values

Parameters:
- target_delta = 1e-5 (< 1/n for n~28k retain samples)
- max_grad_norm = 1.0
- 50 epochs maximum, patience=15
- Opacus auto-calibrates noise_multiplier for target epsilon

Note: DP-SGD trains from scratch (no baseline checkpoint needed).
This is slower than post-hoc methods but provides formal guarantees.

In [2]:
EPSILONS = [1.0, 10.0, 50.0]
SEEDS = [42, 123, 456]

results = {}

# Epsilon sweep with seed=42
for eps in EPSILONS:
    out_dir = OUTPUT_BASE / f'eps{eps}_seed42'
    print(f'\n{"="*60}')
    print(f'DP-SGD epsilon={eps}, seed=42')
    print(f'{"="*60}')
    
    t0 = time.time()
    ckpt_path = train_dp(
        data_path=DATA_PATH,
        split_path=SPLIT_PATH,
        output_dir=str(out_dir),
        target_epsilon=eps,
        target_delta=1e-5,
        max_grad_norm=1.0,
        n_epochs=50,
        lr=1e-3,
        batch_size=256,
        latent_dim=32,
        hidden_dims=[1024, 512, 128],
        patience=15,
        seed=42,
    )
    elapsed = time.time() - t0
    results[f'eps{eps}_seed42'] = {'path': str(ckpt_path), 'time': elapsed}
    print(f'Done in {elapsed:.1f}s')

# Multi-seed for eps=10 (our main comparison point)
for seed in [123, 456]:
    out_dir = OUTPUT_BASE / f'eps10.0_seed{seed}'
    print(f'\n{"="*60}')
    print(f'DP-SGD epsilon=10.0, seed={seed}')
    print(f'{"="*60}')
    
    t0 = time.time()
    ckpt_path = train_dp(
        data_path=DATA_PATH,
        split_path=SPLIT_PATH,
        output_dir=str(out_dir),
        target_epsilon=10.0,
        target_delta=1e-5,
        max_grad_norm=1.0,
        n_epochs=50,
        lr=1e-3,
        batch_size=256,
        latent_dim=32,
        hidden_dims=[1024, 512, 128],
        patience=15,
        seed=seed,
    )
    elapsed = time.time() - t0
    results[f'eps10.0_seed{seed}'] = {'path': str(ckpt_path), 'time': elapsed}
    print(f'Done in {elapsed:.1f}s')

print(f'\nAll training complete. {len(results)} checkpoints saved.')


DP-SGD epsilon=1.0, seed=42
Data: torch.Size([33088, 2000]), Retain: 28094, Device: cpu
Target epsilon: 1.0, delta: 1e-05
Creating DP-compatible VAE (2000 -> [1024, 512, 128] -> z=32)...




Calibrated noise_multiplier: 2.8840 (target_eps=1.0, steps=5250)
Training for up to 50 epochs (~105 batches/epoch)...




  Epoch 1: train=767.03, val=600.79, eps=0.14, time=326s
  Epoch 5: train=559.22, val=505.91, eps=0.30, time=311s
  Epoch 10: train=539.51, val=468.51, eps=0.43, time=347s
  Epoch 15: train=544.76, val=462.86, eps=0.53, time=363s
  Epoch 20: train=552.00, val=461.25, eps=0.61, time=296s
  Epoch 25: train=554.71, val=461.63, eps=0.69, time=294s
  Epoch 30: train=554.88, val=460.99, eps=0.76, time=298s
  Epoch 35: train=550.93, val=460.14, eps=0.83, time=294s
  Epoch 40: train=546.67, val=458.10, eps=0.89, time=294s
  Epoch 45: train=542.05, val=456.93, eps=0.95, time=290s
  Epoch 50: train=536.76, val=456.93, eps=1.00, time=294s

Final privacy budget: epsilon=1.00, delta=1e-05
Saved to ../outputs/p2/dp_sgd/eps1.0_seed42/best_model.pt
Done in 15347.7s

DP-SGD epsilon=10.0, seed=42
Data: torch.Size([33088, 2000]), Retain: 28094, Device: cpu
Target epsilon: 10.0, delta: 1e-05
Creating DP-compatible VAE (2000 -> [1024, 512, 128] -> z=32)...




Calibrated noise_multiplier: 0.7102 (target_eps=10.0, steps=5250)
Training for up to 50 epochs (~105 batches/epoch)...




  Epoch 1: train=614.24, val=495.84, eps=2.96, time=302s
  Epoch 5: train=478.36, val=436.45, eps=4.08, time=292s
  Epoch 10: train=463.00, val=416.63, eps=5.04, time=295s
  Epoch 15: train=453.15, val=411.18, eps=5.85, time=292s
  Epoch 20: train=447.69, val=408.33, eps=6.57, time=291s
  Epoch 25: train=441.68, val=406.57, eps=7.23, time=293s
  Epoch 30: train=438.78, val=404.68, eps=7.84, time=298s
  Epoch 35: train=434.09, val=403.50, eps=8.43, time=295s
  Epoch 40: train=431.43, val=402.86, eps=8.98, time=291s
  Epoch 45: train=429.80, val=401.75, eps=9.51, time=289s
  Epoch 50: train=428.05, val=400.29, eps=10.03, time=294s

Final privacy budget: epsilon=10.03, delta=1e-05
Saved to ../outputs/p2/dp_sgd/eps10.0_seed42/best_model.pt
Done in 14705.9s

DP-SGD epsilon=50.0, seed=42
Data: torch.Size([33088, 2000]), Retain: 28094, Device: cpu
Target epsilon: 50.0, delta: 1e-05
Creating DP-compatible VAE (2000 -> [1024, 512, 128] -> z=32)...




Calibrated noise_multiplier: 0.4374 (target_eps=50.0, steps=5250)
Training for up to 50 epochs (~105 batches/epoch)...




  Epoch 1: train=578.58, val=465.78, eps=11.46, time=296s
  Epoch 5: train=454.68, val=416.33, eps=18.01, time=293s
  Epoch 10: train=440.20, val=405.58, eps=23.33, time=294s
  Epoch 15: train=432.50, val=402.37, eps=27.59, time=291s
  Epoch 20: train=428.58, val=399.94, eps=31.42, time=293s
  Epoch 25: train=424.13, val=398.33, eps=34.92, time=293s
  Epoch 30: train=422.31, val=396.75, eps=38.42, time=296s
  Epoch 35: train=418.95, val=395.73, eps=41.47, time=294s
  Epoch 40: train=417.26, val=395.30, eps=44.38, time=294s
  Epoch 45: train=416.36, val=394.42, eps=47.29, time=288s
  Epoch 50: train=415.10, val=393.44, eps=50.20, time=292s

Final privacy budget: epsilon=50.20, delta=1e-05
Saved to ../outputs/p2/dp_sgd/eps50.0_seed42/best_model.pt
Done in 14688.6s

DP-SGD epsilon=10.0, seed=123
Data: torch.Size([33088, 2000]), Retain: 28094, Device: cpu
Target epsilon: 10.0, delta: 1e-05
Creating DP-compatible VAE (2000 -> [1024, 512, 128] -> z=32)...




Calibrated noise_multiplier: 0.7102 (target_eps=10.0, steps=5250)
Training for up to 50 epochs (~105 batches/epoch)...




  Epoch 1: train=612.48, val=505.27, eps=2.96, time=299s
  Epoch 5: train=479.17, val=445.11, eps=4.08, time=294s
  Epoch 10: train=467.56, val=425.77, eps=5.04, time=302s
  Epoch 15: train=456.04, val=415.14, eps=5.85, time=300s
  Epoch 20: train=448.05, val=411.61, eps=6.57, time=361s
  Epoch 25: train=441.64, val=408.82, eps=7.23, time=298s
  Epoch 30: train=437.01, val=407.40, eps=7.84, time=294s
  Epoch 35: train=433.55, val=406.02, eps=8.43, time=293s
  Epoch 40: train=431.15, val=404.39, eps=8.98, time=292s
  Epoch 45: train=429.33, val=403.50, eps=9.51, time=298s
  Epoch 50: train=426.97, val=402.38, eps=10.03, time=294s

Final privacy budget: epsilon=10.03, delta=1e-05
Saved to ../outputs/p2/dp_sgd/eps10.0_seed123/best_model.pt
Done in 15330.7s

DP-SGD epsilon=10.0, seed=456
Data: torch.Size([33088, 2000]), Retain: 28094, Device: cpu
Target epsilon: 10.0, delta: 1e-05
Creating DP-compatible VAE (2000 -> [1024, 512, 128] -> z=32)...




Calibrated noise_multiplier: 0.7102 (target_eps=10.0, steps=5250)
Training for up to 50 epochs (~105 batches/epoch)...




  Epoch 1: train=614.92, val=508.14, eps=2.96, time=303s
  Epoch 5: train=474.22, val=438.97, eps=4.08, time=294s
  Epoch 10: train=459.26, val=419.27, eps=5.04, time=297s
  Epoch 15: train=453.24, val=415.03, eps=5.85, time=293s
  Epoch 20: train=445.09, val=412.33, eps=6.57, time=294s
  Epoch 25: train=439.71, val=409.20, eps=7.23, time=293s
  Epoch 30: train=434.61, val=408.08, eps=7.84, time=294s
  Epoch 35: train=432.56, val=406.24, eps=8.43, time=294s
  Epoch 40: train=430.11, val=405.50, eps=8.98, time=291s
  Epoch 45: train=426.60, val=404.40, eps=9.51, time=301s
  Epoch 50: train=425.19, val=403.75, eps=10.03, time=302s

Final privacy budget: epsilon=10.03, delta=1e-05
Saved to ../outputs/p2/dp_sgd/eps10.0_seed456/best_model.pt
Done in 14816.5s

All training complete. 5 checkpoints saved.


## 2. Evaluate with Canonical Fresh Attacker

In [3]:
sys.path.insert(0, '../scripts')
from eval_multiseed import (
    load_vae_model, train_fresh_attacker, evaluate_privacy,
    evaluate_utility, MARKER_GENES
)
import scanpy as sc

adata = sc.read_h5ad(DATA_PATH)
X = torch.tensor(
    adata.X.toarray() if hasattr(adata.X, 'toarray') else adata.X,
    dtype=torch.float32
)

with open(SPLIT_PATH) as f:
    split = json.load(f)
forget_idx = split['forget_indices']
retain_idx = split['retain_indices']
unseen_idx = split['unseen_indices']

with open('../outputs/p1.5/s1_matched_negatives.json') as f:
    matched_data = json.load(f)
matched_neg_idx = matched_data['matched_indices']

X_holdout = X[unseen_idx]
labels_holdout = adata.obs['leiden'].values[unseen_idx]
gene_names = list(adata.var_names)
marker_idx = [gene_names.index(g) for g in MARKER_GENES if g in gene_names]
marker_names = [g for g in MARKER_GENES if g in gene_names]

# Train fresh attacker on baseline
BASELINE_CKPT = '../outputs/p1/baseline/best_model.pt'
baseline_model, _ = load_vae_model(BASELINE_CKPT)
attacker = train_fresh_attacker(
    baseline_model, adata, forget_idx, matched_neg_idx, retain_idx, seed=42
)

Training fresh attacker on baseline F vs matched:
  Samples: 224 (30 forget + 194 matched)
  Features: 70 dims
  Train: 179, Test: 45
  Baseline AUC (F vs matched, full set): 0.7792
  (Canonical NB03 value: ~0.769)


In [4]:
eval_results = {}

for name, info in results.items():
    ckpt_path = info['path']
    print(f'\nEvaluating {name}...')
    
    model, config = load_vae_model(ckpt_path)
    
    privacy = evaluate_privacy(
        model, attacker, adata, forget_idx, matched_neg_idx, retain_idx
    )
    utility = evaluate_utility(
        model, X_holdout, labels_holdout, marker_idx, gene_names
    )
    
    # Load achieved epsilon from checkpoint
    ckpt = torch.load(ckpt_path, map_location='cpu', weights_only=False)
    achieved_eps = ckpt.get('achieved_epsilon', 'unknown')
    
    eval_results[name] = {
        'privacy': privacy,
        'utility': utility,
        'training_time': info['time'],
        'achieved_epsilon': achieved_eps,
    }
    
    print(f'  AUC={privacy["mlp_auc"]:.3f}, '
          f'Advantage={privacy["mlp_advantage"]:.3f}, '
          f'ELBO={utility["elbo"]:.1f}, '
          f'Marker r={utility["marker_r"]:.3f}, '
          f'eps={achieved_eps}')


Evaluating eps1.0_seed42...


Found Intel OpenMP ('libiomp') and LLVM OpenMP ('libomp') loaded at
the same time. Both libraries are known to be incompatible and this
can cause random crashes or deadlocks on Linux when loaded in the
same Python program.
Using threadpoolctl may cause crashes or deadlocks. For more
information and possible workarounds, please see
    https://github.com/joblib/threadpoolctl/blob/master/multiple_openmp.md



  AUC=0.646, Advantage=0.292, ELBO=461.4, Marker r=0.495, eps=1.0035028714928582

Evaluating eps10.0_seed42...
  AUC=0.472, Advantage=0.056, ELBO=404.4, Marker r=0.781, eps=10.030516019139084

Evaluating eps50.0_seed42...
  AUC=0.438, Advantage=0.123, ELBO=397.2, Marker r=0.795, eps=50.19651091262326

Evaluating eps10.0_seed123...
  AUC=0.431, Advantage=0.137, ELBO=402.8, Marker r=0.790, eps=10.030516019139084

Evaluating eps10.0_seed456...
  AUC=0.488, Advantage=0.023, ELBO=402.7, Marker r=0.789, eps=10.030516019139084


## 3. Summary: Privacy-Utility Tradeoff

In [5]:
# Epsilon sweep comparison
print('DP-SGD Epsilon Sweep (seed=42):')
print(f'{"Epsilon":<10} {"AUC":>8} {"Advantage":>10} {"Marker r":>10} {"ELBO":>8} {"Achieved eps":>12}')
print('-' * 62)
for eps in EPSILONS:
    key = f'eps{eps}_seed42'
    if key in eval_results:
        r = eval_results[key]
        print(f'{eps:<10.1f} {r["privacy"]["mlp_auc"]:>8.3f} '
              f'{r["privacy"]["mlp_advantage"]:>10.3f} '
              f'{r["utility"]["marker_r"]:>10.3f} '
              f'{r["utility"]["elbo"]:>8.1f} '
              f'{r["achieved_epsilon"]:>12}')

print()

# Eps=10 multi-seed
eps10_aucs = []
eps10_advantages = []
for seed in SEEDS:
    key = f'eps10.0_seed{seed}'
    if key in eval_results:
        eps10_aucs.append(eval_results[key]['privacy']['mlp_auc'])
        eps10_advantages.append(eval_results[key]['privacy']['mlp_advantage'])

if eps10_aucs:
    print(f'DP-SGD eps=10 (3 seeds):')
    print(f'  AUC: {np.mean(eps10_aucs):.3f} +/- {np.std(eps10_aucs):.3f}')
    print(f'  Advantage: {np.mean(eps10_advantages):.3f} +/- {np.std(eps10_advantages):.3f}')

print()
print('Reference: Retrain AUC=0.523, Advantage=0.046')
print('Reference: Baseline AUC=0.783, Advantage=0.565')
print()
print('Note: DP-SGD trains from scratch on retain set, so forget set')
print('was never seen. MIA AUC should be near chance (0.5) unless the')
print('DP noise degrades the model so much that it creates artifacts.')

DP-SGD Epsilon Sweep (seed=42):
Epsilon         AUC  Advantage   Marker r     ELBO Achieved eps
--------------------------------------------------------------
1.0           0.646      0.292      0.495    461.4 1.0035028714928582
10.0          0.472      0.056      0.781    404.4 10.030516019139084
50.0          0.438      0.123      0.795    397.2 50.19651091262326

DP-SGD eps=10 (3 seeds):
  AUC: 0.464 +/- 0.024
  Advantage: 0.072 +/- 0.048

Reference: Retrain AUC=0.523, Advantage=0.046
Reference: Baseline AUC=0.783, Advantage=0.565

Note: DP-SGD trains from scratch on retain set, so forget set
was never seen. MIA AUC should be near chance (0.5) unless the
DP noise degrades the model so much that it creates artifacts.


In [6]:
output = {
    'method': 'dp_sgd',
    'dataset': 'PBMC',
    'forget_type': 'structured',
    'seeds': SEEDS,
    'epsilon_sweep': EPSILONS,
    'results': eval_results,
    'summary': {
        'eps10': {
            'mean_auc': float(np.mean(eps10_aucs)) if eps10_aucs else None,
            'std_auc': float(np.std(eps10_aucs)) if eps10_aucs else None,
            'mean_advantage': float(np.mean(eps10_advantages)) if eps10_advantages else None,
            'std_advantage': float(np.std(eps10_advantages)) if eps10_advantages else None,
        }
    }
}

with open(OUTPUT_BASE / 'dp_sgd_results.json', 'w') as f:
    json.dump(output, f, indent=2, default=str)

print(f'Saved to {OUTPUT_BASE / "dp_sgd_results.json"}')

Saved to ../outputs/p2/dp_sgd/dp_sgd_results.json


## 4. Analysis

DP-SGD is the only method tested that gets close to retrain-equivalent privacy. At eps=10 across three seeds, MIA AUC is 0.464 +/- 0.024 with advantage 0.072 +/- 0.048. One seed (456) hits advantage=0.023, below the retrain threshold of 0.046. The mean sits slightly above it, but it is in the right neighborhood.

The epsilon sweep shows a non-monotonic privacy-utility tradeoff. At eps=1.0, the noise is so heavy that the model falls apart (marker r=0.495, ELBO=461), and that damage itself becomes detectable (AUC=0.646, advantage=0.292). The model is so distorted that forget and retain samples produce distinguishable outputs, but for reasons unrelated to memorization. At eps=10.0, the noise lands in a useful range: enough to mask membership signals, little enough for the model to still learn structure (marker r=0.781, ELBO=404). At eps=50.0, less noise gives slightly better utility (marker r=0.795, ELBO=397) but the model over-unlearns slightly (AUC=0.438, advantage=0.123).

DP-SGD "works" because the forget set was never in training. That is the whole explanation. No post-hoc method tested in this study came close to matching it, and DP-SGD itself requires accepting a real utility penalty (marker r 0.781 vs 0.832, ELBO 404 vs 364).