## What’s not covered (or only conceptually)

These map directly to the “Practical Issues & Fixes” you flagged and a few production must‑haves.

1. Non‑IID mitigation (beyond FedOpt)

Not implemented: FedProx (prox term μ), FedAvgM (server momentum), SCAFFOLD / control variates, FedNova, clustering or personalized FL (pFedMe, Per‑FedAvg, FedBN).

Impact: Faster/more stable convergence on skewed clients; better per‑client accuracy.

2. Stragglers & partial participation

Not implemented: Async/near‑async aggregation, deadline‑based drop, staleness weighting.

Impact: Keeps round time near median; realistic mobile conditions.

3. Robust secure aggregation under dropout

Missing: Additive secret sharing / two‑server protocol to reconstruct masks if clients disappear; threshold‑t tolerant schemes.

Impact: Makes secure‑agg usable in real rounds with churn.

4. Communication‑cost controls

Missing: Gradient sparsification (Top‑k, Rand‑k), quantization (QSGD, 8‑bit/1‑bit), error‑feedback; payload/round accounting.

Impact: 10–100× lower uplink without (much) accuracy loss.

5. Privacy actually in the loop

Pending (next module): DP‑FedAvg integration (per‑client clipping + server Gaussian noise), RDP privacy accountant, user‑level ε tracking and early‑stop on budget.

Impact: Formal privacy guarantees; measurable ε–accuracy tradeoff.

6. Security hardening beyond toy demo

Missing in‑loop: mTLS for gRPC, replay protection (nonce/timestamps), auth tokens/attestation stubs, secret management (KMS), CI image scanning gating; poisoning defenses (clip‑and‑filter, cosine‑sim filters, median/trimmed‑mean/Krum).

Impact: Realistic threat model coverage (honest‑but‑curious + malicious clients).

7. Fairness & per‑client quality

Missing: q‑FFL or reweighting, long‑tail client metrics, per‑cohort evaluation.

Impact: Avoid models that serve only “majority” clients well.

8. Evaluation & ops gaps

Missing: Federated test evaluation (global vs per‑client), checkpointing & resume, Prometheus/Grafana wiring in these runs, experiment registry, deterministic seeds across all layers.

Impact: Reproducible science and debuggability.

9. Tuning & HParam search for FL

Missing: Fed‑aware sweeps (server LR, client LR, C, E, μ for FedProx) with budgeted comparisons.

Impact: Systematic performance gains, not ad‑hoc wins.

## Suggested “next steps” checklist (actionable)

If you want to round out the FL module before moving to DP/Sec:

Implement FedProx (μ grid: 0, 1e‑4, 1e‑3, 1e‑2) on the non‑IID Dirichlet(α=0.3) split; compare rounds‑to‑target vs FedAvg/FedAdam.

Add deadline‑based aggregation: sample C clients, give them random “latencies,” drop stragglers after a deadline; plot loss vs wall‑clock.

Two‑server secure‑agg prototype: split masks across Aggregator & MaskServer; show exact‑sum even with 10–20% dropouts.

Add comm‑compression: Top‑k (e.g., 1%/5%) with error‑feedback; log bytes/round and accuracy delta.

Robust aggregator: median or trimmed‑mean on layer deltas; simulate 5–10% poisoned clients and show mitigation.

Federated test evaluation: extract server weights each 5 rounds, evaluate centrally; record IID vs non‑IID gap.

Basic fairness slice: track accuracy per label or per‑client size decile; report disparity.