Release v0.6.1 - vectorized executor follow-ups + mypy strict fix · dataeducator/fairswarm-library

Patch release. Three fixes layered on v0.6.0:

mypy --strict regression in update_swarm_vectorized (51c87f9)
Sprint 8 / G11 — batched fairness gradient (a3eda6e)
Sprint 8 / G12 — threads chunked dispatch + honest backend docs (dad8f9e)

No public-API change. Defaults unchanged. pip install fairswarm==0.6.1 is a drop-in upgrade from 0.6.0.

What this release closes

G11 — Batched fairness gradient

update_swarm_vectorized shipped in v0.6.0 with a residual per-particle
Python loop calling compute_fairness_gradient once per particle. That
loop was the dominant cost in the vectorized path and capped its
speedup at ~1.7x serial.

v0.6.1 adds compute_fairness_gradient_batched(positions, clients, target)
that returns the entire (P, n) gradient matrix in a single matmul plus
a handful of vector ops:

S_p = sum(X[p]) + eps                 # per-particle position sum
W   = X / S[:, None]                  # (P, n)  soft selection weights
C   = W @ D  --normalize--            # (P, k)  soft coalition demographics
G   = log(C / target_safe) + 1        # (P, k)  KL gradient w.r.t. C
M   = G @ D.T                         # (P, n)  one matmul
q   = (G * C).sum(axis=1)             # (P,)    per-particle scalar
grad = -(M - q[:, None]) / S[:, None] # (P, n)  output

Equivalence: batched output agrees with the per-particle reference
to 4.12e-17 absolute error on a (P=25, n=30, k=5) sweep — bit-exact
up to floating-point rounding. Per-particle norm clipping
(max_grad_norm=10.0) is preserved row-by-row so each particle's
restoring force stays proportional to its own divergence (Theorem 2's
drift analysis is per-particle).

Wall-clock impact (commodity CPU, 30 iter, 3 repeats):

P	n	serial	vectorized v0.6.0	vectorized v0.6.1	speedup
30	50	1284 ms	299 ms (4.3x)	257 ms	5.0x
100	50	3278 ms	2945 ms (1.1x)	965 ms	3.4x
200	50	7232 ms	7721 ms (0.9x)	1291 ms	5.6x
30	200	2722 ms	2806 ms (1.0x)	306 ms	8.9x
100	200	9650 ms	8973 ms (1.1x)	1324 ms	7.3x
200	200	18439 ms	7821 ms (1.7x)	1750 ms	10.5x

Speedup is now uniform 3-10x across the (P, n) grid rather than
concentrated at the single largest cell.

G12 — Threads chunked dispatch

The threads executor previously submitted one Future per particle.
ThreadPoolExecutor.submit + Future.result has ~50-200us of round-trip
overhead per dispatch on a 4-worker pool (Windows numbers; lower on
Linux). For lightweight fitness functions like DemographicFitness
(~30-100us per evaluation), per-particle dispatch loses to the serial
baseline because the dispatch tax exceeds the work.

v0.6.1 changes the threads dispatch to bundle particles into
n_workers contiguous chunks and submit one task per chunk. Each
task processes its slice serially within one worker thread, then
returns the slice; the executor re-stitches the slices in input order.
Worst-case threads improved from 0.14x to 0.41x of serial.

Threads still loses to serial on lightweight fitness — and that's
expected. The honest answer (now in the module docstring) is that
threads is for fitness functions that release the GIL for substantial
wall-time:

a federated training round (network or disk I/O bound),
a large BLAS-heavy accuracy evaluation,
a fitness that calls into a C extension for milliseconds.

For fast closed-form scores (the common case in coalition-selection
benchmarks), use vectorized (3-10x speedup) or serial.

mypy --strict regression fix

The narrowing if ctx0.target_distribution is not None in
update_swarm_vectorized only constrained ctx0.target_distribution,
not the loop-local ctx.target_distribution. v0.6.1 binds the
narrowed reference once outside the loop. All contexts in one
iteration share the same snapshot anyway, so this is also a tiny
performance win (no per-iteration attribute access).

Tests & quality

845 passing (+4 vs v0.6.0), 1 skipped, 0 failing
mypy --strict: clean across 48 source files
Four new tests pin the new contracts:
- TestBatchedFairnessGradient::test_batched_matches_per_particle_machine_precision
- TestBatchedFairnessGradient::test_batched_clip_matches_per_particle_clip
- TestThreadsChunkedDispatch::test_threads_chunked_result_in_input_order
- TestThreadsChunkedDispatch::test_threads_chunked_matches_serial_n_workers_invariant

Install

pip install --upgrade fairswarm==0.6.1

Reproducibility

Re-run the vectorized executor benchmark:

python experiments/bench_vectorized.py

The post-G11+G12 artifact is committed at
results/parallel_speedup/bench_vectorized_20260513_010046.json.

Reference

T. Norwood, D. Das, P. Chatterjee, E. Bentley, and U. Ghosh,
"FairSwarm: Trustworthy Coalition Selection for Fair and Secure
Federated Intelligence," IEEE Trans. Consum. Electron., 2026 (Submitted).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.6.1 - vectorized executor follow-ups + mypy strict fix

Choose a tag to compare

Sorry, something went wrong.