Releases: JimGalasyn/jax-solitons
v0.0.4 — campaign robustness: FleetExecutor, self-healing REST, reap by label
The campaign-robustness release. Everything from the 2026-06-15 farm post-mortem — a transient DNS storm that made the fleet treat 14 of 16 legs as terminally failed and tore the boxes down — turned into hardening, plus the refactor that collapses three forked fleet drivers into one library executor. All additive: no breaking changes from v0.0.3.
pip install --upgrade git+https://github.com/JimGalasyn/jax-solitons.git@v0.0.4
Headline: FleetExecutor — a fleet run is data, not a fork
jax_solitons.campaign.FleetExecutor runs a list of FleetLeg(label, command, ship, fetch) over any Provider, one rented host per leg, in parallel. The three hand-rolled private drivers (run_eps_fleet / run_stability_fleet / run_eps_kick_fleet), each copy-pasting the same rent → wait-worker → scp → ssh → fetch → fail-over loop, now collapse to thin callers that just build legs. Physics-agnostic (no model/stepper import).
Every robustness feature is the fix for a specific failure that session paid for:
- Per-leg failover on a bad host (
HostProbeFailed) or an offer race (RentUnavailable). - Fast-fail a corpse (
VastProvider.dead_reason) — a container that never came up is detected via the API and failed over in seconds instead of ssh-polling it for the full ~20-min deadline. - Offer-pool refresh when it drains under heavy failover, instead of
NO_OFFERSwhile new offers exist. - Resume / skip legs whose output already exists locally — idempotent relaunch.
- Launch jitter against the thundering herd; signal-safe teardown that destroys in-flight rentals on SIGTERM/SIGINT.
Self-healing Vast REST (_req)
Every Vast API call now retries transient transport faults — DNS EAI_AGAIN, connection reset, read timeout, 5xx — with exponential backoff before giving up. The retry is idempotent-aware: create retries only pre-send DNS failures, so a half-completed rent can never double-rent a GPU. A raw URLError no longer escapes; every network failure surfaces as VastError (now carrying .code). This single change at the one function every REST call flows through is what would have made the original DNS storm invisible.
Orphan attribution: LaunchSpec.label + reap --label
Every rented instance is stamped with a campaign/run label at create, and reap gains a third scope — reap --label NAME destroys only that campaign's boxes. Safe under concurrent farming like --ledger (no --all gate), but ledger-free, so a crashed run's orphans are reapable by label from any machine. Scopes AND with --ledger.
Also
RentUnavailable— aProviderfailover signal for an offer taken before any instance was created (nothing to tear down); keeps executors provider-agnostic.campaign status—python -m jax_solitons.campaign.status --ledger <path>prints live instances (what's billing now) + cumulative spend/outcomes, replacing log-grepping and shellingvastai.make_verlet_stepis now exported fromjax_solitons.steppers.
See the CHANGELOG entry for the structured list.
v0.0.3 — knot ID, gauged L₂+L₃ model, anti-vacuum pole fix
jax-solitons v0.0.3 — knot ID, the gauged L₂+L₃ model, and the anti-vacuum pole fix
Released 2026-06-15. Everything merged since the v0.0.2 PyPI debut, in one patch. Two new physics capabilities land — core-curve knot identification (
jax_solitons.knots) and the coupled L₂+L₃ gauged Faddeev–Skyrme–Higgs model — alongside the multi-cloud campaignProviderseam. And one real correctness bug is fixed: the knot tracer assumed the wrong vacuum hemisphere, so the library's own seeds tripped the knot-ID it ships. That bug is almost certainly the "census on evolved fields blocked" wall; it surfaced while dogfooding the engine on a bare-vs-gauged knot-stability sweep.
← All releases · Full CHANGELOG entry · GitHub release · v0.0.3 Zenodo DOI: minted on release (backfilled here) · Concept DOI: 10.5281/zenodo.20680195
Install / upgrade
pip install --upgrade git+https://github.com/JimGalasyn/jax-solitons.git@v0.0.3The headline fix: auto-detect the anti-vacuum pole
core_curves_from_n traces a soliton's core curve as the preimage of a pole of the Faddeev map n: R³ → S² — the {n1=0, n2=0} set on one hemisphere. It was hard-coded to pole=+1, which assumes the vacuum sits at −z. But the library's own torus_knot_hopfion + arrested_flow leave the vacuum at +z (mean n3 ≈ +0.985). So the tracer seeded on the entire +z vacuum bulk — millions of points instead of the ~600-point core curve — and the pure-Python predictor-corrector ground for hours.
In other words: the library was internally inconsistent. The fields it generates tripped the knot-ID it ships. Any consumer tracing a jax-solitons field hit the hang. This is almost certainly the root of the "census on evolved fields blocked" / "skeletons unknot the trefoil" wall seen downstream.
The fix makes pole selection automatic and convention-agnostic:
core_curves_from_n(n1, n2, n3, axes) # pole="auto" (new default)
# pole = -1 when mean(n3) > 0 (vacuum +z -> trace the -z core), else +1- vacuum at −z →
pole=+1(the historical default) — unchanged for those fields; - vacuum at +z →
pole=−1— the case that used to hang; - a degenerate field that fills one pole entirely falls back to the opposite pole;
- any value other than
"auto"/±1 now raisesValueErrorup front (not a cryptic failure deep inpole * n3).
On the dogfood sweep this took tracing from a 600 s timeout to ~2 s/knot. New regression test test_core_curves_from_n_auto_pole_plusz_vacuum asserts a +z-vacuum field is found by auto and missed by the old default.
Robustness: lower the resample cap
identify_knot / identify_core_knot resample a curve before handing it to pyknotid's Alexander routine. The default cap was 600 — but on jittery evolved curves, 600 still went combinatorial when pyknotid's cython chelpers aren't compiled (the common pip case). The default is now 200, which resolves every torus knot through T(2,9) on noisy traces in under a second; identify_core_knot also forwards max_points so callers can tune it. Raise it for genuinely high-crossing curves.
New: core-curve knot identification (jax_solitons.knots)
The inverse of the carrier ladder. Given a soliton field, trace its core curve and read off the Alexander determinant (unknot = 1, trefoil = 3, cinquefoil = 5, …):
from jax_solitons.knots import core_curves_from_n, identify_core_knot
curves = core_curves_from_n(n1, n2, n3, axes) # predictor-corrector trace
info = identify_core_knot(curves) # {'determinant': 3, 'carrier': 'trefoil', ...}Front ends for both the Faddeev n-field (core_curves_from_n) and a GPE ψ-field (core_curves_from_psi), plus curve_energy_scores to pick the dominant component and a with_time_limit guard for turbulent fields where the pure-Python tracer/Alexander can go pathological.
New: the coupled L₂ + L₃ gauged model
models.gauged_faddeev implements the Paper-16 L_NWT — an SU(2) Skyrme field slaved to a C² doublet with a U(1) gauge + Higgs sector. gauged_faddeev_model builds it; n_from_doublet recovers the Skyrme field so the same relax-then-ID pipeline works on gauged solutions. Seeded by the new torus_knot_hopfion (T(p,q) hopfion, Q_H = p·m) and flux_threaded_knot_seed.
New: the multi-cloud campaign Provider seam (F)
A pluggable cloud-broker Protocol so a new GPU cloud is a ~150-line adapter rather than a fork, with leak-proof rent() as the contract invariant. Ships VastProvider and RunPodProvider, remote execution via the Executor seam (ModalExecutor, ProviderExecutor, InProcessExecutor), campaign.multi for partition-and-merge across executors, and a shared object-store backend. See the full CHANGELOG entry for the complete list. Breaking: VastClient → VastProvider (method signatures changed; a VastClient alias is kept for import compatibility).
What's next
- Build the cython
chelpers(or a vectorized determinant) so high-point curves are fast without resampling. - Harden the knot-ID against the diffuse gauged core (a short bare-relax sharpen currently makes it traceable; do it inside the tracer).
- Continue the campaign
ProviderExecutor/run_multipath toward end-to-end cross-cloud sweeps.
See also
CHANGELOG.md— structured changelog for every versionsrc/jax_solitons/knots.py— the core-curve tracer + knot IDdocs/RELEASING.md— the maintainer release runbook
v0.0.2 — PyPI debut + trusted publishing
Packaging release (no engine/API changes).
- On PyPI now:
pip install jax-solitons(0.0.2). oracleextra pinned tonwt-substrate==0.5.0(now on PyPI) for a reproducible cross-engine equivalence gate.- Release-triggered PyPI trusted-publishing workflow (OIDC, no token).
Version DOI: 10.5281/zenodo.20683741 · concept DOI: 10.5281/zenodo.20680195
v0.0.1 — first pre-alpha
First tagged pre-alpha of jax-solitons. The API will change without notice until 0.1.
What's in it
- One
Modelabstraction — composable energy terms + manifold constraint + topological charges; forces viajax.grad. Faddeev-Skyrme (E₂ + area-form E₄, S²/CP¹) and Gross-Pitaevskii are validated. - Exact topology — the Berg-Lüscher solid-angle Hopf charge as a native, differentiable primitive (best-in-class per Phys. Rev. B 111, 134408 (2025)).
- Restartable, registered runs over a fleet — the physics-agnostic A/B/C/E campaign contract (
jax_solitons.campaign): config-hashed registry, bit-identical full-state restart, streamed event-records with triggered capture, probe-or-bail admission, over a pluggable executor (local backends shipped; SkyPilot executor stubbed). SeeCAMPAIGN.md. - First RunFn —
faddeev_relax_then_id(relax-then-ID behind the contract) + a runnable campaign example.
Validation
Cross-checked bit-identically against the source research engine; live acceptance gates in CI; CodeQL clean. Suite: 33 passed, 3 skipped (GPU tier).
See VALIDATION.md for the full claims → tests → references map.