fix(ant-dev): clean up orphan anvil/antnode and stale node identities on stop#81
Merged
Merged
Conversation
… on stop ant-devnet keeps anvil alive past Testnet::new scope via std::mem::forget on the AnvilInstance, then relies on graceful Drop at process exit to clean it up. SIGTERM/SIGKILL skip destructors, so every ant dev stop leaks one anvil child and one ~/.local/share/ant/nodes/<peer_id>/ tree for each of the spawned nodes. After a handful of start/stop or killed-mid-startup cycles, the LXC accumulates orphan anvils plus 100+ stale node dirs, and subsequent ant dev start runs flake or hang. This is a workaround at the ant-dev layer (Option B in #73). The proper fix lives in ant-devnet itself (Option A: tempfile::TempDir + tokio signal handler, mirroring how ant-clients MiniTestnet and ant-nodes tests/e2e/testnet.rs already do it) and will be a separate PR against WithAutonomi/ant-node. In ant dev stop now: - pkill anvil and antnode in addition to ant-devnet - rm -rf ~/.local/share/ant/nodes and ~/.local/share/ant/spill so the next start begins from a clean state - Centralise the pkill calls into a _pkill() helper No behaviour change on Windows (the pkill / rm paths are POSIX-only). Closes #16 (local task); helps mitigate #73 (upstream).
Nic-dorman
added a commit
that referenced
this pull request
May 14, 2026
…l) (#88) `ant dev start` previously hardcoded `ant-devnet --preset default` (25 nodes). On a cold-cache fresh start that reproducibly hits the manifest-wait timeout (#73) — even after #81's stop-time cleanup — because spinning up 25 nodes exceeds the 6-min wait window. New contributors following SETUP.md hit `Timed out waiting for devnet manifest` on their first run with no obvious cause. `--preset small` (10 nodes) finishes in seconds and is plenty for SDK development. Switch the default to `small` and expose `--preset` so users can opt back into `default` / `large` for stress runs. ## Why default changes (not just a new flag) Defaults should make the documented happy path actually work. With `default` as the default, `ant dev start --ant-node-dir …` per SETUP.md fails on cold cache; with `small`, it works. The proper fix is #73 Option A in `WithAutonomi/ant-node` (`tempfile::TempDir` + tokio signal handler in `ant-devnet/main.rs`) — once that lands and `default` works reliably, this flag still gives users a fast option for tight iteration loops. ## Test plan - [x] `ant dev start --ant-node-dir ~/Projects/ant-node` (no --preset) → uses `small`, devnet ready in ~10s - [x] `ant dev start --preset default --ant-node-dir …` → reproduces #73 manifest-timeout symptom (as expected — the underlying ant-devnet bug is unchanged) - [x] `ant dev start --help` shows the new flag with all three choices - [x] Cross-SDK e2e harness (15/15 SDKs) green with the default preset change Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Nic-dorman
added a commit
that referenced
this pull request
May 14, 2026
Cuts v0.7.1 atop v0.7.0. Primarily refreshes the upstream `ant-core` pin to the `ant-cli-v0.2.3` release tag (no API change for antd consumers). Bundles a substantial round of cross-SDK example/build fixes, dispatcher improvements, and CI/release workflow hardening. ## antd - chore(antd): bump ant-core to v0.2.3 (#85) ## SDK example/build fixes - fix(antd-php): use cost-estimate fields in example 02 (#74) - fix(antd-elixir): print cost-estimate fields in examples (#75) - fix(antd-lua): add missing discover module to rockspec (#76) - fix(antd-kotlin): make put-response cost optional + ship gradle wrapper (#77) - fix(antd-zig): pass payment_mode to dataPutPublic/dataPutPrivate (#79) - fix(antd-java): make examples runnable via gradle :examples subproject (#80) - fix(antd-zig): align stdlib API to declared 0.14.x minimum (#82) - fix(antd-swift): port to Linux + populate cost-estimate fields (#87) ## ant-dev (developer CLI) - fix(ant-dev): clean up orphan anvil/antnode and stale node identities on stop (#81) - fix(ant-dev): tooling cluster — flag alias, sys.executable, anvil preflight, README (#83) - feat(ant-dev): expand `ant dev example` to dispatch all 15 SDKs (#84) - fix(ant-dev): dispatcher swift no-skip + lua LUA_PATH wrap (#86) - feat(ant-dev): expose --preset flag on `ant dev start` (default: small) (#88) ## CI / release - ci: authenticate arduino/setup-protoc on ci.yml too (#60) - feat(release): publish antd-linux-arm64 artifact (#89) ## Validation 15/15 SDKs round-tripped end-to-end against a daemon built from this commit on a Linux dev box (Ubuntu 24.04, 0.7.1 atop ant-core v0.2.3). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Helps mitigate #73 (Option B path).
ant-devnetkeeps anvil alive pastTestnet::new's scope withstd::mem::forget(testnet)and relies on graceful Drop at process exit to clean it up. SIGTERM/SIGKILL skip destructors, so everyant dev stopleaks oneanvilchild and one~/.local/share/ant/nodes/<peer_id>/directory per spawned node (25 dirs on the default preset). After a handful of start/stop cycles — and especially after kill-mid-startup events — the LXC accumulates orphan anvils plus 100+ stale node dirs, and subsequentant dev startruns flake or hang.This is the Option B workaround proposed in #73 (the band-aid at the
ant-devlayer). The proper fix is Option A: changeant-devnet/main.rsto usetempfile::TempDir+ a tokio signal handler, mirroring howant-client'sMiniTestnetandant-node'stests/e2e/testnet.rsalready do it. That lives inWithAutonomi/ant-nodeand will go up as a separate PR there.Changes in
ant dev stoppkill -9 -f anvilandpkill -9 -f .../antnodein addition to the existingant-devnetpkillrm -rf ~/.local/share/ant/{nodes,spill}so the nextant dev startbegins from a clean slate_pkill()helper for readabilityNo behaviour change on Windows —
pkilland the data-dir cleanup are POSIX-only branches.Test plan
ant dev startfollowed byant dev stopleft an orphananviland 25 dirs in~/.local/share/ant/nodes/. Reproducible every run.start→stopleaves zero processes and an empty data dir: