feat: extend automatic upgrades with caching, cross-platform support, and clean restarts by jacderida · Pull Request #22 · WithAutonomi/ant-node

jacderida · 2026-03-10T18:24:18Z

Summary

Extends the automatic upgrade system with production-ready features for multi-node deployments across all platforms:

Always-on upgrades: Removed the --auto-upgrade flag; upgrades are now always enabled, simplifying deployment configuration
Release metadata cache: Shared disk cache with configurable TTL so multiple nodes on the same machine don't each poll the GitHub API independently
Binary cache: Downloaded and verified binaries are cached on disk (with SHA-256 integrity checks) so only the first node needs to download; subsequent nodes copy from cache
Clean process restart: Replaced Unix exec() with a spawn-and-exit model that works cross-platform. Added --stop-on-upgrade flag for service manager environments (systemd, launchd, Windows Service) where the manager handles restarts
Windows support: Added zip archive extraction alongside tar.gz, enabling auto-upgrades on Windows nodes
Staggered upgrade checks: Nodes randomise their check intervals (±5%) to avoid thundering-herd API calls
Robustness fixes: Handle deleted binary paths, improve error context in download/extraction, fix staged rollout check integration in the node loop

Validation

Three separate testnet deployments were used to validate the changes:

100-node Linux testnet: All nodes successfully auto-upgraded to the fake 0.3.12-rc.1 release
100-node Linux testnet + standalone Windows node: All Linux nodes upgraded successfully, and the standalone Windows node also upgraded correctly (zip extraction path)
100-node Linux testnet + standalone macOS node: All Linux nodes upgraded successfully, and the standalone macOS node also upgraded correctly

Test plan

100-node Linux-only testnet upgrade validated
100-node Linux testnet + Windows standalone node upgrade validated
100-node Linux testnet + macOS standalone node upgrade validated

🤖 Generated with Claude Code

Copilot

Pull request overview

This PR expands the node auto-upgrade subsystem for multi-node, cross-platform deployments by adding shared on-disk caching, Windows archive support, and a service-manager-friendly restart mode, while also making upgrades always enabled by default.

Changes:

Make auto-upgrades always enabled (remove --auto-upgrade / config enabled) and add --stop-on-upgrade for service-manager restarts.
Add shared disk caches for GitHub release metadata and extracted binaries (with locking and TTL support in the cache primitives).
Add cross-platform restart behavior and Windows zip extraction; update packaging/deploy/docs to match the new behavior and repo URLs.

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
`wix/main.wxs`	Update ARP metadata links to new GitHub org/repo.
`systemd/saorsa-node.service`	Add `--stop-on-upgrade` and switch to `Restart=always` for service-manager restarts.
`src/upgrade/release_cache.rs`	New: disk-backed release metadata cache with TTL and file locking + tests.
`src/upgrade/monitor.rs`	Integrate `ReleaseCache`; add upgrade timing logs; update repo references in tests.
`src/upgrade/mod.rs`	Export new cache modules and restart exit code; update platform support flag/test.
`src/upgrade/cache_dir.rs`	New: shared upgrade cache directory helper + test.
`src/upgrade/binary_cache.rs`	New: extracted-binary disk cache with SHA-256 integrity metadata and a download lock helper + tests.
`src/upgrade/apply.rs`	Add binary cache usage, zip extraction, deleted-path handling, and spawn-and-exit restart model with `--stop-on-upgrade`.
`src/node.rs`	Make upgrades always on; wire in caches + stop-on-upgrade; use staged-rollout-aware monitor method; add startup jitter.
`src/config.rs`	Remove upgrade `enabled`; add `stop_on_upgrade`; update default GitHub repo.
`src/bin/saorsa-node/cli.rs`	Remove `--auto-upgrade`; add `--stop-on-upgrade`; update upgrade config wiring.
`scripts/testnet/upgrade-test.sh`	Adjust testnet script output for always-on upgrades.
`scripts/testnet/start-genesis.sh`	Remove `--auto-upgrade` from generated systemd unit.
`scripts/testnet/spawn-nodes.sh`	Remove `--auto-upgrade` from generated systemd unit.
`scripts/testnet/deploy-all.sh`	Update binary URL to new GitHub org/repo.
`scripts/testnet/build-and-deploy.sh`	Update clone URL to new GitHub org/repo.
`docs/DESIGN.md`	Update repo references and remove mention of `--auto-upgrade` flag.
`deploy/terraform/cloud-init/worker.yml`	Update release download URL to new GitHub org/repo.
`deploy/scripts/spawn-nodes.sh`	Update release download URL to new GitHub org/repo.
`README.md`	Update CLI examples, repo references, and add “Running as a Service” guidance.
`Cargo.toml`	Add `zip`, `fs2`, and Windows `self-replace`; install systemd unit in packages; update repository URL.
`.github/workflows/release.yml`	Adjust Windows MSI version derivation to strip prerelease suffix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/node.rs

src/upgrade/apply.rs

.github/workflows/release.yml

src/upgrade/apply.rs

README.md

src/upgrade/apply.rs

src/upgrade/binary_cache.rs

src/upgrade/cache_dir.rs

src/node.rs

greptile-apps · 2026-03-10T18:31:21Z

Greptile Summary

This PR extends the auto-upgrade system from a basic Unix-only downloader into a production-grade, cross-platform upgrade pipeline. Key additions are a shared disk cache for release metadata (ReleaseCache) and verified binaries (BinaryCache), zip archive extraction for Windows, and a --stop-on-upgrade flag that lets service managers (systemd, launchd, WinSW) own the restart instead of exec(). The --auto-upgrade flag is removed in favour of always-on behaviour.

Highlights:

Release and binary caches are both well-guarded: release writes use an exclusive file lock + atomic rename; binary cache reads verify SHA-256 against stored metadata.
on_disk_version() correctly uses tokio::process::Command with a 5 s timeout (addressing the prior review concern).
Archive format detection by magic bytes is clean and the new test suite covers tar.gz, zip, unknown format, and missing-binary cases.
Repository URLs updated consistently across all scripts, config defaults, CI, and docs.

Issues found:

with_download_lock is called from the async fn resolve_upgrade_binary but uses fs2::FileExt::lock_exclusive(), a blocking syscall. When another process holds the lock the Tokio worker thread stalls — should use tokio::task::spawn_blocking.
Both extract_from_zip and extract_from_tar_gz read the entire binary into a Vec<u8> before writing; a streaming std::io::copy would avoid potentially large heap allocations.
current_binary_path() can return a relative argv[0] path, which could silently break the spawn in trigger_restart if the CWD changes.
BinaryCache::store() does not ensure the cache directory exists before writing, leading to a less-informative error if the directory is absent.

Confidence Score: 3/5

Functionally sound and validated on real testnets, but the blocking file lock in an async task is a correctness concern under contention that should be addressed before merging to a high-node-count fleet.
The core upgrade logic is well-tested and the security gate (ML-DSA signature verification before any extraction) is preserved. The testnet validation gives confidence in the happy path. The blocking with_download_lock call inside an async task is a real concern: while the lock is held only briefly (a single cache-file read), under contention it blocks a Tokio worker, and the issue compounds at scale with many nodes starting simultaneously. The relative-path and large-allocation issues are lower severity but worth fixing. No critical data-loss or security regressions were found.
Primary: src/upgrade/apply.rs (blocking lock in async, full-binary Vec allocation, relative argv[0] path). Secondary: src/upgrade/binary_cache.rs (missing cache dir creation in store()).

Important Files Changed

Filename	Overview
src/upgrade/apply.rs	Core upgrade logic refactored: adds binary cache integration, cross-platform restart (spawn-and-exit vs exec), zip extraction, and blocking file-lock in async context
src/upgrade/binary_cache.rs	New file: SHA-256-verified binary cache with exclusive file locking; store() writes without holding the download lock (benign due to identical content) but could cause transient SHA-256 mismatches
src/upgrade/release_cache.rs	New file: disk cache for GitHub release metadata with TTL, atomic writes, and exclusive file locking; well-implemented with thorough test coverage
src/upgrade/monitor.rs	Adds release cache integration; falls back transparently to network on cache miss; clean implementation
src/node.rs	Upgrade monitor always-on, adds jitter, uses check_for_ready_upgrade with richer branch handling; upgrade monitor no longer Arc-wrapped
systemd/saorsa-node.service	Adds --stop-on-upgrade to ExecStart and upgrades Restart=on-failure to Restart=always for clean upgrade restarts

Sequence Diagram

sequenceDiagram
    participant N as saorsa-node (any instance)
    participant RC as ReleaseCache (disk)
    participant GH as GitHub API
    participant BC as BinaryCache (disk)
    participant FS as Filesystem (binary)

    N->>RC: read_if_valid(repo)
    alt cache hit (fresh)
        RC-->>N: Vec<GitHubRelease>
    else cache miss or expired
        N->>GH: GET /repos/{owner}/repo/releases
        GH-->>N: Vec<GitHubRelease>
        N->>RC: write(repo, releases) [exclusive lock]
        RC-->>N: Ok
    end

    N->>N: select_upgrade_from_releases()
    alt upgrade available
        N->>BC: get_verified(version)
        alt binary cache hit
            BC-->>N: PathBuf (cached binary)
        else cache miss
            N->>BC: with_download_lock { double-check }
            BC-->>N: still None
            N->>GH: download archive + signature
            GH-->>N: archive, .sig
            N->>N: verify ML-DSA signature
            N->>N: extract binary
            N->>BC: store(version, binary)
        end
        N->>FS: check on_disk_version()
        alt disk already upgraded
            N->>N: trigger_restart (skip replace)
        else needs replacement
            N->>FS: backup current binary
            N->>FS: replace_binary (atomic rename / self_replace)
            alt stop_on_upgrade=true
                N->>N: process::exit(0) [Unix] or exit(100) [Windows]
            else stop_on_upgrade=false
                N->>N: spawn new binary, then process::exit(0)
            end
        end
    end

Comments Outside Diff (2)

src/upgrade/apply.rs, line 957-961 (link)

Blocking file lock called from async context

cache.with_download_lock(...) calls fs2::FileExt::lock_exclusive(), which is a blocking syscall. When another process holds the lock, the current Tokio worker thread stalls until the lock is released. Since resolve_upgrade_binary is async and runs inside a tokio::spawn task, blocking here starves other futures sharing that worker thread.

While the closure body (a single get_verified file read) is fast, the lock acquisition itself can block for as long as the competing process holds it. This should be moved to a tokio::task::spawn_blocking block:

let cached_under_lock = tokio::task::spawn_blocking({
    let cache = cache.clone(); // if BinaryCache is Clone, or Arc-wrap it
    let version_str = version_str.to_string();
    move || {
        cache.with_download_lock(|| {
            cache.get_verified(&version_str)
                .map_or(Ok(None), |p| Ok(Some(p)))
        })
    }
})
.await
.map_err(|e| Error::Upgrade(format!("Download lock task panicked: {e}")))??;

Alternatively, BinaryCache could be made Clone (or wrapped in Arc) and with_download_lock could be a free async fn that uses tokio::task::spawn_blocking internally.

Prompt To Fix With AI

This is a comment left during a code review.
Path: src/upgrade/apply.rs
Line: 957-961

Comment:
**Blocking file lock called from async context**

`cache.with_download_lock(...)` calls `fs2::FileExt::lock_exclusive()`, which is a blocking syscall. When another process holds the lock, the current Tokio worker thread stalls until the lock is released. Since `resolve_upgrade_binary` is `async` and runs inside a `tokio::spawn` task, blocking here starves other futures sharing that worker thread.

While the closure body (a single `get_verified` file read) is fast, the lock *acquisition* itself can block for as long as the competing process holds it. This should be moved to a `tokio::task::spawn_blocking` block:

```rust
let cached_under_lock = tokio::task::spawn_blocking({
    let cache = cache.clone(); // if BinaryCache is Clone, or Arc-wrap it
    let version_str = version_str.to_string();
    move || {
        cache.with_download_lock(|| {
            cache.get_verified(&version_str)
                .map_or(Ok(None), |p| Ok(Some(p)))
        })
    }
})
.await
.map_err(|e| Error::Upgrade(format!("Download lock task panicked: {e}")))??;
```

Alternatively, `BinaryCache` could be made `Clone` (or wrapped in `Arc`) and `with_download_lock` could be a free `async fn` that uses `tokio::task::spawn_blocking` internally.

How can I resolve this? If you propose a fix, please make it concise.

src/upgrade/binary_cache.rs, line 1545-1549 (link)

store() doesn't ensure the cache directory exists

BinaryCache::new accepts any PathBuf without creating the directory. If upgrade_cache_dir() fails (or the directory is removed between creation and first use), the fs::copy call here will fail with an opaque I/O error. Callers currently suppress this with warn!, so it won't crash — but an explicit fs::create_dir_all(&self.cache_dir) at the top of store() would make the failure much easier to diagnose:

pub fn store(&self, version: &str, source_path: &std::path::Path) -> Result<()> {
    fs::create_dir_all(&self.cache_dir)
        .map_err(|e| Error::Upgrade(format!("Failed to create binary cache dir: {e}")))?;
    let hash = sha256_file(source_path)?;
    // ...
}

Prompt To Fix With AI

This is a comment left during a code review.
Path: src/upgrade/binary_cache.rs
Line: 1545-1549

Comment:
**`store()` doesn't ensure the cache directory exists**

`BinaryCache::new` accepts any `PathBuf` without creating the directory. If `upgrade_cache_dir()` fails (or the directory is removed between creation and first use), the `fs::copy` call here will fail with an opaque I/O error. Callers currently suppress this with `warn!`, so it won't crash — but an explicit `fs::create_dir_all(&self.cache_dir)` at the top of `store()` would make the failure much easier to diagnose:

```rust
pub fn store(&self, version: &str, source_path: &std::path::Path) -> Result<()> {
    fs::create_dir_all(&self.cache_dir)
        .map_err(|e| Error::Upgrade(format!("Failed to create binary cache dir: {e}")))?;
    let hash = sha256_file(source_path)?;
    // ...
}
```

How can I resolve this? If you propose a fix, please make it concise.

_{Last reviewed commit: 426bde1}

src/upgrade/binary_cache.rs

src/upgrade/apply.rs

src/node.rs

jacderida · 2026-03-10T19:19:13Z

@greptile review

Copilot

Pull request overview

Copilot reviewed 21 out of 22 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/upgrade/apply.rs

src/upgrade/binary_cache.rs

src/upgrade/monitor.rs

src/node.rs

README.md

src/upgrade/apply.rs

Copilot

Pull request overview

Copilot reviewed 21 out of 22 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (2)

.github/workflows/release.yml:168

Stripping the pre-release suffix from the git tag when generating the MSI ProductVersion can cause version collisions (e.g. v1.2.3-rc.1 and v1.2.3 both become 1.2.3.0). MSI upgrades generally require monotonically increasing versions; this may prevent upgrading from an RC to the final release. Consider encoding pre-release information into the 4-part MSI version instead of dropping it (or ensure your WiX upgrade strategy accounts for this).

          # Remove 'v' prefix and strip any pre-release suffix (e.g., -rc.1, -beta.2)
          $version = $tag -replace '^v', '' -replace '-.*$', ''
          # Ensure 4-part version for MSI (add .0 if needed)
          $parts = $version.Split('.')
          while ($parts.Length -lt 4) {
            $version = "$version.0"
            $parts = $version.Split('.')
          }

README.md:933

The CLI reference doesn’t mention the new --stop-on-upgrade flag, even though it’s documented earlier and implemented in cli.rs. Please add it here so operators can discover it via the README’s option list.

    --upgrade-channel <CHANNEL>
        Release channel: stable, beta
        [default: stable]

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/upgrade/cache_dir.rs

src/upgrade/apply.rs

src/upgrade/release_cache.rs

src/upgrade/binary_cache.rs

Copilot

Pull request overview

Copilot reviewed 22 out of 23 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

.github/workflows/release.yml:168

The MSI version derivation now strips any pre-release suffix (e.g. v1.2.3-rc.1 → 1.2.3). Since this workflow runs on all v* tags and publishes pre-releases too, this can create MSI version collisions between pre-release and stable tags (and between multiple pre-releases), which can break Windows upgrade/uninstall behavior. Consider either skipping MSI generation for pre-release tags or mapping pre-release identifiers into a monotonically increasing 4-part MSI version (so each tag produces a unique, ordered ProductVersion).

          $tag = "${{ github.ref_name }}"
          # Remove 'v' prefix and strip any pre-release suffix (e.g., -rc.1, -beta.2)
          $version = $tag -replace '^v', '' -replace '-.*$', ''
          # Ensure 4-part version for MSI (add .0 if needed)
          $parts = $version.Split('.')
          while ($parts.Length -lt 4) {
            $version = "$version.0"
            $parts = $version.Split('.')
          }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

config/production.toml

docs/DESIGN.md

src/upgrade/release_cache.rs

src/config.rs

grumbach

PR #22 Review: Extend Automatic Upgrades

Overall Assessment: The architecture is sound — two-layer caching, ML-DSA signature verification before extraction, zip-slip protection via enclosed_name(), and the spawn-and-exit restart model are all well-designed. However, there are correctness issues that should be addressed before merge.

Must Fix

std::process::exit() in async context — bypasses graceful shutdown, tokio runtime, open handles, and the CancellationToken flow (see inline comments on apply.rs)
current_binary_path() (deleted) logic bug — the (deleted) stripping inside exists() guard is unreachable (see inline comment)
use rand::Rng inline in function bodies — violates project coding standards (see inline comments on node.rs)

Should Fix

Replace std::thread::sleep with tokio::time::sleep in the Windows retry loop (apply.rs:560)
Log a warning when chrono::Duration::from_std() fails silently (node.rs)
Extract jitter calculation into a helper — it's duplicated in node.rs
Add migration note in release notes: users with enabled = false in config will now silently get auto-upgrades

Consider for Follow-up

rollout.rs:69,136 uses direct array indexing (self.node_id_hash[0] through [7]). Blake3 is always 32 bytes so it's safe, but violates the project's no-direct-indexing rule. Consider hash.as_bytes().first_chunk::<8>().
No cache eviction in cache_dir.rs — old versions accumulate forever
apply.rs:272 reads entire response into memory before writing. For 200 MiB max archive, consider streaming to disk.
Missing tests for resolve_upgrade_binary(), current_binary_path() with (deleted), concurrent cache access, and jitter calculation
check_interval_hours = 0 in config would cause a tight polling loop — consider validating

grumbach · 2026-03-12T04:30:14Z

src/upgrade/apply.rs

+        let invoked_path = env::args().next().map(PathBuf::from);
+
+        if let Some(ref invoked) = invoked_path {
+            if invoked.exists() {


Bug (M1): Logic error — this (deleted) check (lines 103-110) is inside the if invoked.exists() guard. If the binary was deleted (which is why the (deleted) suffix appears), then invoked.exists() returns false, and this code is unreachable.

The (deleted) stripping only works in the current_exe() fallback path (lines 124-132). Move the (deleted) check before the exists() check, or handle deleted paths as a special case:

// Check for (deleted) suffix first, before exists() let cleaned = if invoked_str.ends_with(" (deleted)") { PathBuf::from(invoked_str.trim_end_matches(" (deleted)")) } else { invoked.clone() }; if cleaned.exists() { return Ok(cleaned); }

grumbach · 2026-03-12T04:30:14Z

src/upgrade/apply.rs

+                info!("Exiting with code 0 for service manager restart");
+                std::thread::sleep(std::time::Duration::from_millis(100));
+                std::process::exit(0);
+            }


Critical (C1): std::process::exit() in async context terminates the process immediately without running destructors or flushing async resources. This is called from within a tokio::spawn'd task. The 100ms thread::sleep before exit is best-effort but:

Tokio runtime is not gracefully shut down

Open file handles, network connections, and LMDB transactions may not be flushed

The CancellationToken shutdown flow in RunningNode::run() is completely bypassed

Recommendation: Instead of calling exit() directly, cancel the shutdown token and let the main loop handle graceful shutdown, then exit from the main function. Or at minimum, spawn a dedicated OS thread for the exit sequence.

grumbach · 2026-03-12T04:30:14Z

src/upgrade/apply.rs

+                        );
+                        last_err = Some(e);
+                        std::thread::sleep(std::time::Duration::from_millis(*delay_ms));
+                    }


Major (M2): std::thread::sleep blocks the tokio worker thread. This Windows retry loop sleeps for 500ms, 1000ms, and 2000ms — up to 3.5 seconds of tokio worker blocking in the worst case.

Use tokio::time::sleep instead, or wrap in spawn_blocking.

grumbach · 2026-03-12T04:30:14Z

src/node.rs

+                    let variance = interval_secs / 20; // 5%
+                    let jitter_secs = if variance > 0 {
+                        use rand::Rng;
+                        rand::thread_rng().gen_range(0..=variance * 2)


Major (M3): use rand::Rng; is inline in a function body. Project rules (CLAUDE.md) explicitly state: "always use top level use for dependencies instead of inlining them in the functions." Please move this to the top-level imports.

grumbach · 2026-03-12T04:30:14Z

src/node.rs

+                            let variance = interval_secs / 20; // 5%
+                            let jittered_secs = if variance > 0 {
+                                use rand::Rng;
+                                let jitter = rand::thread_rng().gen_range(0..=variance * 2);


Major (M3, duplicate): Second occurrence of inline use rand::Rng;. Same as above — move to top-level imports.

Also, this entire jitter calculation block (lines 617-631) is duplicated from lines 541-558. Extract into a helper function.

Re-submitting as comment review

grumbach

PR #22 Review: Extend Automatic Upgrades

Overall Assessment: The architecture is sound — two-layer caching, ML-DSA signature verification before extraction, zip-slip protection via enclosed_name(), and the spawn-and-exit restart model are all well-designed. However, there are correctness issues that should be addressed before merge.

Must Fix

std::process::exit() in async context — bypasses graceful shutdown, tokio runtime, open handles, and the CancellationToken flow (see inline comments on apply.rs)
current_binary_path() (deleted) logic bug — the (deleted) stripping inside exists() guard is unreachable (see inline comment)
use rand::Rng inline in function bodies — violates project coding standards (see inline comments on node.rs)

Should Fix

Replace std::thread::sleep with tokio::time::sleep in the Windows retry loop (apply.rs:560)
Log a warning when chrono::Duration::from_std() fails silently (node.rs)
Extract jitter calculation into a helper — it's duplicated in node.rs
Add migration note in release notes: users with enabled = false in config will now silently get auto-upgrades

Consider for Follow-up

rollout.rs:69,136 uses direct array indexing (self.node_id_hash[0] through [7]). Blake3 is always 32 bytes so it's safe, but violates the project's no-direct-indexing rule. Consider hash.as_bytes().first_chunk::<8>().
No cache eviction in cache_dir.rs — old versions accumulate forever
apply.rs:272 reads entire response into memory before writing. For 200 MiB max archive, consider streaming to disk.
Missing tests for resolve_upgrade_binary(), current_binary_path() with (deleted), concurrent cache access, and jitter calculation
check_interval_hours = 0 in config would cause a tight polling loop — consider validating

grumbach · 2026-03-12T05:20:15Z

src/upgrade/apply.rs

+        let invoked_path = env::args().next().map(PathBuf::from);
+
+        if let Some(ref invoked) = invoked_path {
+            if invoked.exists() {


Bug (M1): Logic error — this (deleted) check (lines 103-110) is inside the if invoked.exists() guard. If the binary was deleted (which is why the (deleted) suffix appears), then invoked.exists() returns false, and this code is unreachable.

The (deleted) stripping only works in the current_exe() fallback path (lines 124-132). Move the (deleted) check before the exists() check, or handle deleted paths as a special case:

// Check for (deleted) suffix first, before exists() let cleaned = if invoked_str.ends_with(" (deleted)") { PathBuf::from(invoked_str.trim_end_matches(" (deleted)")) } else { invoked.clone() }; if cleaned.exists() { return Ok(cleaned); }

grumbach · 2026-03-12T05:20:15Z

src/upgrade/apply.rs

+                info!("Exiting with code 0 for service manager restart");
+                std::thread::sleep(std::time::Duration::from_millis(100));
+                std::process::exit(0);
+            }


Critical (C1): std::process::exit() in async context terminates the process immediately without running destructors or flushing async resources. This is called from within a tokio::spawn'd task. The 100ms thread::sleep before exit is best-effort but:

Tokio runtime is not gracefully shut down

Open file handles, network connections, and LMDB transactions may not be flushed

The CancellationToken shutdown flow in RunningNode::run() is completely bypassed

Recommendation: Instead of calling exit() directly, cancel the shutdown token and let the main loop handle graceful shutdown, then exit from the main function. Or at minimum, spawn a dedicated OS thread for the exit sequence.

grumbach · 2026-03-12T05:20:15Z

src/upgrade/apply.rs

+                        );
+                        last_err = Some(e);
+                        std::thread::sleep(std::time::Duration::from_millis(*delay_ms));
+                    }


Major (M2): std::thread::sleep blocks the tokio worker thread. This Windows retry loop sleeps for 500ms, 1000ms, and 2000ms — up to 3.5 seconds of tokio worker blocking in the worst case.

Use tokio::time::sleep instead, or wrap in spawn_blocking.

grumbach · 2026-03-12T05:20:15Z

src/node.rs

+                    let variance = interval_secs / 20; // 5%
+                    let jitter_secs = if variance > 0 {
+                        use rand::Rng;
+                        rand::thread_rng().gen_range(0..=variance * 2)


Major (M3): use rand::Rng; is inline in a function body. Project rules (CLAUDE.md) explicitly state: "always use top level use for dependencies instead of inlining them in the functions." Please move this to the top-level imports.

grumbach · 2026-03-12T05:20:15Z

src/node.rs

+                            let variance = interval_secs / 20; // 5%
+                            let jittered_secs = if variance > 0 {
+                                use rand::Rng;
+                                let jitter = rand::thread_rng().gen_range(0..=variance * 2);


Major (M3, duplicate): Second occurrence of inline use rand::Rng;. Same as above — move to top-level imports.

Also, this entire jitter calculation block (lines 617-631) is duplicated from lines 541-558. Extract into a helper function.

Copilot

Pull request overview

Copilot reviewed 21 out of 22 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

.github/workflows/release.yml:166

The MSI version derivation strips the entire pre-release suffix (e.g. v0.3.12-rc.1 → 0.3.12.0). If you publish MSIs for both pre-releases and the eventual stable v0.3.12, they’ll share the same ProductVersion, which can break Windows Installer upgrade semantics (stable may not upgrade/replace the RC MSI). Consider mapping pre-release identifiers into the 4th version component (e.g. 0.3.12.1 for rc.1) so every tag produces a monotonically increasing MSI version.

          # Remove 'v' prefix and strip any pre-release suffix (e.g., -rc.1, -beta.2)
          $version = $tag -replace '^v', '' -replace '-.*$', ''
          # Ensure 4-part version for MSI (add .0 if needed)
          $parts = $version.Split('.')
          while ($parts.Length -lt 4) {
            $version = "$version.0"

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-12T22:24:36Z

src/upgrade/apply.rs

+            // Standalone mode: spawn new process, then exit after graceful shutdown
+            let args: Vec<String> = env::args().skip(1).collect();

-            // exec() replaces the current process
-            let err = std::process::Command::new(binary_path).args(&args).exec();
+            info!("Spawning new process: {} {:?}", binary_path.display(), args);

-            // If we get here, exec failed
-            Err(Error::Upgrade(format!("Failed to exec new binary: {err}")))
-        }
+            std::process::Command::new(binary_path)
+                .args(&args)
+                .stdin(std::process::Stdio::null())
+                .stdout(std::process::Stdio::inherit())
+                .stderr(std::process::Stdio::inherit())
+                .spawn()
+                .map_err(|e| Error::Upgrade(format!("Failed to spawn new binary: {e}")))?;

-        #[cfg(not(unix))]
-        {
-            // On Windows, we can't replace a running binary
-            // Just log and let the user restart manually
-            let _ = binary_path; // Suppress unused warning on Windows
-            warn!("Auto-restart not supported on this platform. Please restart manually.");
-            Ok(())
+            info!("New process spawned, will exit after graceful shutdown");
+            Ok(0)


In standalone mode (stop_on_upgrade = false), prepare_restart spawns the upgraded binary immediately, before the node has completed its graceful shutdown/cleanup. This can briefly run two instances concurrently and cause port/DB lock conflicts (the child may fail to start while the parent still holds resources). Consider deferring the spawn until after shutdown has completed (e.g., return a restart action/command to be executed by the main run loop after cleanup), or always use an external supervisor for restarts.

Copilot · 2026-03-12T22:24:36Z

src/upgrade/mod.rs

    /// Whether the current platform supports in-place auto-upgrade.
    ///
-    /// On Windows, replacing a running executable is typically blocked by file locks.
+    /// Supported on Unix (via `exec()`) and Windows (via `self-replace` crate).
    const fn auto_upgrade_supported() -> bool {
-        !cfg!(windows)
+        true
    }


auto_upgrade_supported() now always returns true, but the surrounding docs/commentary indicate Windows support via self-replace/exec(). In this module, perform_upgrade still uses fs::rename for replacement and doesn’t use self-replace, so Windows in-place upgrades will likely fail at runtime. Either restore a real platform gate here (fail closed on Windows) or update Upgrader to use the same cross-platform replacement approach as AutoApplyUpgrader.

Copilot · 2026-03-12T22:24:37Z

src/upgrade/cache_dir.rs

+    /// Note: this creates a real directory under the platform data dir.
+    /// Modifying env vars to isolate this requires `unsafe` (due to
+    /// `deny(unsafe_code)`), so we accept the minor side-effect.
+    #[test]
+    fn test_upgrade_cache_dir_returns_path() {


This unit test creates a real directory under the user/platform data dir, which can leave behind state and can fail in restricted CI environments. Also, std::env::set_var isn’t gated by deny(unsafe_code), so the comment about needing unsafe to isolate env looks incorrect. Consider making the test hermetic (e.g., set XDG_DATA_HOME/APPDATA to a temp dir in a serialized test), or mark it #[ignore]/remove it if it can’t be made isolated.

Suggested change

/// Note: this creates a real directory under the platform data dir.

/// Modifying env vars to isolate this requires `unsafe` (due to

/// `deny(unsafe_code)`), so we accept the minor side-effect.

#[test]

fn test_upgrade_cache_dir_returns_path() {

/// This test uses a temporary directory as the platform data dir by

/// setting the appropriate environment variable (e.g. `XDG_DATA_HOME` or

/// `APPDATA`) before calling `upgrade_cache_dir()`, to avoid touching the

/// real user data directory.

#[test]

fn test_upgrade_cache_dir_returns_path() {

// Use a dedicated temp directory as the platform data directory root.

let temp_root = std::env::temp_dir().join("saorsa_upgrade_cache_test");

let _ = std::fs::remove_dir_all(&temp_root);

std::fs::create_dir_all(&temp_root).expect("failed to create temp data dir root");

// Point the platform data dir to our temp root so that ProjectDirs

// resolves paths under it instead of the real user data directory.

#[cfg(target_os = "windows")]

{

std::env::set_var("APPDATA", &temp_root);

}

#[cfg(not(target_os = "windows"))]

{

std::env::set_var("XDG_DATA_HOME", &temp_root);

}

Copilot · 2026-03-12T22:24:37Z

src/upgrade/release_cache.rs

+    /// caller should fetch from the network.  The returned
+    /// The returned lock guard must be held until after writing the fresh
+    /// data so that other nodes block rather than all hitting the API.


Doc comment has a duplicated/unfinished sentence (“caller should fetch from the network. The returned The returned…”). Please remove the stray fragment so the rustdoc for lock_and_recheck is readable.

Suggested change

/// caller should fetch from the network. The returned

/// The returned lock guard must be held until after writing the fresh

/// data so that other nodes block rather than all hitting the API.

/// caller should fetch from the network. The returned lock guard must be

/// held until after writing the fresh data so that other nodes block

/// rather than all hitting the API.

Copilot · 2026-03-12T22:24:37Z

src/upgrade/apply.rs

+/// On Windows, `trigger_restart` exits with this code instead of using
+/// `exec()`.  The wrapping service (e.g. NSSM or Windows Service) should be


The RESTART_EXIT_CODE docs mention trigger_restart/exec(), but the code now uses prepare_restart and no longer calls exec(). Update this comment to reflect the current restart mechanism (spawn-and-exit vs service-manager exit codes).

Suggested change

/// On Windows, `trigger_restart` exits with this code instead of using

/// `exec()`. The wrapping service (e.g. NSSM or Windows Service) should be

/// When using a service manager–based restart flow (instead of spawning the

/// new binary directly), the process should exit with this code and rely on

/// the wrapping service (e.g. systemd, NSSM, or Windows Service) being

Copilot

Pull request overview

Copilot reviewed 21 out of 22 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/upgrade/release_cache.rs

+    ///
+    /// Returns `Ok(Some(releases))` if a valid cache was found under the
+    /// lock, or `Ok(None)` if the cache is still stale/missing and the
+    /// caller should fetch from the network.  The returned


src/node.rs

+fn jittered_interval(base: std::time::Duration) -> std::time::Duration {
+    let secs = base.as_secs();
+    let variance = secs / 20; // 5%
+    if variance == 0 {
+        return base;
+    }
+    let jitter = rand::thread_rng().gen_range(0..=variance * 2);
+    std::time::Duration::from_secs(secs.saturating_sub(variance) + jitter)
+}


src/node.rs

@@ -504,54 +527,102 @@ impl RunningNode {
        self.start_protocol_routing();

        // Start upgrade monitor if enabled


Copilot

Pull request overview

Copilot reviewed 21 out of 24 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/upgrade/release_cache.rs

+    /// caller should fetch from the network.  The returned
+    /// The returned lock guard must be held until after writing the fresh
+    /// data so that other nodes block rather than all hitting the API.


src/upgrade/apply.rs

+/// On Windows, `trigger_restart` exits with this code instead of using
+/// `exec()`.  The wrapping service (e.g. NSSM or Windows Service) should be
+/// configured to restart on this exit code.


src/upgrade/apply.rs

+            };
+
+            if cleaned.exists() {
+                // Canonicalize to an absolute path so that trigger_restart


src/upgrade/mod.rs

    /// Whether the current platform supports in-place auto-upgrade.
    ///
-    /// On Windows, replacing a running executable is typically blocked by file locks.
+    /// Supported on Unix (via `exec()`) and Windows (via `self-replace` crate).


src/node.rs

@@ -504,54 +527,102 @@ impl RunningNode {
        self.start_protocol_routing();

        // Start upgrade monitor if enabled


Add shared disk caches so multiple saorsa-node instances on the same machine avoid redundant GitHub API calls and binary downloads: - Release cache (release_cache.rs): TTL-based JSON cache with fs2 file locking. Only one node per host fetches from the API; others reuse the cached response. TTL set to 1 hour to match the default check interval. - Binary cache (binary_cache.rs): SHA-256 verified cache of extracted binaries with fs2 download locking. Only one node downloads and extracts; others copy the cached binary. - Shared cache directory (cache_dir.rs): ~/.local/share/saorsa/upgrades/ Add Windows auto-upgrade support: - Binary replacement via self-replace crate with retry/backoff - RESTART_EXIT_CODE (100) for service manager restart signalling - .exe suffix handling in extract_binary - auto_upgrade_supported() now returns true on all platforms Add log messages matching the antnode upgrade test verification pipeline: - "Node is running on port: {port}" - "Next upgrade check scheduled for {time}" - "Node will stop/restart for upgrade at {time}" - "Downloading saorsa-node binary..." - "Cached binary verified for version {version}" - "Using cached release info" / "No valid cache, fetching from API" - "Error during upgrade process: {reason}" - "Already running latest version" Strip pre-release suffix from MSI version in the release workflow so that WiX candle receives a valid 4-part numeric version. Update GitHub repo references from dirvine to saorsa-labs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The upgrade loop was calling `check_for_updates()` which bypasses the staged rollout delay calculation. Switch to `check_for_ready_upgrade()` so that nodes respect their deterministic rollout delay before applying upgrades, preventing mass simultaneous restarts. The UpgradeMonitor no longer needs Arc wrapping since only the single spawned upgrade task uses it; moving ownership in gives the required `&mut self` access for the stateful rollout tracking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Strip the " (deleted)" suffix that Linux appends to /proc/self/exe when another node has already replaced the on-disk binary during an upgrade. This prevents backup creation and binary replacement from targeting a nonexistent path. Add randomized jitter (0-10% of check interval) before the first upgrade check so nodes started simultaneously don't all race to the GitHub API on the first cycle. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Auto-upgrades are now unconditionally active rather than opt-in. The `--auto-upgrade` CLI flag and `SAORSA_AUTO_UPGRADE` env var are removed, along with the `enabled` field from `UpgradeConfig`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ades Add --stop-on-upgrade flag for service manager mode (systemd, launchd, Windows Service) where the node exits cleanly and lets the service manager restart it. Without the flag, standalone mode spawns the new binary as a child process before exiting. This fixes leaked inodes from exec(), ensures clean resource cleanup, and makes systemd aware of the restart lifecycle. - Add --stop-on-upgrade CLI flag and UpgradeConfig field - Replace trigger_restart: exit(0) on Unix / exit(100) on Windows for service manager mode; spawn+exit for standalone mode - Update systemd service: add --stop-on-upgrade, change Restart=always - Add service file to RPM package assets - Add "Running as a Service" README section (systemd/launchd/Windows) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Skip binary replacement when sibling service already upgraded (checks on-disk version via --version before backup/replace) - Distinguish "no upgrade available" from "upgrade pending" in log messages using time_until_upgrade() - Add 100ms sleep before process exit to flush tracing log buffer - Prefer argv[0] over current_exe() for symlink-aware path resolution Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The extract_binary() function unconditionally used GzDecoder (tar.gz), but Windows release assets are packaged as .zip files. This caused "invalid gzip header" errors on Windows during auto-upgrade extraction. Now detects archive format by magic bytes (gzip=1f8b, zip=504b) and dispatches to the appropriate extractor, supporting both formats on all platforms. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Remove sentinel `_failed_` path pattern; propagate errors properly from download_verify_extract and map to RolledBack in apply_upgrade - Make on_disk_version async with 5s timeout to avoid blocking tokio - Wire up BinaryCache::with_download_lock at the cache-miss call site to prevent concurrent downloads from multiple nodes - Apply jitter to every upgrade check interval, not just the first, to prevent fleet re-alignment thundering herd - Remove stale `enabled = true` from README upgrade config example - Fix formatting in apply.rs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Fix unreachable `(deleted)` suffix check in `current_binary_path()` by moving it before the `exists()` guard - Replace `std::process::exit()` in async context with graceful shutdown via CancellationToken, ensuring tokio runtime, LMDB, and file handles are properly cleaned up before exit - Move inline `use rand::Rng` to top-level imports and extract duplicated jitter calculation into `jittered_interval()` helper - Offload blocking `replace_binary` to `spawn_blocking` to avoid starving the tokio worker thread (especially on Windows with retry sleeps) - Add warning logs when `chrono::Duration::from_std()` falls back to default values Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

macOS App Nap aggressively coalesces timers and reduces CPU priority for background processes with no visible UI. This caused upgrade check timers to be deferred by 3-15+ minutes, making the auto-upgrade feature unreliable on macOS when running as a standalone process. Call NSProcessInfo.beginActivity(NSActivityUserInitiated) at startup to tell macOS this process is performing important user-initiated work. This prevents timer coalescing and CPU throttling for the lifetime of the process. Note: IOKit power assertions (PreventUserIdleSystemSleep) do NOT prevent App Nap — only the NSProcessInfo activity API works for this purpose. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

saorsa-core v0.17 changed MultiAddr::port() to return Option<u16>. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 21 out of 24 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/upgrade/release_cache.rs

+    ///
+    /// Returns `Ok(Some(releases))` if a valid cache was found under the
+    /// lock, or `Ok(None)` if the cache is still stale/missing and the
+    /// caller should fetch from the network.  The returned


src/upgrade/apply.rs

+/// Exit code that signals the service manager to restart the process.
+///
+/// On Windows, `trigger_restart` exits with this code instead of using
+/// `exec()`.  The wrapping service (e.g. NSSM or Windows Service) should be
+/// configured to restart on this exit code.


src/upgrade/apply.rs

+            };
+
+            if cleaned.exists() {
+                // Canonicalize to an absolute path so that trigger_restart


- Replace redundant closure with method reference for MultiAddr::port - Replace deprecated msg_send_id! with msg_send! (objc2 deprecation) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 21 out of 24 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/upgrade/release_cache.rs

+    ///
+    /// Returns `Ok(Some(releases))` if a valid cache was found under the
+    /// lock, or `Ok(None)` if the cache is still stale/missing and the
+    /// caller should fetch from the network.  The returned


src/upgrade/release_cache.rs

+        let now = SystemTime::now().duration_since(UNIX_EPOCH).ok()?.as_secs();
+        let age_secs = now.saturating_sub(cached.fetched_at_epoch_secs);
+        if age_secs >= self.ttl.as_secs() {


src/node.rs

+        if let Ok(cache_dir) = upgrade_cache_dir() {
+            monitor = monitor.with_release_cache(ReleaseCache::new(
+                cache_dir,
+                std::time::Duration::from_secs(3600),
+            ));


src/upgrade/apply.rs

+/// On Windows, `trigger_restart` exits with this code instead of using
+/// `exec()`.  The wrapping service (e.g. NSSM or Windows Service) should be


src/upgrade/apply.rs

+            let path_str = invoked.to_string_lossy();
+            let cleaned = if path_str.ends_with(" (deleted)") {
+                let stripped = path_str.trim_end_matches(" (deleted)");
+                debug!("Stripped '(deleted)' suffix from invoked path: {stripped}");
+                PathBuf::from(stripped)
+            } else {


src/upgrade/apply.rs

+        // Prefer the invoked path (argv[0]) if it exists, as it preserves symlinks.
+        // Fall back to current_exe() which resolves symlinks via /proc/self/exe.


Copilot AI review requested due to automatic review settings March 10, 2026 18:24

Copilot started reviewing on behalf of jacderida March 10, 2026 18:24 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

greptile-apps bot reviewed Mar 10, 2026

View reviewed changes

src/upgrade/binary_cache.rs Outdated Show resolved Hide resolved

src/upgrade/apply.rs Show resolved Hide resolved

src/upgrade/apply.rs Outdated Show resolved Hide resolved

src/node.rs Outdated Show resolved Hide resolved

jacderida force-pushed the feat-extend_automatic_upgrades branch from e86bca9 to 09500a1 Compare March 10, 2026 18:35

Copilot AI review requested due to automatic review settings March 10, 2026 19:16

Copilot started reviewing on behalf of jacderida March 10, 2026 19:17 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

src/upgrade/apply.rs Outdated Show resolved Hide resolved

src/upgrade/binary_cache.rs Outdated Show resolved Hide resolved

src/upgrade/monitor.rs Outdated Show resolved Hide resolved

src/node.rs Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

greptile-apps bot reviewed Mar 10, 2026

View reviewed changes

src/upgrade/apply.rs Outdated Show resolved Hide resolved

src/upgrade/apply.rs Show resolved Hide resolved

jacderida force-pushed the feat-extend_automatic_upgrades branch from 426bde1 to 99f6bdc Compare March 10, 2026 22:25

Copilot AI review requested due to automatic review settings March 10, 2026 22:31

jacderida force-pushed the feat-extend_automatic_upgrades branch from 99f6bdc to 5f4545a Compare March 10, 2026 22:31

Copilot started reviewing on behalf of jacderida March 10, 2026 22:33 View session

jacderida force-pushed the feat-extend_automatic_upgrades branch from 5f4545a to ab3ecdb Compare March 10, 2026 22:35

Copilot AI reviewed Mar 10, 2026

View reviewed changes

src/upgrade/cache_dir.rs Show resolved Hide resolved

src/upgrade/apply.rs Outdated Show resolved Hide resolved

src/upgrade/release_cache.rs Outdated Show resolved Hide resolved

src/upgrade/binary_cache.rs Outdated Show resolved Hide resolved

jacderida force-pushed the feat-extend_automatic_upgrades branch from ab3ecdb to 0e0d136 Compare March 10, 2026 22:55

Copilot AI review requested due to automatic review settings March 11, 2026 00:18

Copilot started reviewing on behalf of jacderida March 11, 2026 00:18 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

config/production.toml Show resolved Hide resolved

docs/DESIGN.md Show resolved Hide resolved

src/upgrade/release_cache.rs Show resolved Hide resolved

src/config.rs Outdated Show resolved Hide resolved

jacderida force-pushed the feat-extend_automatic_upgrades branch from 4180c36 to 0e0d136 Compare March 11, 2026 21:50

grumbach previously requested changes Mar 12, 2026

View reviewed changes

grumbach reviewed Mar 12, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings March 12, 2026 22:18

Copilot started reviewing on behalf of jacderida March 12, 2026 22:19 View session

Copilot AI reviewed Mar 12, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings March 14, 2026 21:10

jacderida force-pushed the feat-extend_automatic_upgrades branch from 96b1411 to 06909cb Compare March 14, 2026 21:10

Copilot started reviewing on behalf of jacderida March 14, 2026 21:10 View session

Copilot AI reviewed Mar 14, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings March 17, 2026 16:32

jacderida force-pushed the feat-extend_automatic_upgrades branch from 37cc2a1 to 6e0ebae Compare March 17, 2026 16:32

Copilot started reviewing on behalf of jacderida March 17, 2026 16:33 View session

Copilot AI reviewed Mar 17, 2026

View reviewed changes

jacderida and others added 10 commits March 18, 2026 22:22

jacderida force-pushed the feat-extend_automatic_upgrades branch from 6e0ebae to 4ab01f1 Compare March 19, 2026 01:02

fix: adapt listen port extraction for MultiAddr::port() returning Option

a80a71e

saorsa-core v0.17 changed MultiAddr::port() to return Option<u16>. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings March 19, 2026 01:04

jacderida force-pushed the feat-extend_automatic_upgrades branch from 4ab01f1 to a80a71e Compare March 19, 2026 01:04

Copilot AI reviewed Mar 19, 2026

View reviewed changes

grumbach approved these changes Mar 19, 2026

View reviewed changes

fix: address CI failures for clippy and macOS build

f6a24c8

- Replace redundant closure with method reference for MultiAddr::port - Replace deprecated msg_send_id! with msg_send! (objc2 deprecation) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings March 19, 2026 16:51

jacderida force-pushed the feat-extend_automatic_upgrades branch from 3127a30 to f6a24c8 Compare March 19, 2026 16:51

Copilot started reviewing on behalf of jacderida March 19, 2026 16:51 View session

Copilot AI reviewed Mar 19, 2026

View reviewed changes

jacderida merged commit 1d3f7f3 into WithAutonomi:main Mar 19, 2026
14 checks passed

-    /// Note: this creates a real directory under the platform data dir.
-    /// Modifying env vars to isolate this requires `unsafe` (due to
-    /// `deny(unsafe_code)`), so we accept the minor side-effect.
-    #[test]
-    fn test_upgrade_cache_dir_returns_path() {
+    /// This test uses a temporary directory as the platform data dir by
+    /// setting the appropriate environment variable (e.g. `XDG_DATA_HOME` or
+    /// `APPDATA`) before calling `upgrade_cache_dir()`, to avoid touching the
+    /// real user data directory.
+    #[test]
+    fn test_upgrade_cache_dir_returns_path() {
+        // Use a dedicated temp directory as the platform data directory root.
+        let temp_root = std::env::temp_dir().join("saorsa_upgrade_cache_test");
+        let _ = std::fs::remove_dir_all(&temp_root);
+        std::fs::create_dir_all(&temp_root).expect("failed to create temp data dir root");
+        // Point the platform data dir to our temp root so that ProjectDirs
+        // resolves paths under it instead of the real user data directory.
+        #[cfg(target_os = "windows")]
+        {
+            std::env::set_var("APPDATA", &temp_root);
+        }
+        #[cfg(not(target_os = "windows"))]
+        {
+            std::env::set_var("XDG_DATA_HOME", &temp_root);
+        }

		/// On Windows, `trigger_restart` exits with this code instead of using
		/// `exec()`. The wrapping service (e.g. NSSM or Windows Service) should be

-/// On Windows, `trigger_restart` exits with this code instead of using
-/// `exec()`.  The wrapping service (e.g. NSSM or Windows Service) should be
+/// When using a service manager–based restart flow (instead of spawning the
+/// new binary directly), the process should exit with this code and rely on
+/// the wrapping service (e.g. systemd, NSSM, or Windows Service) being

		@@ -504,54 +527,102 @@ impl RunningNode {
		self.start_protocol_routing();

		// Start upgrade monitor if enabled

		// Prefer the invoked path (argv[0]) if it exists, as it preserves symlinks.
		// Fall back to current_exe() which resolves symlinks via /proc/self/exe.

Conversation

jacderida commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (2)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jacderida commented Mar 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

grumbach left a comment

Choose a reason for hiding this comment

PR #22 Review: Extend Automatic Upgrades

Must Fix

Should Fix

Consider for Follow-up

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jacderida commented Mar 10, 2026 •

edited

Loading

greptile-apps bot commented Mar 10, 2026 •

edited

Loading