Skip to content

refactor(e2e): replace bash e2e tests with Rust integration tests that invoke the CLI #116

@drew

Description

@drew

Problem

We currently have 3 bash e2e test scripts in e2e/bash/ that test CLI functionality by shelling out to the nemoclaw binary:

Script What it tests
test_sandbox_custom_image.sh Custom Dockerfile build + sandbox creation via --from
test_sandbox_sync.sh Bidirectional file sync (directories, single files, large files with checksum verification)
test_port_forward.sh TCP port forwarding through a sandbox via sandbox forward start

These bash tests are brittle, hard to maintain, and inconsistent with the rest of the test suite which is written in Rust and Python. They rely on hand-rolled helpers (strip_ansi(), poll loops, trap-based cleanup) that would be better served by Rust's type system, assert! macros, and RAII cleanup.

Proposed Solution

Replace all 3 bash e2e test scripts with Rust integration tests that invoke the nemoclaw CLI binary as a subprocess (using std::process::Command or assert_cmd). The new tests should live in crates/navigator-cli/tests/ alongside the existing Rust integration tests (provider_commands_integration.rs, mtls_integration.rs).

Key design decisions

  • Invoke the actual binary (cargo build artifact or env!("CARGO_BIN_EXE_nemoclaw")) rather than calling library functions directly — these are true e2e tests that should exercise the full CLI entrypoint.
  • Use assert_cmd (or std::process::Command + helpers) for ergonomic subprocess assertions.
  • Use tempfile (already a dev-dependency) for temp directories and cleanup via RAII/Drop.
  • Port all test scenarios faithfully — each bash test has specific edge cases that must be preserved:
    • Custom image: Dockerfile build, marker file verification in sandbox output
    • Sync: nested directories, single-file mode, large file (~512 KiB) with SHA-256 checksum + size verification, multi-chunk ordering
    • Port forward: background process management, TCP echo server, retry logic for tunnel readiness

Tasks

  • Add assert_cmd as a dev-dependency for navigator-cli
  • Create crates/navigator-cli/tests/e2e_custom_image.rs — port test_sandbox_custom_image.sh
  • Create crates/navigator-cli/tests/e2e_sync.rs — port test_sandbox_sync.sh (all 5 steps)
  • Create crates/navigator-cli/tests/e2e_port_forward.rs — port test_port_forward.sh
  • Extract shared helpers (binary resolution, ANSI stripping, sandbox cleanup) into a shared test utility module (e.g., crates/navigator-cli/tests/common/mod.rs)
  • Update tasks/test.toml — replace bash task definitions (test:e2e:custom-image, test:e2e:sync, test:e2e:port-forward) to run the new Rust tests (e.g., via cargo test --test e2e_*)
  • Delete e2e/bash/ directory and all 3 bash scripts
  • Verify all new tests pass against a running cluster (mise run e2e)

Acceptance Criteria

  • All 3 bash e2e test scripts are deleted
  • Equivalent Rust integration tests exist that invoke the nemoclaw binary as a subprocess
  • All test scenarios from the bash scripts are faithfully ported (no coverage regression)
  • Shared test helpers are extracted to avoid duplication
  • tasks/test.toml is updated so mise run tasks point to the new Rust tests
  • All new tests pass in CI

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions