Skip to content

Windows: investigate restart_service crash blocking component-load test paths #549

@heskew

Description

@heskew

Context

While trying to expand Windows integration test coverage (#525, #546, #547), the actual underlying blocker came into focus. Multiple integration test files have per-test skips for restart_service operations with the same explanation:

'Windows: Skipping restart_service to avoid crash'

Found in:

  • integrationTests/apiTests/tests/23_blob.mjs (2 occurrences)
  • Likely elsewhere — grep -rE "Windows.*Skipping restart_service|restart_service.*crash" for the full set

These skips look surgical and isolated but they're load-bearing in suite flow: any test that needs Harper to pick up a newly-written component file depends on restart_service to actually load it. With the operation skipped, downstream tests fail with "schema not found" / "component not loaded" cascades.

This was the latent root cause behind #546/#547. The outer suite-level skip on 23_blob.mjs ({ skip: process.platform === 'win32' }) is effectively a workaround for this — undocumented, but real.

See #547 (closed) for the cascade trace: the file-write portion of the blob suite passes on Windows, then "Confirm Blob schema and table created" fails because the schema isn't there, because the component was never loaded, because restart_service was skipped.

Goal

Make restart_service work on Windows, OR replace the operation in the test paths with a Windows-compatible alternative that achieves the same effect (Harper picks up a newly-written component).

Approach (suggested)

  1. Reproduce the crash locally on Windows. Run a minimal test that calls restart_service (e.g., one of the currently-skipped tests in 23_blob.mjs). Capture the actual crash signature — stack trace, exit code, hdb.log output.
  2. Identify the failure mode. Likely candidates based on Kris's prior Windows fixes:
    • File-locking / EPERM during the restart sequence (7823d97a "workaround database drop crashes on windows" suggests this pattern is present)
    • Worker-process spawn ordering specific to Windows process semantics
    • Socket / pipe handling for IPC during the restart handshake
    • Database file handles not released cleanly across the restart
  3. Fix or work around. Either fix the crash in Harper itself, or introduce a Windows-specific code path (e.g., separate-process restart, retry-with-backoff on EPERM, explicit file-handle release before restart).
  4. Un-skip the affected tests. Once restart_service works on Windows, remove the 'Windows: Skipping restart_service to avoid crash' skips. The outer suite-level skip on 23_blob.mjs becomes removable as a follow-up — that work is well-covered by Windows integration tests: investigate and reduce skips in 23_blob.mjs #546's framing, just gated on this issue first.

Success criteria

  • A Windows integration test can successfully call restart_service and observe the newly-loaded component active after the call.
  • The 'Windows: Skipping restart_service to avoid crash' skips can be removed from 23_blob.mjs (and any other files with the same pattern).
  • run-integration-apiTests-windows CI job continues to pass.

Out of scope

References

(🤖)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions