Skip to content

fix: make :MISSION:SAVE: async to stop blocking ArmA main thread#159

Merged
fank merged 4 commits intomainfrom
fix/async-mission-save
Apr 12, 2026
Merged

fix: make :MISSION:SAVE: async to stop blocking ArmA main thread#159
fank merged 4 commits intomainfrom
fix/async-mission-save

Conversation

@fank
Copy link
Copy Markdown
Member

@fank fank commented Apr 11, 2026

Summary

  • :MISSION:SAVE: now returns immediately with queued and runs the export + upload in a goroutine.
  • A single flag prevents concurrent saves.
  • Completion is reported via a new :MISSION:SAVED: extension callback: [ok|partial|error, path, errorDetail].
  • Panics in EndMission / encoding / upload are recovered; the ArmA host stays alive.
  • Matching addon changes land in a parallel PR on OCAP2/addon (fix/async-mission-saved-callback).

Why

Investigation of a user crash showed :MISSION:SAVE: blocks the ArmA main thread for 13-30+s while EndMission builds and serializes the v1 export. On larger missions this is long enough for the OS OOM killer or a watchdog to terminate the server, leaving a 0-byte recording. Making the handler async is the minimum fix that prevents any save-duration from being a crash trigger.

Investigation details: a 2.5hr / 22-player successful run produced an 834 MB uncompressed JSON with 11.45 million position entries; save held the dispatcher for 13s and upload held it for another 19s. A 30-player / 2hr Zeus session with 10Hz capture scales this up and has been crashing the host with a 0-byte output file — consistent with the process being killed between os.Create and the first gzip flush.

Test plan

  • go test ./... - all green
  • Manual: start a short mission on a dev server, call save, observe ArmA remains responsive and :MISSION:SAVED: fires in the logs
  • Manual: point the addon at a dead api.serverUrl, confirm save still completes locally and a partial callback is emitted

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the mission saving process to be asynchronous, moving the logic from the main dispatcher into a dedicated worker in mission_save.go. This change prevents the application from blocking during disk I/O or network uploads and introduces a state-managed worker that prevents concurrent save operations. Additionally, a testing hook was added to the a3interface package to allow for intercepting and verifying extension callbacks in unit tests. One piece of feedback was provided to ensure idiomatic context management by using defer cancel().

@github-actions
Copy link
Copy Markdown

Merging this branch will not change overall coverage

Impacted Packages Coverage Δ 🤖
github.com/OCAP2/extension/v5/cmd/ocap_recorder 0.00% (ø)
github.com/OCAP2/extension/v5/pkg/a3interface 0.00% (ø)

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/OCAP2/extension/v5/cmd/ocap_recorder/main.go 0.00% (ø) 0 0 0
github.com/OCAP2/extension/v5/cmd/ocap_recorder/mission_save.go 0.00% (ø) 0 0 0
github.com/OCAP2/extension/v5/pkg/a3interface/extensioncallback.go 0.00% (ø) 0 0 0

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

  • github.com/OCAP2/extension/v5/cmd/ocap_recorder/mission_save_test.go
  • github.com/OCAP2/extension/v5/pkg/a3interface/extensioncallback_test.go

@fank fank merged commit d09a1d5 into main Apr 12, 2026
3 checks passed
@fank fank deleted the fix/async-mission-save branch April 12, 2026 10:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant