fix: make :MISSION:SAVE: async to stop blocking ArmA main thread#159
fix: make :MISSION:SAVE: async to stop blocking ArmA main thread#159
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the mission saving process to be asynchronous, moving the logic from the main dispatcher into a dedicated worker in mission_save.go. This change prevents the application from blocking during disk I/O or network uploads and introduces a state-managed worker that prevents concurrent save operations. Additionally, a testing hook was added to the a3interface package to allow for intercepting and verifying extension callbacks in unit tests. One piece of feedback was provided to ensure idiomatic context management by using defer cancel().
Merging this branch will not change overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. Changed unit test files
|
Summary
:MISSION:SAVE:now returns immediately withqueuedand runs the export + upload in a goroutine.:MISSION:SAVED:extension callback:[ok|partial|error, path, errorDetail].EndMission/ encoding / upload are recovered; the ArmA host stays alive.OCAP2/addon(fix/async-mission-saved-callback).Why
Investigation of a user crash showed
:MISSION:SAVE:blocks the ArmA main thread for 13-30+s whileEndMissionbuilds and serializes the v1 export. On larger missions this is long enough for the OS OOM killer or a watchdog to terminate the server, leaving a 0-byte recording. Making the handler async is the minimum fix that prevents any save-duration from being a crash trigger.Investigation details: a 2.5hr / 22-player successful run produced an 834 MB uncompressed JSON with 11.45 million position entries; save held the dispatcher for 13s and upload held it for another 19s. A 30-player / 2hr Zeus session with 10Hz capture scales this up and has been crashing the host with a 0-byte output file — consistent with the process being killed between
os.Createand the first gzip flush.Test plan
go test ./...- all green:MISSION:SAVED:fires in the logsapi.serverUrl, confirm save still completes locally and apartialcallback is emitted