Fix concurrent map access race in network-ports module and bump version to 2.0.5#130
Merged
Conversation
- actions/checkout v4 -> v6 - actions/setup-go v4 -> v5 - go-version ^1.20 -> ^1.25 - Replace abandoned marvinpinto/action-automatic-releases with softprops/action-gh-release v2 Node.js 20 actions are deprecated and will be forced to Node 24 starting June 2, 2026, and removed September 16, 2026.
The NetworkPortsModule cached EC2 NACLs and Security Groups in two per-region maps. The cache lookup read the maps without holding a lock while the cache population path took a write lock, producing a "concurrent map read and map write" runtime panic when multiple region goroutines hit the cache check concurrently. The panic reproduced reliably in low-density AWS accounts where every region goroutine raced through the cache check at the same instant. Take the read lock on the cache fast-path in both getEC2NACLsPerRegion and getEC2SecurityGroupsPerRegion. Move naclsMutex and securityGroupsMutex from package-level vars onto the module struct so concurrent module instances stop serializing on each other's locks. Convert CommandCounter from plain int fields with naked ++/-- to int64 fields with Incr/Decr/Load methods that use atomic.AddInt64 and atomic.LoadInt64 internally. Worker goroutines and the spinner goroutine were updating and reading the same counters without synchronization, producing torn integer writes that could show as off-by-one progress in the spinner. Plain int64 storage keeps the struct copyable so existing constructors and helpers that take modules by value continue to vet-clean. Add TestNetworkPortsCacheConcurrentAccess as a regression test. With the read lock removed, "go test -race" reports the mapaccess2_faststr read racing the locked write; with the fix in place the test passes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the concurrent map read/write panic in
NetworkPortsModulereported in #129, hardens worker-goroutine counter access fleet-wide, updates GitHub Actions for Node 24 compatibility, and bumps version to 2.0.5.Bug 1: Concurrent map read/write panic in network-ports (Fixes #129)
NetworkPortsModulecached EC2 NACLs and Security Groups in two per-region maps. The cache lookup read the maps without holding a lock while the cache population path took a write lock, producing aconcurrent map read and map writeruntime panic when multiple region goroutines hit the cache check concurrently. Reproduced reliably in low-density AWS accounts where every region goroutine raced through the cache check at the same instant.Fix: take the read lock on the cache fast-path in both
getEC2NACLsPerRegionandgetEC2SecurityGroupsPerRegion.naclsMutexandsecurityGroupsMutexmoved from package-level vars to struct fields so concurrent module instances stop serializing on each other's locks.Bug 2: Torn integer writes in CommandCounter
CommandCounterused plainintfields with naked++/--from worker goroutines, whileSpinUntilread them concurrently. Plain int increments are not atomic in Go: lost updates and torn reads were possible across both AWS and GCP modules, producing inconsistent progress display and drifted Error counts. Not a panic, but a silent correctness bug.Fix: convert
CommandCounterfields toint64withIncr*/Decr*/Load*methods backed byatomic.AddInt64andatomic.LoadInt64. Plainint64storage keeps the struct copyable so existing constructors and helpers that take modules by value continue to vet-clean (nonoCopyfanout). 1087 call sites updated acrossaws/,gcp/commands/, andinternal/gcp/.Bug 3: Stale GitHub Actions before Node 24 enforcement
actions/checkoutv4,actions/setup-gov4, andmarvinpinto/action-automatic-releasesall run on Node 20, which becomes unsupported on June 2, 2026.Fix: upgrade
actions/checkoutto v6,actions/setup-goto v5,go-versionto ^1.25 (matches project), and replace the abandonedmarvinpinto/action-automatic-releaseswithsoftprops/action-gh-release@v2.Testing
go build ./...cleango vet ./...warning count unchanged frommain(zero new warnings)go test -race -run TestNetworkPorts ./aws/passesTestNetworkPortsCacheConcurrentAccess): with the read lock removed,go test -racereports themapaccess2_faststrread racing the locked write; with the fix in place the test passes