Skip to content

Fix concurrent map access race in network-ports module and bump version to 2.0.5#130

Merged
jbarciabf merged 3 commits into
mainfrom
fix/network-ports-map-race
May 26, 2026
Merged

Fix concurrent map access race in network-ports module and bump version to 2.0.5#130
jbarciabf merged 3 commits into
mainfrom
fix/network-ports-map-race

Conversation

@jbarciabf
Copy link
Copy Markdown
Collaborator

Summary

Fixes the concurrent map read/write panic in NetworkPortsModule reported in #129, hardens worker-goroutine counter access fleet-wide, updates GitHub Actions for Node 24 compatibility, and bumps version to 2.0.5.

Bug 1: Concurrent map read/write panic in network-ports (Fixes #129)

NetworkPortsModule cached EC2 NACLs and Security Groups in two per-region maps. The cache lookup read the maps without holding a lock while the cache population path took a write lock, producing a concurrent map read and map write runtime panic when multiple region goroutines hit the cache check concurrently. Reproduced reliably in low-density AWS accounts where every region goroutine raced through the cache check at the same instant.

Fix: take the read lock on the cache fast-path in both getEC2NACLsPerRegion and getEC2SecurityGroupsPerRegion. naclsMutex and securityGroupsMutex moved from package-level vars to struct fields so concurrent module instances stop serializing on each other's locks.

Bug 2: Torn integer writes in CommandCounter

CommandCounter used plain int fields with naked ++/-- from worker goroutines, while SpinUntil read them concurrently. Plain int increments are not atomic in Go: lost updates and torn reads were possible across both AWS and GCP modules, producing inconsistent progress display and drifted Error counts. Not a panic, but a silent correctness bug.

Fix: convert CommandCounter fields to int64 with Incr*/Decr*/Load* methods backed by atomic.AddInt64 and atomic.LoadInt64. Plain int64 storage keeps the struct copyable so existing constructors and helpers that take modules by value continue to vet-clean (no noCopy fanout). 1087 call sites updated across aws/, gcp/commands/, and internal/gcp/.

Bug 3: Stale GitHub Actions before Node 24 enforcement

actions/checkout v4, actions/setup-go v4, and marvinpinto/action-automatic-releases all run on Node 20, which becomes unsupported on June 2, 2026.

Fix: upgrade actions/checkout to v6, actions/setup-go to v5, go-version to ^1.25 (matches project), and replace the abandoned marvinpinto/action-automatic-releases with softprops/action-gh-release@v2.

Testing

  • go build ./... clean
  • go vet ./... warning count unchanged from main (zero new warnings)
  • go test -race -run TestNetworkPorts ./aws/ passes
  • Regression test added (TestNetworkPortsCacheConcurrentAccess): with the read lock removed, go test -race reports the mapaccess2_faststr read racing the locked write; with the fix in place the test passes

jbarciabf added 3 commits May 26, 2026 10:30
- actions/checkout v4 -> v6
- actions/setup-go v4 -> v5
- go-version ^1.20 -> ^1.25
- Replace abandoned marvinpinto/action-automatic-releases with
  softprops/action-gh-release v2

Node.js 20 actions are deprecated and will be forced to Node 24
starting June 2, 2026, and removed September 16, 2026.
The NetworkPortsModule cached EC2 NACLs and Security Groups in two
per-region maps. The cache lookup read the maps without holding a
lock while the cache population path took a write lock, producing a
"concurrent map read and map write" runtime panic when multiple
region goroutines hit the cache check concurrently. The panic
reproduced reliably in low-density AWS accounts where every region
goroutine raced through the cache check at the same instant.

Take the read lock on the cache fast-path in both
getEC2NACLsPerRegion and getEC2SecurityGroupsPerRegion. Move
naclsMutex and securityGroupsMutex from package-level vars onto the
module struct so concurrent module instances stop serializing on
each other's locks.

Convert CommandCounter from plain int fields with naked ++/-- to
int64 fields with Incr/Decr/Load methods that use atomic.AddInt64
and atomic.LoadInt64 internally. Worker goroutines and the spinner
goroutine were updating and reading the same counters without
synchronization, producing torn integer writes that could show as
off-by-one progress in the spinner. Plain int64 storage keeps the
struct copyable so existing constructors and helpers that take
modules by value continue to vet-clean.

Add TestNetworkPortsCacheConcurrentAccess as a regression test.
With the read lock removed, "go test -race" reports the
mapaccess2_faststr read racing the locked write; with the fix in
place the test passes.
@jbarciabf jbarciabf requested a review from bishopfaure as a code owner May 26, 2026 14:45
@jbarciabf jbarciabf merged commit ba4ff47 into main May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

panic: concurrent map read and map write in NetworkPortsModule (getECSNetworkPortsPerRegion → getEC2NACLsPerRegion)

1 participant