Skip to content

DAOS-18086 control: Fix racy logic in unit test#18090

Merged
daltonbohning merged 1 commit intomasterfrom
kjacque/drpc-utest-race-part-deux
Apr 23, 2026
Merged

DAOS-18086 control: Fix racy logic in unit test#18090
daltonbohning merged 1 commit intomasterfrom
kjacque/drpc-utest-race-part-deux

Conversation

@kjacque
Copy link
Copy Markdown
Contributor

@kjacque kjacque commented Apr 22, 2026

The mock for this test wasn't updated when we switched to dRPC message chunking, so the mocked sessions were failing and being removed. Whether they were removed early enough for the test to detect was intermittent.

  • Populate valid data in the mockConn used for each mock session. This ensures the sessions remain "open" until the context is canceled (i.e. the test ends).
  • Add a brief sleep in the test to allow for the goroutines to either die or remain stable. This allowed the test to reproduce the bug 100% of the time.
  • Update to a table-based test format for the affected tests.

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

The mock for this test wasn't updated when we switched to dRPC
message chunking, so the mocked sessions were failing and being
removed. Whether they were removed early enough for the test to
detect was intermittent.

- Populate valid data in the mockConn used for each mock session.
  This ensures the sessions remain "open" until the context is
  canceled (i.e. the test ends).
- Add a brief sleep in the test to allow for the goroutines to
  either die or remain stable. This allowed the test to reproduce
  the bug 100% of the time.
- Update to a table-based test format for the affected tests.

Signed-off-by: Kris Jacque <kris.jacque@hpe.com>
@kjacque kjacque requested review from knard38, liw, mjmac and tanabarr April 22, 2026 15:41
@kjacque kjacque self-assigned this Apr 22, 2026
@kjacque kjacque requested review from a team as code owners April 22, 2026 15:41
@kjacque
Copy link
Copy Markdown
Contributor Author

kjacque commented Apr 22, 2026

This remains a unit-test-only fix.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 22, 2026

Ticket title is 'Unit Tests / Unit Test on EL 8.8 / UTEST_control.drpc.TestServer_Listen_AcceptConnection: data race'
Status is 'Awaiting backport'
Job should run at elevated priority (1)
https://daosio.atlassian.net/browse/DAOS-18086

@github-actions github-actions Bot added the priority Ticket has high priority (automatically managed) label Apr 22, 2026
Copy link
Copy Markdown
Contributor

@knard38 knard38 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kjacque kjacque added the forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed. label Apr 23, 2026
@kjacque
Copy link
Copy Markdown
Contributor Author

kjacque commented Apr 23, 2026

Fault injection test setup failure is a known issue being worked in CI. This patch affects Go unit tests only, and they passed.

For local testing, I did 100 runs of this test locally, had 100% bug reproduction with the fix to the tests, but without the fix to the mock. After the mock fix, had 0% bug reproduction.

@kjacque kjacque requested a review from a team April 23, 2026 15:09
@daltonbohning daltonbohning merged commit a75ec4d into master Apr 23, 2026
43 of 44 checks passed
@daltonbohning daltonbohning deleted the kjacque/drpc-utest-race-part-deux branch April 23, 2026 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed. priority Ticket has high priority (automatically managed)

Development

Successfully merging this pull request may close these issues.

5 participants