Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: TestDistSQLReceiverReportsContention/contention=false is flakey #86112

Closed
ajwerner opened this issue Aug 15, 2022 · 4 comments · Fixed by #87115
Closed

sql: TestDistSQLReceiverReportsContention/contention=false is flakey #86112

ajwerner opened this issue Aug 15, 2022 · 4 comments · Fixed by #87115
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@ajwerner
Copy link
Contributor

ajwerner commented Aug 15, 2022

Describe the problem

Here's a build where it failed

To Reproduce
Run ./dev test pkg/sql --filter 'TestDistSQLReceiverReportsContention' --stress -v --show-logs 2>&1 | tee out.

TestDistSQLReceiverReportsContention.txt

Jira issue: CRDB-18579

@ajwerner ajwerner added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Aug 15, 2022
@matthewtodd matthewtodd self-assigned this Aug 19, 2022
@ajwerner
Copy link
Contributor Author

@matthewtodd
Copy link
Contributor

Bisecting on a gceworker late last week showed 8303dda, from #81995, as the commit that introduced the flake:

git bisect start
git bisect good dc32c9248e5d730216653d06652d8ee045f76828
git bisect bad 21365bf129f9eb618fcf23999129d82794d76f88
git bisect run ./dev test pkg/sql:sql_test --filter 'TestDistSQLReceiverReportsContention' --stress -v --show-logs --timeout=10m

I'll keep looking, but do you have any thoughts, @yuzefovich?

@yuzefovich
Copy link
Member

My guess is that some internal query experienced contention. I think we probably want to disable sampling altogether with SET CLUSTER SETTING sql.txn_stats.sample_rate = 0 so that only the our query gets a trace - then it wouldn't matter if there is contention on internal queries since the tracing would be disabled for them and there would be no way to propagate the contention events.

@matthewtodd
Copy link
Contributor

Thanks, @yuzefovich, fixed in #87115.

craig bot pushed a commit that referenced this issue Aug 30, 2022
84620: admin: tenant support jobs endpoints r=matthewtodd,knz,ericharmeling a=dhartunian

Previously, the Job() and Jobs() endpoints were not implemented on the
tenant admin server as there was not a need and we had split the
implementation into a tenant-only server.

Now that we want to ship the jobs page into CC Console and support it
for multi-tenant, the endpoints for the UI should work as expected.

This PR edits the `jobHelper` and `jobsHelper` helpers to be functions
instead of methods in `adminServer` which allows the implementation to
be shared between the two servers.

Fixes #84621.

Release note: None

Release justification: low risk, high benefit addition to multi-tenant

87115: sql: deflake TestDistSQLReceiverReportsContention r=matthewtodd a=matthewtodd

Fixes #86112.

Turning off sampling altogether means we'll only trace the query of
interest, avoiding flakes when internal queries also experience
contention and happen to get sampled.

Before this change,
`TestDistSQLReceiverReportsContention/contention=false` would reliably
fail under stress[^1]; now it succeeds.

[^1]: Run as:
    ```
    ./dev test pkg/sql:sql_test \
      --filter 'TestDistSQLReceiverReportsContention' \
      --stress -v --show-logs --timeout=10m
    ```

Release justification: Category 1: Non-production code changes

Release note: None

Co-authored-by: David Hartunian <davidh@cockroachlabs.com>
Co-authored-by: Matthew Todd <todd@cockroachlabs.com>
@craig craig bot closed this as completed in acfd53f Aug 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants