Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/sql/sqlstats/insights/integration/integration_test: TestInsightsIntegrationForContention failed #108368

Closed
cockroach-teamcity opened this issue Aug 8, 2023 · 5 comments · Fixed by #108479
Assignees
Labels
branch-master Failures on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Aug 8, 2023

pkg/sql/sqlstats/insights/integration/integration_test.TestInsightsIntegrationForContention failed with artifacts on master @ 4c397610e416cd81c839ec2f8b7cbc5ac51f8322:

=== RUN   TestInsightsIntegrationForContention
    test_log_scope.go:167: test logs captured to: /artifacts/tmp/_tmp/8db7e2fbdb77cb910d7e66552998132e/logTestInsightsIntegrationForContention4598468
    test_log_scope.go:81: use -show-logs to present logs inline
    test_server_shim.go:98: 
        Test server was configured to route SQL queries to a secondary tenant (virtual cluster).
        If you are only seeing a test failure when this message appears, there may be a problem
        specific to cluster virtualization or multi-tenancy.
        
        To investigate, consider using "COCKROACH_TEST_TENANT=true" to force-enable just
        the secondary tenant in all runs (or, alternatively, "false" to force-disable), or use
        "COCKROACH_INTERNAL_DISABLE_METAMORPHIC_TESTING=false" to disable all random test variables altogether.
*
* WARNING: test tenant requested by configuration, but code organization prevents start!
* OSS binaries do not include enterprise features
*
    insights_test.go:566: condition failed to evaluate within 5s: node_execution_insights did not return any rows
    panic.go:522: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/8db7e2fbdb77cb910d7e66552998132e/logTestInsightsIntegrationForContention4598468
--- FAIL: TestInsightsIntegrationForContention (8.50s)
Help

See also: How To Investigate a Go Test Failure (internal)

Same failure on other branches

/cc @cockroachdb/cluster-observability

This test on roachdash | Improve this report!

Jira issue: CRDB-30441

@cockroach-teamcity cockroach-teamcity added branch-master Failures on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-cluster-observability labels Aug 8, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.2 milestone Aug 8, 2023
@xinhaoz xinhaoz self-assigned this Aug 8, 2023
@xinhaoz xinhaoz moved this from Triage to Active Issues in Cluster Observability Aug 8, 2023
@j82w
Copy link
Contributor

j82w commented Aug 8, 2023

FYI: not sure if this was before or after the change got merged: #108230

@xinhaoz
Copy link
Member

xinhaoz commented Aug 8, 2023

@j82w it looks like this failed on master with those changes :(
I'm taking a look now - let me know if you have any ideas off the top of your head on the cause!

@j82w
Copy link
Contributor

j82w commented Aug 8, 2023

My current theory is a race condition exists with secondary tenants. The only suggestions I have are

  1. Query the transaction_contention_events to see if the contention event exists.
  2. If possible to capture the trace events from the KV layer and validate the KV layer firing the contention event.

@j82w
Copy link
Contributor

j82w commented Aug 8, 2023

I have not been able to reproduce this issue locally. I don't think @koorosh was able to either.

@cockroach-teamcity
Copy link
Member Author

pkg/sql/sqlstats/insights/integration/integration_test.TestInsightsIntegrationForContention failed with artifacts on master @ 0d110cd57dc85944b3c74bf21546f789b285e509:

=== RUN   TestInsightsIntegrationForContention
    test_log_scope.go:167: test logs captured to: /artifacts/tmp/_tmp/8db7e2fbdb77cb910d7e66552998132e/logTestInsightsIntegrationForContention362415568
    test_log_scope.go:81: use -show-logs to present logs inline
    insights_test.go:566: condition failed to evaluate within 5s: node_execution_insights did not return any rows
    panic.go:522: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/8db7e2fbdb77cb910d7e66552998132e/logTestInsightsIntegrationForContention362415568
--- FAIL: TestInsightsIntegrationForContention (11.58s)
Help

See also: How To Investigate a Go Test Failure (internal)

Same failure on other branches

This test on roachdash | Improve this report!

xinhaoz added a commit to xinhaoz/cockroach that referenced this issue Aug 9, 2023
`TestInsightsIntegrationForContention` is flaky due to checking for an
expected contention insight event that in rare cases may not be generated.
Previously, this test was not attempting to enforce that the blocking
transaction blocks the expected query for the required amount of time. The
test sets the insight latency threshold to 30ms, however in rare cases
it's possible that the waiting query finishes executing in that time.

This fix adds a wait group s.t. the blocking txn will wait for the waiting
txn to begin executing, then sleep for a duration of 500ms.  This should
ensure we hit the 30ms threshold required by insights.

Epic: none
Fixes: cockroachdb#108368

Release note: None
xinhaoz added a commit to xinhaoz/cockroach that referenced this issue Aug 10, 2023
`TestInsightsIntegrationForContention` is flaky due to checking for an
expected contention insight event that in rare cases may not be generated.
Previously, this test was not attempting to enforce that the blocking
transaction blocks the expected query for the required amount of time. The
test sets the insight latency threshold to 30ms, however in rare cases
it's possible that the waiting query finishes executing in that time.

This fix adds a wait group s.t. the blocking txn will wait for the waiting
txn to begin executing, then sleep for a duration of 500ms.  This should
ensure we hit the 30ms threshold required by insights.

Epic: none
Fixes: cockroachdb#108368

Release note: None
craig bot pushed a commit that referenced this issue Aug 10, 2023
108384: release: update predecessor map for 22.2.13 r=THardy98 a=THardy98

Release note: None
Epic: None

108479: insights: fix flaky `TestInsightsIntegrationForContention` r=xinhaoz a=xinhaoz

`TestInsightsIntegrationForContention` is flaky due to checking for an expected contention insight event that in rare cases may not be generated. Previously, this test was not attempting to enforce that the blocking transaction blocks the expected query for the required amount of time. The test sets the insight latency threshold to 30ms, however in rare cases it's possible that the waiting query finishes executing in that time.

This fix adds a wait group s.t. the blocking txn will wait for the waiting txn to begin executing, then sleep for a duration of 500ms.  This should ensure we hit the 30ms threshold required by insights.

Epic: none
Fixes: #108368

Release note: None

Co-authored-by: Thomas Hardy <thardy@cockroachlabs.com>
Co-authored-by: Xin Hao Zhang <xzhang@cockroachlabs.com>
@craig craig bot closed this as completed in 682be9d Aug 10, 2023
Cluster Observability automation moved this from Active Issues to Done Aug 10, 2023
blathers-crl bot pushed a commit that referenced this issue Aug 11, 2023
`TestInsightsIntegrationForContention` is flaky due to checking for an
expected contention insight event that in rare cases may not be generated.
Previously, this test was not attempting to enforce that the blocking
transaction blocks the expected query for the required amount of time. The
test sets the insight latency threshold to 30ms, however in rare cases
it's possible that the waiting query finishes executing in that time.

This fix adds a wait group s.t. the blocking txn will wait for the waiting
txn to begin executing, then sleep for a duration of 500ms.  This should
ensure we hit the 30ms threshold required by insights.

Epic: none
Fixes: #108368

Release note: None
xinhaoz added a commit that referenced this issue Aug 14, 2023
`TestInsightsIntegrationForContention` is flaky due to checking for an
expected contention insight event that in rare cases may not be generated.
Previously, this test was not attempting to enforce that the blocking
transaction blocks the expected query for the required amount of time. The
test sets the insight latency threshold to 30ms, however in rare cases
it's possible that the waiting query finishes executing in that time.

This fix adds a wait group s.t. the blocking txn will wait for the waiting
txn to begin executing, then sleep for a duration of 500ms.  This should
ensure we hit the 30ms threshold required by insights.

Epic: none
Fixes: #108368

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

3 participants