Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ccl/kvccl/kvfollowerreadsccl: TestBoundedStalenessDataDriven failed #124694

Closed
cockroach-teamcity opened this issue May 25, 2024 · 4 comments
Closed
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-kv KV Team X-unactionable This was closed because it was unactionable.

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented May 25, 2024

ccl/kvccl/kvfollowerreadsccl.TestBoundedStalenessDataDriven failed on master @ b5081e997adaf5950c9b85dfb6655f226bf16d29:

    datadriven.go:144: 
        /var/lib/engflow/worker/work/1/exec/bazel-out/k8-fastbuild/bin/pkg/ccl/kvccl/kvfollowerreadsccl/kvfollowerreadsccl_test_/kvfollowerreadsccl_test.runfiles/com_github_cockroachdb_cockroach/pkg/ccl/kvccl/kvfollowerreadsccl/testdata/boundedstaleness/single_row:136:
        reset-matching-stmt-for-tracing [0 args]
        <no input to command>
        ----
    datadriven.go:144: 
        /var/lib/engflow/worker/work/1/exec/bazel-out/k8-fastbuild/bin/pkg/ccl/kvccl/kvfollowerreadsccl/kvfollowerreadsccl_test_/kvfollowerreadsccl_test.runfiles/com_github_cockroachdb_cockroach/pkg/ccl/kvccl/kvfollowerreadsccl/testdata/boundedstaleness/single_row:141:
        exec [0 args]
        SET CLUSTER SETTING kv.closed_timestamp.target_duration = '1hr';
        ----
    datadriven.go:144: 
        /var/lib/engflow/worker/work/1/exec/bazel-out/k8-fastbuild/bin/pkg/ccl/kvccl/kvfollowerreadsccl/kvfollowerreadsccl_test_/kvfollowerreadsccl_test.runfiles/com_github_cockroachdb_cockroach/pkg/ccl/kvccl/kvfollowerreadsccl/testdata/boundedstaleness/single_row:145:
        exec [0 args]
        ALTER TABLE t ADD COLUMN new_col INT NOT NULL DEFAULT 2
        ----
    datadriven.go:144: 
        /var/lib/engflow/worker/work/1/exec/bazel-out/k8-fastbuild/bin/pkg/ccl/kvccl/kvfollowerreadsccl/kvfollowerreadsccl_test_/kvfollowerreadsccl_test.runfiles/com_github_cockroachdb_cockroach/pkg/ccl/kvccl/kvfollowerreadsccl/testdata/boundedstaleness/single_row:151:
        query [1 args]
        SELECT * FROM t AS OF SYSTEM TIME with_max_staleness('10s') WHERE pk = 1
        ----
        1 2
        events (1 found):
         * event 1: colbatchscan trace on node_idx 2: local read then remote leaseholder read
    datadriven.go:144: 
        /var/lib/engflow/worker/work/1/exec/bazel-out/k8-fastbuild/bin/pkg/ccl/kvccl/kvfollowerreadsccl/kvfollowerreadsccl_test_/kvfollowerreadsccl_test.runfiles/com_github_cockroachdb_cockroach/pkg/ccl/kvccl/kvfollowerreadsccl/testdata/boundedstaleness/single_row:158:
        query [1 args]
        SELECT * FROM t AS OF SYSTEM TIME with_min_timestamp(now() - '10s') WHERE pk = 1
        ----
        1 2
        events (1 found):
         * event 1: colbatchscan trace on node_idx 2: local read then remote leaseholder read
    datadriven.go:144: 
        /var/lib/engflow/worker/work/1/exec/bazel-out/k8-fastbuild/bin/pkg/ccl/kvccl/kvfollowerreadsccl/kvfollowerreadsccl_test_/kvfollowerreadsccl_test.runfiles/com_github_cockroachdb_cockroach/pkg/ccl/kvccl/kvfollowerreadsccl/testdata/boundedstaleness/single_row:165:
        query [1 args]
        SELECT * FROM t AS OF SYSTEM TIME with_max_staleness('10s', false) WHERE pk = 1
        ----
        1 2
        events (1 found):
         * event 1: colbatchscan trace on node_idx 2: local read then remote leaseholder read
    datadriven.go:144: 
        /var/lib/engflow/worker/work/1/exec/bazel-out/k8-fastbuild/bin/pkg/ccl/kvccl/kvfollowerreadsccl/kvfollowerreadsccl_test_/kvfollowerreadsccl_test.runfiles/com_github_cockroachdb_cockroach/pkg/ccl/kvccl/kvfollowerreadsccl/testdata/boundedstaleness/single_row:172:
        query [1 args]
        SELECT * FROM t AS OF SYSTEM TIME with_min_timestamp(now() - '10s', false) WHERE pk = 1
        ----
        1 2
        events (1 found):
         * event 1: colbatchscan trace on node_idx 2: local read then remote leaseholder read
    boundedstaleness_test.go:383: condition failed to evaluate within 45s: from boundedstaleness_test.go:408: not follower reads found:
        events (0 found):
    --- FAIL: TestBoundedStalenessDataDriven/single_row (82.86s)

Parameters:

  • attempt=1
  • deadlock=true
  • run=3
  • shard=1
Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/kv

This test on roachdash | Improve this report!

Jira issue: CRDB-39014

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-kv KV Team labels May 25, 2024
@miraradeva
Copy link
Contributor

Haven't been able to repro on a gce worker, under deadlock, on the same sha as the failure:

./dev test pkg/ccl/kvccl/kvfollowerreadsccl/ --filter=TestBoundedStalenessDataDriven --stress --deadlock --count=5000

The main thing I see in the logs right before the timeout are repeated (over 200 of them) lines like this:

I240525 08:16:07.851733 2280561 sql/catalog/lease/descriptor_state.go:256 ⋮ [T1,Vsystem,n3,client=127.0.0.1:43292,hostssl,user=root] 2083  release: 104(‹"t"›,0101807ad272fe77444245aeb087b0700b9108) ver=9:1716625276.970286272,0, refcount=0

I think it's possible that one of the schema change jobs is not completing on time while waiting for that sql lease expire:

I240525 08:16:06.960110 2381445 sql/catalog/lease/lease.go:264 ⋮ [T1,Vsystem,n1,job=‹NEW SCHEMA CHANGE id=971749981398958081›] 2077  waiting for 1 leases to expire: desc=[{‹t› 104 8}]

@rickystewart
Copy link
Collaborator

FYI, this test is given only one core even under deadlock. You may find it easier to reproduce with --test_env=GOMAXPROCS=1 or alternatively we can grant it additional resources under deadlock to see if that helps.

@miraradeva
Copy link
Contributor

Also no luck reproducing with ./dev test pkg/ccl/kvccl/kvfollowerreadsccl/ --filter=TestBoundedStalenessDataDriven --stress --deadlock --count=5000 -- --test_env=GOMAXPROCS=1.

@nicktrav nicktrav added the P-2 Issues/test failures with a fix SLA of 3 months label Jun 10, 2024
@arulajmani arulajmani added X-unactionable This was closed because it was unactionable. and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Jun 12, 2024
@arulajmani
Copy link
Collaborator

I'm going to close this out as unactionable. We've seen this fail only once, and there's nothing in the logs that's useful and Mira couldn't repro this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. P-2 Issues/test failures with a fix SLA of 3 months T-kv KV Team X-unactionable This was closed because it was unactionable.
Projects
No open projects
Status: roachtest/unit test backlog
Development

No branches or pull requests

5 participants