Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: import/nodeShutdown/coordinator failed #110782

Closed
cockroach-teamcity opened this issue Sep 17, 2023 · 3 comments
Closed

roachtest: import/nodeShutdown/coordinator failed #110782

cockroach-teamcity opened this issue Sep 17, 2023 · 3 comments
Labels
branch-release-23.1 Used to mark GA and release blockers and technical advisories for 23.1 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-sql-queries SQL Queries Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Sep 17, 2023

roachtest.import/nodeShutdown/coordinator failed with artifacts on release-23.1 @ 12a0fdf76785787a3a7e83198f1adfd7184ea910:

(monitor.go:153).Wait: monitor failure: getting the job status: pq: crdb-internal-jobs-table: system-jobs-scan: rpc error: code = Unavailable desc = error reading from server: read tcp 10.142.0.214:37010->10.142.0.200:26257: read: connection reset by peer
test artifacts and logs in: /artifacts/import/nodeShutdown/coordinator/run_1

Parameters: ROACHTEST_arch=amd64 , ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_encrypted=false , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

/cc @cockroachdb/sql-queries

This test on roachdash | Improve this report!

Jira issue: CRDB-31611

@cockroach-teamcity cockroach-teamcity added branch-release-23.1 Used to mark GA and release blockers and technical advisories for 23.1 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-sql-queries SQL Queries Team labels Sep 17, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Sep 17, 2023
@mgartner
Copy link
Collaborator

From the logs of node 2:

cockroach exited with code 137: Sun Sep 17 05:54:41 UTC 2023

I believe this indicates it was killed by the OOM killer. I've attached the debug zip below:

debug.zip

@yuzefovich
Copy link
Member

I don't think it's an OOM - the test itself shuts down a node:

05:54:40 jobs.go:128: stopping node (using SIGKILL) :2
05:54:40 cluster.go:709: test status: stopping nodes :2
05:54:40 jobs.go:114: job %!s(catpb.JobID=900660104002240514) still running, waiting to succeed
05:54:42 jobs.go:134: stopped node :2

This type of error indicates we happened to shutdown node 2 concurrently with crdb-internal-jobs-table query being issued (we have a separate goroutine that polls SHOW JOBS every second). I believe on master we solved this scenario via "retry-as-local" mechanism which isn't really backportable, so the question is what do we want to do on 23.1 and 22.2 branches.

My inclination is to simply close this issue since it seems quite rare (I wasn't able to quickly find similar issues). Alternatively, we could swallow a subset of errors when polling SHOW JOBS and not fail the test. @mgartner WDYT?

craig bot pushed a commit that referenced this issue Sep 20, 2023
110931: ui: add plan gist as option on bundle collection r=maryliag a=maryliag

Note to reviewers: There is another option to get any plan _except_ from the selected gist, but that is not part of this PR. Once a design is created, this option can be added.

---
Part Of #103018

This commit adds an option to collect statement bundle based on a specific plan gist.

<img width="578" alt="Screenshot 2023-09-19 at 3 18 01 PM" src="https://github.com/cockroachdb/cockroach/assets/1017486/5ab807b7-08f4-49e7-b540-2dadface766d">


https://www.loom.com/share/59335438f0884b75a7d163d96effe5a8

Release note (ui change): Add option to filter out by specific plan gist when collecting a statement bundle.

110976: syntheticprivilege: admin always has ALL global privileges r=rafiss a=rafiss

### syntheticprivilege: admin always has ALL global privileges

As we move away from requiring the admin role to perform cluster
debug/repair operations, we want to use a privilege instead. To
facilitate that, the admin role now implicitly has ALL global
privileges. The privilege for admins is not revokeable.

---

### sql: use better error message for missing system privilege

Since we document privileges on the GlobalPrivilegeObject using the
phrase "system privilege", we should make the error message say that
too.

informs #109814
Release note: None

110979: roachtest: use correct format directive for job ID r=yuzefovich a=yuzefovich

`catpb.JobID` doesn't implement `fmt.Stringer`.

Touches: #110782.

Epic: None

Release note: None

Co-authored-by: maryliag <marylia@cockroachlabs.com>
Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
craig bot pushed a commit that referenced this issue Sep 20, 2023
110396: sql: add support for foreign key cascades in udfs r=rharding6373 a=rharding6373

This commit adds testing and makes some fixes to support foreign key cascades in UDFs.

Epic: CRDB-25388
Informs: #87289

Release note: none

110925: dev: add support for `podman` r=healthy-pod a=rickystewart

Part of: DEVINF-522

Epic: none
Release note: None

110978: catpb: make JobID implement fmt.Stringer r=yuzefovich a=yuzefovich

This will make some things nicer (e.g. in roachtest/tests/jobs.go we used %s format directive).

Touches: #110782.

Epic: None

Release note: None

Co-authored-by: rharding6373 <rharding6373@users.noreply.github.com>
Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
@mgartner
Copy link
Collaborator

I'm ok closing this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-23.1 Used to mark GA and release blockers and technical advisories for 23.1 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-sql-queries SQL Queries Team
Projects
Archived in project
Development

No branches or pull requests

3 participants