-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: schemachange/index/tpcc/w=1000 failed #54304
Comments
(roachtest).schemachange/index/tpcc/w=1000 failed on release-20.2@85cc9fe61acc633d38c5c7078725e7ea68b5352c:
More
Artifacts: /schemachange/index/tpcc/w=1000
See this test on roachdash |
This last one is an OOM |
(roachtest).schemachange/index/tpcc/w=1000 failed on release-20.2@bdb8cd0e7b2f25a08569a56464838486b6d16421:
More
Artifacts: /schemachange/index/tpcc/w=1000
See this test on roachdash |
(roachtest).schemachange/index/tpcc/w=1000 failed on release-20.2@f180b8d178f7c81dbb73e7f4b139bd7b7bf829dc:
More
Artifacts: /schemachange/index/tpcc/w=1000
See this test on roachdash |
The first failure is an infra flake and the most recent two are from the backfiller memory accounting bug (fixed in #55092). That leaves the OOM. I don't know what's going on with that one. The OOM was 3 minutes into an index backfill. There's a heap profile from less than a minute before the process was killed that only reports <700 MB inuse memory and nothing particularly stands out about it, and runtime stats indicate most of the allocation is happening in cgo anyway. Everything just looks slow and there are some SST ingestion delays that look very large (this is about a minute before the OOM):
Some of this might just be typical for this test, though. @ajwerner I can't remember whether you were looking at this one. Did you find out anything interesting about it? I think a rare OOM on this test warrants more investigation but is not a release blocker so I'm removing the tag. |
Also, we've had a few similar-looking failures a few times on master in the last few months. The first one seems to be #44071 (comment). |
I'm also confused because this test is running on machines with 16 CPUs and 14.4 GB of memory, and dmesg says
But our runtime stats report the RSS for the process hovering at around 6 GB before it gets killed, the last entry being
Is |
(roachtest).schemachange/index/tpcc/w=1000 failed on release-20.2@8c79e2bc4b35d36c8527f4c40c974f03d9034f46:
More
Artifacts: /schemachange/index/tpcc/w=1000
See this test on roachdash |
(roachtest).schemachange/index/tpcc/w=1000 failed on release-20.2@b0012907c1bc9627ae2de83e6099c4930a32699e:
More
Artifacts: /schemachange/index/tpcc/w=1000
See this test on roachdash |
Will triage these recent failures and assign based on the outcome. |
(roachtest).schemachange/index/tpcc/w=1000 failed on release-20.2@01dad2f783c4be5060d332b975f96a3ad4be8ecb:
More
Artifacts: /schemachange/index/tpcc/w=1000
See this test on roachdash |
This wasn't actually the test I wanted to bisect (it was #62320). What we saw over there was that this test is running close to overload and so it will sometimes fall over. I'd suggest toning down the workload here a bunch. |
(roachtest).schemachange/index/tpcc/w=1000 failed on release-20.2@e9f553b570957d34f5bb39f8976f6d2c893bd8b4:
More
Artifacts: /schemachange/index/tpcc/w=1000 See this test on roachdash |
closing this as stale |
(roachtest).schemachange/index/tpcc/w=1000 failed on release-20.2@a504fc7f26be39586c14c3fa22ad728ed8fa9b6d:
More
Artifacts: /schemachange/index/tpcc/w=1000
Related:
See this test on roachdash
powered by pkg/cmd/internal/issues
The text was updated successfully, but these errors were encountered: