-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YSQL] Creating a table with "IF NOT EXISTS" leads to tables being created over and over again until OOM #2001
Comments
@aphyr I'm assuming you don't have the logs for this (or they were too massive). Dug a bit into the sampled data you were able to collect and found this:
So my assumption of what happened was that we generated 19961 tablets worth of tables until we could not do any more... cc @m-iancu @ndeodhar I'm guessing this is still txn DDL related, despite |
I've been working on logs--I have a 6.3GB tarball of logs from n3, which took all day to make, haha. Would it be helpful for me to throw that up on S3? It's 121GB expanded, which might... be difficult... |
@aphyr if I'm right, maybe it would be useful to instead get the logs just from the master (leader). Is that significantly smaller? 🙏 |
They're... all pretty huge. I actually just wiped the other node logs to get started on some new tests--figured it wasn't worth waiting the whole weekend for the other node logs to pack up. :( |
Haha, definitely not! erm...it wouldn't happen that n3 was the master leader? :) In any case, yeah, some s3 upload would be sweet, we can copy it in our own s3 and go from there. |
OK! Upload starting! See y'all... in the morning, haha :) |
See related comment in #1991. Duplicate tables are being created due to an existing limitation with the way we maintain caches in master for postgres tables. |
Couldn't reproduce using the given Jepsen version and command line (with much higher |
I had a cluster of 1.3.1 nodes go very sideways in a Jepsen test last night. During table creation, (which uses
CREATE TABLE ... IF NOT EXISTS
), one particular table,append4
, repeatedly failed to create, logging:... over and over. We've seen this error before, during testing, but it's usually transient. This time it went on for hours before the box OOMed, and presumably killed a bunch of processes, but left one tserver (on node n3) spinning at 93-97% CPU use, on a 48-way (with HT) box.
Some debugging data:
mem-trackers
threadz?group=all
contention
rpcz.txt
Unfortunately I wasn't able to get a corefile--attaching GDB to the process made it crash immediately.
This happened with Jepsen af7285b96952258f3e3cb22cb18796ddbf37c56f, running
The text was updated successfully, but these errors were encountered: