-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sqllogictests on mac is slow #16963
Comments
A command that caused this:
|
This pain is also actively being felt by the compute team
|
This either isn't true or is very bad if we are indeed doing that (state is observed across statements), it's a new envd per file though. |
Sorry, I should have checked closer, and it's indeed not true that an envd is created for every statement but rather files. |
FWIW, running the same command pasted above took 38s on my machine on the second try (the first try compiled a bunch of stuff). Did some of that time include compilation? |
Nope. It was the second run for me also. |
Today's investigation: 8s startup locally, over 6s of that in coord bootstrap. 5s of it is from the catalog entries init, with most of the time spent in #17057. |
https://gist.github.com/mjibson/bc5a7c17abaf04cb2e3958b12653f71c is a json file of a trace from a CREATE TABLE issued by a sqllogictest statement run on a mac that took 7s to run. It can be uploaded into jaeger for anyone to view. The below screenshot shows that stash and persist operations are just incredibly slow. This is almost certainly either the local cockroach being slow (seems unlikely that it'd be THIS slow), or some translation/network/??? thing the mac is doing to all outgoing network connections, or something like it. I don't have a dev mac so cannot test. Someone who has a mac needs to drive this investigation now. |
300ms+ latencies for those apply_unbatched_cmd_cas calls are crazy and indeed seem to be crdb slowness. the 700ms rollup::set is blob though, which in dev is the local filesystem |
A couple notes: 1) this is with running cockroach on Docker 2) I have an M1 mac
For Cockroach slowness we can to put together a "packaged" report of what's wrong to share with them. |
700ms for a local filesystem operation is also way too long. This is for a CREATE TABLE on an empty process. What is taking 700ms to write? More evidence that this is not a cockroach-specific problem. |
I don't see large CRDB latencies (using Docker CRDB on an M1 mac), but I do see the file system slowness. For me, the slowest operations happening during I think the take-away is that disk I/O is just slow. Maybe it could be sped up by using a ramdisk. Or maybe we can find a lightweight S3 mock backend and run our tests against that instead. |
I did some more testing. Using the file system, my run of the above sqllogictest command takes ~50s.
|
All those filesystem operations look anywhere from 10-40× more expensive than I'd expect them to be -- like |
That was me running Note that those fs functions I instrumented were not literally fs operations but async functions performing the same. So there is probably some scheduling overhead involved as well. |
Brutal, so that's just HFS+ or whatever. Apparently sync_all on macOS really does cost 20ms though (it does fcntl F_FULLFSYNC rather than fsync). Maybe we need a reduced-durability-guarantees version of FileBlob for macOS :-) |
Can we get an in-memory local persist store? That should be the bulk of the remainder after #17326. |
We have one, but it can't be shared across processes. Is that enough? |
That should work for sqllogictest, which never restarts anything and checks state. Seems like this could be configurable in the process orchestrator via envd config that sqllogictest sets, and maybe we can also allow setting via the envd cli so that we can opt in locally. All mzcompose tests could use the existing disk store as the default. |
We discovered using the existing mem impl isn't enough, and we'd need to create some shared server which itself uses the mem impl. Someone might type this up someday soon. |
The forked Homebrew tap avoids slow filesystem operations on macOS by configuring Cockroach to run with an in-memory store. Touches #16963.
Just had a good Slack discussion about how we could instead build some tooling around using a ramfs for the blob store:
|
Closing as stale. SLT is still slower than we'd like, but it's totally workable these days, including on my macOS machine. |
From @frankmcsherry , who was running
sqllogictests
for ~10 files on an already-built release.Also
Which took 7 minutes at 100% CPU.
This slowness in build + execution is affecting all developers and aligns with the goal to make envd startup sub 1s.
The text was updated successfully, but these errors were encountered: