-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Measure read latency of single row SQL reads done by users, without capturing contention or hardware queueing #98305
Comments
CC @tbg |
What do you mean by that? You will measure hardware overload no matter what, no? Similarly, network latencies are measured too, and if there is network congestion that too will influence the measurement. I'd think all of this is intentional, too. |
Ideally, AC should shift queuing into the AC system rather than the goroutine scheduler, etc. So, for the same reason we expect node liveness HBs & kvprober requests to succeed under hardware overload, we expect Perhaps the problem is the term "hardware overload". In this case, by "hardware overload", I mean some hardware resource is saturated but only a little since AC is working. Re: network, yes, since AC doesn't take this into account. We have no observed that problem in practice in CC to date. |
I would replace the word "hardware overload" with "hardware queueing". Specifically, there are a few places I could see hardware queueing:
For DIsk it is a little more tricky since the OS will also queue and reorder operations since we are going through the file system and not directly to the raw disk. There are metrics that the file system has (dirty bytes, ...) that can be used to determine how far behind it is on writes. Ideally, AC would reduce the size of these queues to prevent hardware queuing. I'm not sure it will want to consider all of them though. For instance memory queueing requires a deep understanding of caches and numa architecture, so that is likely outside the scope of what AC should do. |
"single row" seems overly specific and also maybe not what it sounds like you're looking for here -- a "row" could be 200 reads or one read depending on how many column families and plenty of workloads are going to be scans rather than single row reads. It sounds like what you're getting at is measuring the latency / service time of the storage layer in finding and returning the data requested by the transaction layer, as an indicator of storage performance which is independent of performance of the transaction layer sitting on top of it, including anything like AC queueing, latch acquisitions, etc that happen in that transaction layer? You just want a pure indicator of how quickly the storage layer is getting the data to it? For that, I think it sounds like the iterator stats, specifically the timing of Seek*() and Next*() calls, might get you closer to what you want, i.e. the retrieval performance of the storage engine? |
Just noticed this bit. Hardware overload is a problem for the customer experience so makes sense to measure it with certain metrics. But we don't want to wake up an SRE at 3am because a cluster is hardware overloaded. As a result, for this metric, I like that it mostly doesn't capture hardware overload.
Good idea. Updating ticket title.
Agreed! I think it's okay that AC is not perfect. AC needs to be good enough that SRE / dev can page on things like kvprober / this & not receive too many false positive pages (pages with cause being hardware overload). Some false positive pages are fine tho.
I'll take a look at that. Thanks for idea! Re: goal, the goal you are laying out is perhaps not the same as the goal I have. I think the goal is to measure E2E SQL perf in latency sense for the single SQL row SELECT workload, but with certain pieces of latency that are generally caused by customer workload choices, e.g. contention + hardware queueing, excluded. You might say: Josh, you are not measuring any SQL stuff. Agreed but that is more for expediency! One way to think about this ticket is a very scrappy & minimal version of #71169 (comment).
Re: 200 reads or one read depend on column families, let's see if that is a problem in practice. Re: scans, I think that is out of scope. Useful to know perf in latency sense of non-scan workloads, even tho people do scans. Scans are tricky since perf is expected to be extremely variable. |
I guess I'm arguing with the goal; why is your goal "single row SELECT latency that excludes `$stuff" ? Is that a goal because we think if we get it we will be able to answer questions like like "how does disagg storage affects performance?" or "does this node on this cluster have a dodgy EBS volume and need to trigger an alert?", without the confounding variables of per-workload differences like contention, AC, schema, etc?, If so, then I think having a metric for "single row SQL read excluding $stuff" isn't actually the right goal, that if we had it would best give us those answers, for reasons like the ones I had above. In particular, I think exposing the pebble iterator operation latencies, measured below all of the "", would be a better goal. |
Ya, it's an interesting point. I shall dig more into details of exactly what the iterator stats are measuring & then report back! |
Is your feature request related to a problem? Please describe.
Would be nice to measure point read latency of single row SQL reads done by users, without capturing contention or hardware overload. CRDB should deliver predictable performance for single row SQL reads.
Describe the solution you'd like
Implement
kv.replica_read_batch_evaluate.single_row.latency
. Similar tokv.replica_read_batch_evaluate.latency
but only captures read requests that touch a single SQL row (or similar e.g. allow joins by FK). @rytaft suggests a way to check if a read touch just one SQL row at #71169 (comment). We could do the check in SQL land, attach some metadata to the context indicating that only a single SQL row is affected, then plumb that intoBatchRequest
like we do with tracing info, then plumb intokvserver
context, the use to implementkv.replica_read_batch_evaluate.single_row.latency
.Describe alternatives you've considered
#71169 is a more complete solution but will take much longer to implement.
Additional context
This will help with:
Jira issue: CRDB-25182
The text was updated successfully, but these errors were encountered: