Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding the stats->map output #232

Open
KaliszAd opened this issue May 8, 2023 · 3 comments
Open

Understanding the stats->map output #232

KaliszAd opened this issue May 8, 2023 · 3 comments

Comments

@KaliszAd
Copy link

KaliszAd commented May 8, 2023

I have trouble understanding the (manifold-exec/stats->map (.getStats executor)) output.
After a trivial transformation, I get:

#:thread-pool{:num-workers 93,
              :utilization #:summary{:permille-950 0.7666895163808638,
                                     :permille-999 0.8500435866770647,
                                     :permille-900 0.7023507840369302,
                                     :permille-500 0.0,
                                     :permille-990 0.8200846893009659},
              :queue-latency #:summary{:permille-950 2.602875999999999,
                                       :permille-999 11.711050468000012,
                                       :permille-900 1.6487680000000007,
                                       :permille-500 0.085797,
                                       :permille-990 6.60346608},
              :task-completion-rate #:summary{:permille-950 400.0,
                                              :permille-999 1060.7999999999993,
                                              :permille-900 240.0,
                                              :permille-500 0.0,
                                              :permille-990 640.0},
              :task-latency #:summary{:permille-950 903.397588,
                                      :permille-999 2057.2366079300305,
                                      :permille-900 761.185869,
                                      :permille-500 1.472786,
                                      :permille-990 1242.8296557000006},
              :queue-length #:summary{:permille-950 0.0,
                                      :permille-999 0.0,
                                      :permille-900 0.0,
                                      :permille-500 0.0,
                                      :permille-990 0.0},
              :task-arrival-rate #:summary{:permille-950 380.0,
                                           :permille-999 710.3999999999996,
                                           :permille-900 240.0,
                                           :permille-500 0.0,
                                           :permille-990 560.0},
              :task-rejection-rate #:summary{:permille-950 0.0,
                                             :permille-999 0.0,
                                             :permille-900 0.0,
                                             :permille-500 0.0,
                                             :permille-990 0.0}}

I don't get how the task arrival rate can be 0 in the Q-50. Why is there a queue latency when the queue length is 0? This particular executor is a utilization executor (0 queue length by default). It is created using (flow/utilization-executor 0.9 512 {:initial-thread-count 10}).

@KingMob
Copy link
Collaborator

KingMob commented May 9, 2023

I don't know off the top of my head. I'd have to bury deep into the Dirigiste code to refresh my memory to get the answer. But I'll give you my immediate guesses. @arnaudgeiser may also have some insights.

I don't get how the task arrival rate can be 0 in the Q-50.

If no tasks arrive for at least half the recording period that stats were collected for, then the median (50th pctile) will be 0. If you start stuff up in the background, and only use it occasionally (like in the REPL, or a low-use server), this seems pretty natural to me. Let me turn it around: why do you think it couldn't be 0?

Why is there a queue latency when the queue length is 0? This particular executor is a utilization executor (0 queue length by default).

Yeah, this one's a little confusing. What queue-latency is actually measuring is time to start executing submitted tasks, which is always non-zero. (The name, or docs, might be clarified on this point. PRs welcome.)

Even if you specify a queue length of 0, the code must still deal with the situation of the executor not being ready to run immediately. To handle that, it uses a SynchronousQueue, which blocks the submitting thread until the executor can accept the Runnable. And even if the executor was always ready, time still elapses between when the job is submitted, and when it starts, regardless.

It should probably be called something like job-start-latency.

@KaliszAd
Copy link
Author

KaliszAd commented May 9, 2023

I don't get how the task arrival rate can be 0 in the Q-50.

If no tasks arrive for at least half the recording period that stats were collected for, then the median (50th pctile) will be 0. If you start stuff up in the background, and only use it occasionally (like in the REPL, or a low-use server), this seems pretty natural to me. Let me turn it around: why do you think it couldn't be 0?

Ah ok, that makes sense. I somehow didn't realize there could actually by no tasks in at least half of the possible arrival time slots. Now it also makes sense why arrival rates are often modeled using exponential distributions.

Yes, there could definitely be a bit more explanation about the why and what with the metrics and their stats in Dirigiste and Manifold. It would also be useful to document the units (AFAIK milliseconds for latencies), so that people don't need to sift through the source code.

@KingMob
Copy link
Collaborator

KingMob commented May 10, 2023

PRs welcome 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants