Provide metric for tasks queue when using limit-active-tasks placement strategy #5057

tenjaa · 2020-01-22T13:52:10Z

What challenge are you facing?

We switched to the limit-active-tasks placement strategy and so far it solved a lot of our problems. We want to improve now by scaling our workers depending on the size of the task queue.
We are running our Concourse in a Kubernetes environment.

What would make this better?

It would improve the scaling of workers.

Are you interested in implementing this yourself?

Sure :)
We already saw that the first proposed implementation had this metric exported: #4612

jamieklassen · 2020-01-24T20:09:17Z

Rather than implementing a full queue (like concurrency-safe FIFO guarantee or any kind of priority - which does not currently exist, for the record!) I suspect it would be enough for you to emit a metric whenever the

All workers are busy at the moment, please stand-by.

event happens, which seems to be around this block of code. I can imagine this being a strong enough heuristic to say "my workers are getting busy".

Then I'm thinking you could autoscale depending on how often this event has occurred in the last hour (or whatever granularity/tuning makes sense)?

Frankly I'm not sharp when it comes to k8s autoscaling, so I'd need a sanity check on this assertion. Am I making sense? off-base?

tenjaa · 2020-01-25T16:56:17Z

Oh I only looked at the linked proposed implementation and assumed the final one was more or less the same and it was just forgotten to expose the metric.

In general it should be possible to build a custom metric based on the logs.
Using heuristics we could probably say "three logs in the last minute => three jobs in the queue".
But the FAQ advise against it: https://prometheus.io/docs/introduction/faq/#how-to-feed-logs-into-prometheus

What do you think about not having a queue but a counter?
Maybe even the amount of currently active tasks should be enough. If we just substract the amount of workers (which is already known) we also have the amount of unscheduled tasks.

tenjaa · 2020-03-05T19:36:38Z

Hey @pivotal-jamie-klassen do you have any feedback about the counter idea?
Would it be fine that we'd propose a pullrequest for it?

jamieklassen · 2020-03-27T01:26:16Z

@tenjaa seems fair to me. Especially if you experiment with using that metric in your own environment! I guess you'd probably emit a counter per worker.

tenjaa added the enhancement label Jan 22, 2020

tenjaa mentioned this issue Apr 16, 2020

atc: behaviour: emit tasks waiting prometheus metric #5448

Merged

8 tasks

jamieklassen closed this as completed in #5448 May 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide metric for tasks queue when using limit-active-tasks placement strategy #5057

Provide metric for tasks queue when using limit-active-tasks placement strategy #5057

tenjaa commented Jan 22, 2020

jamieklassen commented Jan 24, 2020 •

edited

tenjaa commented Jan 25, 2020 •

edited

tenjaa commented Mar 5, 2020

jamieklassen commented Mar 27, 2020

Provide metric for tasks queue when using limit-active-tasks placement strategy #5057

Provide metric for tasks queue when using limit-active-tasks placement strategy #5057

Comments

tenjaa commented Jan 22, 2020

What challenge are you facing?

What would make this better?

Are you interested in implementing this yourself?

jamieklassen commented Jan 24, 2020 • edited

tenjaa commented Jan 25, 2020 • edited

tenjaa commented Mar 5, 2020

jamieklassen commented Mar 27, 2020

jamieklassen commented Jan 24, 2020 •

edited

tenjaa commented Jan 25, 2020 •

edited