New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support progress and/or rate indication #1658
Comments
We are also writing a custom solution for this, where workflows report progress to a REST endpoint. If this were built into argo, that would be nice. |
See #3557 |
We've an ask for this in the user interface, both at a node and workflow level. This could be done in the UI using existing backend code by getting the last successful execution of the workflow (determine by This is not a very popular issue, so warrants a low cost solution. |
I've linked the PR, which provides a coarse way to track your workflow. It estimates how long it will take to complete and displays progress towards that time. This is purely time-based. This is not as fined grained or nuanced as some of the ideas in this PR. For example, if a workflow had 10 steps, another way to do this would be to base progress on the number of steps complete. @brabster @ddseapy please take a look and make suggestions or comments. |
Unfortunately most our current workflow templates yield workflows that vary drastically from a couple minutes to several hours, so as the pr mentions in the docs it's not ideal for our current use case but will definitely keep this in mind for future wftmpl. |
Any thought on other ways to estimate this that would work for you? |
Currently for most workflows we have a specific parameter (hardcoded name we look for), whose value is the amount that workflow progress should be incremented for once that node is complete. This parameter is passed from a previous step that knows how many items/nodes there will be. Progress is computed by looking at the workflow in a shared informer and summing the parameter values up across all nodes. For workflows with just a few nodes, we also have a rest service that allows nodes to increment the progress themselves. This progress is stored in a separate postgres table. Both of these have the downside of the workflow needing to know/report about their own progress. What is in the PR clearly doesn't have that restriction. So Im not sure I have a better generalized suggestion. At least not a performant one that doesn't involve periodically analysing lots of workflows from the archive. |
@dseapy that's really interesting. Let me play that back so I can be sure I understand. In effect, you have a method for nodes to report their progress? That is just another way to report progress. Proposal: We currently nodes report status by annotating their pod. What if there was an annotation that we recognise as progress your nodes could update this and we could report back via the CLI and UI. If the annotation was absent, we would default a computed metric.
|
Yeah, I believe that would work for our wftmpls. |
@dseapy I've tweaked my POC. Setting annotations works today, but (a) requires the workflow role to have pod patch and this is is a security issue we want to remove and (b) exposes implementation details, instead what about just recognising a log line:
|
Better still:
|
That does indeed sound much better for security and friendlier to the container. I'm guessing there is not too much of a performance hit to do the log parsing/matching for each pod? |
I think this approach works for me too. I had long running tasks within a workflow that I tracked via logging, |
I really like the progress stuff, definitely solves some UX issues I have! One thought is: The ability to have multiple independent progress meters would be nice (think things like monitoring rollouts of multiple different kubernetes workloads). Obviously can be handled currently by just breaking those out into distinct leaf nodes, so not really an issue. |
Another thought would be to have the node config include a regex that would yield the progress information? |
I think this way of parsing the stdout log works for me too 👍 |
The implementation in #4015, reporting progress via |
@salanki thank you. I think what I'm saying is that I don't plan to work on this anymore. But if someone wants to take it on - that'd be great! |
FYI i started working on this in https://github.com/helio/argo-workflows/tree/custom-workflow-progress But in case anyone wants to have an early look at it, more than welcome :). I only ported the custom N/M reporting, not yet the message part as the pod reporting status messages has changed quite a bit and I'm not sure if it's even needed TBH. compare link: https://github.com/argoproj/argo-workflows/compare/master...helio:custom-workflow-progress?expand=1 |
original code from: https://github.com/argoproj/argo-workflows/pull/4015/files closes argoproj#1658, argoproj#4245 Signed-off-by: Michael Weibel <michael@helio.exchange>
in case anyone wants to start using this, docs are here: https://github.com/argoproj/argo-workflows/blob/master/docs/progress.md#self-reporting-progress. Let me know if they're clear enough. |
Hi, can I get this information from inside a workflow manifest using a workflow variable, something like |
As far as I'm aware not using a variable (all workflow variables are documented here). However, you could might be able to read the file exposed via ARGO_PROGRESS_FILE though I'm not sure about that. |
Thanks, just tried using the https://github.com/argoproj/argo-workflows/blob/729d0a7b35825ff47254bad3fbea3d571ea621c8/examples/dag-coinflip.yaml example, on L50 I replaced
looks like it's not two-way binding. |
is it possible to get the global total tasks in the workflow so that I can calculate it myself? spent some time checked the doc but couldn't find any way to do it. |
Is this a BUG REPORT or FEATURE REQUEST?:
Feature Request
What happened:
It would be helpful if it were possible to see how much progress is being made by a long-running task
What you expected to happen:
A progress indication of some kind - ideally something that can be watched at the CLI and rendered in argo-ui.
For example I have a job that runs in the middle of a workflow over a few million lines and it takes a while. I'm trying to tune k8s autoscaling to scale out a service it uses and it would be helpful to know how fast it is going and how far it is through the work. Rather than implement my own solution to log this or produce metrics it would be neat if there were a way to publish this information straight into Argo.
How to reproduce it (as minimally and precisely as possible):
N/A from here down...
Anything else we need to know?:
Environment:
Other debugging information (if applicable):
The text was updated successfully, but these errors were encountered: