Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProgressReports should report start/finish times. #5268

Open
ggainey opened this issue Apr 16, 2024 · 3 comments
Open

ProgressReports should report start/finish times. #5268

ggainey opened this issue Apr 16, 2024 · 3 comments
Labels

Comments

@ggainey
Copy link
Contributor

ggainey commented Apr 16, 2024

ProgressReports are used for tasks that have multiple parts, so you can tell how much they're doing/have done. When debugging large tasks (e.g., a sync of the one Really Big Repo), it would be handy to be able to see which subsection of the task took the bulk of the time, before digging into the logs.

Currently, for example, a sync-task has a progres--report-section that looks like this:

"progress_reports": [
    {
      "message": "Downloading Metadata Files",
      "code": "sync.downloading.metadata",
      "state": "completed",
      "total": null,
      "done": 6,
      "suffix": null
    },
    {
      "message": "Skipping Packages",
      "code": "sync.skipped.packages",
      "state": "completed",
      "total": 0,
      "done": 0,
      "suffix": null
    },
    {
      "message": "Parsed Advisories",
      "code": "sync.parsing.advisories",
      "state": "completed",
      "total": 4565,
      "done": 4565,
      "suffix": null
    },
    {
      "message": "Parsed Packages",
      "code": "sync.parsing.packages",
      "state": "completed",
      "total": 20851,
      "done": 20851,
      "suffix": null
    },
    {
      "message": "Parsed Comps",
      "code": "sync.parsing.comps",
      "state": "completed",
      "total": 23,
      "done": 23,
      "suffix": null
    },
    {
      "message": "Downloading Artifacts",
      "code": "sync.downloading.artifacts",
      "state": "completed",
      "total": null,
      "done": 0,
      "suffix": null
    },
    {
      "message": "Associating Content",
      "code": "associating.content",
      "state": "completed",
      "total": null,
      "done": 0,
      "suffix": null
    }

It would be very handy if it had a started_at: that was the ProgRpt's create-time (or maybe when state changed to running?) and a finished_at: that was filled in when the state changed to completed. This would make it possible to see at a glance that, for example, "oh, all the time was in 'Downloading Artifacts' - I need to check network-speed".

@mdellweg
Copy link
Member

The thing is that we designed our most interesting (long running tasks) in a pipeline manner. They do not perform one step after another, but all at the same time. I fear these times would be rather meaningless.

@ggainey
Copy link
Contributor Author

ggainey commented Apr 19, 2024

It's not perfect, surely. But not everything happens in the sync-pipeline - in fact, some parts of the sync-pipeline happen not-concurrently, in more complicated cases like pulp_rpm. orphan-cleanup and import/export both use multiple progress-reports in ways that this might give some useful insights.

In any event - it can be useful, depending on the circumstancesm, and isn't a very intrusive change.

@mdellweg
Copy link
Member

Sure, some information may be better than none.
Should we add a docs page that states: "These times may absolutely include waiting on other pieces of the same task or even external resources. You many need an engineer to understand that they tell you little to nothing."

I'm not saying we shouldn't do this. I just fear that this can potentially increase the support surface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants