Skip to content

fix: add pagination to job_collector task#8240

Merged
klesh merged 1 commit into
apache:release-v1.0from
ClaudioMascaro:feat/pagination-collect-jobs-release
Dec 18, 2024
Merged

fix: add pagination to job_collector task#8240
klesh merged 1 commit into
apache:release-v1.0from
ClaudioMascaro:feat/pagination-collect-jobs-release

Conversation

@ClaudioMascaro
Copy link
Copy Markdown
Contributor

⚠️ Pre Checklist

Please complete ALL items in this checklist, and remove before submitting

  • I have read through the Contributing Documentation.
  • I have added relevant tests.
  • I have added relevant documentation.
  • I will add labels to the PR, such as pr-type/bug-fix, pr-type/feature-development, etc.

Summary

Add pagination in github_graphql job collector task.

Does this close any open issues?

Closes #8028

Screenshots

We needed to extract data from a large and complex repository, which has over 30000 workflow runs, and some of that can have more than 200 job runs.

Whenever the Collect Job task started, It simply wouldn't finish the query in time, entering in the retry flow:

Screenshot from 2024-12-09 16-56-00

we have tried to reduce api timeout, but it would only increase the number of unsuccessful retries:

Screenshot from 2024-12-06 15-38-14

So, after implementing the solution, it would solve our case, and after 17 hours, it was able to collect all data:

image
(don't mind the log I added locally to debug)

image

as for comparison purposes, we have extracted data from a much less complex repository, which before the implementation, took the following time:

Screenshot from 2024-12-11 08-11-06

and, after bringing off the solution (of course, rerunning in hard refresh mode), there was no change in pipeline overall time:

Screenshot from 2024-12-11 08-18-53

Other Information

Already merged on main branch, reopening as requested: #8233 (review)

@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. component/plugins This issue or PR relates to plugins pr-type/bug-fix This PR fixes a bug labels Dec 12, 2024
@ClaudioMascaro
Copy link
Copy Markdown
Contributor Author

@klesh

@klesh
Copy link
Copy Markdown
Contributor

klesh commented Dec 16, 2024

This looks good! Could you also update the end-to-end (e2e) test cases to reflect these changes? Thanks!

@ClaudioMascaro
Copy link
Copy Markdown
Contributor Author

@klesh do you have any example on writing e2e test for Collector tasks? I couldn't find any practical approach to that. Also there are tests failing for jira plugin.

@klesh klesh force-pushed the feat/pagination-collect-jobs-release branch from 2462665 to 43caa9e Compare December 18, 2024 02:31
@klesh
Copy link
Copy Markdown
Contributor

klesh commented Dec 18, 2024

Yes, that's unexpected; collectors shouldn't impact e2e tests.

Further investigation revealed that PR #8223 introduced the error, which I've addressed in PR #8243. Everything should be working correctly now.

My apologies for the inconvenience, and thank you for your contribution!

@klesh klesh merged commit 7beae18 into apache:release-v1.0 Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/plugins This issue or PR relates to plugins pr-type/bug-fix This PR fixes a bug size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants