Skip to content

fix: add pagination to job_collector task#8233

Merged
klesh merged 1 commit into
apache:mainfrom
ClaudioMascaro:feat/pagination-collect-jobs
Dec 12, 2024
Merged

fix: add pagination to job_collector task#8233
klesh merged 1 commit into
apache:mainfrom
ClaudioMascaro:feat/pagination-collect-jobs

Conversation

@ClaudioMascaro
Copy link
Copy Markdown
Contributor

@ClaudioMascaro ClaudioMascaro commented Dec 6, 2024

⚠️ Pre Checklist

Please complete ALL items in this checklist, and remove before submitting

  • I have read through the Contributing Documentation.
  • I have added relevant tests.
  • I have added relevant documentation.
  • I will add labels to the PR, such as pr-type/bug-fix, pr-type/feature-development, etc.

Summary

Add pagination in github_graphql job collector task.

Does this close any open issues?

Closes #8028

Screenshots

We needed to extract data from a large and complex repository, which has over 30000 workflow runs, and some of that can have more than 200 job runs.

Whenever the Collect Job task started, It simply wouldn't finish the query in time, entering in the retry flow:

Screenshot from 2024-12-09 16-56-00

we have tried to reduce api timeout, but it would only increase the number of unsuccessful retries:

Screenshot from 2024-12-06 15-38-14

So, after implementing the solution, it would solve our case, and after 17 hours, it was able to collect all data:

image
(don't mind the log I added locally to debug)

image

as for comparison purposes, we have extracted data from a much less complex repository, which before the implementation, took the following time:

Screenshot from 2024-12-11 08-11-06

and, after bringing off the solution (of course, rerunning in hard refresh mode), there was no change in pipeline overall time:

Screenshot from 2024-12-11 08-18-53

Other Information

Any other information that is important to this PR.

@ClaudioMascaro ClaudioMascaro marked this pull request as ready for review December 7, 2024 11:10
@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. component/plugins This issue or PR relates to plugins pr-type/bug-fix This PR fixes a bug labels Dec 7, 2024
@klesh
Copy link
Copy Markdown
Contributor

klesh commented Dec 9, 2024

Thank you for your contribution!

Have you had a chance to test the code? If so, it would be great if you could share some screenshots to help us review. Thank you!

@ClaudioMascaro ClaudioMascaro force-pushed the feat/pagination-collect-jobs branch from 1d9685a to 83f3e97 Compare December 10, 2024 16:47
@ClaudioMascaro
Copy link
Copy Markdown
Contributor Author

Hey @klesh

After some testing, I have brought a final solution. All the evidence is in the PR description. Thanks

Copy link
Copy Markdown
Contributor

@klesh klesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for your contribution.
Would you like to submit another PR to the release-v1.0 branch so it can be released to the community sooner?

@klesh klesh merged commit 47b4014 into apache:main Dec 12, 2024
@ClaudioMascaro
Copy link
Copy Markdown
Contributor Author

@klesh Sure, here it is: #8240 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/plugins This issue or PR relates to plugins pr-type/bug-fix This PR fixes a bug size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug][Github] GraphQL API requests will eventually fail forever collecting large repositories data

2 participants