Skip to content

fix(circleci-plugin): pipeline collector time range#7820

Merged
klesh merged 4 commits into
apache:mainfrom
Nickcw6:fix/circleci-time-range
Aug 15, 2024
Merged

fix(circleci-plugin): pipeline collector time range#7820
klesh merged 4 commits into
apache:mainfrom
Nickcw6:fix/circleci-time-range

Conversation

@Nickcw6
Copy link
Copy Markdown
Contributor

@Nickcw6 Nickcw6 commented Aug 1, 2024

Summary

  • Fixes bug whereby CircleCI pipeline collector would continue to collect data and not consider the timeAfter config of the DevLake pipeline
  • Handle case where circleci workflow or job has been deleted

Does this close any open issues?

Closes #7797

Screenshots

Before:
Running DevLake pipeline with time range set to 2024-06-01 would pull in results from Feb (ie back 180 days until hitting circleci's data retention policy for my org):
image
image

After:
Only results from 2024-06-01 onwards are considered:
Screenshot 2024-07-31 at 19 26 56

Other Information

May have uncovered another bug in the CircleCI plugin whereby existing data is overwritten by a subsequent collection, see this comment Confirmed as intended behaviour for a full sync plugin

@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. component/plugins This issue or PR relates to plugins pr-type/bug-fix This PR fixes a bug severity/p1 This bug affects functionality or significantly affect ux labels Aug 1, 2024
// CircleCI API will return a 404 response for a workflow/job that has been deleted
// due to their data retention policy. We should ignore these errors.
if res.StatusCode == http.StatusNotFound {
return api.ErrIgnoreAndContinue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be api.ErrFinishCollect ? There would be no more data to be collected if the reason was "data retention", would it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With api.ErrFinishCollect it potentially misses workflow/jobs that are still available if the ci data isn't processed exactly in order from newest -> oldest. With api.ErrIgnoreAndContinue if that does occur those rows would still be collected.

E.g. say there are 600 pipelines to be collected, 1 has been deleted. These end up out of order and the deleted record gets processed as 300/600. The devlake pipeline would end with 300 rows uncollected.

There shouldn't be a significant number of deleted workflows & jobs for attempted collection (as the associated pipelines will also have been deleted), so the increase in devlake pipeline duration should be minimal & worth the reduced risk of missing rows.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah, I see. It makes total sense.
Sorry for taking so long I was on vacation and somehow missed the notification.

@Nickcw6
Copy link
Copy Markdown
Contributor Author

Nickcw6 commented Aug 14, 2024

Hey @klesh, is there an ETA on when this PR might be reviewed/merged?

Copy link
Copy Markdown
Contributor

@klesh klesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// CircleCI API will return a 404 response for a workflow/job that has been deleted
// due to their data retention policy. We should ignore these errors.
if res.StatusCode == http.StatusNotFound {
return api.ErrIgnoreAndContinue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah, I see. It makes total sense.
Sorry for taking so long I was on vacation and somehow missed the notification.

@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label Aug 15, 2024
@klesh klesh merged commit edc2412 into apache:main Aug 15, 2024
@klesh
Copy link
Copy Markdown
Contributor

klesh commented Aug 15, 2024

@Nickcw6 Thanks for your contribution.
Will release a new beta on Friday.

@jibsen-vh
Copy link
Copy Markdown
Contributor

@klesh looks like this was never cherry-picked into any beta or release. It's missing from 1.0.1 and from any of the 1.0.2 betas.
Thank you!
https://github.com/apache/incubator-devlake/blob/v1.0.1/backend/plugins/circleci/tasks/pipeline_collector.go
https://github.com/apache/incubator-devlake/blob/v1.0.2-beta6/backend/plugins/circleci/tasks/pipeline_collector.go

magichair pushed a commit to magichair/incubator-devlake that referenced this pull request Mar 18, 2025
* fix(circleci-plugin): only collect pipelines from after data time range

* fix(circleci-plugin): ignore 404 not found when collecting jobs or workflows
klesh pushed a commit that referenced this pull request Mar 19, 2025
* feat(circleci-plugin): incremental data collection (#7986)

* feat(api_collector_stateful): handle case were response has records from both before & after createdAfter of last collection

* feat(circleci-plugin): incremental collection for pipelines

* feat(api_collector_stateful): expose Input collector arg for StatefulFinalizableEntity to collect data based on previous collection

* feat(circleci-plugin): incremental data collection for workflows

* feat(circleci-plugin): incremental data collection for jobs

* refactor(circleci-plugin): use common query param function

* refactor(circleci-plugin): use BuildQueryParamsWithPageToken func when collecting unfinished job details

* fix(circleci-plugin): pipeline collector time range (#7820)

* fix(circleci-plugin): only collect pipelines from after data time range

* fix(circleci-plugin): ignore 404 not found when collecting jobs or workflows

* Cleanup from bad merge

---------

Co-authored-by: Nick Williams <65220492+Nickcw6@users.noreply.github.com>
Co-authored-by: John Ibsen <john@videa.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/plugins This issue or PR relates to plugins lgtm This PR has been approved by a maintainer needs-cherrypick-v1.0 pr-type/bug-fix This PR fixes a bug severity/p1 This bug affects functionality or significantly affect ux size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug][CircleCI Plugin] CircleCI pipelines collected from before time range

3 participants