Search before asking
What happened
When collecting CircleCI pipelines, the time range specified in the sync policy has no effect on the data collected - pipelines are collected from before the specified date.
E.g. Sync policy settings set to collect from 1st June 2024:


Excerpt of data JSON blob from top row - has created_date and updated_date of 1st Feb 2024 (ie. 180 days ago from todays date - 2024-07-30):
{
"id" : "eae60b4c-7dcc-4293-8b00-45f18a494881",
"updated_at" : "2024-02-01T14:11:31.988Z",
"created_at" : "2024-02-01T14:11:31.988Z",
...
"trigger" : {
"received_at" : "2024-02-01T14:11:31.481Z",
"type" : "webhook",
...
},
...
}
What do you expect to happen
No CircleCI pipelines, workflows or jobs are collected from before the time range start point.
How to reproduce
- Set the time frame to a date before any CircleCI data retention period ends (e.g. if retention period is 90 days, set this to 30 days - see below).
- Run the DevLake pipeline
- Sort the
_raw_circleci_api_pipelines table by the created_at JSON property of the data column:
SELECT *
FROM _raw_circleci_api_pipelines
ORDER BY STR_TO_DATE(JSON_UNQUOTE(JSON_EXTRACT(CONVERT(data USING utf8mb4), '$.created_at')), '%Y-%m-%dT%H:%i:%s.%fZ') ASC;
- Compare the
created_at property to that set in the sync policy time range.
Anything else
This is in part due to the recent pagination fix on the plugin (#7770) - the pagination works but as the CircleCI API does not offer any date range pagination controls, the collector now loops through the pages until next_page_token is null, which is whenever the data retention limit is hit for the account (e.g. for me it is 180 days, but could be less/more, see here).
When subsequently attempting to collect the relevant workflows & jobs for the pipeline, this will return a 404 and error the DevLake pipeline for any data points that fall outside of the data retention range in a race condition vs. CircleCI cleaning up build data:
subtask collectWorkflows ended unexpectedly Wraps: (2) Retry exceeded 3 times calling /v2/pipeline/6b7c4513-56bd-4e0c-ad72-d562df7513b1/workflow. The last error was: Http DoAsync error calling [method:GET path:/v2/pipeline/6b7c4513-56bd-4e0c-ad72-d562df7513b1/workflow query:map[]]. Response: {:message "Pipeline not found"} (404) Error types: (1) *hintdetail.withDetail (2) *errors.errorString
There needs to be an additional check that the created_at property of the returned pipelines is not before the specified time range starting point.
Version
44c3ecb
Are you willing to submit PR?
Code of Conduct
Search before asking
What happened
When collecting CircleCI pipelines, the time range specified in the sync policy has no effect on the data collected - pipelines are collected from before the specified date.
E.g. Sync policy settings set to collect from 1st June 2024:

Excerpt of
dataJSON blob from top row - hascreated_dateandupdated_dateof 1st Feb 2024 (ie. 180 days ago from todays date - 2024-07-30):What do you expect to happen
No CircleCI pipelines, workflows or jobs are collected from before the time range start point.
How to reproduce
_raw_circleci_api_pipelinestable by thecreated_atJSON property of thedatacolumn:created_atproperty to that set in the sync policy time range.Anything else
This is in part due to the recent pagination fix on the plugin (#7770) - the pagination works but as the CircleCI API does not offer any date range pagination controls, the collector now loops through the pages until
next_page_tokenisnull, which is whenever the data retention limit is hit for the account (e.g. for me it is 180 days, but could be less/more, see here).When subsequently attempting to collect the relevant workflows & jobs for the pipeline, this will return a 404 and error the DevLake pipeline for any data points that fall outside of the data retention range in a race condition vs. CircleCI cleaning up build data:
There needs to be an additional check that the
created_atproperty of the returned pipelines is not before the specified time range starting point.Version
44c3ecb
Are you willing to submit PR?
Code of Conduct