Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DatetimeBasedCursor should support incremental syncs without steps #26570

Closed
5 tasks
maxi297 opened this issue May 25, 2023 · 2 comments
Closed
5 tasks

DatetimeBasedCursor should support incremental syncs without steps #26570

maxi297 opened this issue May 25, 2023 · 2 comments
Assignees
Labels

Comments

@maxi297
Copy link
Contributor

maxi297 commented May 25, 2023

What area the feature impact?

Connectors

Revelant Information

Tell us about the problem you're trying to solve

Today when configuring an incremental sync, the concept of step and granularity is very confusing for a couple of reasons:

  • It is implementation details and the user doesn't really understand why
  • The possible values are complicated

It is possible to create a sync that is incremental without providing those. Therefore, those values should be optional.

Describe the solution you’d like

Bundle step and granularity together and make them optional. This means that this YAML component is valid:

    incremental_sync:
      type: "DatetimeBasedCursor"
      start_datetime:
        datetime: "{{ config['start_date'] }}"
        datetime_format: "%Y-%m"
      datetime_format: "%Y-%m-%dT%H:%M:%S%z"
      cursor_field: "pub_date"

which would lead to only one slice.

However, we can define it like this:

    incremental_sync:
      type: "DatetimeBasedCursor"
      start_datetime:
        datetime: "{{ config['start_date'] }}"
        datetime_format: "%Y-%m"
      step: "P1M"
      cursor_granularity: "PT1S"
      datetime_format: "%Y-%m-%dT%H:%M:%S%z"
      cursor_field: "pub_date"

... which will create multiple slices. We can set the step totimedelta.max within the DatetimeBasedCursor or something when the step is not provided. It should stop iterating at end_datetime so it should be fine.

Acceptance Criteria

  • Implement the proposed solution in the CDK
    • Throw and error when only one between step and cursor_granularity is defined
  • Deprecated step and cursor_granularity
  • Update Connector Builder to reflect that. Right now, step is marked as “advanced” but not the cursor granularity and therefore those concepts are far away even though they are linked. It should be clear that both go together. We should as well set them both behind a checkbox under “Advanced”
  • Update the documentation (docs folder and declarative_component_schema.yaml)
@lmossman
Copy link
Contributor

lmossman commented May 26, 2023

A connector builder user has run into an issue that I believe is related to Step. Pasting my email response here for context:

I have a theory as to what is going on here, though it is hard for me to test it without access to your account, so I'd like you to try out a change to see if it fixes the problem.

What I think may be happening here is that because your connector's Incremental Sync section is only configured to pass a "start time" to the API and not any "end time", our connector framework is not able to break up the entire set of results into separate timespan chunks, because it can only provide a start time to the API.

The problem with this is that the "Step" configuration (in the "Advanced" part of the Incremental Sync section) defaults to P1M (i.e. 1 month), which means the connector is trying to break up the requests into 1-month timeframe chunks to have better checkpointing in the case of failed syncs. But since it cannot provide an end time to the API, it ends up re-requesting the entire set of records, but with start times that are increased by 1 month at a time. You can read more about Step in our docs here if you are interested.

TL;DR: Because end times are not being passed to the API, the Step configuration is causing this to re-request data that has already been requested.

To fix this, please try setting the Step value to a very large timeframe, such as P1000Y (i.e. 1000 years), which will mean that the low-code framework will not try to break up the timeframe into chunks and it should work as expected. Requests will still be paginated of course.

Another way to fix this would be to configure your connector to also pass an end date to the API, but this is likely more complicated to do.

@maxi297
Copy link
Contributor Author

maxi297 commented May 30, 2023

Grooming notes:

  • Make them optional and once we have a way to migrate these, migrate them to the new format. In the meanwhile, UI would need to support both

At first, the proposition was to have a steps subobject like this:

    incremental_sync:
      type: "DatetimeBasedCursor"
      start_datetime:
        datetime: "{{ config['start_date'] }}"
        datetime_format: "%Y-%m"
      steps:
        duration: "P1M"
        cursor_granularity: "PT1S"
      datetime_format: "%Y-%m-%dT%H:%M:%S%z"
      cursor_field: "pub_date"

However, this would implies a migration. Therefore, we will keep the same schema for now

maxi297 added a commit that referenced this issue Jun 5, 2023
maxi297 added a commit that referenced this issue Jun 5, 2023
* [ISSUE #26570] make step and cursor_granularity optional

* [ISSUE #26570] fix typos
@maxi297 maxi297 closed this as completed Jun 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants