Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source Hubspot: Engagements stream rate-limited in incremental sync #26365

Closed
1 task
nataliekwong opened this issue May 22, 2023 · 4 comments · Fixed by #27161
Closed
1 task

Source Hubspot: Engagements stream rate-limited in incremental sync #26365

nataliekwong opened this issue May 22, 2023 · 4 comments · Fixed by #27161
Assignees
Labels
area/connectors Connector related issues team/connectors-python type/bug Something isn't working

Comments

@nataliekwong
Copy link
Contributor

Connector Name

source-hubspot

Connector Version

0.2.3

What step the error happened?

During the sync

Revelant information

The Hubspot connector has a stream "Engagements".

That stream erroneously uses this endpoint (https://legacydocs.hubspot.com/docs/methods/engagements/get-recent-engagements) for incremental syncs, which was changed in this PR: #8887. The issue with this approach is that it is rate limited to only 10,000 records.

That means that a user, when selecting an incremental sync mode for this stream, will only receive a maximum of 10,000 records. This problem has been observed here: #23630. This means the user experience differs when selecting Full Refresh or Incremental. This is particularly important for the Engagements stream because it is event-based, which means it contains 1 row per event and is naturally a table with many rows.

The stream can use this endpoint instead: https://legacydocs.hubspot.com/docs/methods/engagements/get-all-engagements. As a result, when syncing incrementally, more than 10,000 records will be synced.

Relevant log output

No response

Contribute

  • Yes, I want to contribute
@nataliekwong
Copy link
Contributor Author

To be added to the Epic for stream parity

@sherifnada
Copy link
Contributor

thank you @nataliekwong for the incredible detail in the ticket. Super helpful.

@sherifnada
Copy link
Contributor

sherifnada commented May 25, 2023

Some notes after reading the API docs:

  • Getting all engagements (/engagements/v1/engagements/paged) means we might no longer be doing incremental for this stream. Since this is a stream with many rows, this is not great. Ideally we find a way to get both incremental sync and avoid the 10k limit.
  • I can see a few ways ways to do achieve both goals (incremental, and more than 10k records):
    • repeatedly use the incremental endpoint (/engagements/v1/engagements/recent/modified). Assuming it returns data in ASC order, we can call it as many times as needed to get all the data, each time passing the largest updated_at value we saw in the previous response, until we're done. This only works if it returns data ASC though. The docs make no guarantees about this.
    • if the above approach doesn't work, then if the /engagements/v1/engagements/paged endpoint for getting all engagements returns data DESC, we can get all the data in a DESC order then exit the sync once we've synced all the data up to the cursor value from the previous sync
    • Alternatively, the incremental endpoint returns a total field which says how many fields will be in the response. If total >= 10000 then use the full refresh endpoint, otherwise use the incremental endpoint. If we go with this approach, in the case where we ned up using the full refresh stream, we could filter results generated/updated before the input cursor value (client-side incremental) to at least not rewrite the entire stream to the destination. In all cases we should make it clear to the user that we may be running a full_refresh so they're not surprised if their sync takes a lot longer.
  • A separate question we need to answer is: should we be using the newer v3 CRM API to pull engagement data? Why aren't we using that and still on the legacy API? We can use the same solutions listed above except using the CRM Search API for incremental and the dedicated engagement endpoints for full refresh.

@nataliekwong
Copy link
Contributor Author

nataliekwong commented Jun 29, 2023

@midavadim Is it possible to link the PR to this issue as well? #23630

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues team/connectors-python type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants