Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source Intercom: Support high volume syncs #11595

Open
sherifnada opened this issue Mar 30, 2022 · 13 comments
Open

Source Intercom: Support high volume syncs #11595

sherifnada opened this issue Mar 30, 2022 · 13 comments

Comments

@sherifnada
Copy link
Contributor

sherifnada commented Mar 30, 2022

Tell us about the problem you're trying to solve

A user was trying to sync a high volume instance of Intercom (logs below). The connector spent many hours (50+ hours) syncing data from the contacts stream. This is a bad user experience as it does not allow them to make use of the product and data quickly.
logs-97415.txt

Note that this issue is not just trying to do this for intercom, but should be used as a learning opportunity for how this can be done at the CDK level as described in airbytehq/airbyte-internal-issues#504

Describe the solution you’d like

I would like us to find a way to speed up intercom syncs coming from high volume instances such as this one. Ideally a sync takes no longer than a couple of hours in 99% of cases.

@lazebnyi
Copy link
Collaborator

@sherifnada did you think issue with performance connected is source issue.

How I understand @alafanechere talk about that here: #12671 (comment)

@lazebnyi lazebnyi self-assigned this May 19, 2022
@sherifnada
Copy link
Contributor Author

@lazebnyi this should not block certifications atm

@lazebnyi lazebnyi removed their assignment May 23, 2022
@misteryeo
Copy link
Contributor

Team, let's pick this back up alongside an investigation of: https://github.com/airbytehq/oncall/issues/274. Please reach out to @sherifnada when you dig in to gain access to the impacted workspace.

@bazarnov
Copy link
Collaborator

@sherifnada
Can I have the creds to this high volume data account to proceed with tests?

@marcosmarxm
Copy link
Member

Another complain in Discourse: https://airbyte7538.zendesk.com/agent/tickets/1459

And from other Intercom issue in github #12506 looks contact took 15h to finished, in this case the stream has the majority of data (1mm records)

2022-05-02 00:53:35 �[44msource�[0m > Read 1002750 records from contacts stream
2022-05-02 00:53:35 �[44msource�[0m > Finished syncing contacts
2022-05-02 00:53:35 �[44msource�[0m > SourceIntercom runtimes:
Syncing stream admins 0:00:02.460130
Syncing stream contacts 15:01:43.301241
2022-05-02 00:53:35 �[44msource�[0m > Syncing stream: tags 
2022-05-02 00:53:37 �[32mINFO�[m i.a.w.DefaultReplicationWorker(lambda$getReplicationRunnable$5):301 - Records read: 1004000 (1 GB)
2022-05-02 00:53:37 �[44msource�[0m > Read 1268 records from tags stream
2022-05-02 00:53:37 �[44msource�[0m > Finished syncing tags
2022-05-02 00:53:37 �[44msource�[0m > SourceIntercom runtimes:
Syncing stream admins 0:00:02.460130
Syncing stream contacts 15:01:43.301241
Syncing stream tags 0:00:01.593165

@marcosmarxm
Copy link
Member

Zendesk ticket #1459 has been linked to this issue.

@marcosmarxm
Copy link
Member

Comment made from Zendesk by Marcos Marx on 2022-07-05 at 12:28:

Hello Alelxis, there is one issue in Github #11595 about improving Intercom speed. I saw the code implementation and this stream doesn't have any special code compared to others streams (companies, tags, segments). In any case I'll return to you when the issue is resolved.

@IzioDev
Copy link

IzioDev commented Aug 5, 2022

We disabled incremental for the contacts stream, swapped from /contacts/search (POST) to /contacts (GET) and this solves the request throttle.

@bazarnov
Copy link
Collaborator

bazarnov commented Aug 5, 2022

We disabled incremental for the contacts stream, swapped from /contacts/search (POST) to /contacts (GET) and this solves the request throttle.

How many records do you have for contacts stream?

@IzioDev
Copy link

IzioDev commented Aug 5, 2022

We disabled incremental for the contacts stream, swapped from /contacts/search (POST) to /contacts (GET) and this solves the request throttle.

How many records do you have for contacts stream?

More than 9Gb according to logs

@sherifnada
Copy link
Contributor Author

Potentially one promising direction here is to use the export functionality of the intercom API. More information here: https://developers.intercom.com/intercom-api-reference/reference/export-job-model

@mrhallak
Copy link
Contributor

mrhallak commented Sep 7, 2022

@sherifnada We are currently facing this issue with company_segments taking at least 12 hours

@bazarnov bazarnov removed their assignment Oct 18, 2022
@bazarnov
Copy link
Collaborator

bazarnov commented Feb 9, 2023

@sherifnada
The link https://developers.intercom.com/intercom-api-reference/reference/export-job-model is not available.
Instead, this one works fine: https://developers.intercom.com/intercom-api-reference/reference/the-export-job-model

the Export-Jobs are available for the Messages stream only and used along with the Unstable API version.
More context here: #9188 (comment)

Unfortunately, we cannot use it for all streams available for now.

@mrhallak
As for the company_segments stream, it's slow in its nature, since depends on the Companies stream. Both of them don't allow filtering out the records on the API side, thus we have to fetch all of the data from both of them and then filter the latest. There is no workaround for this right now.

The general speed of the connector has already been tuned to its max, considering rate limits and caching strategy. The other option is to make dependent streams call their endpoints in async mode (in the theory of course)

@bleonard bleonard added the frozen Not being actively worked on label Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Non-groomed backlog
Status: Backlog (unscoped)
Development

No branches or pull requests

9 participants