Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source-Salesforce: Bulk API limited to 10,000 records #6122

Closed
jagannathsrs opened this issue Sep 16, 2021 · 6 comments · Fixed by #6209
Closed

Source-Salesforce: Bulk API limited to 10,000 records #6122

jagannathsrs opened this issue Sep 16, 2021 · 6 comments · Fixed by #6209

Comments

@jagannathsrs
Copy link
Contributor

jagannathsrs commented Sep 16, 2021

Enviroment

  • Airbyte version: 0.29.17-alpha
  • OS Version / Instance: AWS EC2
  • Deployment: Docker
  • Source Connector and version: salesforce 0.1.0
  • Destination Connector and version: snowflake 0.3.13
  • Severity: High
  • Step where error happened: Sync Job

Current Behavior

When using the Bulk API option for Salesforce source, it only pulls 10,000 records for any object.

Expected Behavior

Should be able to pull more than that since the SF Bulk API documentation does not specify any record limitation, only file size limitation.

I forked the SF repo and changed the limit

from 10,000 to 500,000 and it works better. Is there any reason this limit was applied in the first place?

Are you willing to submit a PR?

Yes

@jagannathsrs jagannathsrs added the type/bug Something isn't working label Sep 16, 2021
@marcosmarxm
Copy link
Member

thanks, @jagannathsrs! @yevhenii-ldv can you give your opinion on this?

@yevhenii-ldv
Copy link
Contributor

hello @jagannathsrs! This limit is specified for a one-time request to receive data on a stream, but there is pagination, which, when not all data is retrieved, asks for the next batch of data.
Please tell me which streams you have a problem with?

@jagannathsrs
Copy link
Contributor Author

jagannathsrs commented Sep 16, 2021

Hey @yevhenii-ldv! I am facing this issue with both custom and standard streams. So far I've tried Account, AccountContactRelation, Contact. All of them have more than 50k + records and only synced 10,000 records.
image

@yevhenii-ldv
Copy link
Contributor

@jagannathsrs Thanks a lot for the information, I'll see what might not work correctly.

@marcosmarxm marcosmarxm added the area/connectors Connector related issues label Sep 16, 2021
@jagannathsrs
Copy link
Contributor Author

jagannathsrs commented Sep 16, 2021

Here is some more data to help you debug. Row counts from the Airbyte native connector and from the modified connected where I updated the limit to 500,000 records limit.

Note the Did not sync is not an issue since I forgot to select that table while syncing.

SF Connector settings: BULK API, Sync Mode: Dedup+History, startDate: 2010

TABLE_NAME ROW_COUNT_NATIVE_CONNECTOR ROW_COUNT_MODIFIED_CONNECTOR
ACCOUNT 10001 12978
ACCOUNTCONTACTRELATION 10000 21669
AFFILIATE_NETWORK__C 68 68
BILLING_PROFILE__C 10000 77579
CONTACT 10000 14282
OPPORTUNITY 10000 89191
PRODUCT2 10000 201618
QUOTE_LINE__C 10000 282465
QUOTE__C 10000 88705
SITE_STORE__C 10000 11770
TENANT_NETWORK_STORE__C 10000 Did not Sync
TENANT__C 15 Did not Sync
USER 292 292

I am also attaching the logs from both the runs. Hope these help!
airbyte-sf-native-logs.txt
airbyte-sf-modified-logs.txt

@yevhenii-ldv
Copy link
Contributor

@jagannathsrs

We just merged this fix into master and released a new version of the connector.

Upgrade your connector to version 0.1.1 and get started. To upgrade your connector version, go to the admin panel in the left hand side of the UI, find this connector in the list, and input the latest connector version.

Please let us know if you have any further questions.

Enjoy!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment