New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Source SalesForce: Remove pagination and query limits #25700
Conversation
Before Merging a Connector Pull RequestWow! What a great pull request you have here! 🎉 To merge this PR, ensure the following has been done/considered for each connector added or updated:
If the checklist is complete, but the CI check is failing,
|
/test connector=connectors/source-salesforce
Build PassedTest summary info:
|
I'd like to make a publish with |
/publish connector=connectors/source-salesforce pre-release=true
if you have connectors that successfully published but failed definition generation, follow step 4 here |
Is a review still required on this @artem1205 ? |
@maxi297 , yes, i've Reduced slice interval to 30 days, we can try this once more in test Doxo connection |
airbyte-integrations/connectors/source-salesforce/source_salesforce/streams.py
Show resolved
Hide resolved
Cause OOM for large responses
# Conflicts: # docs/integrations/sources/salesforce.md
/publish connector=connectors/source-salesforce pre-release=true
| Connector | Version | Did it publish? | if you have connectors that successfully published but failed definition generation, follow step 4 here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested in production as AB testing and it worked so I'll approve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested in production as AB testing and it worked so I'll approve
/publish connector=connectors/source-salesforce
| Connector | Version | Did it publish? | if you have connectors that successfully published but failed definition generation, follow step 4 here |
@@ -432,7 +432,7 @@ def download_data(self, url: str, chunk_size: int = 1024) -> tuple[str, str]: | |||
# set filepath for binary data from response | |||
tmp_file = os.path.realpath(os.path.basename(url)) | |||
with closing(self._send_http_request("GET", f"{url}/results", stream=True)) as response, open(tmp_file, "wb") as data_file: | |||
response_encoding = response.apparent_encoding or response.encoding or self.encoding | |||
response_encoding = response.encoding or self.encoding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you clarify why you made this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
response.apparent_encoding
method tries to guess encoding by reading the whole response content. this caused OOM for large responses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense!
@@ -579,7 +579,7 @@ def transform_empty_string_to_none(instance: Any, schema: Any): | |||
|
|||
class IncrementalRestSalesforceStream(RestSalesforceStream, ABC): | |||
state_checkpoint_interval = 500 | |||
STREAM_SLICE_STEP = 120 | |||
STREAM_SLICE_STEP = 30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the impact of changing this? Do other salesforce syncs now take longer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏼
# Conflicts: # airbyte-config-oss/init-oss/src/main/resources/seed/oss_registry.json
I was just asking questions, don't block on my review! |
# Conflicts: # airbyte-config-oss/init-oss/src/main/resources/seed/oss_registry.json
# Conflicts: # airbyte-config-oss/init-oss/src/main/resources/seed/oss_registry.json
* Source SalesForce: remove pagination and query limits * Source SalesForce: remove pagination and query limits * Reduce slice interval to 30 days * Source SalesForce : remove apparent_encoding guess for response; Cause OOM for large responses * Source SalesForce : bump versions --------- Co-authored-by: artem1205 <artem1205@users.noreply.github.com>
* Source SalesForce: remove pagination and query limits * Source SalesForce: remove pagination and query limits * Reduce slice interval to 30 days * Source SalesForce : remove apparent_encoding guess for response; Cause OOM for large responses * Source SalesForce : bump versions --------- Co-authored-by: artem1205 <artem1205@users.noreply.github.com>
What
Resolving https://github.com/airbytehq/oncall/issues/1787
How
Remove pagination and query limits; use only stream slices instead.
After these changes, connector successfully downloaded about 440k records.
Recommended reading order
x.java
y.python
🚨 User Impact 🚨
no breaking changes
Pre-merge Actions
Updating a connector
Community member or Airbyter
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.