Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source SalesForce: Remove pagination and query limits #25700

Merged
merged 14 commits into from May 19, 2023

Conversation

artem1205
Copy link
Collaborator

@artem1205 artem1205 commented Apr 30, 2023

What

Resolving https://github.com/airbytehq/oncall/issues/1787

How

Remove pagination and query limits; use only stream slices instead.
After these changes, connector successfully downloaded about 440k records.

Recommended reading order

  1. x.java
  2. y.python

🚨 User Impact 🚨

no breaking changes

Pre-merge Actions

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Unit & integration tests added

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.

@github-actions
Copy link
Contributor

github-actions bot commented Apr 30, 2023

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • PR name follows PR naming conventions
  • Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan and you've followed all steps in the Breaking Changes Checklist
  • Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • Secrets in the connector's spec are annotated with airbyte_secret
  • All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • You, or an Airbyter, have run /test successfully on this PR - or on a non-forked branch
  • You, or an Airbyter, have run /publish successfully on this PR - or on a non-forked branch
  • You've updated the connector's metadata.yaml file (new!)
  • The Octavia bot updated the source_definitions.yaml or destination_definitions.yaml, or you ran processResources manually (deprecated)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.

@artem1205
Copy link
Collaborator Author

artem1205 commented Apr 30, 2023

/test connector=connectors/source-salesforce

🕑 connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/4843479008
✅ connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/4843479008
Python tests coverage:

Name                                         Stmts   Miss  Cover
----------------------------------------------------------------
source_salesforce/utils.py                       8      0   100%
source_salesforce/__init__.py                    2      0   100%
source_salesforce/source.py                    102      6    94%
source_salesforce/streams.py                   409     32    92%
source_salesforce/api.py                       155     14    91%
source_salesforce/exceptions.py                  8      1    88%
source_salesforce/rate_limiting.py              22      3    86%
source_salesforce/availability_strategy.py      17      8    53%
----------------------------------------------------------------
TOTAL                                          723     64    91%
Name                                         Stmts   Miss  Cover
----------------------------------------------------------------
source_salesforce/__init__.py                    2      0   100%
source_salesforce/exceptions.py                  8      1    88%
source_salesforce/api.py                       155     21    86%
source_salesforce/availability_strategy.py      17      3    82%
source_salesforce/streams.py                   409     91    78%
source_salesforce/rate_limiting.py              22      6    73%
source_salesforce/source.py                    102     34    67%
source_salesforce/utils.py                       8      7    12%
----------------------------------------------------------------
TOTAL                                          723    163    77%

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:100: The previous and actual specifications are identical.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:578: The previous and actual discovered catalogs are identical.
================== 39 passed, 2 skipped in 1300.40s (0:21:40) ==================

@octavia-squidington-iii octavia-squidington-iii added the area/documentation Improvements or additions to documentation label Apr 30, 2023
@artem1205
Copy link
Collaborator Author

I'd like to make a publish with dev tag, and pin this version for Doxo.com customer only.

@artem1205
Copy link
Collaborator Author

artem1205 commented Apr 30, 2023

/publish connector=connectors/source-salesforce pre-release=true

🕑 Publishing the following connectors:
connectors/source-salesforce
https://github.com/airbytehq/airbyte/actions/runs/4844283865


Connector Version Did it publish? Were definitions generated?
connectors/source-salesforce 2.0.12-dev.e5a5ac90

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@maxi297
Copy link
Contributor

maxi297 commented May 4, 2023

Is a review still required on this @artem1205 ?

@artem1205
Copy link
Collaborator Author

@maxi297 , yes, i've Reduced slice interval to 30 days, we can try this once more in test Doxo connection

@lazebnyi lazebnyi requested a review from erohmensing May 9, 2023 10:32
@artem1205
Copy link
Collaborator Author

artem1205 commented May 18, 2023

/publish connector=connectors/source-salesforce pre-release=true

🕑 Publishing the following connectors:
connectors/source-salesforce
https://github.com/airbytehq/airbyte/actions/runs/5012429260


| Connector | Version | Did it publish? |
| --- | --- | --- | --- |
| connectors/source-salesforce | 2.0.13-dev.b206a930 | ✅ |

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested in production as AB testing and it worked so I'll approve

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested in production as AB testing and it worked so I'll approve

@artem1205
Copy link
Collaborator Author

artem1205 commented May 18, 2023

/publish connector=connectors/source-salesforce

🕑 Publishing the following connectors:
connectors/source-salesforce
https://github.com/airbytehq/airbyte/actions/runs/5017075910


| Connector | Version | Did it publish? |
| --- | --- | --- | --- |
| connectors/source-salesforce | 2.0.13 | ✅ |

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@@ -432,7 +432,7 @@ def download_data(self, url: str, chunk_size: int = 1024) -> tuple[str, str]:
# set filepath for binary data from response
tmp_file = os.path.realpath(os.path.basename(url))
with closing(self._send_http_request("GET", f"{url}/results", stream=True)) as response, open(tmp_file, "wb") as data_file:
response_encoding = response.apparent_encoding or response.encoding or self.encoding
response_encoding = response.encoding or self.encoding
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you clarify why you made this change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

response.apparent_encoding method tries to guess encoding by reading the whole response content. this caused OOM for large responses.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense!

@@ -579,7 +579,7 @@ def transform_empty_string_to_none(instance: Any, schema: Any):

class IncrementalRestSalesforceStream(RestSalesforceStream, ABC):
state_checkpoint_interval = 500
STREAM_SLICE_STEP = 120
STREAM_SLICE_STEP = 30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the impact of changing this? Do other salesforce syncs now take longer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By decreasing the window we make responses smaller and thus iterate over them quicker.
According to screen it takes roughly the same time to sync (or even slightly less):
First sync was performed using 2.0.13-dev.b206a930, next 2 using 2.0.9
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼

artem1205 and others added 2 commits May 18, 2023 23:59
# Conflicts:
#	airbyte-config-oss/init-oss/src/main/resources/seed/oss_registry.json
@sherifnada
Copy link
Contributor

I was just asking questions, don't block on my review!

# Conflicts:
#	airbyte-config-oss/init-oss/src/main/resources/seed/oss_registry.json
@artem1205 artem1205 enabled auto-merge (squash) May 18, 2023 23:04
artem1205 and others added 4 commits May 18, 2023 23:22
@artem1205 artem1205 merged commit 1c029bc into master May 19, 2023
22 of 26 checks passed
@artem1205 artem1205 deleted the artem1205/source-salesforce-OC-1787 branch May 19, 2023 10:13
nguyenaiden pushed a commit that referenced this pull request May 25, 2023
* Source SalesForce: remove pagination and query limits

* Source SalesForce: remove pagination and query limits

* Reduce slice interval to 30 days

* Source SalesForce : remove apparent_encoding guess for response;
Cause OOM for large responses

* Source SalesForce : bump versions


---------

Co-authored-by: artem1205 <artem1205@users.noreply.github.com>
marcosmarxm pushed a commit to natalia-miinto/airbyte that referenced this pull request Jun 8, 2023
* Source SalesForce: remove pagination and query limits

* Source SalesForce: remove pagination and query limits

* Reduce slice interval to 30 days

* Source SalesForce : remove apparent_encoding guess for response;
Cause OOM for large responses

* Source SalesForce : bump versions


---------

Co-authored-by: artem1205 <artem1205@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/salesforce
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants