Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ISSUE #6829] update salesforce to support partitioned state #36942

Merged

Conversation

maxi297
Copy link
Contributor

@maxi297 maxi297 commented Apr 9, 2024

What

Addressing https://github.com/airbytehq/airbyte-internal-issues/issues/6829 to avoid stuck syncs.

Basically, syncs would get stuck when there was an error in a thread because we would try to shutdown the threads and threads that were already active would continue to sync without anything to consume the queue. To avoid that, we will run all the threads without breaking the main thread and save the state as a partitioned one so we only have to retry the slices that failed.

How

  • Update how we instantiate the ConcurrentCursor (__init__ have breaking changes)
  • Passing the cursor to the stream so that stream generation relies on the cursor and not the old stream_slices implementation

Manual Testing

Compared output for:

  • Accounts
  • AcceptedEventRelation (UNSUPPORTED_BULK_API_SALESFORCE_OBJECTS)
  • ContentDocumentLink (PARENT_SALESFORCE_OBJECTS): this helped me understand an issue that we had in production (see this)

馃毃 User Impact 馃毃

This is a breaking change as the state format will change. It does not need opt-in mechanism because the new version of the connector can understand the previous state format. However we can't revert this change as the old version doesn't understand the state of the new version.

This should allow threads to continue sync records even though one failed.

Copy link

vercel bot commented Apr 9, 2024

The latest updates on your projects. Learn more about Vercel for Git 鈫楋笌

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs 猬滐笍 Ignored (Inspect) Visit Preview Apr 11, 2024 0:33am

@octavia-squidington-iv octavia-squidington-iv requested review from a team April 9, 2024 21:17
@maxi297 maxi297 requested review from bazarnov and removed request for bazarnov April 9, 2024 21:18
@octavia-squidington-iii octavia-squidington-iii added the area/documentation Improvements or additions to documentation label Apr 10, 2024
@@ -404,131 +406,6 @@ def configure_request_params_mock(stream_1, stream_2):
stream_2.request_params.return_value = {"q": "query"}


def test_rate_limit_bulk(stream_config, stream_api, bulk_catalog, state):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are very hard to maintain as they rely on private things to run for example page_size. Also, the way they are instantiated makes then not have the ConcurrentCursor as it relies on aggressively re-implementing source methods (like source.streams = Mock()). We will move these to mock server tests. We have already added a test than ensure it retries on 406 so we know the integration with the error handling works so I'm not too worried in removing this test specifically.

@@ -949,15 +826,15 @@ def test_bulk_stream_error_on_wait_for_job(requests_mock, stream_config, stream_
@freezegun.freeze_time("2023-01-01")
@pytest.mark.parametrize(
"lookback, stream_slice_step, expected_len_stream_slices, expect_error",
[(None, "P30D", 0, True), (0, "P30D", 158, False), (10, "P1D", 4732, False), (10, "PT12H", 9463, False), (-1, "P30D", 0, True)],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the tests as they are kind of weird cases where either lookback is invalid or "we change lookback even though it is a constant". We have moved some of those tests to test_slice_generation.py

@maxi297 maxi297 requested a review from bazarnov April 10, 2024 03:16
@maxi297
Copy link
Contributor Author

maxi297 commented Apr 10, 2024

I'll add one test tomorrow morning to show that the new code supports the two state formats

@maxi297 maxi297 requested a review from girarda April 10, 2024 15:03
Copy link
Contributor

@girarda girarda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@maxi297 maxi297 merged commit 2617a03 into master Apr 11, 2024
29 checks passed
@maxi297 maxi297 deleted the issue-6829/update-source-salesforce-to-have-partitioned-state branch April 11, 2024 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/salesforce
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants