Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚨🚨 🐛 Convex source fix skipped records #27226

Merged
merged 17 commits into from
Jun 21, 2023

Conversation

ldanilek
Copy link
Contributor

@ldanilek ldanilek commented Jun 9, 2023

What

There is a bug in the Convex source connector due pagination interacting with the default HttpAvailabilityStrategy.

Convex is a fairly simple source that has one pagination cursor per stream. In the existing code we store the cursor on the stream, and update the cursor in parse_response. This is the same cursor we use in the state property for checkpointing, and it's the same cursor used to construct request_params.

The bug comes from the usage of HttpAvailabilityStrategy which calls read_records on the stream to see if there are any records to read. We have conflicting constraints: read_records must bump the cursor so stream.state can be used as a checkpoint. But read_records must not bump the cursor because then the availability check skips records.

See community slack thread: https://airbytehq.slack.com/archives/C027KKE4BCZ/p1686329017267509

How

This PR solves the issue by bumping the cursor in next_page_token instead of parse_response, because next_page_token is only called after the first result is yielded from read_records. Therefore the availability check -- which discards the iterator after getting the first result -- does not call next_page_token and the records are no longer dropped.

Additionally, we add a new request header with the client version, so the server can detect whether the Airbyte connector is running with this fix. If the server detects a version prior to this PR, it will throw a 400 error and block syncing entirely.

Recommended reading order

  1. source_convex/source.py

🚨 User Impact 🚨

Are there any breaking changes? What is the end result perceived by the user?

I believe this change is a breaking change because the current connector skips records and is disabled server-side, so this applies:

the full dataset will need to be re-synced

For connector PRs, use this section to explain which type of semantic versioning bump occurs as a result of the changes. Refer to our Semantic Versioning for Connectors guidelines for more information. Breaking changes to connectors must be documented by an Airbyte engineer (PR author, or reviewer for community PRs) by using the Breaking Change Release Playbook.

If there are breaking changes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.

Pre-merge Actions

Expand the relevant checklist and delete the others.

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Unit & integration tests added

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.

@github-actions
Copy link
Contributor

github-actions bot commented Jun 9, 2023

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • PR name follows PR naming conventions
  • Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan and you've followed all steps in the Breaking Changes Checklist
  • Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • Secrets in the connector's spec are annotated with airbyte_secret
  • All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • You, or an Airbyter, have run /test successfully on this PR - or on a non-forked branch
  • You've updated the connector's metadata.yaml file (new!)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.

@marcosmarxm
Copy link
Member

marcosmarxm commented Jun 12, 2023

/test connector=connectors/source-convex

🕑 connectors/source-convex https://github.com/airbytehq/airbyte/actions/runs/5248049347
❌ connectors/source-convex https://github.com/airbytehq/airbyte/actions/runs/5248049347
🐛 https://gradle.com/s/4enpxkt6sow2a

Build Failed

Test summary info:

=========================== short test summary info ============================
FAILED test_core.py::TestConnection::test_check[inputs0] - AssertionError: as...
FAILED test_core.py::TestDiscovery::test_discover[inputs0] - docker.errors.Co...
ERROR test_core.py::TestDiscovery::test_defined_cursors_exist_in_schema[inputs0]
ERROR test_core.py::TestDiscovery::test_defined_refs_exist_in_schema[inputs0]
ERROR test_core.py::TestDiscovery::test_defined_keyword_exist_in_schema[inputs0-allOf]
ERROR test_core.py::TestDiscovery::test_defined_keyword_exist_in_schema[inputs0-not]
ERROR test_core.py::TestDiscovery::test_primary_keys_exist_in_schema[inputs0]
ERROR test_core.py::TestDiscovery::test_streams_has_sync_modes[inputs0] - doc...
ERROR test_core.py::TestDiscovery::test_additional_properties_is_true[inputs0]
ERROR test_core.py::TestDiscovery::test_backward_compatibility[inputs0] - doc...
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/plugin.py:63: Skipping TestBasicRead.test_read: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/plugin.py:63: Skipping TestFullRefresh.test_sequential_reads: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/plugin.py:63: Skipping TestIncremental.test_two_sequential_reads: not found in the config.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:100: The previous and actual specifications are identical.
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:695: This tests currently leads to too much failures. We need to fix the connectors at scale first.
============== 2 failed, 23 passed, 5 skipped, 8 errors in 27.29s ==============

@ldanilek
Copy link
Contributor Author

@marcosmarxm test_backward_compatibility is failing because the existing deployed connector is intentionally throwing errors on the Convex side to prevent broken syncs.

@airbytehq airbytehq deleted a comment from ldanilek Jun 21, 2023
@octavia-squidington-iii octavia-squidington-iii added the area/documentation Improvements or additions to documentation label Jun 21, 2023
@marcosmarxm marcosmarxm merged commit 77ea559 into airbytehq:master Jun 21, 2023
33 of 37 checks passed
sh4sh pushed a commit that referenced this pull request Jun 22, 2023
* update cursor in next_page_token, add request_headers

* .

* fix unit tests

* also fix SyncMode.incremental so it doesn't return duplicate records in race conditions

* fix client version in json_schemas

* fix version number

* fix sync_mode

* cache json_schemas to pass typecheck

* never mind

* fix sync_mode

* update metadata and doc

* backward compatiblity bypass

---------

Co-authored-by: Marcos Marx <marcosmarxm@users.noreply.github.com>
Co-authored-by: marcosmarxm <marcosmarxm@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation community connectors/source/convex
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants