-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🚨🚨 🐛 Convex source fix skipped records #27226
🚨🚨 🐛 Convex source fix skipped records #27226
Conversation
Before Merging a Connector Pull RequestWow! What a great pull request you have here! 🎉 To merge this PR, ensure the following has been done/considered for each connector added or updated:
If the checklist is complete, but the CI check is failing,
|
/test connector=connectors/source-convex
Build FailedTest summary info:
|
…in race conditions
…anilek/airbyte into lee/convex-source-fix-skipped
@marcosmarxm |
* update cursor in next_page_token, add request_headers * . * fix unit tests * also fix SyncMode.incremental so it doesn't return duplicate records in race conditions * fix client version in json_schemas * fix version number * fix sync_mode * cache json_schemas to pass typecheck * never mind * fix sync_mode * update metadata and doc * backward compatiblity bypass --------- Co-authored-by: Marcos Marx <marcosmarxm@users.noreply.github.com> Co-authored-by: marcosmarxm <marcosmarxm@gmail.com>
What
There is a bug in the Convex source connector due pagination interacting with the default
HttpAvailabilityStrategy
.Convex is a fairly simple source that has one pagination cursor per stream. In the existing code we store the cursor on the stream, and update the cursor in
parse_response
. This is the same cursor we use in thestate
property for checkpointing, and it's the same cursor used to constructrequest_params
.The bug comes from the usage of
HttpAvailabilityStrategy
which callsread_records
on the stream to see if there are any records to read. We have conflicting constraints:read_records
must bump the cursor sostream.state
can be used as a checkpoint. Butread_records
must not bump the cursor because then the availability check skips records.See community slack thread: https://airbytehq.slack.com/archives/C027KKE4BCZ/p1686329017267509
How
This PR solves the issue by bumping the cursor in
next_page_token
instead ofparse_response
, becausenext_page_token
is only called after the first result is yielded fromread_records
. Therefore the availability check -- which discards the iterator after getting the first result -- does not callnext_page_token
and the records are no longer dropped.Additionally, we add a new request header with the client version, so the server can detect whether the Airbyte connector is running with this fix. If the server detects a version prior to this PR, it will throw a 400 error and block syncing entirely.
Recommended reading order
source_convex/source.py
🚨 User Impact 🚨
Are there any breaking changes? What is the end result perceived by the user?
I believe this change is a breaking change because the current connector skips records and is disabled server-side, so this applies:
For connector PRs, use this section to explain which type of semantic versioning bump occurs as a result of the changes. Refer to our Semantic Versioning for Connectors guidelines for more information. Breaking changes to connectors must be documented by an Airbyte engineer (PR author, or reviewer for community PRs) by using the Breaking Change Release Playbook.
If there are breaking changes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.
Pre-merge Actions
Expand the relevant checklist and delete the others.
Community member or Airbyter
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.