Skip to content

Conversation

maxi297
Copy link
Contributor

@maxi297 maxi297 commented Jul 28, 2025

What

We still have a couple of usage of the declarative cursors. We would rather use the concurrent one because the declarative ones are maintained to a minimum. This will allow us to migrate to the DefaultStream and remove the DeclarativeStream down the line.

How

Summary by CodeRabbit

  • Refactor

    • Removed record comparison features from multiple cursor types to simplify data processing.
    • Simplified record tracking and slice closure in data retrieval for better maintainability.
  • Tests

    • Removed tests related to record comparison to reflect updated cursor behavior.

Copy link

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@maxi297/clean-declarative-stream#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch maxi297/clean-declarative-stream

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

@github-actions github-actions bot added the chore label Jul 28, 2025
Copy link
Contributor

coderabbitai bot commented Jul 28, 2025

📝 Walkthrough

Walkthrough

This change removes the is_greater_than_or_equal comparison method from multiple cursor classes and their base abstract class in both the declarative and checkpoint modules. Corresponding test cases and related logic in the SimpleRetriever class and its tests are also deleted, simplifying the codebase and removing now-unused comparison logic.

Changes

Cohort / File(s) Change Summary
Declarative Incremental Cursor Classes
airbyte_cdk/sources/declarative/incremental/datetime_based_cursor.py, .../global_substream_cursor.py, .../per_partition_cursor.py, .../per_partition_with_global.py, .../resumable_full_refresh_cursor.py
Removed the is_greater_than_or_equal method from all listed cursor classes.
Checkpoint Cursor Classes
airbyte_cdk/sources/streams/checkpoint/cursor.py, .../resumable_full_refresh_cursor.py, .../substream_resumable_full_refresh_cursor.py
Removed the abstract is_greater_than_or_equal method from Cursor and its implementations from the two resumable full refresh cursor classes.
SimpleRetriever Logic
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py
Removed _get_most_recent_record method and related logic from read_records, including import cleanup and simplification of cursor.close_slice call.
DatetimeBasedCursor Tests
unit_tests/sources/declarative/incremental/test_datetime_based_cursor.py
Deleted four test functions that validated the is_greater_than_or_equal method in DatetimeBasedCursor.
PerPartitionCursor Tests
unit_tests/sources/declarative/incremental/test_per_partition_cursor.py
Removed tests related to PerPartitionCursor.is_greater_than_or_equal, including error and delegation checks.
SimpleRetriever Tests
unit_tests/sources/declarative/retrievers/test_simple_retriever.py
Deleted the parameterized test ensuring SimpleRetriever.read_records called cursor.close_slice with the correct record based on cursor comparison logic.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Suggested reviewers

  • tolik0

Would you like to have a second reviewer with deep familiarity in the incremental sync logic, wdyt?

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0b3e5bd and 0f17729.

📒 Files selected for processing (12)
  • airbyte_cdk/sources/declarative/incremental/datetime_based_cursor.py (0 hunks)
  • airbyte_cdk/sources/declarative/incremental/global_substream_cursor.py (0 hunks)
  • airbyte_cdk/sources/declarative/incremental/per_partition_cursor.py (0 hunks)
  • airbyte_cdk/sources/declarative/incremental/per_partition_with_global.py (0 hunks)
  • airbyte_cdk/sources/declarative/incremental/resumable_full_refresh_cursor.py (0 hunks)
  • airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1 hunks)
  • airbyte_cdk/sources/streams/checkpoint/cursor.py (0 hunks)
  • airbyte_cdk/sources/streams/checkpoint/resumable_full_refresh_cursor.py (0 hunks)
  • airbyte_cdk/sources/streams/checkpoint/substream_resumable_full_refresh_cursor.py (0 hunks)
  • unit_tests/sources/declarative/incremental/test_datetime_based_cursor.py (0 hunks)
  • unit_tests/sources/declarative/incremental/test_per_partition_cursor.py (0 hunks)
  • unit_tests/sources/declarative/retrievers/test_simple_retriever.py (1 hunks)
💤 Files with no reviewable changes (10)
  • unit_tests/sources/declarative/incremental/test_per_partition_cursor.py
  • airbyte_cdk/sources/streams/checkpoint/cursor.py
  • airbyte_cdk/sources/declarative/incremental/per_partition_cursor.py
  • airbyte_cdk/sources/declarative/incremental/per_partition_with_global.py
  • airbyte_cdk/sources/declarative/incremental/resumable_full_refresh_cursor.py
  • airbyte_cdk/sources/declarative/incremental/global_substream_cursor.py
  • airbyte_cdk/sources/streams/checkpoint/resumable_full_refresh_cursor.py
  • airbyte_cdk/sources/declarative/incremental/datetime_based_cursor.py
  • airbyte_cdk/sources/streams/checkpoint/substream_resumable_full_refresh_cursor.py
  • unit_tests/sources/declarative/incremental/test_datetime_based_cursor.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • airbyte_cdk/sources/declarative/retrievers/simple_retriever.py
  • unit_tests/sources/declarative/retrievers/test_simple_retriever.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch maxi297/clean-declarative-stream

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1)

536-537: Consider prioritizing the cleanup referenced in the FIXME comment.

The FIXME comment suggests that the removed _get_most_recent_record logic can be fully eliminated as part of addressing the linked internal issue. Since this migration is removing legacy declarative cursor functionality, would it make sense to tackle that cleanup sooner rather than later to avoid accumulating technical debt, wdyt?

Would you like me to help track down any remaining references to the removed record tracking logic or generate a script to verify the cleanup is complete across the codebase?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 51cfea5 and 0b3e5bd.

📒 Files selected for processing (12)
  • airbyte_cdk/sources/declarative/incremental/datetime_based_cursor.py (0 hunks)
  • airbyte_cdk/sources/declarative/incremental/global_substream_cursor.py (0 hunks)
  • airbyte_cdk/sources/declarative/incremental/per_partition_cursor.py (0 hunks)
  • airbyte_cdk/sources/declarative/incremental/per_partition_with_global.py (0 hunks)
  • airbyte_cdk/sources/declarative/incremental/resumable_full_refresh_cursor.py (0 hunks)
  • airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1 hunks)
  • airbyte_cdk/sources/streams/checkpoint/cursor.py (0 hunks)
  • airbyte_cdk/sources/streams/checkpoint/resumable_full_refresh_cursor.py (0 hunks)
  • airbyte_cdk/sources/streams/checkpoint/substream_resumable_full_refresh_cursor.py (0 hunks)
  • unit_tests/sources/declarative/incremental/test_datetime_based_cursor.py (0 hunks)
  • unit_tests/sources/declarative/incremental/test_per_partition_cursor.py (0 hunks)
  • unit_tests/sources/declarative/retrievers/test_simple_retriever.py (0 hunks)
💤 Files with no reviewable changes (11)
  • unit_tests/sources/declarative/incremental/test_per_partition_cursor.py
  • airbyte_cdk/sources/declarative/incremental/global_substream_cursor.py
  • airbyte_cdk/sources/declarative/incremental/per_partition_with_global.py
  • airbyte_cdk/sources/declarative/incremental/per_partition_cursor.py
  • unit_tests/sources/declarative/incremental/test_datetime_based_cursor.py
  • airbyte_cdk/sources/streams/checkpoint/resumable_full_refresh_cursor.py
  • airbyte_cdk/sources/declarative/incremental/resumable_full_refresh_cursor.py
  • airbyte_cdk/sources/streams/checkpoint/substream_resumable_full_refresh_cursor.py
  • airbyte_cdk/sources/declarative/incremental/datetime_based_cursor.py
  • unit_tests/sources/declarative/retrievers/test_simple_retriever.py
  • airbyte_cdk/sources/streams/checkpoint/cursor.py
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1)

Learnt from: aaronsteers
PR: #58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in airbyte_cdk/cli/source_declarative_manifest/, including _run.py, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
🔇 Additional comments (1)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1)

533-533: LGTM - Simplified cursor.close_slice call aligns with the migration goals.

The removal of the most recent record parameter from cursor.close_slice(_slice) is consistent with eliminating the is_greater_than_or_equal comparison logic from cursor implementations. This simplification is a good step toward the concurrent cursor framework migration, wdyt?

Copy link

github-actions bot commented Jul 28, 2025

PyTest Results (Fast)

3 689 tests   - 11   3 678 ✅  - 11   6m 28s ⏱️ -16s
    1 suites ± 0      11 💤 ± 0 
    1 files   ± 0       0 ❌ ± 0 

Results for commit 0f17729. ± Comparison against base commit 51cfea5.

This pull request removes 11 tests.
unit_tests.sources.declarative.incremental.test_datetime_based_cursor ‑ test_given_first_greater_than_second_then_return_true
unit_tests.sources.declarative.incremental.test_datetime_based_cursor ‑ test_given_first_lesser_than_second_then_return_false
unit_tests.sources.declarative.incremental.test_datetime_based_cursor ‑ test_given_no_cursor_value_for_first_than_second_then_return_false
unit_tests.sources.declarative.incremental.test_datetime_based_cursor ‑ test_given_no_cursor_value_for_second_than_second_then_return_true
unit_tests.sources.declarative.incremental.test_per_partition_cursor ‑ test_given_records_with_different_slice_when_is_greater_than_or_equal_then_raise_error
unit_tests.sources.declarative.incremental.test_per_partition_cursor ‑ test_given_records_without_a_slice_when_is_greater_than_or_equal_then_raise_error[first record does not have a slice]
unit_tests.sources.declarative.incremental.test_per_partition_cursor ‑ test_given_records_without_a_slice_when_is_greater_than_or_equal_then_raise_error[second record does not have a slice]
unit_tests.sources.declarative.incremental.test_per_partition_cursor ‑ test_given_slice_is_unknown_when_is_greater_than_or_equal_then_raise_error
unit_tests.sources.declarative.incremental.test_per_partition_cursor ‑ test_when_is_greater_than_or_equal_then_return_underlying_cursor_response
unit_tests.sources.declarative.retrievers.test_simple_retriever ‑ test_when_read_records_then_cursor_close_slice_with_greater_record[test_first_greater_than_second-True]
…

♻️ This comment has been updated with latest results.

@maxi297 maxi297 force-pushed the maxi297/clean-declarative-stream branch from 0b3e5bd to 0f17729 Compare July 28, 2025 19:12
Copy link

PyTest Results (Full)

3 692 tests   - 11   3 681 ✅  - 11   11m 46s ⏱️ - 6m 25s
    1 suites ± 0      11 💤 ± 0 
    1 files   ± 0       0 ❌ ± 0 

Results for commit 0f17729. ± Comparison against base commit 51cfea5.

This pull request removes 11 tests.
unit_tests.sources.declarative.incremental.test_datetime_based_cursor ‑ test_given_first_greater_than_second_then_return_true
unit_tests.sources.declarative.incremental.test_datetime_based_cursor ‑ test_given_first_lesser_than_second_then_return_false
unit_tests.sources.declarative.incremental.test_datetime_based_cursor ‑ test_given_no_cursor_value_for_first_than_second_then_return_false
unit_tests.sources.declarative.incremental.test_datetime_based_cursor ‑ test_given_no_cursor_value_for_second_than_second_then_return_true
unit_tests.sources.declarative.incremental.test_per_partition_cursor ‑ test_given_records_with_different_slice_when_is_greater_than_or_equal_then_raise_error
unit_tests.sources.declarative.incremental.test_per_partition_cursor ‑ test_given_records_without_a_slice_when_is_greater_than_or_equal_then_raise_error[first record does not have a slice]
unit_tests.sources.declarative.incremental.test_per_partition_cursor ‑ test_given_records_without_a_slice_when_is_greater_than_or_equal_then_raise_error[second record does not have a slice]
unit_tests.sources.declarative.incremental.test_per_partition_cursor ‑ test_given_slice_is_unknown_when_is_greater_than_or_equal_then_raise_error
unit_tests.sources.declarative.incremental.test_per_partition_cursor ‑ test_when_is_greater_than_or_equal_then_return_underlying_cursor_response
unit_tests.sources.declarative.retrievers.test_simple_retriever ‑ test_when_read_records_then_cursor_close_slice_with_greater_record[test_first_greater_than_second-True]
…

@maxi297 maxi297 requested a review from brianjlai July 29, 2025 01:23
Copy link
Contributor

@brianjlai brianjlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just chat about the one case I mentioned, I might not be understanding this final case quite right. Otherwise no other comments, but can 👍 after

Copy link
Contributor

@brianjlai brianjlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

@maxi297 maxi297 changed the title chore: migrate cursors to concurrent framework chore: remove cursor.is_greater_than_or_equal Jul 30, 2025
@maxi297 maxi297 merged commit e4cbaaf into main Jul 30, 2025
26 of 27 checks passed
@maxi297 maxi297 deleted the maxi297/clean-declarative-stream branch July 30, 2025 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants