Skip to content

Conversation

brianjlai
Copy link
Contributor

@brianjlai brianjlai commented Sep 3, 2025

Note: This should not be merged until after #707 since this PR depends on logic implemented there. And this will fail a lot of tests until after that change is rebased into this one

What

Warning this will be breaking and part of the CDK release to v7

Now that the ConcurrentDeclarativeSource has been updated to only receive and process DefaultStream from the model to component factory here, we can significantly simplify how we handle connector operations and process syncs without account for the legacy synchronous flow.

All the legacy DeclarativeStream and AbstractStream inheritance has been extracted from ConcurrentDeclarativeSource.

How

Broadly the big changes are:

  • ConcurrentDeclarativeSource no longer implements AbstractSource and instead directly implements the primary Source interface
  • There is no more grouping logic because all streams are DefaultStream.
  • Methods like message_repository() are removed since only AbstractSource required them
  • Simplified some of the connection check logic. Ideally we would have streamlined this more, but because the ConnectionChecker component takes in the source itself. The CheckStream component does a lot of shenanigans withe source object itself so extracting it is a bit annoying
  • Moved all legacy (non-concurrent) cursors to legacy
  • Moved DeclarativeStream to the legacy folder

Something worth noting is that I did not move AbstractSource to the legacy package. It has a lot of dependencies in the file CDK, scenario builder, and various Legacy <> Synchronous CDK adapters, so its not quite a full on legacy class yet. Despite it still being in the main package, it has been fully extracted out of the ConcurrentDeclarativeSource flow so we're still left with a streamline code flow when processing all low-code syncs.

Out of scope

Two things that I've left out of scope to make the change more manageable and because they should have minimal impact on the customer facing interface

  • SimpleRetriever still has state methods and fields for legacy cursors. These should be more easily extractable because they're not invoked in the flow anymore
  • The AbstractStream still makes references to things like RFR and checkpoint readers. While these are effectively legacy components, they're not referenced by the concurrent CDK so cleanup is neither urgent nor breaking.

Summary by CodeRabbit

  • New Features

    • check(logger, config) now returns a status object (Status.SUCCEEDED / Status.FAILED).
    • streams(config) accessor added to retrieve declarative stream lists.
  • Refactor

    • Declarative sources now use the public Source typing and a unified config-driven flow for concurrent sources.
    • Several declarative cursor types and stream implementations moved to legacy namespaces; internal concurrency utilities reorganized.
    • Removed deprecation marker from AbstractStream.
  • Removal

    • CustomIncrementalSync removed; incremental sync supports only DatetimeBasedCursor and IncrementingCountCursor.
  • Tests

    • Unit tests updated to use the new status/check API, stream-access patterns, and adjusted imports.

…treamline inheritance to Source interface, deprecate legacy components
Copy link

github-actions bot commented Sep 3, 2025

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@brian/streamline_cds_move_legacy_components#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch brian/streamline_cds_move_legacy_components

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

@github-actions github-actions bot added the chore label Sep 3, 2025
Copy link

github-actions bot commented Sep 3, 2025

PyTest Results (Fast)

3 742 tests   - 10   3 730 ✅  - 10   6m 12s ⏱️ -11s
    1 suites ± 0      12 💤 ± 0 
    1 files   ± 0       0 ❌ ± 0 

Results for commit e6f9485. ± Comparison against base commit dd52cfe.

This pull request removes 82 and adds 72 tests. Note that renamed tests count towards both.
unit_tests.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_latest_record_cursor_value_is_higher_than_slice_end-2021-01-01-stream_slice2-observed_records2-expected_state2]
unit_tests.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_previous_cursor_is_highest-2023-01-01-stream_slice0-observed_records0-expected_state0]
unit_tests.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_stream_slice_partition_end_is_highest-2020-01-01-stream_slice1-observed_records1-expected_state1]
unit_tests.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_all_records_out_of_slice_and_no_previous_cursor-None-stream_slice9-observed_records9-expected_state9]
unit_tests.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_all_records_out_of_slice_boundaries-2021-01-01-stream_slice8-observed_records8-expected_state8]
unit_tests.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_no_records_observed-2021-01-01-stream_slice3-observed_records3-expected_state3]
unit_tests.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_no_records_observed_and_no_previous_state-None-stream_slice4-observed_records4-expected_state4]
unit_tests.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_out_of_order_records-2021-01-01-stream_slice6-observed_records6-expected_state6]
unit_tests.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_some_records_out_of_slice_boundaries-2021-01-01-stream_slice7-observed_records7-expected_state7]
unit_tests.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_without_previous_cursor-None-stream_slice5-observed_records5-expected_state5]
…
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_latest_record_cursor_value_is_higher_than_slice_end-2021-01-01-stream_slice2-observed_records2-expected_state2]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_previous_cursor_is_highest-2023-01-01-stream_slice0-observed_records0-expected_state0]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_stream_slice_partition_end_is_highest-2020-01-01-stream_slice1-observed_records1-expected_state1]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_all_records_out_of_slice_and_no_previous_cursor-None-stream_slice9-observed_records9-expected_state9]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_all_records_out_of_slice_boundaries-2021-01-01-stream_slice8-observed_records8-expected_state8]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_no_records_observed-2021-01-01-stream_slice3-observed_records3-expected_state3]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_no_records_observed_and_no_previous_state-None-stream_slice4-observed_records4-expected_state4]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_out_of_order_records-2021-01-01-stream_slice6-observed_records6-expected_state6]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_some_records_out_of_slice_boundaries-2021-01-01-stream_slice7-observed_records7-expected_state7]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_without_previous_cursor-None-stream_slice5-observed_records5-expected_state5]
…

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented Sep 3, 2025

PyTest Results (Full)

3 745 tests   - 10   3 733 ✅  - 10   10m 49s ⏱️ -20s
    1 suites ± 0      12 💤 ± 0 
    1 files   ± 0       0 ❌ ± 0 

Results for commit e6f9485. ± Comparison against base commit dd52cfe.

This pull request removes 82 and adds 72 tests. Note that renamed tests count towards both.
unit_tests.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_latest_record_cursor_value_is_higher_than_slice_end-2021-01-01-stream_slice2-observed_records2-expected_state2]
unit_tests.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_previous_cursor_is_highest-2023-01-01-stream_slice0-observed_records0-expected_state0]
unit_tests.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_stream_slice_partition_end_is_highest-2020-01-01-stream_slice1-observed_records1-expected_state1]
unit_tests.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_all_records_out_of_slice_and_no_previous_cursor-None-stream_slice9-observed_records9-expected_state9]
unit_tests.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_all_records_out_of_slice_boundaries-2021-01-01-stream_slice8-observed_records8-expected_state8]
unit_tests.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_no_records_observed-2021-01-01-stream_slice3-observed_records3-expected_state3]
unit_tests.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_no_records_observed_and_no_previous_state-None-stream_slice4-observed_records4-expected_state4]
unit_tests.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_out_of_order_records-2021-01-01-stream_slice6-observed_records6-expected_state6]
unit_tests.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_some_records_out_of_slice_boundaries-2021-01-01-stream_slice7-observed_records7-expected_state7]
unit_tests.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_without_previous_cursor-None-stream_slice5-observed_records5-expected_state5]
…
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_latest_record_cursor_value_is_higher_than_slice_end-2021-01-01-stream_slice2-observed_records2-expected_state2]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_previous_cursor_is_highest-2023-01-01-stream_slice0-observed_records0-expected_state0]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_stream_slice_partition_end_is_highest-2020-01-01-stream_slice1-observed_records1-expected_state1]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_all_records_out_of_slice_and_no_previous_cursor-None-stream_slice9-observed_records9-expected_state9]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_all_records_out_of_slice_boundaries-2021-01-01-stream_slice8-observed_records8-expected_state8]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_no_records_observed-2021-01-01-stream_slice3-observed_records3-expected_state3]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_no_records_observed_and_no_previous_state-None-stream_slice4-observed_records4-expected_state4]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_out_of_order_records-2021-01-01-stream_slice6-observed_records6-expected_state6]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_some_records_out_of_slice_boundaries-2021-01-01-stream_slice7-observed_records7-expected_state7]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_without_previous_cursor-None-stream_slice5-observed_records5-expected_state5]
…

♻️ This comment has been updated with latest results.

Copy link
Contributor

coderabbitai bot commented Sep 3, 2025

📝 Walkthrough

Walkthrough

Routes DeclarativeStream and cursor types to legacy modules, removes CustomIncrementalSync from schema/models/parsing, refactors ConcurrentDeclarativeSource to subclass Source with a Status-based check and public streams(config) API, adds local utilities to concurrent partition cursor, and updates many tests/imports accordingly.

Changes

Cohort / File(s) Summary
Top-level exports
airbyte_cdk/__init__.py
Repointed public imports for DeclarativeStream and DatetimeBasedCursor to legacy module paths; exported names unchanged.
Legacy declarative package
airbyte_cdk/legacy/sources/declarative/declarative_stream.py, airbyte_cdk/legacy/sources/declarative/incremental/...
Adjusted imports to reference legacy incremental modules; added re-exports in legacy incremental __init__.py exposing multiple cursor classes.
Declarative incremental (non-legacy) API reduction
airbyte_cdk/sources/declarative/incremental/__init__.py
Removed numerous cursor exports (leaving concurrent cursor variants), shrinking the non-legacy public API surface.
Concurrent partition cursor utilities
airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py
Inlined iterate_with_last_flag_and_state (generic) and introduced a local Timer class; removed previous external imports.
ConcurrentDeclarativeSource refactor
airbyte_cdk/sources/declarative/concurrent_declarative_source.py
Base class changed to Source; centralizes config in self._config; removed legacy grouping helpers; adds public streams(self, config) and check(self, logger, config) returning Status; stream construction updated.
Model factory / partition routing
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
Migrated runtime types to legacy imports; removed CustomIncrementalSyncModel; added partition_router: Optional[PartitionRouter] to create_simple_retriever and route request options via it; adjusted default stream return typing.
CustomIncrementalSync removal
airbyte_cdk/sources/declarative/declarative_component_schema.yaml, .../models/declarative_component_schema.py, .../parsers/manifest_component_transformer.py, .../migrations/legacy_to_per_partition_state_migration.py
Deleted CustomIncrementalSync schema/model, removed related DEFAULT_MODEL_TYPES entries, and narrowed migration ctor to accept only DatetimeBasedCursor.
Checks & typing updates
airbyte_cdk/sources/declarative/checks/*
Replaced AbstractSource annotations with public Source; check APIs now use a Status-bearing result object instead of (bool, message) tuples.
Retrievers & simple retriever
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py
Switched several incremental/cursor imports to legacy paths; added TODOs; no behavioral changes.
Tests — imports & API alignment
unit_tests/... (many files)
Updated tests to import declarative types from legacy modules, adopt Status-based check API and streams(config) usage, convert some manifests to inline schemas, remove CustomIncrementalSync coverage, and reshape concurrent-cursor tests.
Misc small changes
airbyte_cdk/sources/declarative/requesters/http_job_repository.py, .../transformations/keys_replace_transformation.py, .../streams/concurrent/abstract_stream.py
Copyright year bump; moved some imports (AirbyteMessage, InterpolatedString); removed the deprecation decorator from AbstractStream.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant S as ConcurrentDeclarativeSource
  participant F as ModelToComponentFactory
  participant C as ConnectionChecker
  participant R as AirbyteConnectionStatus

  User->>S: check(logger, config)
  S->>F: build ConnectionChecker component (uses self._config)
  F-->>S: ConnectionChecker
  S->>C: check_connection(Source, logger, config)
  C-->>S: result {status: SUCCEEDED|FAILED, message}
  S-->>User: AirbyteConnectionStatus(status, message)

  Note right of S: check() returns Status-based result replacing (bool, message)
Loading
sequenceDiagram
  autonumber
  actor User
  participant S as ConcurrentDeclarativeSource
  participant F as ModelToComponentFactory

  User->>S: streams(config)
  S->>F: create stream components (may use partition_router)
  F-->>S: stream instances
  S->>S: select/filter concurrent streams
  S-->>User: List[Stream]

  Note right of S: public streams(config) replaces previous _group_streams tuple flow
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested reviewers

  • maxi297
  • tolik0

Would you like me to flag the highest-risk files for deeper review (ConcurrentDeclarativeSource, ModelToComponentFactory, concurrent_partition_cursor) — wdyt?


📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 723f940 and e6f9485.

📒 Files selected for processing (4)
  • airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (2 hunks)
  • unit_tests/connector_builder/test_connector_builder_handler.py (2 hunks)
  • unit_tests/sources/declarative/partition_routers/test_cartesian_product_partition_router.py (2 hunks)
  • unit_tests/sources/declarative/retrievers/test_simple_retriever.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • unit_tests/sources/declarative/partition_routers/test_cartesian_product_partition_router.py
  • unit_tests/sources/declarative/retrievers/test_simple_retriever.py
  • unit_tests/connector_builder/test_connector_builder_handler.py
  • airbyte_cdk/sources/declarative/retrievers/simple_retriever.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: Pytest (Fast)
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch brian/streamline_cds_move_legacy_components

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (10)
airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py (2)

220-239: Don’t block on a semaphore while holding the global lock.

acquire() may block for a while; doing it under self._lock serializes unrelated partitions and can hurt throughput. Grab the semaphore under lock, release the lock, block on the semaphore, then re-acquire the lock for the rest. This keeps invariants without long critical sections. wdyt?

-        with self._lock:
-            self._semaphore_per_partition[partition_key].acquire()
-            if not self._use_global_cursor:
-                self._cursor_per_partition[partition_key].close_partition(partition=partition)
-                cursor = self._cursor_per_partition[partition_key]
-                if (
-                    partition_key in self._partitions_done_generating_stream_slices
-                    and self._semaphore_per_partition[partition_key]._value == 0
-                ):
-                    self._update_global_cursor(cursor.state[self.cursor_field.cursor_field_key])
-
-            # Clean up the partition if it is fully processed
-            self._cleanup_if_done(partition_key)
-
-            self._check_and_update_parent_state()
-
-            self._emit_state_message()
+        # Read semaphore under lock, then block outside the lock
+        with self._lock:
+            semaphore = self._semaphore_per_partition[partition_key]
+        semaphore.acquire()
+
+        with self._lock:
+            if not self._use_global_cursor:
+                self._cursor_per_partition[partition_key].close_partition(partition=partition)
+                cursor = self._cursor_per_partition[partition_key]
+                if (
+                    partition_key in self._partitions_done_generating_stream_slices
+                    and self._semaphore_per_partition[partition_key]._value == 0
+                ):
+                    self._update_global_cursor(cursor.state[self.cursor_field.cursor_field_key])
+
+            # Clean up the partition if it is fully processed
+            self._cleanup_if_done(partition_key)
+
+            self._check_and_update_parent_state()
+
+        self._emit_state_message()

266-273: Guard Timer.finish() to avoid RuntimeError when no slices were generated.

If no partitions/slices were ever generated and ensure_at_least_one_state_emitted() is called, finish() can raise because _timer.start() never ran. Guarding with is_running() makes this safe. wdyt?

-        if not any(
+        if not any(
             semaphore_item[1]._value for semaphore_item in self._semaphore_per_partition.items()
         ):
             self._global_cursor = self._new_global_cursor
-            self._lookback_window = self._timer.finish()
+            if self._timer.is_running():
+                self._lookback_window = self._timer.finish()
+            else:
+                self._lookback_window = 0
             self._parent_state = self._partition_router.get_stream_state()
         self._emit_state_message(throttle=False)
airbyte_cdk/legacy/sources/declarative/incremental/global_substream_cursor.py (2)

63-66: Bug: calling float.ceil directly raises AttributeError at runtime.

((...)/1e9).__ceil__() will fail because float doesn’t expose __ceil__ publicly; use math.ceil and a None check.

Apply this diff?

@@
-    def finish(self) -> int:
-        if self._start:
-            return ((time.perf_counter_ns() - self._start) / 1e9).__ceil__()
-        else:
-            raise RuntimeError("Global substream cursor timer not started")
+    def finish(self) -> int:
+        import math
+        if self._start is not None:
+            seconds = (time.perf_counter_ns() - self._start) / 1e9
+            return math.ceil(seconds)
+        raise RuntimeError("Global substream cursor timer not started")

238-241: Avoid relying on Semaphore._value (private, non-portable).

Reading _slice_semaphore._value is implementation detail and racy across Python versions. Would you track an explicit “inflight slices” counter guarded by the same lock (or a Condition) and check inflight == 0 instead, wdyt?

@@ class GlobalSubstreamCursor(DeclarativeCursor):
-        self._slice_semaphore = threading.Semaphore(0)
+        self._slice_semaphore = threading.Semaphore(0)
+        self._inflight = 0
@@
-        self._slice_semaphore.release()
+        self._slice_semaphore.release()
+        with self._lock:
+            self._inflight += 1
         if last:
             self._all_slices_yielded = True
@@
-            self._slice_semaphore.acquire()
-            if self._all_slices_yielded and self._slice_semaphore._value == 0:
+            self._slice_semaphore.acquire()
+            self._inflight -= 1
+            if self._all_slices_yielded and self._inflight == 0:
                 self._lookback_window = self._timer.finish()
                 self._stream_cursor.close_slice(
                     StreamSlice(partition={}, cursor_slice=stream_slice.cursor_slice), *args
                 )
unit_tests/sources/declarative/custom_state_migration.py (2)

24-26: Bug: computed self._cursor_field is never used; raw self._cursor.cursor_field is used instead.

This can break when the cursor field is interpolated. Use the evaluated self._cursor_field for state lookups and emitted keys.

Apply this diff?

-        self._cursor_field = InterpolatedString.create(
-            self._cursor.cursor_field, parameters=self._parameters
-        ).eval(self._config)
+        self._cursor_field = InterpolatedString.create(
+            self._cursor.cursor_field, parameters=self._parameters
+        ).eval(self._config)
@@
-        updated_at = stream_state[self._cursor.cursor_field]
+        updated_at = stream_state[self._cursor_field]
@@
-                    "cursor": {self._cursor.cursor_field: updated_at},
+                    "cursor": {self._cursor_field: updated_at},
@@
-                    "cursor": {self._cursor.cursor_field: updated_at},
+                    "cursor": {self._cursor_field: updated_at},

Also, do you want to guard against a missing key in stream_state and raise a clearer error, wdyt?

Also applies to: 31-45


19-23: Defensive check: ensure incremental_sync exists.

If declarative_stream.incremental_sync can be None, accessing .cursor_field will crash. Add a clear check?

 def __init__(self, declarative_stream: DeclarativeStreamModel, config: Config):
     self._config = config
-    self.declarative_stream = declarative_stream
-    self._cursor = declarative_stream.incremental_sync
+    self.declarative_stream = declarative_stream
+    self._cursor = declarative_stream.incremental_sync
+    if self._cursor is None:
+        raise ValueError("CustomStateMigration requires an incremental_sync with a cursor_field")

Wdyt?

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (2)

389-399: Narrow streams(...) to return only concurrent streams and assert at runtime.

Since CDS now “only processes DefaultStream”, can we tighten the annotation and assert to fail fast if a non-concurrent stream slips in?

-    def streams(self, config: Mapping[str, Any]) -> List[Union[Stream, AbstractStream]]:  # type: ignore  # we are migrating away from the AbstractSource and are expecting that this will only be called by ConcurrentDeclarativeSource or the Connector Builder
+    def streams(self, config: Mapping[str, Any]) -> List[AbstractStream]:  # type: ignore
@@
-        return source_streams
+        # Fail fast if any non-concurrent stream appears (should not happen per design).
+        assert all(isinstance(s, AbstractStream) for s in source_streams), "ConcurrentDeclarativeSource.streams expected AbstractStream instances only"
+        return source_streams  # type: ignore

Wdyt?

Also applies to: 409-421


632-644: Filter to concurrent streams in _select_streams(...).

If a non-concurrent Stream ever gets through, ConcurrentSource.read could break. Filtering is cheap insurance.

-        for configured_stream in configured_catalog.streams:
-            stream_instance = stream_name_to_instance.get(configured_stream.stream.name)
-            if stream_instance:
-                abstract_streams.append(stream_instance)
+        for configured_stream in configured_catalog.streams:
+            stream_instance = stream_name_to_instance.get(configured_stream.stream.name)
+            if stream_instance and isinstance(stream_instance, AbstractStream):
+                abstract_streams.append(stream_instance)

Wdyt?

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)

1966-1973: Bug: partition_field_start uses partition_field_end (leads to wrong request option injection).

This will send both start/end under the end field key. Should be start->partition_field_start. Patch below, wdyt?

-            request_options_provider = DatetimeBasedRequestOptionsProvider(
-                start_time_option=start_time_option,
-                end_time_option=end_time_option,
-                partition_field_start=cursor_model.partition_field_end,
-                partition_field_end=cursor_model.partition_field_end,
-                config=config,
-                parameters=model.parameters or {},
-            )
+            request_options_provider = DatetimeBasedRequestOptionsProvider(
+                start_time_option=start_time_option,
+                end_time_option=end_time_option,
+                partition_field_start=cursor_model.partition_field_start,
+                partition_field_end=cursor_model.partition_field_end,
+                config=config,
+                parameters=model.parameters or {},
+            )

2656-2665: Missing required kw-only arg: stream_slicer in create_dynamic_schema_loader.

create_simple_retriever requires stream_slicer (kw-only). This call omits it and will raise TypeError. Pass None explicitly while keeping partition_router for request option routing, wdyt?

         retriever = self._create_component_from_model(
             model=model.retriever,
             config=config,
             name=name,
             primary_key=None,
-            partition_router=self._build_stream_slicer_from_partition_router(
+            stream_slicer=None,
+            partition_router=self._build_stream_slicer_from_partition_router(
                 model.retriever, config
             ),
             transformations=[],
             use_cache=True,
🧹 Nitpick comments (36)
airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py (9)

33-48: Fix docstring to match the function signature (zero-arg state supplier).

The docstring still claims get_stream_state_func “takes an element,” but it’s a zero-arg callable invoked between elements. Also clarify that the function is a generator yielding (element, is_last, state). Adjusting the doc clarifies usage and avoids confusion, wdyt?

Apply this diff to update the docstring:

-    Iterates over the given generator, yielding tuples containing the element, a flag
-    indicating whether it's the last element in the generator, and the result of
-    `get_stream_state_func` applied to the element.
+    Iterates over the given generator and yields (element, is_last, state) tuples.
+    `get_stream_state_func` is a zero-argument callable that is invoked before the
+    first element and between successive elements to fetch the current stream state.
@@
-        get_stream_state_func: A function that takes an element from the generator and
-            returns its state.
+        get_stream_state_func: A zero-argument callable returning the current stream
+            state mapping (or None) for the stream.
@@
-        An iterator that yields tuples of the form (element, is_last, state).
+        A generator that yields tuples of the form (element, is_last, state).

66-85: Harden Timer.finish() None-check; keep semantics intact.

Using if self._start: can mask a legitimate 0 value and is less explicit. Switching to a None check makes intent clear and avoids surprises. Do you want to also reset _start after finish() to prevent accidental reuse, or do you intentionally rely on is_running() staying True as a “stream_slices was invoked” sentinel, wdyt?

     def finish(self) -> int:
-        if self._start:
+        if self._start is not None:
             return ((time.perf_counter_ns() - self._start) / 1e9).__ceil__()
         else:
             raise RuntimeError("Global substream cursor timer not started")

303-313: Avoid unused variable ‘last’.

last from the outer partition iterator isn’t used. Renaming to _ (or _is_last) avoids lint noise and clarifies intent. wdyt?

-        for partition, last, parent_state in iterate_with_last_flag_and_state(
+        for partition, _, parent_state in iterate_with_last_flag_and_state(
             slices, self._partition_router.get_stream_state
         ):

320-331: Reuse computed partition_key to avoid duplicate work.

Nit: you compute partition_key at Line 320, then recompute the same key at Line 322. Reusing it improves clarity and avoids extra serialization. wdyt?

-        cursor = self._cursor_per_partition.get(self._to_partition_key(partition.partition))
+        cursor = self._cursor_per_partition.get(partition_key)

371-379: Docstring references a non-existent _finished_partitions.

The implementation removes finished partitions by checking membership in _partition_key_to_index and tracking _partitions_done_generating_stream_slices. Update the docstring to reflect current behavior. wdyt?

-        1. Attempt to remove partitions that are marked as finished in `_finished_partitions`.
-           These partitions are considered processed and safe to delete.
+        1. Attempt to remove partitions that have completed processing
+           (i.e., not present in `_partition_key_to_index`). These are safe to delete.

387-407: Make the partition limit condition clearer.

Using >= DEFAULT_MAX_PARTITIONS_NUMBER is clearer than > DEFAULT_MAX_PARTITIONS_NUMBER - 1. Same semantics, better readability. wdyt?

-            while len(self._cursor_per_partition) > self.DEFAULT_MAX_PARTITIONS_NUMBER - 1:
+            while len(self._cursor_per_partition) >= self.DEFAULT_MAX_PARTITIONS_NUMBER:

109-113: Typo in class docstring: “CurrentPerPartitionCursor” → “ConcurrentPerPartitionCursor”.

Tiny nit but helps searchability and avoids confusion. wdyt?

-    CurrentPerPartitionCursor expects the state of the ConcurrentCursor to follow the format {cursor_field: cursor_value}.
+    ConcurrentPerPartitionCursor expects the state of the ConcurrentCursor to follow the format {cursor_field: cursor_value}.

221-231: Avoid relying on Semaphore._value (private API).

Accessing _value is CPython-specific and not thread-safe. Could we track “in-flight slices per partition” explicitly (e.g., an int counter in a dict updated alongside the semaphore) or use a BoundedSemaphore and a separate completion flag to determine when to advance the global cursor, wdyt?


5-11: Import cleanup (minor).

Both import copy and from copy import deepcopy are used. Would you prefer to standardize on one (e.g., only deepcopy) and replace copy.deepcopy(...) calls accordingly, wdyt?

airbyte_cdk/sources/declarative/migrations/legacy_to_per_partition_state_migration.py (1)

35-41: Type narrowing to DatetimeBasedCursor only — confirm callers/manifests are migrated

Constraining cursor to DatetimeBasedCursor aligns with the schema changes. Could we double-check that any manifests/tests that previously provided CustomIncrementalSync are updated, and add a short note in the migration docstring that only DatetimeBasedCursor is supported now, wdyt?

airbyte_cdk/sources/declarative/checks/check_dynamic_stream.py (1)

9-9: Adopt Source/Stream in check_dynamic_stream — aligned with refactor

The type updates are coherent with the rest of the PR. One small hardening idea: if an exception occurs before the first iteration assigns stream, referencing stream.name in the except could raise an UnboundLocalError. Would you like to guard the name used in the error message, wdyt?

For example:

current_stream_name = "<unknown>"
try:
    for stream in streams[: min(self.stream_count, len(streams))]:
        current_stream_name = getattr(stream, "name", "<unknown>")
        stream_is_available, reason = evaluate_availability(stream, logger)
        if not stream_is_available:
            logger.warning(f"Stream {current_stream_name} is not available: {reason}")
            return False, reason
except Exception as error:
    error_message = f"Encountered an error trying to connect to stream {current_stream_name}. Error: {error}"
    logger.error(error_message, exc_info=True)
    return False, error_message

Also applies to: 12-12, 36-39

unit_tests/sources/streams/concurrent/test_cursor.py (2)

16-18: Tests updated to legacy import — nice. Add a guard for compatibility?

Would you add a tiny test asserting from airbyte_cdk import DatetimeBasedCursor still works (re-export), to protect downstreams relying on the top-level import, wdyt?


1288-1296: Tests rely on private attrs of DatetimeBasedCursor.

The test accesses _start_datetime, _partition_field_start/_end, _step, etc. Could we expose small accessors or a thin helper for these to reduce brittleness against internal refactors, wdyt?

Also applies to: 1299-1303

airbyte_cdk/legacy/sources/declarative/incremental/per_partition_cursor.py (2)

89-96: Partition limit loop is a bit opaque with “- 1”.

Would >= self.DEFAULT_MAX_PARTITIONS_NUMBER read clearer and avoid off-by-one confusion, wdyt?

-        while len(self._cursor_per_partition) > self.DEFAULT_MAX_PARTITIONS_NUMBER - 1:
+        while len(self._cursor_per_partition) >= self.DEFAULT_MAX_PARTITIONS_NUMBER:

204-209: Unreachable code after raise.

The second if not stream_slice: is dead. Remove for clarity, wdyt?

     def select_state(self, stream_slice: Optional[StreamSlice] = None) -> Optional[StreamState]:
         if not stream_slice:
             raise ValueError("A partition needs to be provided in order to extract a state")
 
-        if not stream_slice:
-            return None
-
         return self._get_state_for_partition(stream_slice.partition)
unit_tests/sources/declarative/partition_routers/helpers.py (2)

55-58: Type hints: make optional explicit

cursor_field and stream_state can be None in stream_slices; would you update hints to Optional[...] for accuracy, wdyt?

-    def stream_slices(
-        self,
-        *,
-        sync_mode: SyncMode,
-        cursor_field: List[str] = None,
-        stream_state: Mapping[str, Any] = None,
-    ) -> Iterable[Optional[StreamSlice]]:
+    def stream_slices(
+        self,
+        *,
+        sync_mode: SyncMode,
+        cursor_field: Optional[List[str]] = None,
+        stream_state: Optional[Mapping[str, Any]] = None,
+    ) -> Iterable[Optional[StreamSlice]]:

64-83: read_records return type vs yielded Records

read_records advertises Iterable[Mapping[str, Any]] but yields Record objects when stream_slice is set. For clarity, would you either change the annotation or yield dicts consistently, wdyt?

-    ) -> Iterable[Mapping[str, Any]]:
+    ) -> Iterable[Union[Mapping[str, Any], Record]]:
airbyte_cdk/legacy/sources/declarative/declarative_stream.py (1)

8-12: Prefer public re-exports to reduce internal coupling

Since these classes are re-exported at airbyte_cdk.init, would you consider importing via the top-level package to avoid binding tests/runtime to internal legacy paths, wdyt?

-from airbyte_cdk.legacy.sources.declarative.incremental import (
-    GlobalSubstreamCursor,
-    PerPartitionCursor,
-    PerPartitionWithGlobalCursor,
-)
+from airbyte_cdk import (
+    GlobalSubstreamCursor,
+    PerPartitionCursor,
+    PerPartitionWithGlobalCursor,
+)
unit_tests/sources/declarative/retrievers/test_simple_retriever.py (1)

15-20: Optional: import via top-level re-exports to decouple tests

To avoid tests depending on internal legacy structure, would you switch to top-level re-exports, wdyt?

-from airbyte_cdk.legacy.sources.declarative.incremental import (
-    DatetimeBasedCursor,
-    DeclarativeCursor,
-    ResumableFullRefreshCursor,
-)
+from airbyte_cdk import (
+    DatetimeBasedCursor,
+    DeclarativeCursor,
+    ResumableFullRefreshCursor,
+)
unit_tests/sources/declarative/partition_routers/test_cartesian_product_partition_router.py (1)

7-9: Optionally use top-level re-export

Would using the package-level re-export help keep this test insulated from internal module layout, wdyt?

-from airbyte_cdk.legacy.sources.declarative.incremental.datetime_based_cursor import (
-    DatetimeBasedCursor,
-)
+from airbyte_cdk import DatetimeBasedCursor
airbyte_cdk/legacy/sources/declarative/incremental/per_partition_with_global.py (1)

10-13: Minor: avoid importing helpers from deep modules?

iterate_with_last_flag_and_state coming from global_substream_cursor creates a subtle coupling; would you consider relocating that helper to a small shared utils module within legacy.incremental and import from there, wdyt?

unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py (1)

143-147: Typo in test name (“aysnc”) — rename for clarity.

Would you mind renaming the test to “async” to avoid confusion, wdyt?

-def test_isinstance_global_cursor_aysnc_job_partition_router():
+def test_isinstance_global_cursor_async_job_partition_router():
unit_tests/sources/declarative/checks/test_check_dynamic_stream.py (2)

125-127: Make parametrized test IDs unique for readability.

Both cases use “test_stream_available”; shall we differentiate them to ease test triage, wdyt?

-        pytest.param(200, Status.SUCCEEDED, True, [], id="test_stream_available"),
-        pytest.param(200, Status.SUCCEEDED, False, [], id="test_stream_available"),
+        pytest.param(200, Status.SUCCEEDED, True, [], id="test_stream_available_with_availability"),
+        pytest.param(200, Status.SUCCEEDED, False, [], id="test_stream_available_without_availability"),

169-171: Guard against None message on failure-expected cases.

To avoid surprises if message is missing, would you add a small guard when expected_messages is non-empty, wdyt?

-    for message in expected_messages:
-        assert message in connection_status.message
+    if expected_messages:
+        assert connection_status.message, "Expected failure message missing"
+        for message in expected_messages:
+            assert message in connection_status.message
airbyte_cdk/sources/declarative/checks/check_stream.py (2)

66-71: Signature updated to source: Source—keep tuple return for ConnectionChecker but consider future-proofing.

Since the public check surface across the PR is moving to Status-based results, would you like to add a small note to the docstring here clarifying why this still returns (bool, Any) as an internal component contract, to avoid confusion, wdyt?


122-127: Param type updated to Source—guarding already handled.

should_check_dynamic_streams pre-check avoids attribute errors; looks good. Tiny nit: the comment/docstring for evaluate_availability still references “migrate everything to AbstractStream” which is outdated with the DefaultStream-only direction—want to update that for clarity, wdyt?

unit_tests/sources/declarative/checks/test_check_stream.py (2)

372-399: Status-based expectations read well; consider asserting messages on failures too.

You already validate HTTP error messages elsewhere; would you like to also assert connection_status.message when expected_result is Status.FAILED here to tighten failure-path coverage, wdyt?

Also applies to: 400-470, 471-519, 520-569, 570-617


694-694: Expecting ValueError with only type provided.

If the model validation raises ValidationError instead, this test could become brittle. Do you want to accept either ValueError or ValidationError to future-proof it, wdyt?

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (1)

514-533: ConditionalStreams evaluation uses empty parameters.

If any legacy manifests rely on $parameters in the condition, this could surprise them. Should we pass through manifest-level parameters (if present) to InterpolatedBoolean for parity, wdyt?

unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (1)

18-25: Use top-level re-exports for DeclarativeStream and DatetimeBasedCursor?
We’ve confirmed they’re re-exported in airbyte_cdk/init.py (imports on lines 68–69, all at 237, 241), so you can simplify the test imports:

-from airbyte_cdk.legacy.sources.declarative.declarative_stream import DeclarativeStream
+from airbyte_cdk import DeclarativeStream
@@
-from airbyte_cdk.legacy.sources.declarative.incremental import (
-    CursorFactory,
-    DatetimeBasedCursor,
-    PerPartitionCursor,
-    PerPartitionWithGlobalCursor,
-    ResumableFullRefreshCursor,
-)
+from airbyte_cdk.legacy.sources.declarative.incremental import (
+    CursorFactory,
+    PerPartitionCursor,
+    PerPartitionWithGlobalCursor,
+    ResumableFullRefreshCursor,
+)
+from airbyte_cdk import DatetimeBasedCursor

wdyt?

airbyte_cdk/legacy/sources/declarative/incremental/__init__.py (1)

2-2: Add a short module docstring to signal deprecation surface?

A tiny docstring will guide users toward preferred import paths without breaking anything. Add something like below, wdyt?

 # Copyright (c) 2025 Airbyte, Inc., all rights reserved.
 
+"""
+Legacy declarative incremental cursor re-exports.
+Prefer importing from `airbyte_cdk` top-level or concurrent modules where available.
+This module exists for backward compatibility.
+"""
+
 from airbyte_cdk.legacy.sources.declarative.incremental.datetime_based_cursor import (
unit_tests/sources/declarative/test_concurrent_declarative_source.py (4)

715-731: Stabilize stream selection: prefer name-based lookup over list indexes

Indexing into streams (e.g., [0], [2], [7]) is brittle as ordering may shift with manifest or constructor changes. Shall we switch to a helper that resolves by name, and update these spots, wdyt?

Apply within the shown ranges:

-    party_members_stream = streams[0]
+    party_members_stream = get_stream_by_name(streams, "party_members")
...
-    locations_stream = streams[2]
+    locations_stream = get_stream_by_name(streams, "locations")
...
-    incremental_counting_stream = streams[7]
+    incremental_counting_stream = get_stream_by_name(streams, "incremental_counting_stream")

Add this small helper once (outside these ranges):

def get_stream_by_name(streams, name: str):
    return next(s for s in streams if s.name == name)

Also applies to: 733-747, 758-771


1393-1398: Avoid index-based access in other tests, too

Same ask here to reduce flakiness; resolve by stream name instead, wdyt?

-    streams = source.streams(_CONFIG)
-    assert streams[0].cursor.state.get("state") != state_blob.__dict__, "State was not migrated."
-    assert streams[0].cursor.state.get("states") == [
+    streams = source.streams(_CONFIG)
+    party_members_stream = get_stream_by_name(streams, "party_members")
+    assert party_members_stream.cursor.state.get("state") != state_blob.__dict__, "State was not migrated."
+    assert party_members_stream.cursor.state.get("states") == [
       {"cursor": {"updated_at": "2024-08-21"}, "partition": {"type": "type_1"}},
       {"cursor": {"updated_at": "2024-08-21"}, "partition": {"type": "type_2"}},
     ], "State was migrated, but actual state don't match expected"
-    async_job_stream = streams[6]
+    async_job_stream = get_stream_by_name(streams, "async_job_stream")
-    locations_stream = streams[2]
+    locations_stream = get_stream_by_name(streams, "locations")
-    locations_stream = streams[2]
+    locations_stream = get_stream_by_name(streams, "locations")

Also applies to: 1637-1639, 1695-1697, 1754-1756


903-906: Harden the patched init against signature drift

Patching AbstractStreamStateConverter.__init__ with a fixed signature is fragile. Can we accept *args, **kwargs so future params don’t break tests, wdyt?

Proposed replacement for mocked_init (defined above this patch):

def mocked_init(self, *args, **kwargs):
    # Force concurrent state format in tests without depending on the ctor signature
    self._is_sequential_state = False

2183-2187: DRY the check-path mocking and avoid coupling to a private method

Two tests patch SimpleRetriever._fetch_next_page for check(). Would you consider a small fixture/context helper to centralize this and reduce reliance on a private method, wdyt?

Example fixture:

import contextlib
import pytest
from unittest.mock import patch

@pytest.fixture
def mock_check_pages():
    @contextlib.contextmanager
    def _mock(pages):
        with patch.object(SimpleRetriever, "_fetch_next_page", side_effect=pages):
            yield
    return _mock

Usage:

with mock_check_pages([_create_page({...})]):
    connection_status = source.check(logging.getLogger(""), config)
    assert connection_status.status == Status.SUCCEEDED

Also applies to: 2962-2973

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

3387-3391: Be defensive when swapping request_options_provider for partition_router.

PartitionRouter may not always implement the RequestOptionsProvider methods. Guard with hasattr to avoid runtime AttributeError and cast to satisfy typing, wdyt?

-        if isinstance(request_options_provider, DefaultRequestOptionsProvider) and isinstance(
-            partition_router, PartitionRouter
-        ):
-            request_options_provider = partition_router
+        if (
+            isinstance(request_options_provider, DefaultRequestOptionsProvider)
+            and isinstance(partition_router, PartitionRouter)
+            and hasattr(partition_router, "get_request_params")
+        ):
+            request_options_provider = cast(RequestOptionsProvider, partition_router)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e5a1fc2 and 4cb9150.

📒 Files selected for processing (38)
  • airbyte_cdk/__init__.py (1 hunks)
  • airbyte_cdk/legacy/sources/declarative/declarative_stream.py (1 hunks)
  • airbyte_cdk/legacy/sources/declarative/incremental/__init__.py (1 hunks)
  • airbyte_cdk/legacy/sources/declarative/incremental/datetime_based_cursor.py (1 hunks)
  • airbyte_cdk/legacy/sources/declarative/incremental/global_substream_cursor.py (1 hunks)
  • airbyte_cdk/legacy/sources/declarative/incremental/per_partition_cursor.py (1 hunks)
  • airbyte_cdk/legacy/sources/declarative/incremental/per_partition_with_global.py (1 hunks)
  • airbyte_cdk/legacy/sources/declarative/incremental/resumable_full_refresh_cursor.py (1 hunks)
  • airbyte_cdk/sources/declarative/checks/check_dynamic_stream.py (2 hunks)
  • airbyte_cdk/sources/declarative/checks/check_stream.py (3 hunks)
  • airbyte_cdk/sources/declarative/checks/connection_checker.py (2 hunks)
  • airbyte_cdk/sources/declarative/concurrent_declarative_source.py (8 hunks)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (0 hunks)
  • airbyte_cdk/sources/declarative/incremental/__init__.py (0 hunks)
  • airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py (2 hunks)
  • airbyte_cdk/sources/declarative/migrations/legacy_to_per_partition_state_migration.py (1 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1 hunks)
  • airbyte_cdk/sources/declarative/parsers/manifest_component_transformer.py (0 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4 hunks)
  • airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1 hunks)
  • unit_tests/connector_builder/test_connector_builder_handler.py (1 hunks)
  • unit_tests/legacy/sources/declarative/incremental/test_datetime_based_cursor.py (1 hunks)
  • unit_tests/legacy/sources/declarative/incremental/test_per_partition_cursor.py (1 hunks)
  • unit_tests/legacy/sources/declarative/incremental/test_resumable_full_refresh_cursor.py (1 hunks)
  • unit_tests/legacy/sources/declarative/test_declarative_stream.py (1 hunks)
  • unit_tests/legacy/sources/declarative/test_manifest_declarative_source.py (0 hunks)
  • unit_tests/sources/declarative/checks/test_check_dynamic_stream.py (3 hunks)
  • unit_tests/sources/declarative/checks/test_check_stream.py (17 hunks)
  • unit_tests/sources/declarative/custom_state_migration.py (1 hunks)
  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (2 hunks)
  • unit_tests/sources/declarative/partition_routers/helpers.py (1 hunks)
  • unit_tests/sources/declarative/partition_routers/test_cartesian_product_partition_router.py (1 hunks)
  • unit_tests/sources/declarative/partition_routers/test_substream_partition_router.py (1 hunks)
  • unit_tests/sources/declarative/requesters/paginators/test_stop_condition.py (2 hunks)
  • unit_tests/sources/declarative/retrievers/test_simple_retriever.py (1 hunks)
  • unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py (1 hunks)
  • unit_tests/sources/declarative/test_concurrent_declarative_source.py (15 hunks)
  • unit_tests/sources/streams/concurrent/test_cursor.py (2 hunks)
💤 Files with no reviewable changes (4)
  • airbyte_cdk/sources/declarative/parsers/manifest_component_transformer.py
  • unit_tests/legacy/sources/declarative/test_manifest_declarative_source.py
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml
  • airbyte_cdk/sources/declarative/incremental/init.py
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
📚 Learning: 2024-11-15T00:59:08.154Z
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/spec.json:9-15
Timestamp: 2024-11-15T00:59:08.154Z
Learning: When code in `airbyte_cdk/cli/source_declarative_manifest/` is being imported from another repository, avoid suggesting modifications to it during the import process.

Applied to files:

  • unit_tests/legacy/sources/declarative/incremental/test_resumable_full_refresh_cursor.py
  • unit_tests/legacy/sources/declarative/test_declarative_stream.py
  • airbyte_cdk/legacy/sources/declarative/incremental/datetime_based_cursor.py
  • airbyte_cdk/legacy/sources/declarative/declarative_stream.py
  • unit_tests/sources/declarative/partition_routers/helpers.py
  • airbyte_cdk/legacy/sources/declarative/incremental/resumable_full_refresh_cursor.py
  • airbyte_cdk/__init__.py
📚 Learning: 2024-11-15T01:04:21.272Z
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.

Applied to files:

  • unit_tests/legacy/sources/declarative/incremental/test_resumable_full_refresh_cursor.py
  • unit_tests/legacy/sources/declarative/test_declarative_stream.py
  • airbyte_cdk/legacy/sources/declarative/incremental/datetime_based_cursor.py
  • airbyte_cdk/legacy/sources/declarative/declarative_stream.py
  • unit_tests/sources/declarative/partition_routers/helpers.py
  • airbyte_cdk/legacy/sources/declarative/incremental/per_partition_cursor.py
  • unit_tests/connector_builder/test_connector_builder_handler.py
  • airbyte_cdk/sources/declarative/retrievers/simple_retriever.py
  • airbyte_cdk/legacy/sources/declarative/incremental/resumable_full_refresh_cursor.py
  • airbyte_cdk/__init__.py
  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
  • unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py
  • unit_tests/sources/declarative/test_concurrent_declarative_source.py
📚 Learning: 2025-01-13T23:39:15.457Z
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#174
File: unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.py:21-29
Timestamp: 2025-01-13T23:39:15.457Z
Learning: The CustomPageIncrement class in unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.py is imported from another connector definition and should not be modified in this context.

Applied to files:

  • unit_tests/sources/declarative/requesters/paginators/test_stop_condition.py
  • unit_tests/connector_builder/test_connector_builder_handler.py
📚 Learning: 2024-11-18T23:40:06.391Z
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.

Applied to files:

  • unit_tests/legacy/sources/declarative/test_declarative_stream.py
  • airbyte_cdk/legacy/sources/declarative/declarative_stream.py
  • unit_tests/sources/declarative/partition_routers/helpers.py
  • airbyte_cdk/sources/declarative/concurrent_declarative_source.py
  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
  • airbyte_cdk/sources/declarative/checks/check_stream.py
📚 Learning: 2024-12-11T16:34:46.319Z
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.

Applied to files:

  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
📚 Learning: 2025-01-14T00:20:32.310Z
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#174
File: airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:1093-1102
Timestamp: 2025-01-14T00:20:32.310Z
Learning: In the `airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py` file, the strict module name checks in `_get_class_from_fully_qualified_class_name` (requiring `module_name` to be "components" and `module_name_full` to be "source_declarative_manifest.components") are intentionally designed to provide early, clear feedback when class declarations won't be found later in execution. These restrictions may be loosened in the future if the requirements for class definition locations change.

Applied to files:

  • airbyte_cdk/sources/declarative/concurrent_declarative_source.py
  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
🧬 Code graph analysis (33)
unit_tests/sources/declarative/requesters/paginators/test_stop_condition.py (2)
airbyte_cdk/sources/streams/concurrent/adapters.py (1)
  • cursor (196-197)
airbyte_cdk/sources/streams/concurrent/cursor.py (1)
  • Cursor (51-87)
unit_tests/legacy/sources/declarative/test_declarative_stream.py (2)
airbyte_cdk/legacy/sources/declarative/declarative_stream.py (1)
  • DeclarativeStream (32-241)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
  • DeclarativeStream (2406-2473)
unit_tests/legacy/sources/declarative/incremental/test_datetime_based_cursor.py (1)
airbyte_cdk/legacy/sources/declarative/incremental/datetime_based_cursor.py (1)
  • DatetimeBasedCursor (28-448)
airbyte_cdk/legacy/sources/declarative/incremental/datetime_based_cursor.py (1)
airbyte_cdk/legacy/sources/declarative/incremental/declarative_cursor.py (1)
  • DeclarativeCursor (9-13)
airbyte_cdk/legacy/sources/declarative/declarative_stream.py (3)
airbyte_cdk/legacy/sources/declarative/incremental/global_substream_cursor.py (1)
  • GlobalSubstreamCursor (73-353)
airbyte_cdk/legacy/sources/declarative/incremental/per_partition_cursor.py (1)
  • PerPartitionCursor (27-365)
airbyte_cdk/legacy/sources/declarative/incremental/per_partition_with_global.py (1)
  • PerPartitionWithGlobalCursor (22-199)
unit_tests/legacy/sources/declarative/incremental/test_per_partition_cursor.py (2)
airbyte_cdk/legacy/sources/declarative/incremental/declarative_cursor.py (1)
  • DeclarativeCursor (9-13)
airbyte_cdk/legacy/sources/declarative/incremental/global_substream_cursor.py (1)
  • GlobalSubstreamCursor (73-353)
airbyte_cdk/legacy/sources/declarative/incremental/global_substream_cursor.py (2)
airbyte_cdk/legacy/sources/declarative/incremental/datetime_based_cursor.py (1)
  • DatetimeBasedCursor (28-448)
airbyte_cdk/legacy/sources/declarative/incremental/declarative_cursor.py (1)
  • DeclarativeCursor (9-13)
unit_tests/sources/streams/concurrent/test_cursor.py (1)
airbyte_cdk/legacy/sources/declarative/incremental/datetime_based_cursor.py (1)
  • DatetimeBasedCursor (28-448)
unit_tests/sources/declarative/partition_routers/helpers.py (1)
airbyte_cdk/legacy/sources/declarative/declarative_stream.py (1)
  • DeclarativeStream (32-241)
airbyte_cdk/legacy/sources/declarative/incremental/per_partition_cursor.py (1)
airbyte_cdk/legacy/sources/declarative/incremental/declarative_cursor.py (1)
  • DeclarativeCursor (9-13)
airbyte_cdk/legacy/sources/declarative/incremental/per_partition_with_global.py (3)
airbyte_cdk/legacy/sources/declarative/incremental/datetime_based_cursor.py (1)
  • DatetimeBasedCursor (28-448)
airbyte_cdk/legacy/sources/declarative/incremental/global_substream_cursor.py (2)
  • GlobalSubstreamCursor (73-353)
  • iterate_with_last_flag_and_state (19-49)
airbyte_cdk/legacy/sources/declarative/incremental/per_partition_cursor.py (2)
  • CursorFactory (19-24)
  • PerPartitionCursor (27-365)
unit_tests/sources/declarative/partition_routers/test_cartesian_product_partition_router.py (2)
airbyte_cdk/legacy/sources/declarative/incremental/datetime_based_cursor.py (1)
  • DatetimeBasedCursor (28-448)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
  • DatetimeBasedCursor (1781-1889)
unit_tests/sources/declarative/retrievers/test_simple_retriever.py (3)
airbyte_cdk/legacy/sources/declarative/incremental/datetime_based_cursor.py (1)
  • DatetimeBasedCursor (28-448)
airbyte_cdk/legacy/sources/declarative/incremental/declarative_cursor.py (1)
  • DeclarativeCursor (9-13)
airbyte_cdk/legacy/sources/declarative/incremental/resumable_full_refresh_cursor.py (1)
  • ResumableFullRefreshCursor (12-93)
unit_tests/connector_builder/test_connector_builder_handler.py (2)
airbyte_cdk/legacy/sources/declarative/declarative_stream.py (1)
  • DeclarativeStream (32-241)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
  • DeclarativeStream (2406-2473)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (2)
airbyte_cdk/legacy/sources/declarative/incremental/resumable_full_refresh_cursor.py (1)
  • ResumableFullRefreshCursor (12-93)
airbyte_cdk/legacy/sources/declarative/incremental/declarative_cursor.py (1)
  • DeclarativeCursor (9-13)
airbyte_cdk/sources/declarative/checks/connection_checker.py (1)
airbyte_cdk/sources/source.py (1)
  • Source (55-95)
unit_tests/sources/declarative/checks/test_check_dynamic_stream.py (2)
airbyte_cdk/sources/declarative/concurrent_declarative_source.py (1)
  • check (487-505)
airbyte_cdk/test/mock_http/mocker.py (1)
  • assert_number_of_calls (131-140)
airbyte_cdk/legacy/sources/declarative/incremental/__init__.py (6)
airbyte_cdk/legacy/sources/declarative/incremental/datetime_based_cursor.py (1)
  • DatetimeBasedCursor (28-448)
airbyte_cdk/legacy/sources/declarative/incremental/declarative_cursor.py (1)
  • DeclarativeCursor (9-13)
airbyte_cdk/legacy/sources/declarative/incremental/global_substream_cursor.py (1)
  • GlobalSubstreamCursor (73-353)
airbyte_cdk/legacy/sources/declarative/incremental/per_partition_cursor.py (2)
  • CursorFactory (19-24)
  • PerPartitionCursor (27-365)
airbyte_cdk/legacy/sources/declarative/incremental/per_partition_with_global.py (1)
  • PerPartitionWithGlobalCursor (22-199)
airbyte_cdk/legacy/sources/declarative/incremental/resumable_full_refresh_cursor.py (2)
  • ChildPartitionResumableFullRefreshCursor (97-116)
  • ResumableFullRefreshCursor (12-93)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
airbyte_cdk/legacy/sources/declarative/incremental/datetime_based_cursor.py (1)
  • DatetimeBasedCursor (28-448)
airbyte_cdk/sources/declarative/migrations/legacy_to_per_partition_state_migration.py (2)
airbyte_cdk/legacy/sources/declarative/incremental/datetime_based_cursor.py (1)
  • DatetimeBasedCursor (28-448)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
  • DatetimeBasedCursor (1781-1889)
unit_tests/sources/declarative/partition_routers/test_substream_partition_router.py (1)
airbyte_cdk/legacy/sources/declarative/incremental/resumable_full_refresh_cursor.py (2)
  • ChildPartitionResumableFullRefreshCursor (97-116)
  • ResumableFullRefreshCursor (12-93)
airbyte_cdk/legacy/sources/declarative/incremental/resumable_full_refresh_cursor.py (1)
airbyte_cdk/legacy/sources/declarative/incremental/declarative_cursor.py (1)
  • DeclarativeCursor (9-13)
airbyte_cdk/__init__.py (3)
airbyte_cdk/legacy/sources/declarative/declarative_stream.py (1)
  • DeclarativeStream (32-241)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)
  • DeclarativeStream (2406-2473)
  • DatetimeBasedCursor (1781-1889)
airbyte_cdk/legacy/sources/declarative/incremental/datetime_based_cursor.py (1)
  • DatetimeBasedCursor (28-448)
airbyte_cdk/sources/declarative/concurrent_declarative_source.py (6)
airbyte_cdk/sources/source.py (3)
  • Source (55-95)
  • read (36-45)
  • discover (48-52)
airbyte_cdk/sources/abstract_source.py (5)
  • streams (74-79)
  • read (101-211)
  • discover (85-90)
  • check (92-99)
  • check_connection (59-71)
airbyte_cdk/entrypoint.py (3)
  • read (271-283)
  • discover (260-269)
  • check (219-258)
airbyte_cdk/sources/declarative/checks/connection_checker.py (2)
  • ConnectionChecker (12-38)
  • check_connection (18-38)
airbyte_cdk/sources/declarative/checks/check_dynamic_stream.py (1)
  • check_connection (35-61)
airbyte_cdk/sources/declarative/checks/check_stream.py (1)
  • check_connection (66-102)
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (6)
airbyte_cdk/legacy/sources/declarative/declarative_stream.py (1)
  • DeclarativeStream (32-241)
airbyte_cdk/legacy/sources/declarative/incremental/per_partition_cursor.py (2)
  • CursorFactory (19-24)
  • PerPartitionCursor (27-365)
airbyte_cdk/legacy/sources/declarative/incremental/datetime_based_cursor.py (1)
  • DatetimeBasedCursor (28-448)
airbyte_cdk/legacy/sources/declarative/incremental/per_partition_with_global.py (1)
  • PerPartitionWithGlobalCursor (22-199)
airbyte_cdk/legacy/sources/declarative/incremental/resumable_full_refresh_cursor.py (1)
  • ResumableFullRefreshCursor (12-93)
airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py (1)
  • ConcurrentPerPartitionCursor (99-596)
unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py (5)
airbyte_cdk/legacy/sources/declarative/incremental/per_partition_cursor.py (1)
  • CursorFactory (19-24)
airbyte_cdk/legacy/sources/declarative/incremental/datetime_based_cursor.py (1)
  • DatetimeBasedCursor (28-448)
airbyte_cdk/legacy/sources/declarative/incremental/global_substream_cursor.py (1)
  • GlobalSubstreamCursor (73-353)
airbyte_cdk/legacy/sources/declarative/incremental/per_partition_with_global.py (1)
  • PerPartitionWithGlobalCursor (22-199)
airbyte_cdk/legacy/sources/declarative/incremental/declarative_cursor.py (1)
  • DeclarativeCursor (9-13)
unit_tests/sources/declarative/checks/test_check_stream.py (3)
unit_tests/sources/test_source.py (2)
  • source (65-66)
  • check (44-45)
airbyte_cdk/sources/declarative/concurrent_declarative_source.py (1)
  • check (487-505)
airbyte_cdk/sources/abstract_source.py (1)
  • check (92-99)
airbyte_cdk/sources/declarative/checks/check_dynamic_stream.py (5)
airbyte_cdk/sources/source.py (1)
  • Source (55-95)
airbyte_cdk/sources/declarative/checks/check_stream.py (1)
  • evaluate_availability (17-29)
airbyte_cdk/sources/declarative/checks/connection_checker.py (1)
  • ConnectionChecker (12-38)
airbyte_cdk/sources/declarative/concurrent_declarative_source.py (1)
  • streams (389-421)
airbyte_cdk/sources/streams/core.py (1)
  • Stream (118-703)
unit_tests/sources/declarative/test_concurrent_declarative_source.py (4)
airbyte_cdk/legacy/sources/declarative/declarative_stream.py (1)
  • DeclarativeStream (32-241)
airbyte_cdk/sources/declarative/concurrent_declarative_source.py (2)
  • streams (389-421)
  • check (487-505)
airbyte_cdk/legacy/sources/declarative/manifest_declarative_source.py (2)
  • streams (301-338)
  • check (412-414)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (3)
  • state (573-574)
  • state (577-580)
  • SimpleRetriever (53-602)
airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py (1)
airbyte_cdk/legacy/sources/declarative/incremental/global_substream_cursor.py (5)
  • iterate_with_last_flag_and_state (19-49)
  • Timer (52-70)
  • start (60-61)
  • finish (63-67)
  • is_running (69-70)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (8)
airbyte_cdk/legacy/sources/declarative/declarative_stream.py (1)
  • DeclarativeStream (32-241)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)
  • DeclarativeStream (2406-2473)
  • DatetimeBasedCursor (1781-1889)
airbyte_cdk/legacy/sources/declarative/incremental/per_partition_cursor.py (1)
  • CursorFactory (19-24)
airbyte_cdk/legacy/sources/declarative/incremental/datetime_based_cursor.py (1)
  • DatetimeBasedCursor (28-448)
airbyte_cdk/legacy/sources/declarative/incremental/global_substream_cursor.py (1)
  • GlobalSubstreamCursor (73-353)
airbyte_cdk/legacy/sources/declarative/incremental/per_partition_with_global.py (1)
  • PerPartitionWithGlobalCursor (22-199)
airbyte_cdk/sources/declarative/partition_routers/partition_router.py (1)
  • PartitionRouter (14-62)
airbyte_cdk/sources/declarative/requesters/request_options/default_request_options_provider.py (1)
  • DefaultRequestOptionsProvider (15-60)
airbyte_cdk/sources/declarative/checks/check_stream.py (1)
airbyte_cdk/sources/source.py (1)
  • Source (55-95)
unit_tests/sources/declarative/custom_state_migration.py (2)
airbyte_cdk/legacy/sources/declarative/declarative_stream.py (1)
  • DeclarativeStream (32-241)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (18)
  • DeclarativeStream (2406-2473)
  • Config (136-137)
  • Config (150-151)
  • Config (164-165)
  • Config (178-179)
  • Config (192-193)
  • Config (206-207)
  • Config (220-221)
  • Config (234-235)
  • Config (248-249)
  • Config (262-263)
  • Config (276-277)
  • Config (292-293)
  • Config (306-307)
  • Config (320-321)
  • Config (354-355)
  • Config (378-379)
  • CustomStateMigration (291-302)
🪛 GitHub Actions: Pytest (Fast)
unit_tests/connector_builder/test_connector_builder_handler.py

[error] 820-820: Test 'test_config_update' failed: latest_config_update was None.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Validate PR title
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Check: source-intercom
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Manifest Server Docker Image Build

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a couple of comments. Should we redo a bit of regression tests once this is merged with the removal of DeclarativeStream stuff?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)

2065-2083: TypeError risk: passing both stream_slicer and partition_router into retriever construction

create_simple_retriever no longer accepts stream_slicer, so this call will raise got an unexpected keyword argument 'stream_slicer' when retriever is SimpleRetrieverModel. Can we branch by retriever type and pass only the param each constructor supports, wdyt?

Apply:

-        retriever = self._create_component_from_model(
-            model=model.retriever,
-            config=config,
-            name=model.name,
-            primary_key=primary_key,
-            request_options_provider=request_options_provider,
-            stream_slicer=stream_slicer,
-            partition_router=partition_router,
-            stop_condition_cursor=concurrent_cursor
-            if self._is_stop_condition_on_cursor(model)
-            else None,
-            client_side_incremental_sync={"cursor": concurrent_cursor}
-            if self._is_client_side_filtering_enabled(model)
-            else None,
-            transformations=transformations,
-            file_uploader=file_uploader,
-            incremental_sync=model.incremental_sync,
-        )
+        common_kwargs = dict(
+            model=model.retriever,
+            config=config,
+            name=model.name,
+            primary_key=primary_key,
+            request_options_provider=request_options_provider,
+            stop_condition_cursor=concurrent_cursor if self._is_stop_condition_on_cursor(model) else None,
+            client_side_incremental_sync={"cursor": concurrent_cursor} if self._is_client_side_filtering_enabled(model) else None,
+            transformations=transformations,
+            file_uploader=file_uploader,
+            incremental_sync=model.incremental_sync,
+        )
+        if isinstance(model.retriever, AsyncRetrieverModel):
+            retriever = self._create_component_from_model(**common_kwargs, stream_slicer=stream_slicer)
+        else:
+            retriever = self._create_component_from_model(**common_kwargs, partition_router=partition_router)

3868-3875: Potential mismatch: passing stream_slicer to retriever may break when retriever is SimpleRetrieverModel

Same issue as above: SimpleRetriever expects partition_router, AsyncRetriever expects stream_slicer. Can we branch on model.retriever type here too, wdyt?

-        retriever = self._create_component_from_model(
-            model=model.retriever,
-            config=config,
-            name=f"{stream_name if stream_name else '__http_components_resolver'}",
-            primary_key=None,
-            stream_slicer=self._build_stream_slicer_from_partition_router(model.retriever, config),
-            transformations=[],
-        )
+        slicer = self._build_stream_slicer_from_partition_router(model.retriever, config)
+        if isinstance(model.retriever, AsyncRetrieverModel):
+            retriever = self._create_component_from_model(
+                model=model.retriever,
+                config=config,
+                name=f"{stream_name if stream_name else '__http_components_resolver'}",
+                primary_key=None,
+                stream_slicer=slicer,
+                transformations=[],
+            )
+        else:
+            retriever = self._create_component_from_model(
+                model=model.retriever,
+                config=config,
+                name=f"{stream_name if stream_name else '__http_components_resolver'}",
+                primary_key=None,
+                partition_router=slicer,
+                transformations=[],
+            )
🧹 Nitpick comments (5)
unit_tests/sources/declarative/test_concurrent_declarative_source.py (4)

715-721: Prefer name-based stream lookup to avoid brittle index assertions

Indexing into streams[...] ties tests to ordering in the manifest/resolution and makes them fragile when streams are added/removed/reordered. Can we key by name instead, and apply the same pattern in the other spots listed above, wdyt?

Example for test_create_concurrent_cursor (apply similar changes in the other call sites):

-    streams = source.streams(config=_CONFIG)
-
-    party_members_stream = streams[0]
+    streams_by_name = {s.name: s for s in source.streams(config=_CONFIG)}
+    party_members_stream = streams_by_name["party_members"]
@@
-    locations_stream = streams[2]
+    locations_stream = streams_by_name["locations"]
@@
-    incremental_counting_stream = streams[7]
+    incremental_counting_stream = streams_by_name["incremental_counting_stream"]

Also applies to: 733-737, 758-762, 1393-1396, 1637-1639, 1694-1697, 1754-1757, 2974-2983, 3853-3885, 4564-4567


1597-1606: Use the locally mutated manifest variable (minor correctness/clarity)

You deepcopy _MANIFEST into manifest and mutate it, but then pass _MANIFEST to ConcurrentDeclarativeSource. This works today but is confusing and error-prone. Shall we pass manifest instead, wdyt?

-    source = ConcurrentDeclarativeSource(
-        source_config=_MANIFEST, config=config, catalog=catalog, state=[]
-    )
+    source = ConcurrentDeclarativeSource(
+        source_config=manifest, config=config, catalog=catalog, state=[]
+    )

2183-2187: Align mocked response bodies with configured record_selector paths

Several checks patch SimpleRetriever._fetch_next_page but return payloads under keys that don’t match the configured record_selector (e.g., using "records" or "students" while the selector is ["result"]). It currently passes because the check logic doesn’t assert on extracted records, but keeping shapes consistent reduces future flakiness if check behavior evolves. Update the fixtures to use "result", wdyt?

-        pages = [_create_page({"records": [{"id": 0}], "_metadata": {}})]
+        pages = [_create_page({"result": [{"id": 0}], "_metadata": {}})]
-        pages = [
-            _create_page(
-                {
-                    "students": [{"id": 0, "first_name": "yu", "last_name": "narukami"}],
-                    "_metadata": {},
-                }
-            )
-        ]
+        pages = [
+            _create_page(
+                {
+                    "result": [{"id": 0, "first_name": "yu", "last_name": "narukami"}],
+                    "_metadata": {},
+                }
+            )
+        ]

Also applies to: 2962-2973


1964-1969: Guard spec.yaml writes to avoid cross-test interference

The fixture writes to a fixed spec.yaml path under the tests tree. If tests run in parallel (xdist) or another test relies on that file, this can race. Would you consider backing up and restoring if the file already exists (or marking the test serial), wdyt?

-        with open(yaml_path, "w") as f:
-            f.write(yaml.dump(spec))
-        yield
-        os.remove(yaml_path)
+        backup = None
+        if os.path.exists(yaml_path):
+            backup = yaml_path + ".bak"
+            os.replace(yaml_path, backup)
+        try:
+            with open(yaml_path, "w") as f:
+                f.write(yaml.dump(spec))
+            yield
+        finally:
+            try:
+                os.remove(yaml_path)
+            finally:
+                if backup and os.path.exists(backup):
+                    os.replace(backup, yaml_path)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

2616-2632: Stub for IncrementingCountCursor: emit an explicit runtime warning

Since this returns a DatetimeBasedCursor as a stub, can we warn loudly to aid debugging if it ever leaks to runtime, wdyt?

     def create_incrementing_count_cursor(
         model: IncrementingCountCursorModel, config: Config, **kwargs: Any
     ) -> DatetimeBasedCursor:
-        # This should not actually get used anywhere at runtime, but needed to add this to pass checks since
+        # This should not actually get used anywhere at runtime, but needed to add this to pass checks since
         # we still parse models into components. The issue is that there's no runtime implementation of a
         # IncrementingCountCursor.
         # A known and expected issue with this stub is running a check with the declared IncrementingCountCursor because it is run without ConcurrentCursor.
+        logging.getLogger(__name__).warning(
+            "IncrementingCountCursor is deprecated in CDS runtime; returning placeholder DatetimeBasedCursor. "
+            "If you see this at runtime, please migrate off IncrementingCountCursor."
+        )
         return DatetimeBasedCursor(
             cursor_field=model.cursor_field,
             datetime_format="%Y-%m-%d",
             start_datetime="2024-12-12",
             config=config,
             parameters={},
         )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4cb9150 and c8f2c12.

📒 Files selected for processing (10)
  • airbyte_cdk/legacy/sources/declarative/incremental/global_substream_cursor.py (1 hunks)
  • airbyte_cdk/legacy/sources/declarative/incremental/per_partition_cursor.py (1 hunks)
  • airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py (2 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3 hunks)
  • unit_tests/connector_builder/test_connector_builder_handler.py (1 hunks)
  • unit_tests/legacy/sources/declarative/incremental/test_per_partition_cursor.py (1 hunks)
  • unit_tests/legacy/sources/declarative/test_manifest_declarative_source.py (0 hunks)
  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (2 hunks)
  • unit_tests/sources/declarative/partition_routers/test_cartesian_product_partition_router.py (1 hunks)
  • unit_tests/sources/declarative/test_concurrent_declarative_source.py (15 hunks)
💤 Files with no reviewable changes (1)
  • unit_tests/legacy/sources/declarative/test_manifest_declarative_source.py
🚧 Files skipped from review as they are similar to previous changes (6)
  • unit_tests/connector_builder/test_connector_builder_handler.py
  • airbyte_cdk/legacy/sources/declarative/incremental/global_substream_cursor.py
  • unit_tests/legacy/sources/declarative/incremental/test_per_partition_cursor.py
  • unit_tests/sources/declarative/partition_routers/test_cartesian_product_partition_router.py
  • airbyte_cdk/legacy/sources/declarative/incremental/per_partition_cursor.py
  • airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
📚 Learning: 2025-01-14T00:20:32.310Z
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#174
File: airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:1093-1102
Timestamp: 2025-01-14T00:20:32.310Z
Learning: In the `airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py` file, the strict module name checks in `_get_class_from_fully_qualified_class_name` (requiring `module_name` to be "components" and `module_name_full` to be "source_declarative_manifest.components") are intentionally designed to provide early, clear feedback when class declarations won't be found later in execution. These restrictions may be loosened in the future if the requirements for class definition locations change.

Applied to files:

  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
📚 Learning: 2024-11-15T01:04:21.272Z
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.

Applied to files:

  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
  • unit_tests/sources/declarative/test_concurrent_declarative_source.py
📚 Learning: 2024-11-18T23:40:06.391Z
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.

Applied to files:

  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
  • unit_tests/sources/declarative/test_concurrent_declarative_source.py
📚 Learning: 2024-12-11T16:34:46.319Z
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.

Applied to files:

  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
🧬 Code graph analysis (3)
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (5)
airbyte_cdk/legacy/sources/declarative/declarative_stream.py (1)
  • DeclarativeStream (32-241)
airbyte_cdk/legacy/sources/declarative/incremental/per_partition_cursor.py (2)
  • CursorFactory (19-24)
  • PerPartitionCursor (27-367)
airbyte_cdk/legacy/sources/declarative/incremental/datetime_based_cursor.py (1)
  • DatetimeBasedCursor (28-448)
airbyte_cdk/legacy/sources/declarative/incremental/per_partition_with_global.py (1)
  • PerPartitionWithGlobalCursor (22-199)
airbyte_cdk/legacy/sources/declarative/incremental/resumable_full_refresh_cursor.py (1)
  • ResumableFullRefreshCursor (12-93)
unit_tests/sources/declarative/test_concurrent_declarative_source.py (5)
airbyte_cdk/legacy/sources/declarative/declarative_stream.py (1)
  • DeclarativeStream (32-241)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)
  • DeclarativeStream (2406-2473)
  • SimpleRetriever (2786-2840)
airbyte_cdk/sources/declarative/concurrent_declarative_source.py (2)
  • streams (389-421)
  • check (487-505)
airbyte_cdk/legacy/sources/declarative/manifest_declarative_source.py (2)
  • streams (301-338)
  • check (412-414)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (3)
  • state (573-574)
  • state (577-580)
  • SimpleRetriever (53-602)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
airbyte_cdk/legacy/sources/declarative/declarative_stream.py (1)
  • DeclarativeStream (32-241)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)
  • DeclarativeStream (2406-2473)
  • DatetimeBasedCursor (1781-1889)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Analyze (python)
🔇 Additional comments (5)
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (2)

54-54: ConcurrentPerPartitionCursor import path is correct
Verified with a ripgrep search that no imports of ConcurrentPerPartitionCursor remain under airbyte_cdk.legacy; approving the change.


18-25: Import path hygiene confirmed. All DeclarativeStream and cursor imports now reference airbyte_cdk.legacy and no stray non-legacy imports remain. LGTM!

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)

3249-3254: Nice: routing DefaultRequestOptionsProvider through PartitionRouter

Good improvement to avoid request option leakage across partitions. LGTM.


3148-3149: CustomIncrementalSync removal fully addressed

The migration guide in cdk-migrations.md clearly documents the removal in v7.0.0, and a grep over the codebase shows no remaining CustomIncrementalSync references. wdyt, shall we resolve?


36-39: Update imports to legacy path and add deprecation notes

There are still direct imports from airbyte_cdk.sources.declarative.declarative_stream and …incremental (e.g., in unit tests, incremental/__init__.py, model_to_component_factory.py), which should be changed to airbyte_cdk.legacy.sources.declarative equivalents. Also ensure user-facing deprecation warnings are documented in the v7 release notes and relevant docs. wdyt?

⛔ Skipped due to learnings
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#174
File: airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:1093-1102
Timestamp: 2025-01-14T00:20:32.310Z
Learning: In the `airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py` file, the strict module name checks in `_get_class_from_fully_qualified_class_name` (requiring `module_name` to be "components" and `module_name_full` to be "source_declarative_manifest.components") are intentionally designed to provide early, clear feedback when class declarations won't be found later in execution. These restrictions may be loosened in the future if the requirements for class definition locations change.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/spec.json:9-15
Timestamp: 2024-11-15T00:59:08.154Z
Learning: When code in `airbyte_cdk/cli/source_declarative_manifest/` is being imported from another repository, avoid suggesting modifications to it during the import process.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (4)
unit_tests/sources/streams/concurrent/test_cursor.py (1)

23-35: Nice decoupling from legacy cursors.

Switching tests to work directly with ConcurrentCursor and concurrent state converters addresses the past concern about instantiating DatetimeBasedCursor just to extract values. LGTM.

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (3)

373-382: Add a debug log when no streams selected (keep using initialized config).

Totally get the intent to prefer self._config. For observability, can we log when nothing is read to help users debug catalogs? This was mentioned earlier; reiterating just the logging piece, wdyt?

         if len(selected_concurrent_streams) > 0:
             yield from self._concurrent_source.read(selected_concurrent_streams)
+        else:
+            logger.debug("No streams selected in catalog; skipping concurrent read.")

383-386: Discover: acknowledge deprecation and add a gentle log?

Previously suggested using config or self._config; since you’re intentionally deprecating the arg, would adding a debug to signal it’s ignored help avoid confusion, wdyt?

 def discover(self, logger: logging.Logger, config: Mapping[str, Any]) -> AirbyteCatalog:
+        if config and config is not self._config:
+            logger.debug("discover(config=...) is deprecated; using initialized config.")
         return AirbyteCatalog(
             streams=[stream.as_airbyte_stream() for stream in self.streams(config=self._config)]
         )

487-509: Gracefully handle missing/unknown check types and improve error message.

Could we default to CheckStream when check is absent and validate the type mapping to avoid a KeyError? Also, using str(error) yields a cleaner user message than repr(error), wdyt?

-        check = self._source_config.get("check")
-        if not check:
-            raise ValueError(f"Missing 'check' component definition within the manifest.")
+        check = dict(self._source_config.get("check", {}))
         if "type" not in check:
             check["type"] = "CheckStream"
-        connection_checker = self._constructor.create_component(
-            COMPONENTS_CHECKER_TYPE_MAPPING[check["type"]],
+        checker_model = COMPONENTS_CHECKER_TYPE_MAPPING.get(check["type"])
+        if not checker_model:
+            raise ValueError(f"Unknown check type '{check['type']}'. Expected one of {list(COMPONENTS_CHECKER_TYPE_MAPPING.keys())}.")
+        connection_checker = self._constructor.create_component(
+            checker_model,
             check,
             dict(),
             emit_connector_builder_messages=self._emit_connector_builder_messages,
         )
@@
-        check_succeeded, error = connection_checker.check_connection(self, logger, self._config)
+        check_succeeded, error = connection_checker.check_connection(self, logger, self._config)
         if not check_succeeded:
-            return AirbyteConnectionStatus(status=Status.FAILED, message=repr(error))
+            return AirbyteConnectionStatus(status=Status.FAILED, message=str(error))
         return AirbyteConnectionStatus(status=Status.SUCCEEDED)
🧹 Nitpick comments (14)
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (2)

18-19: Legacy imports look right; optionally avoid runtime dependency on DeclarativeStream

Shifting DeclarativeStream/DatetimeBasedCursor to legacy matches the v7 plan. Since DeclarativeStream is only used for typing here, would you guard that import behind TYPE_CHECKING to avoid importing legacy code at runtime, wdyt?

Apply within-range change:

-from airbyte_cdk.legacy.sources.declarative.declarative_stream import DeclarativeStream
+if TYPE_CHECKING:
+    from airbyte_cdk.legacy.sources.declarative.declarative_stream import DeclarativeStream

And outside this hunk, update typing import and the annotation:

# near line 9
from typing import Any, Iterable, Mapping, Optional, Union, TYPE_CHECKING

# near lines 4664-4670
def get_retriever(stream: Union["DeclarativeStream", DefaultStream]):
    ...

742-742: Fix nit: TODO wording mismatches (class vs test) and tie to v7

This references a “class” but the next item is a test function. Update the note and mention the v7 removal for clarity, wdyt?

-# todo: delete this class once we deprecate SimpleRetriever.cursor and SimpleRetriever.state methods
+# TODO: delete this test once we deprecate SimpleRetriever.cursor and SimpleRetriever.state methods (CDK v7)
unit_tests/sources/streams/concurrent/test_cursor.py (6)

53-60: Defaulting _stream_name to a Mock can mask issues; default to a real name instead.

Returning a Mock from partition.stream_name() can leak into assertions/logs; using the stable _A_STREAM_NAME is safer. Wdyt?

-def _partition(
-    _slice: Optional[Mapping[str, Any]], _stream_name: Optional[str] = Mock()
-) -> Partition:
+def _partition(
+    _slice: Optional[Mapping[str, Any]], _stream_name: str = _A_STREAM_NAME
+) -> Partition:
     partition = Mock(spec=Partition)
     partition.to_slice.return_value = _slice
     partition.stream_name.return_value = _stream_name
     return partition

62-70: Avoid defaulting parameters to Mock instances.

Default arguments are evaluated once; the same Mock will be reused across calls. Let’s default to None and create a partition on demand, ensuring isolation. Wdyt?

-def _record(
-    cursor_value: CursorValueType, partition: Optional[Partition] = Mock(spec=Partition)
-) -> Record:
+def _record(
+    cursor_value: CursorValueType, partition: Optional[Partition] = None
+) -> Record:
+    if partition is None:
+        partition = _partition(_NO_SLICE)
     return Record(
         data={_A_CURSOR_FIELD_KEY: cursor_value},
         associated_slice=partition.to_slice(),
         stream_name=_A_STREAM_NAME,
     )

1214-1221: Unify state_type to the enum value for consistency.

There’s a mix of "date-range" string literals and ConcurrencyCompatibleStateType.date_range.value. Prefer the enum for resilience to typos/refactors. Apply similarly to other occurrences in this file, wdyt?

-        {
-            "slices": [],
-            "state_type": "date-range",
-        },
+        {
+            "slices": [],
+            "state_type": ConcurrencyCompatibleStateType.date_range.value,
+        },

1009-1034: Add a type hint for end_provider in _cursor helper.

It’s passed a ClampingEndProvider; hinting improves readability and editor help. Wdyt?

-    def _cursor(
+    def _cursor(
         self,
         start: datetime,
-        end_provider,
+        end_provider: ClampingEndProvider,
         slice_range: timedelta,
         granularity: Optional[timedelta],
         clamping_strategy: ClampingStrategy,
     ) -> ConcurrentCursor:

1199-1210: Assert style nit: prefer assert/assert not over == True/False.

Cleaner and yields better failure messages. Wdyt?

-    assert (
-        cursor.should_be_synced(
-            Record(data={_A_CURSOR_FIELD_KEY: state_value - 1}, stream_name="test_stream")
-        )
-        == False
-    )
-    assert (
-        cursor.should_be_synced(
-            Record(data={_A_CURSOR_FIELD_KEY: state_value}, stream_name="test_stream")
-        )
-        == True
-    )
+    assert not cursor.should_be_synced(
+        Record(data={_A_CURSOR_FIELD_KEY: state_value - 1}, stream_name="test_stream")
+    )
+    assert cursor.should_be_synced(
+        Record(data={_A_CURSOR_FIELD_KEY: state_value}, stream_name="test_stream")
+    )

258-349: Consider a small factory to reduce repeated ConcurrentCursor construction.

Many tests inline nearly identical constructors; extracting a parameterized factory (start/state/slice_range/granularity) would trim noise and make intent pop. Want me to draft it, wdyt?

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (6)

245-257: Docstring return type mismatch.

_pre_process_manifest returns a dict but the docstring says None. Can we fix the docstring to prevent confusion in readers and tools, wdyt?

-        Returns:
-            None
+        Returns:
+            Dict[str, Any]: The fully resolved and transformed manifest.

290-295: Docstring claims a return value but method returns None.

The method mutates state in place and returns nothing. Adjusting the docstring would remove ambiguity, wdyt?

-        The original manifest is returned if any error occurs during migration.
+        Mutates self._source_config in place. If migration fails, the previous manifest remains.

346-349: Name the effective config and warn about deprecation.

Since the goal is to prefer self._config, would you like to (a) avoid shadowing the param and (b) emit a deprecation hint if the caller passes config which is ignored, wdyt?

-    def configure(self, config: Mapping[str, Any], temp_dir: str) -> Mapping[str, Any]:
-        config = self._config or config
-        return super().configure(config, temp_dir)
+    def configure(self, config: Mapping[str, Any], temp_dir: str) -> Mapping[str, Any]:
+        effective_config = self._config or config
+        if config and config is not self._config:
+            self.logger.debug("configure(config=...) is deprecated; using initialized config.")
+        return super().configure(effective_config, temp_dir)

388-401: Tighten signature and make the unused param explicit?

Now that this class only deals with concurrent AbstractStreams, would you like to (a) narrow the return type and (b) mark the config param as intentionally unused to reduce confusion, wdyt?

-    def streams(self, config: Mapping[str, Any]) -> List[Union[Stream, AbstractStream]]:  # type: ignore  # we are migrating away from the AbstractSource and are expecting that this will only be called by ConcurrentDeclarativeSource or the Connector Builder
+    def streams(self, _config: Mapping[str, Any]) -> List[AbstractStream]:  # We only return concurrent-compatible streams
@@
-        if self._spec_component:
-            self._spec_component.validate_config(self._config)
+        if self._spec_component:
+            self._spec_component.validate_config(self._config)

If external callers still rely on Union[Stream, AbstractStream], we can keep the return type but add a docstring note that only concurrent-compatible streams are produced now. Wdyt?

Also applies to: 409-421


525-531: Pass manifest parameters into ConditionalStreams evaluation?

InterpolatedBoolean(..., parameters={}) ignores manifest-level params. Should we forward self._source_config.get("parameters", {}) (or per-stream parameters) so conditions can reference them, wdyt?

-                interpolated_boolean = InterpolatedBoolean(
-                    condition=current_stream_config.get("condition"),
-                    parameters={},
-                )
+                interpolated_boolean = InterpolatedBoolean(
+                    condition=current_stream_config.get("condition"),
+                    parameters=self._source_config.get("parameters", {}),
+                )

378-381: Optional: track the “empty streams causes loop” with a TODO or issue link.

Since this notes an upstream behavior, would you like to add a TODO with an issue reference so we don’t lose it during the v7 migration, wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c8f2c12 and bbaf25a.

📒 Files selected for processing (3)
  • airbyte_cdk/sources/declarative/concurrent_declarative_source.py (8 hunks)
  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (3 hunks)
  • unit_tests/sources/streams/concurrent/test_cursor.py (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
📚 Learning: 2024-11-15T01:04:21.272Z
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.

Applied to files:

  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
📚 Learning: 2025-01-14T00:20:32.310Z
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#174
File: airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:1093-1102
Timestamp: 2025-01-14T00:20:32.310Z
Learning: In the `airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py` file, the strict module name checks in `_get_class_from_fully_qualified_class_name` (requiring `module_name` to be "components" and `module_name_full` to be "source_declarative_manifest.components") are intentionally designed to provide early, clear feedback when class declarations won't be found later in execution. These restrictions may be loosened in the future if the requirements for class definition locations change.

Applied to files:

  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
  • airbyte_cdk/sources/declarative/concurrent_declarative_source.py
📚 Learning: 2024-11-18T23:40:06.391Z
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.

Applied to files:

  • airbyte_cdk/sources/declarative/concurrent_declarative_source.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Analyze (python)
🔇 Additional comments (3)
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (1)

48-48: Update deprecated declarative.incremental imports
Multiple tests and source files still import symbols (ConcurrentPerPartitionCursor, CursorFactory, PerPartitionWithGlobalCursor, ResumableFullRefreshCursor) from airbyte_cdk.sources.declarative.incremental. Can you migrate these to the new export paths or remove legacy exports? wdyt?

⛔ Skipped due to learnings
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
unit_tests/sources/streams/concurrent/test_cursor.py (1)

505-560: Confirm inclusive/exclusive upper-bound semantics with granularity.

With a 1-second granularity, earlier slices use [20,29]/[30,39] but the last slice is [40,50] (not [40,49]). Is the final upper bound intentionally inclusive, or should all be consistent? Can you double-check against ConcurrentCursor’s contract, wdyt?

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (1)

46-47: No logger fallback needed

The base Source class defines a @property name returning the class’s __name__, so self.name is always available at initialization. We can safely use logging.getLogger(f"airbyte.{self.name}") without a fallback—wdyt?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
airbyte_cdk/sources/declarative/concurrent_declarative_source.py (2)

372-381: Add a debug log when no streams are selected to avoid silent no-op

When selected_concurrent_streams is empty, read returns without emitting anything. Could we log a debug line to aid troubleshooting, without changing the config handling? Wdyt?

Apply this diff:

         if len(selected_concurrent_streams) > 0:
             yield from self._concurrent_source.read(selected_concurrent_streams)
+        else:
+            self.logger.debug("No streams selected in catalog; skipping concurrent read.")

486-508: Prefer a backward-compatible default for missing 'check' block (or confirm v7 intention)

Currently a missing check raises, which could break existing manifests that relied on implicit CheckStream. Do we want to default to CheckStream when check is absent to preserve behavior, or is this an intentional v7 break? If it’s intentional, can we mention it in the migration notes? Wdyt?

Apply this diff if we want the non-breaking path:

-        check = self._source_config.get("check")
-        if not check:
-            raise ValueError(f"Missing 'check' component definition within the manifest.")
+        check = dict(self._source_config.get("check", {}))
+        # Default to CheckStream if not provided, for compatibility
+        if "type" not in check:
+            check["type"] = "CheckStream"
🧹 Nitpick comments (28)
unit_tests/sources/declarative/extractors/test_dpath_extractor.py (1)

11-12: Unify imports via package init for consistency?

If JsonLineParser is re-exported from airbyte_cdk.sources.declarative.decoders.__init__, would you like to import it from the same package path to keep all decoder imports uniform, wdyt?

-from airbyte_cdk.sources.declarative.decoders import CompositeRawDecoder, Decoder
-from airbyte_cdk.sources.declarative.decoders.composite_raw_decoder import JsonLineParser
+from airbyte_cdk.sources.declarative.decoders import CompositeRawDecoder, Decoder, JsonLineParser
unit_tests/sources/streams/concurrent/test_partition_reader.py (4)

32-44: Deflake: add a timeout and assert the queue is empty to truly enforce “only sentinel”

Queue.get() can block forever if a regression drops the sentinel. Also, the test doesn’t currently assert that nothing else was queued. Shall we add a timeout and an emptiness check, wdyt?

-        while queue_item := self._queue.get():
-            if not isinstance(queue_item, PartitionCompleteSentinel):
-                pytest.fail("Only one PartitionCompleteSentinel is expected")
-            break
+        queue_item = self._queue.get(timeout=2)
+        if not isinstance(queue_item, PartitionCompleteSentinel):
+            pytest.fail("Only one PartitionCompleteSentinel is expected")
+        assert self._queue.empty()

62-67: Stabilize mock expectations with spec and explicit stream name

Using a bare Mock() may yield surprising equality behavior for partition.stream_name() in StreamThreadException. Would you set a spec and a concrete return value to make this test stricter and less brittle, wdyt?

-        partition = Mock()
+        partition = Mock(spec=Partition)
+        partition.stream_name.return_value = "stream"
         cursor = Mock()
         exception = ValueError()
         partition.read.side_effect = self._read_with_exception(_RECORDS, exception)
         self._partition_reader.process_partition(partition, cursor)
@@
-        assert queue_content == _RECORDS + [
-            StreamThreadException(exception, partition.stream_name()),
+        assert queue_content == _RECORDS + [
+            StreamThreadException(exception, "stream"),
             PartitionCompleteSentinel(partition),
         ]

Also applies to: 70-73


114-121: Deflake: avoid potential infinite wait in _consume_queue

Same blocking-risk here; adding a timeout and handling Empty keeps the test from hanging on regressions. Good to keep CI resilient, wdyt?

-from queue import Queue
+from queue import Queue, Empty
@@
-        while queue_item := self._queue.get():
-            queue_content.append(queue_item)
-            if isinstance(queue_item, PartitionCompleteSentinel):
-                break
+        while True:
+            try:
+                queue_item = self._queue.get(timeout=2)
+            except Empty:
+                pytest.fail("Timed out waiting for queue item")
+            queue_content.append(queue_item)
+            if isinstance(queue_item, PartitionCompleteSentinel):
+                break

Also applies to: 4-4


75-78: Naming nit: align with “partition” terminology

Test name still says “close_slice.” For consistency with the code and sentinel type, shall we rename to test_given_exception_from_close_partition_when_process_partition_then_queue_records_and_exception_and_sentinel, wdyt?

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (4)

382-385: Document and surface that discover ignores the incoming config

Given the plan to deprecate the config parameter, could we log once at debug level when a non-empty config differs from self._config, and add a short docstring note so callers aren’t surprised? Wdyt?

Apply this diff:

 def discover(self, logger: logging.Logger, config: Mapping[str, Any]) -> AirbyteCatalog:
-        return AirbyteCatalog(
+        # NOTE: discover ignores the incoming `config` and relies on the initialized config.
+        if config and config != self._config:
+            self.logger.debug("discover(config=...) argument is ignored; using initialized config.")
+        return AirbyteCatalog(
             streams=[stream.as_airbyte_stream() for stream in self.streams(config=self._config)]
         )

504-507: Use str(error) for user-facing message

repr(error) can include noisy type wrappers. Would str(error) be cleaner for end users?

Apply this diff:

-        if not check_succeeded:
-            return AirbyteConnectionStatus(status=Status.FAILED, message=repr(error))
+        if not check_succeeded:
+            return AirbyteConnectionStatus(status=Status.FAILED, message=str(error))

566-568: Be defensive when toggling use_cache in dynamic resolver

Some manifests may omit requester under retriever. Should we guard these writes to avoid KeyErrors on misconfigurations, while still keeping the happy path unchanged?

Apply this diff:

-            if "retriever" in components_resolver_config:
-                components_resolver_config["retriever"]["requester"]["use_cache"] = True
+            retriever_cfg = components_resolver_config.get("retriever")
+            if isinstance(retriever_cfg, dict):
+                requester_cfg = retriever_cfg.get("requester")
+                if isinstance(requester_cfg, dict):
+                    requester_cfg["use_cache"] = True

463-473: Optional: safer cache flagging for parent streams

Direct indexing into nested structures assumes exact shapes. Would using get/type checks reduce surprises from schema drift in manifests, while leaving behavior intact?

Apply this diff:

-                if stream_config["type"] == "StateDelegatingStream":
-                    stream_config["full_refresh_stream"]["retriever"]["requester"]["use_cache"] = (
-                        True
-                    )
-                    stream_config["incremental_stream"]["retriever"]["requester"]["use_cache"] = (
-                        True
-                    )
-                else:
-                    stream_config["retriever"]["requester"]["use_cache"] = True
+                if stream_config.get("type") == "StateDelegatingStream":
+                    fr = stream_config.get("full_refresh_stream", {}).get("retriever", {}).get("requester")
+                    inc = stream_config.get("incremental_stream", {}).get("retriever", {}).get("requester")
+                    if isinstance(fr, dict):
+                        fr["use_cache"] = True
+                    if isinstance(inc, dict):
+                        inc["use_cache"] = True
+                else:
+                    requester = stream_config.get("retriever", {}).get("requester")
+                    if isinstance(requester, dict):
+                        requester["use_cache"] = True
airbyte_cdk/sources/declarative/requesters/http_job_repository.py (6)

143-149: Prefer is None over truthiness to avoid conflating non-2xx Responses with “no response.”

This keeps semantics consistent with _get_validated_polling_response. Shall we align it, wdyt?

-        if not response:
+        if response is None:
             raise AirbyteTracedException(
                 internal_message="Always expect a response or an exception from creation_requester",
                 failure_type=FailureType.system_error,
             )

62-74: Fix “pooling” typos in docstring.

Minor text nit for clarity, wdyt?

-        Validates and retrieves the pooling response for a given stream slice.
+        Validates and retrieves the polling response for a given stream slice.
@@
-            stream_slice (StreamSlice): The stream slice to send the pooling request for.
+            stream_slice (StreamSlice): The stream slice to send the polling request for.
@@
-            requests.Response: The validated pooling response.
+            requests.Response: The validated polling response.

286-297: PEP 8 membership style and tiny readability polish.

Flip membership checks for readability; no behavior change, wdyt?

-        if not "headers" in creation_response_context:
+        if "headers" not in creation_response_context:
             creation_response_context["headers"] = self._create_job_response_by_id[
                 job.api_job_id()
             ].headers
-        if not "request" in creation_response_context:
+        if "request" not in creation_response_context:
             creation_response_context["request"] = self._create_job_response_by_id[
                 job.api_job_id()
             ].request
-        if not "headers" in polling_response_context:
+        if "headers" not in polling_response_context:
             polling_response_context["headers"] = self._polling_job_response_by_id[
                 job.api_job_id()
             ].headers
-        if not "request" in polling_response_context:
+        if "request" not in polling_response_context:
             polling_response_context["request"] = self._polling_job_response_by_id[
                 job.api_job_id()
             ].request

Also applies to: 312-320


259-269: Typo: delete_job_reponsedelete_job_response.

Purely cosmetic but helps grep-ability, wdyt?

-        delete_job_reponse = self.delete_requester.send_request(
+        delete_job_response = self.delete_requester.send_request(
             stream_slice=self._get_create_job_stream_slice(job),
-            log_formatter=lambda delete_job_reponse: format_http_message(
-                response=delete_job_reponse,
+            log_formatter=lambda delete_job_response: format_http_message(
+                response=delete_job_response,
                 title="Async Job -- Delete",
                 description="Delete the specified job from the list of Jobs.",
                 stream_name=None,
                 is_auxiliary=True,
                 type="ASYNC_DELETE",
             ),
         )

273-275: Avoid KeyError on cleanup using pop(..., None).

This is safer if delete() is called before a polling response is recorded, wdyt?

-        del self._create_job_response_by_id[job_id]
-        del self._polling_job_response_by_id[job_id]
+        self._create_job_response_by_id.pop(job_id, None)
+        self._polling_job_response_by_id.pop(job_id, None)

237-237: Remove no-op yield from [].

Redundant; can be dropped, wdyt?

-        yield from []
+        # no-op
unit_tests/sources/declarative/requesters/error_handlers/backoff_strategies/test_wait_time_from_header.py (4)

21-46: Drop the unused test_name arg and use pytest ids for readability.

The test_name param isn’t used and duplicates some labels. Using pytest.param(..., id=...) keeps parametrization self-documenting. Apply?

-@pytest.mark.parametrize(
-    "test_name, header, header_value, regex, expected_backoff_time",
-    [
-        ("test_wait_time_from_header", "wait_time", SOME_BACKOFF_TIME, None, SOME_BACKOFF_TIME),
-        ("test_wait_time_from_header_string", "wait_time", "60", None, SOME_BACKOFF_TIME),
-        (
-            "test_wait_time_from_header_parameters",
-            "{{ parameters['wait_time'] }}",
-            "60",
-            None,
-            SOME_BACKOFF_TIME,
-        ),
-        (
-            "test_wait_time_from_header_config",
-            "{{ config['wait_time'] }}",
-            "60",
-            None,
-            SOME_BACKOFF_TIME,
-        ),
-        ("test_wait_time_from_header_not_a_number", "wait_time", "61,60", None, None),
-        ("test_wait_time_from_header_with_regex", "wait_time", "61,60", r"([-+]?\d+)", 61),  # noqa
-        ("test_wait_time_fœrom_header_with_regex_no_match", "wait_time", "...", "[-+]?\d+", None),  # noqa
-        ("test_wait_time_from_header", "absent_header", None, None, None),
-    ],
-)
-def test_wait_time_from_header(test_name, header, header_value, regex, expected_backoff_time):
+@pytest.mark.parametrize(
+    "header, header_value, regex, expected_backoff_time",
+    [
+        pytest.param("wait_time", SOME_BACKOFF_TIME, None, SOME_BACKOFF_TIME, id="int-header"),
+        pytest.param("wait_time", "60", None, SOME_BACKOFF_TIME, id="str-header"),
+        pytest.param("{{ parameters['wait_time'] }}", "60", None, SOME_BACKOFF_TIME, id="parameter-ref"),
+        pytest.param("{{ config['wait_time'] }}", "60", None, SOME_BACKOFF_TIME, id="config-ref"),
+        pytest.param("wait_time", "61,60", None, None, id="not-a-number"),
+        pytest.param("wait_time", "61,60", r"([-+]?\d+)", 61, id="regex-first-number"),
+        pytest.param("wait_time", "...", "[-+]?\d+", None, id="regex-no-match"),
+        pytest.param("absent_header", None, None, None, id="missing-header"),
+    ],
+)
+def test_wait_time_from_header(header, header_value, regex, expected_backoff_time):

Also applies to: 46-46


41-44: Minor label nits: fix typo and duplicate label while you’re here?

The id string “test_wait_time_fœrom_header_with_regex_no_match” includes a typo (“fœrom”), and “test_wait_time_from_header” appears twice. If you adopt parametrization ids (prior comment), these go away—otherwise, shall we correct them, wdyt?


47-49: Mimic requests’ case-insensitive headers to avoid surprises.

Requests’ Response.headers is a CaseInsensitiveDict. Using it in the mock keeps behavior realistic. Update?

+from requests.structures import CaseInsensitiveDict
@@
-    response_mock = MagicMock(spec=Response)
-    response_mock.headers = {"wait_time": header_value}
+    response_mock = MagicMock(spec=Response)
+    response_mock.headers = CaseInsensitiveDict({"wait_time": header_value})

70-80: Add boundary test for == max_waiting_time_in_seconds.

You cover “< max” and “> max”. Adding “== max” locks intended behavior. Should it return max or raise? Suggest returning max. Add?

+def test_given_retry_after_equal_to_max_time_then_use_max_time():
+    response_mock = MagicMock(spec=Response)
+    response_mock.headers = {_A_RETRY_HEADER: str(_A_MAX_TIME)}
+    backoff_strategy = WaitTimeFromHeaderBackoffStrategy(
+        header=_A_RETRY_HEADER, max_waiting_time_in_seconds=_A_MAX_TIME, parameters={}, config={}
+    )
+    assert backoff_strategy.backoff_time(response_mock, 1) == _A_MAX_TIME
unit_tests/sources/declarative/async_job/test_job_orchestrator.py (5)

59-64: Fix misnamed test: it returns COMPLETED, not RUNNING.

Rename for clarity to match the asserted status, wdyt?

-def test_given_only_completed_jobs_when_status_then_return_running(self) -> None:
+def test_given_only_completed_jobs_when_status_then_return_completed(self) -> None:

320-361: This method won’t run (missing test_ prefix) and is timing-sensitive; rename and patch sleep?

Renaming ensures pytest collects it; patching sleep brings it in line with adjacent tests and reduces flakiness. Apply?

-    def given_budget_already_taken_before_start_when_create_and_get_completed_partitions_then_wait_for_budget_to_be_freed(
-        self,
-    ) -> None:
+    @mock.patch(sleep_mock_target)
+    def test_given_budget_already_taken_before_start_when_create_and_get_completed_partitions_then_wait_for_budget_to_be_freed(
+        self, mock_sleep: MagicMock
+    ) -> None:
@@
-        threading.Thread(target=wait_and_free_intent, args=[job_tracker, intent_to_free]).start()
-        partitions = list(orchestrator.create_and_get_completed_partitions())
+        t = threading.Thread(target=wait_and_free_intent, args=[job_tracker, intent_to_free])
+        t.start()
+        partitions = list(orchestrator.create_and_get_completed_partitions())
+        t.join()

26-29: Use real StreamSlice objects to exercise types and reduce mocking.

Switching from bare Mock()s to minimal StreamSlice instances tightens the contract and future-proofs assertions. Interested?

-_ANY_STREAM_SLICE = Mock()
-_A_STREAM_SLICE = Mock()
-_ANOTHER_STREAM_SLICE = Mock()
+_ANY_STREAM_SLICE = StreamSlice(partition={"p": "any"}, cursor_slice={"c": 0})
+_A_STREAM_SLICE = StreamSlice(partition={"p": "a"}, cursor_slice={"c": 1})
+_ANOTHER_STREAM_SLICE = StreamSlice(partition={"p": "b"}, cursor_slice={"c": 2})

31-31: Trim the buffer to speed tests.

_BUFFER = 10000 may slow the tight loop; could we drop to 100 without losing intent, wdyt?

-_BUFFER = 10000  # this buffer allows us to be unconcerned with the number of times the update status is called
+_BUFFER = 100  # sufficient buffer for status updates in tests

66-78: Type hint consistency: prefer typing.Set (or drop the import).

You import Set but annotate with set[...]. For consistency (and older tooling), shall we switch to Set[AsyncJob], wdyt?

-) -> Callable[[set[AsyncJob]], None]:
+) -> Callable[[Set[AsyncJob]], None]:
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (2)

742-742: Nit: fix the TODO wording (it’s a test, not a class).

Minor clarity tweak so future readers don’t trip over this. Also, would you like to track this with a GH issue ID in the comment, wdyt?

-# todo: delete this class once we deprecate SimpleRetriever.cursor and SimpleRetriever.state methods
+# TODO: delete this test once we deprecate SimpleRetriever.cursor and SimpleRetriever.state methods

17-18: Add a re-export for DeclarativeStream and consider aliasing legacy imports?

  • DeclarativeStream is defined in legacy/sources/declarative/declarative_stream.py but not re-exported from legacy/sources/declarative/__init__.py, so your explicit import is required—would you add
    from .declarative_stream import DeclarativeStream
    to the package init to mirror the incremental export?
  • DatetimeBasedCursor is already re-exported in legacy/sources/declarative/incremental/__init__.py, so that import works. To avoid confusion with its model counterpart, would you alias it as
    from airbyte_cdk.legacy.sources.declarative.incremental import DatetimeBasedCursor as LegacyDatetimeBasedCursor
    wdyt?
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)

1973-1976: Return type now AbstractStream — align annotations and drop unused import

create_default_stream now returns AbstractStream. However, create_state_delegating_stream is still annotated to return DeclarativeStream while it will also produce DefaultStream via the same mapping. To avoid confusion and type ignores, shall we update the annotation and remove the now-unneeded DeclarativeStream import, wdyt?

Apply:

- from airbyte_cdk.legacy.sources.declarative.declarative_stream import DeclarativeStream
+ # (import no longer needed if not referenced elsewhere)

@@
-    def create_state_delegating_stream(
+    def create_state_delegating_stream(
         self,
         model: StateDelegatingStreamModel,
         config: Config,
         has_parent_state: Optional[bool] = None,
         **kwargs: Any,
-    ) -> DeclarativeStream:
+    ) -> AbstractStream:
@@
-        return self._create_component_from_model(stream_model, config=config, **kwargs)  # type: ignore[no-any-return]  # DeclarativeStream will be created as stream_model is alwyas DeclarativeStreamModel
+        return self._create_component_from_model(stream_model, config=config, **kwargs)

Also applies to: 3381-3395, 36-39


2616-2633: Stub for IncrementingCountCursor: prefer fail-fast or a stable epoch baseline

This stub returns a DatetimeBasedCursor with a hard-coded 2024-12-12 start. That can cause surprising behavior if it’s accidentally used outside tests. Would you either fail fast or choose a neutral epoch to minimize impact, wdyt?

Option A (fail fast):

 def create_incrementing_count_cursor(
     model: IncrementingCountCursorModel, config: Config, **kwargs: Any
 ) -> DatetimeBasedCursor:
-    # This should not actually get used anywhere at runtime, but needed to add this to pass checks since
-    # we still parse models into components. The issue is that there's no runtime implementation of a
-    # IncrementingCountCursor.
-    # A known and expected issue with this stub is running a check with the declared IncrementingCountCursor because it is run without ConcurrentCursor.
-    return DatetimeBasedCursor(
-        cursor_field=model.cursor_field,
-        datetime_format="%Y-%m-%d",
-        start_datetime="2024-12-12",
-        config=config,
-        parameters={},
-    )
+    # This path should never be hit at runtime; IncrementingCount is only supported via ConcurrentCursor.
+    raise NotImplementedError("IncrementingCountCursor has no legacy runtime; use concurrent path.")

Option B (safer default if fail-fast breaks checks):

-        start_datetime="2024-12-12",
+        start_datetime="1970-01-01",

[submitter can pick one]

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between bbaf25a and c63ca4a.

📒 Files selected for processing (13)
  • airbyte_cdk/sources/declarative/concurrent_declarative_source.py (9 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4 hunks)
  • airbyte_cdk/sources/declarative/requesters/http_job_repository.py (2 hunks)
  • airbyte_cdk/sources/declarative/transformations/keys_replace_transformation.py (1 hunks)
  • unit_tests/sources/declarative/async_job/test_integration.py (1 hunks)
  • unit_tests/sources/declarative/async_job/test_job_orchestrator.py (1 hunks)
  • unit_tests/sources/declarative/decoders/test_decoders_memory_usage.py (1 hunks)
  • unit_tests/sources/declarative/extractors/test_dpath_extractor.py (1 hunks)
  • unit_tests/sources/declarative/interpolation/test_jinja.py (1 hunks)
  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (4 hunks)
  • unit_tests/sources/declarative/requesters/error_handlers/backoff_strategies/test_wait_time_from_header.py (1 hunks)
  • unit_tests/sources/declarative/retrievers/test_simple_retriever.py (1 hunks)
  • unit_tests/sources/streams/concurrent/test_partition_reader.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • airbyte_cdk/sources/declarative/transformations/keys_replace_transformation.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • unit_tests/sources/declarative/retrievers/test_simple_retriever.py
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
📚 Learning: 2024-11-18T23:40:06.391Z
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.

Applied to files:

  • unit_tests/sources/declarative/decoders/test_decoders_memory_usage.py
  • unit_tests/sources/declarative/async_job/test_integration.py
  • airbyte_cdk/sources/declarative/concurrent_declarative_source.py
📚 Learning: 2024-11-15T00:59:08.154Z
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/spec.json:9-15
Timestamp: 2024-11-15T00:59:08.154Z
Learning: When code in `airbyte_cdk/cli/source_declarative_manifest/` is being imported from another repository, avoid suggesting modifications to it during the import process.

Applied to files:

  • unit_tests/sources/declarative/decoders/test_decoders_memory_usage.py
  • unit_tests/sources/declarative/async_job/test_integration.py
📚 Learning: 2024-11-15T01:04:21.272Z
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.

Applied to files:

  • unit_tests/sources/declarative/decoders/test_decoders_memory_usage.py
  • unit_tests/sources/declarative/async_job/test_job_orchestrator.py
  • unit_tests/sources/declarative/async_job/test_integration.py
  • airbyte_cdk/sources/declarative/concurrent_declarative_source.py
  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
📚 Learning: 2024-12-11T16:34:46.319Z
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.

Applied to files:

  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
📚 Learning: 2025-01-14T00:20:32.310Z
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#174
File: airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:1093-1102
Timestamp: 2025-01-14T00:20:32.310Z
Learning: In the `airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py` file, the strict module name checks in `_get_class_from_fully_qualified_class_name` (requiring `module_name` to be "components" and `module_name_full` to be "source_declarative_manifest.components") are intentionally designed to provide early, clear feedback when class declarations won't be found later in execution. These restrictions may be loosened in the future if the requirements for class definition locations change.

Applied to files:

  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
  • airbyte_cdk/sources/declarative/concurrent_declarative_source.py
  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
🧬 Code graph analysis (11)
unit_tests/sources/declarative/decoders/test_decoders_memory_usage.py (1)
airbyte_cdk/sources/declarative/yaml_declarative_source.py (1)
  • YamlDeclarativeSource (17-69)
unit_tests/sources/declarative/requesters/error_handlers/backoff_strategies/test_wait_time_from_header.py (1)
airbyte_cdk/utils/traced_exception.py (1)
  • AirbyteTracedException (25-145)
unit_tests/sources/declarative/async_job/test_job_orchestrator.py (2)
airbyte_cdk/sources/types.py (1)
  • StreamSlice (75-169)
airbyte_cdk/utils/traced_exception.py (1)
  • AirbyteTracedException (25-145)
unit_tests/sources/declarative/interpolation/test_jinja.py (1)
airbyte_cdk/sources/types.py (1)
  • StreamSlice (75-169)
unit_tests/sources/declarative/async_job/test_integration.py (4)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (3)
  • DeclarativeStream (2406-2473)
  • AsyncRetriever (2843-2946)
  • InlineSchemaLoader (825-832)
airbyte_cdk/sources/abstract_source.py (1)
  • AbstractSource (52-326)
airbyte_cdk/sources/declarative/partition_routers/single_partition_router.py (1)
  • SinglePartitionRouter (13-57)
airbyte_cdk/sources/types.py (1)
  • StreamSlice (75-169)
unit_tests/sources/declarative/extractors/test_dpath_extractor.py (2)
airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py (1)
  • CompositeRawDecoder (130-221)
airbyte_cdk/sources/declarative/decoders/decoder.py (1)
  • Decoder (15-32)
airbyte_cdk/sources/declarative/requesters/http_job_repository.py (2)
airbyte_cdk/models/airbyte_protocol.py (1)
  • AirbyteMessage (79-88)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
  • FailureType (743-746)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)
airbyte_cdk/legacy/sources/declarative/declarative_stream.py (1)
  • DeclarativeStream (32-241)
airbyte_cdk/legacy/sources/declarative/incremental/datetime_based_cursor.py (1)
  • DatetimeBasedCursor (28-448)
airbyte_cdk/sources/streams/concurrent/abstract_stream.py (1)
  • AbstractStream (21-96)
unit_tests/sources/streams/concurrent/test_partition_reader.py (1)
airbyte_cdk/sources/message/repository.py (1)
  • InMemoryMessageRepository (74-95)
airbyte_cdk/sources/declarative/concurrent_declarative_source.py (3)
airbyte_cdk/sources/source.py (1)
  • Source (55-95)
airbyte_cdk/sources/abstract_source.py (4)
  • streams (74-79)
  • read (101-211)
  • check (92-99)
  • check_connection (59-71)
airbyte_cdk/sources/declarative/checks/connection_checker.py (2)
  • ConnectionChecker (12-38)
  • check_connection (18-38)
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (3)
airbyte_cdk/legacy/sources/declarative/declarative_stream.py (1)
  • DeclarativeStream (32-241)
airbyte_cdk/legacy/sources/declarative/incremental/datetime_based_cursor.py (1)
  • DatetimeBasedCursor (28-448)
airbyte_cdk/utils/traced_exception.py (1)
  • AirbyteTracedException (25-145)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
🔇 Additional comments (14)
unit_tests/sources/declarative/extractors/test_dpath_extractor.py (2)

11-11: Import path consolidation LGTM

Switching to the package-level airbyte_cdk.sources.declarative.decoders import reads cleaner and matches the refactor direction. Nice touch.


11-11: Consolidate deep decoder imports
I’m still seeing direct imports from airbyte_cdk.sources.declarative.decoders.decoder or …decoders.composite_raw_decoder in these files:
• unit_tests/sources/declarative/extractors/test_dpath_extractor.py
• unit_tests/sources/declarative/decoders/test_json_decoder.py
• airbyte_cdk/sources/declarative/decoders/xml_decoder.py
• airbyte_cdk/sources/declarative/decoders/noop_decoder.py
• airbyte_cdk/sources/declarative/decoders/json_decoder.py
• airbyte_cdk/sources/declarative/decoders/zipfile_decoder.py
• airbyte_cdk/sources/declarative/auth/token_provider.py
• airbyte_cdk/sources/declarative/decoders/init.py

Could we switch these to the package-level imports (e.g. from airbyte_cdk.sources.declarative.decoders import Decoder, CompositeRawDecoder, JsonLineParser)? wdyt?

⛔ Skipped due to learnings
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/spec.json:9-15
Timestamp: 2024-11-15T00:59:08.154Z
Learning: When code in `airbyte_cdk/cli/source_declarative_manifest/` is being imported from another repository, avoid suggesting modifications to it during the import process.
unit_tests/sources/declarative/interpolation/test_jinja.py (1)

12-12: Import path shift to sources.types looks good

The switch to airbyte_cdk.sources.types.StreamSlice aligns with the new typing module layout; no test logic change. LGTM.

unit_tests/sources/declarative/decoders/test_decoders_memory_usage.py (1)

16-16: YamlDeclarativeSource import relocation is correct

Using airbyte_cdk.sources.declarative.yaml_declarative_source matches the refactor. No other changes needed here.

unit_tests/sources/declarative/async_job/test_integration.py (1)

8-10: Imports updated to new public/legacy paths — LGTM

  • DeclarativeStream from legacy path, AbstractSource from airbyte_cdk.sources, and SinglePartitionRouter/Stream/StreamSlice from their new homes all look consistent with the v7 layout.

Also applies to: 18-18, 27-28

airbyte_cdk/sources/declarative/concurrent_declarative_source.py (1)

128-128: Switching to Source (vs AbstractSource) matches PR goals

The inheritance change is consistent with the v7 surface area reduction and keeps concerns localized.

airbyte_cdk/sources/declarative/requesters/http_job_repository.py (2)

1-2: LGTM: Updated header year.

The 2025 copyright header and spacing look good, wdyt?


13-13: Import-path consolidation verified — no FailureType collisions detected
No imports of FailureType from any airbyte_cdk.sources.declarative.models path were found in this module or nearby packages.

unit_tests/sources/declarative/requesters/error_handlers/backoff_strategies/test_wait_time_from_header.py (1)

14-14: Update stale import of AirbyteTracedException

In airbyte_cdk/sources/file_based/availability_strategy/default_file_based_availability_strategy.py (line 11), replace the import:

- from airbyte_cdk import AirbyteTracedException
+ from airbyte_cdk.utils import AirbyteTracedException

wdyt?

⛔ Skipped due to learnings
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/spec.json:9-15
Timestamp: 2024-11-15T00:59:08.154Z
Learning: When code in `airbyte_cdk/cli/source_declarative_manifest/` is being imported from another repository, avoid suggesting modifications to it during the import process.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
unit_tests/sources/declarative/async_job/test_job_orchestrator.py (1)

23-25: Update stale StreamSlice and AirbyteTracedException imports to new public API

Could you update the following occurrences to use
from airbyte_cdk.sources.types import StreamSlice
and
from airbyte_cdk.utils import AirbyteTracedException?

  • unit_tests/sources/declarative/async_job/test_job.py: line 9
  • unit_tests/sources/declarative/requesters/request_options/test_per_partition_request_option_provider.py: line 9
  • unit_tests/sources/declarative/requesters/request_options/test_datetime_based_request_options_provider.py: line 14
  • unit_tests/sources/declarative/extractors/test_record_filter.py: line 26
  • airbyte_cdk/sources/file_based/stream/default_file_based_stream.py: line 36
  • airbyte_cdk/sources/file_based/stream/abstract_file_based_stream.py: line 32
  • airbyte_cdk/sources/file_based/stream/permissions_file_based_stream.py: line 14
  • airbyte_cdk/sources/file_based/stream/concurrent/adapters.py: line 31
  • airbyte_cdk/sources/file_based/availability_strategy/default_file_based_availability_strategy.py: line 11
⛔ Skipped due to learnings
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (2)

177-177: No changes needed for AirbyteTracedException import
airbyte_cdk/utils/__init__.py re-exports AirbyteTracedException in its __all__, so importing from airbyte_cdk.utils is valid.


47-47: Confirmed export of ConcurrentPerPartitionCursor
ConcurrentPerPartitionCursor is re-exported in airbyte_cdk/sources/declarative/incremental/init.py, so this import is safe.

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)

36-39: Legacy imports alignment — LGTM

Re-routing DeclarativeStream and DatetimeBasedCursor to legacy paths matches the v7 migration goals.


3146-3150: No remaining CustomIncrementalSync references

Code and config searches return no lingering usages; only cdk-migrations.md mentions it (as intended).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py (1)

35-39: Past TODO addressed; any remaining legacy references to prune?

Looks like you’ve moved to concurrent components as planned. Do you want a quick sweep to ensure no tests still import legacy cursors, wdyt?

#!/bin/bash
# Find any lingering legacy cursor usages
rg -nP 'DeclarativeCursor|GlobalSubstreamCursor|PerPartitionWithGlobalCursor|DatetimeBasedCursor|MinMaxDatetime' -g '!**/site-packages/**' -C2
🧹 Nitpick comments (7)
unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py (7)

49-67: Remove unused MockedCursorBuilder.

This helper isn’t referenced anywhere; dropping it will keep the test leaner, wdyt?

-class MockedCursorBuilder:
-    def __init__(self):
-        self._stream_slices = []
-        self._stream_state = {}
-
-    def with_stream_slices(self, stream_slices):
-        self._stream_slices = stream_slices
-        return self
-
-    def with_stream_state(self, stream_state):
-        self._stream_state = stream_state
-        return self
-
-    def build(self):
-        cursor = Mock(spec=Cursor)
-        cursor.get_stream_state.return_value = self._stream_state
-        cursor.stream_slices.return_value = self._stream_slices
-        return cursor

81-98: Prefer None over empty-string for stream_namespace and use stable no-op collaborators.

Using None reads clearer for “no namespace”. Also, since this is a unit test that doesn’t inspect side-effects, NoopMessageRepository() and a real ConnectorStateManager() reduce mocking overhead. Shall we simplify, wdyt?

-    return ConcurrentCursor(
-        stream_name="test",
-        stream_namespace="",
-        stream_state={},
-        message_repository=mocked_message_repository(),
-        connector_state_manager=mocked_connector_state_manager(),
+    return ConcurrentCursor(
+        stream_name="test",
+        stream_namespace=None,
+        stream_state={},
+        message_repository=NoopMessageRepository(),
+        connector_state_manager=ConnectorStateManager(),
         connector_state_converter=state_converter,
         cursor_field=CursorField("created_at"),
         slice_boundary_fields=None,
         start=ab_datetime_parse("2021-01-01"),
         end_provider=state_converter.get_end_provider(),
     )

193-204: Align stream_namespace and collaborators in ConcurrentPerPartitionCursor setup.

For consistency with the earlier factory, would you switch stream_namespace to None and consider NoopMessageRepository()/ConnectorStateManager() to avoid brittle mocks, wdyt?

-    substream_cursor = ConcurrentPerPartitionCursor(
+    substream_cursor = ConcurrentPerPartitionCursor(
         cursor_factory=cursor_factory,
         partition_router=partition_router,
-        stream_name="test",
-        stream_namespace="",
-        stream_state={},
-        message_repository=mocked_message_repository(),
-        connector_state_manager=mocked_connector_state_manager(),
+        stream_name="test",
+        stream_namespace=None,
+        stream_state={},
+        message_repository=NoopMessageRepository(),
+        connector_state_manager=ConnectorStateManager(),
         connector_state_converter=connector_state_converter,
         cursor_field=CursorField(cursor_field_key="updated_at"),
         use_global_cursor=use_global_cursor,
     )

211-235: Avoid asserting on protected attributes; remove duplicate assertion.

Asserting on underscored fields (_cursor_factory, _partition_router, _use_global_cursor) couples tests to internals. Could we expose read-only properties (or validate via behavior) and forward them through the decorator instead, wdyt? Also, the _use_global_cursor check is duplicated.

-    assert wrapped_slicer._cursor_factory == cursor_factory
+    # Prefer public surface; consider adding properties for these if we want to assert identity.
+    assert wrapped_slicer._cursor_factory == cursor_factory
@@
-    assert wrapped_slicer._use_global_cursor == use_global_cursor
+    assert wrapped_slicer._use_global_cursor == use_global_cursor
@@
-    assert wrapped_slicer._use_global_cursor == use_global_cursor
+    # (duplicate) removed; already asserted above

If adding properties is feasible, something like:

# in ConcurrentPerPartitionCursor (production code)
@property
def cursor_factory(self) -> ConcurrentCursorFactory: return self._cursor_factory
@property
def partition_router(self) -> PartitionRouter: return self._partition_router
@property
def use_global_cursor(self) -> bool: return self._use_global_cursor

Then tests can assert wrapped_slicer.cursor_factory is cursor_factory, etc.


15-17: Minor: consistent import aliasing.

Since both classes are used frequently, would aliasing to short names (ConcurrentFactory, PerPartitionCursor) improve readability across assertions, or shall we keep the explicit names for clarity, wdyt?


141-149: Double-check intended isinstance behavior through the decorator.

StreamSlicerTestReadDecorator passing isinstance(..., ConcurrentPerPartitionCursor)/SubstreamPartitionRouter implies intentional type proxying. If that’s the contract, all good; otherwise, consider using isinstance(..., StreamSlicerTestReadDecorator) only to reduce type coupling, wdyt?


35-41: Import MessageRepository/NoopMessageRepository from airbyte_cdk.sources.message.repository for consistency? Some modules (e.g. partition_reader.py, file_based_concurrent_cursor.py) already use the concrete path—unifying the rest can avoid surprises, wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c63ca4a and e5bc81c.

📒 Files selected for processing (1)
  • unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py (7 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
🧬 Code graph analysis (1)
unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py (6)
airbyte_cdk/sources/connector_state_manager.py (1)
  • ConnectorStateManager (32-161)
airbyte_cdk/sources/declarative/incremental/concurrent_partition_cursor.py (4)
  • ConcurrentCursorFactory (94-103)
  • ConcurrentPerPartitionCursor (106-646)
  • cursor_field (194-195)
  • start (81-82)
airbyte_cdk/sources/message/repository.py (2)
  • MessageRepository (45-60)
  • NoopMessageRepository (63-71)
airbyte_cdk/sources/streams/concurrent/cursor.py (2)
  • ConcurrentCursor (135-518)
  • CursorField (40-48)
airbyte_cdk/sources/streams/concurrent/state_converters/datetime_stream_state_converter.py (1)
  • CustomFormatConcurrentStreamStateConverter (193-223)
airbyte_cdk/utils/datetime_helpers.py (1)
  • ab_datetime_parse (361-442)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Analyze (python)
🔇 Additional comments (3)
unit_tests/sources/declarative/stream_slicers/test_stream_slicer_read_decorator.py (3)

119-149: Type assertions around AsyncJobPartitionRouter look good.

These checks accurately validate the decorator/wrapper layering and ensure we’re not accidentally constructing a concurrent cursor here. LGTM.


160-173: SubstreamPartitionRouter assertions look good.

This mirrors the async-job case and correctly avoids concurrent-cursor types here. LGTM.


176-183: Parametrization reads well.

Clear IDs and coverage for both global and non-global cursor scenarios. LGTM.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
airbyte_cdk/sources/declarative/concurrent_declarative_source.py (1)

364-380: Read path ignores incoming state; risks losing incremental progress.

state passed to read(...) is not propagated into the existing ConnectorStateManager nor the factory, so concurrent components likely initialize with empty state. Given entrypoint.read(...) passes state at runtime, this can cause incorrect checkpoints and full re-syncs. Can we hydrate the state manager and factory at read time, before building streams? Example:

 def read(
   self,
   logger: logging.Logger,
   config: Mapping[str, Any],
   catalog: ConfiguredAirbyteCatalog,
   state: Optional[List[AirbyteStateMessage]] = None,
 ) -> Iterator[AirbyteMessage]:
+        # Ensure the concurrent components see the latest runtime state
+        if state is not None:
+            # Reuse the same message repository to preserve queueing behavior
+            self._connector_state_manager = ConnectorStateManager(state=state)
+            self._constructor = ModelToComponentFactory(
+                emit_connector_builder_messages=self._emit_connector_builder_messages,
+                message_repository=self._message_repository,
+                connector_state_manager=self._connector_state_manager,
+                max_concurrent_async_job_count=self._source_config.get("max_concurrent_async_job_count"),
+                limit_pages_fetched_per_slice=self._limits.max_pages_per_slice if self._limits else None,
+                limit_slices_fetched=self._limits.max_slices if self._limits else None,
+                disable_retries=True if self._limits else False,
+                disable_cache=True if self._limits else False,
+            )
 
         selected_concurrent_streams = self._select_streams(
             streams=self.streams(config=self._config),  # type: ignore
             configured_catalog=catalog,
         )
 
         # It would appear that passing in an empty set of streams causes an infinite loop in ConcurrentReadProcessor.
         # This is also evident in concurrent_source_adapter.py so I'll leave this out of scope to fix for now
         if len(selected_concurrent_streams) > 0:
             yield from self._concurrent_source.read(selected_concurrent_streams)
+        else:
+            self.logger.debug("No streams selected in catalog; skipping concurrent read.")
♻️ Duplicate comments (2)
airbyte_cdk/sources/declarative/concurrent_declarative_source.py (2)

381-384: Discover should accept the passed config (duplicate of prior feedback).

Understood you plan to deprecate the config parameter, but until then, callers may expect discover(logger, config) to honor it. Do you want to keep this as-is and document the deprecation, or switch to config or self._config for now? Wdyt?


485-507: Check should degrade gracefully and avoid uncaught ValueError; also pass config and return cleaner messages.

  • Raising ValueError on missing check will bypass entrypoint.check(...)’s exception handling (it only catches AirbyteTracedException), potentially crashing the CLI.
  • Defaulting to CheckStream preserves legacy behavior.
  • create_component(...) should receive self._config to resolve interpolations in the check block.
  • Prefer str(error) over repr(error) for user-facing messages.

Proposed patch:

 def check(self, logger: logging.Logger, config: Mapping[str, Any]) -> AirbyteConnectionStatus:
-        check = self._source_config.get("check")
-        if not check:
-            raise ValueError(f"Missing 'check' component definition within the manifest.")
-
-        if "type" not in check:
-            check["type"] = "CheckStream"
-        connection_checker = self._constructor.create_component(
+        check = dict(self._source_config.get("check", {}))
+        if "type" not in check:
+            check["type"] = "CheckStream"
+        connection_checker = self._constructor.create_component(
             COMPONENTS_CHECKER_TYPE_MAPPING[check["type"]],
             check,
-            dict(),
+            self._config,
             emit_connector_builder_messages=self._emit_connector_builder_messages,
         )
         if not isinstance(connection_checker, ConnectionChecker):
             raise ValueError(
                 f"Expected to generate a ConnectionChecker component, but received {connection_checker.__class__}"
             )
 
-        check_succeeded, error = connection_checker.check_connection(self, logger, self._config)
-        if not check_succeeded:
-            return AirbyteConnectionStatus(status=Status.FAILED, message=repr(error))
+        check_succeeded, error = connection_checker.check_connection(self, logger, self._config)
+        if not check_succeeded:
+            return AirbyteConnectionStatus(status=Status.FAILED, message=str(error))
         return AirbyteConnectionStatus(status=Status.SUCCEEDED)
🧹 Nitpick comments (2)
airbyte_cdk/sources/declarative/concurrent_declarative_source.py (2)

214-221: Pass transformed config to the ConcurrencyLevel component?

self._config is available by this point (Line 212), but the factory here uses config or {}. Using the migrated/transformed config keeps behavior consistent with the rest of the class. Wdyt?

-                config=config or {},
+                config=self._config,

253-255: Docstring return type is wrong.

_pre_process_manifest returns the processed manifest, not None. Update to avoid misleading readers and tooling, wdyt?

-        Returns:
-            None
+        Returns:
+            Dict[str, Any]: The processed manifest with references resolved and types/parameters propagated.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e5bc81c and 723f940.

📒 Files selected for processing (2)
  • airbyte_cdk/sources/declarative/concurrent_declarative_source.py (9 hunks)
  • airbyte_cdk/sources/streams/concurrent/abstract_stream.py (0 hunks)
💤 Files with no reviewable changes (1)
  • airbyte_cdk/sources/streams/concurrent/abstract_stream.py
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
📚 Learning: 2024-11-18T23:40:06.391Z
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.

Applied to files:

  • airbyte_cdk/sources/declarative/concurrent_declarative_source.py
📚 Learning: 2025-01-14T00:20:32.310Z
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#174
File: airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:1093-1102
Timestamp: 2025-01-14T00:20:32.310Z
Learning: In the `airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py` file, the strict module name checks in `_get_class_from_fully_qualified_class_name` (requiring `module_name` to be "components" and `module_name_full` to be "source_declarative_manifest.components") are intentionally designed to provide early, clear feedback when class declarations won't be found later in execution. These restrictions may be loosened in the future if the requirements for class definition locations change.

Applied to files:

  • airbyte_cdk/sources/declarative/concurrent_declarative_source.py
📚 Learning: 2024-11-15T01:04:21.272Z
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.

Applied to files:

  • airbyte_cdk/sources/declarative/concurrent_declarative_source.py
🧬 Code graph analysis (1)
airbyte_cdk/sources/declarative/concurrent_declarative_source.py (4)
airbyte_cdk/sources/source.py (2)
  • Source (55-95)
  • discover (48-52)
airbyte_cdk/sources/abstract_source.py (5)
  • streams (74-79)
  • read (101-211)
  • discover (85-90)
  • check (92-99)
  • check_connection (59-71)
airbyte_cdk/entrypoint.py (3)
  • read (271-283)
  • discover (260-269)
  • check (219-258)
airbyte_cdk/sources/declarative/checks/connection_checker.py (2)
  • ConnectionChecker (12-38)
  • check_connection (18-38)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Check: source-shopify
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (Fast)
🔇 Additional comments (1)
airbyte_cdk/sources/declarative/concurrent_declarative_source.py (1)

127-131: Good move: switch to Source interface is cleaner.

Directly implementing Source simplifies the flow and decouples from AbstractSource. Nice.

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's do this!

@brianjlai
Copy link
Contributor Author

Regression test results run against source-bing-ads, source-gitlab, and source-instagram
airbytehq/airbyte#65612 (comment)

Results look good and based on the changes in the PR would catastrophically fail during a sync if there was an issue. There are a few small mismatches, but nothing beyond what was already being observed. I'll do a final local validation sanity check prior to the merge and release of the version.

@brianjlai brianjlai merged commit c40125a into main Sep 5, 2025
29 checks passed
@brianjlai brianjlai deleted the brian/streamline_cds_move_legacy_components branch September 5, 2025 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants