-
Notifications
You must be signed in to change notification settings - Fork 30
chore(low code): refactor model to component factory #673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
chore(low code): refactor model to component factory #673
Conversation
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@daryna/low-code/refactor-model-to-component-factory#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch daryna/low-code/refactor-model-to-component-factory Helpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
/autofix
|
📝 WalkthroughWalkthroughIntroduces a ComponentConstructor pattern with AdditionalFlags for DI, migrates default declarative components (RecordFilter, QueryProperties, SimpleRetriever) to that pattern, updates ModelToComponentFactory to build ComponentConstructor-based defaults, and adds a ClientSideIncrementalRecordFilterDecorator for cursor-based pre-filtering. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant ModelToComponentFactory
participant ComponentConstructor
participant SimpleRetriever
participant RecordFilter
participant QueryProperties
User->>ModelToComponentFactory: request component (model, config)
ModelToComponentFactory->>ComponentConstructor: build(model, config, dependency_constructor, additional_flags)
ComponentConstructor->>ComponentConstructor: resolve_dependencies(...)
ComponentConstructor->>SimpleRetriever: instantiate (or return LazySimpleRetriever)
ComponentConstructor->>RecordFilter: instantiate
ComponentConstructor->>QueryProperties: instantiate
ComponentConstructor-->>ModelToComponentFactory: constructed component
ModelToComponentFactory-->>User: return component
sequenceDiagram
participant Retriever
participant RecordFilterDecorator
participant Cursor
participant BaseRecordFilter
Retriever->>RecordFilterDecorator: filter_records(records, state, slice, token)
RecordFilterDecorator->>Cursor: should_be_synced(record, state)
Cursor-->>RecordFilterDecorator: keep/drop
alt condition present
RecordFilterDecorator->>BaseRecordFilter: filter_records(cursor_filtered_records, ...)
BaseRecordFilter-->>RecordFilterDecorator: filtered output
RecordFilterDecorator-->>Retriever: final records
else no condition
RecordFilterDecorator-->>Retriever: cursor_filtered_records
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Want a short example showing how to call ModelToComponentFactory.build() for a Pre-merge checks (2 passed, 1 warning)❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal). Please share your feedback with us on this Discord post. ✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
airbyte_cdk/sources/declarative/parsers/component_constructor.py (2)
34-62
: Consider refining the return type annotation, wdyt?The build method logic is solid, but the return type annotation
ComponentConstructor[M]
might be too restrictive. Sincecls(**resolved_dependencies)
returns an instance of the actual subclass (not the base class), would usingSelf
or making it more generic be more accurate?+from typing import Self # Add to imports if using Python 3.11+ ... - ) -> "ComponentConstructor[M]": + ) -> Self:The orchestration logic itself looks great - first resolve dependencies, then build the component.
63-73
: Potential code duplication - is this intentional, wdyt?I notice this
_json_schema_type_name_to_type
method is identical to the one inmodel_to_component_factory.py
(lines 1023-1032). If this is part of moving the functionality to a more appropriate location, that's great! But if it's unintentional duplication, we might want to consolidate them.The implementation itself is solid - handles the None case gracefully and provides the correct type mappings.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
628-630
: Remove unused type variableD
The type variable
D
is defined but never used in the code. Consider removing it to keep the code clean, wdyt?M = TypeVar("M", bound=BaseModel) -D = TypeVar("D", bound=BaseModel)
840-852
: Fix variable type annotationThe
component_instance
variable holds the actual component instance returned bybuild()
, not aComponentConstructor
. Consider updating the type annotation for clarity, wdyt?if inspect.isclass(component) and issubclass(component, ComponentConstructor): # Default components flow - component_instance: ComponentConstructor[BaseModel] = component.build( + component_instance = component.build( model=model, config=config, dependency_constructor=self._create_component_from_model, additional_flags=self._flags, **kwargs, ) return component_instance
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
airbyte_cdk/sources/declarative/extractors/record_filter.py
(2 hunks)airbyte_cdk/sources/declarative/parsers/component_constructor.py
(1 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(6 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#174
File: airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:1093-1102
Timestamp: 2025-01-14T00:20:32.310Z
Learning: In the `airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py` file, the strict module name checks in `_get_class_from_fully_qualified_class_name` (requiring `module_name` to be "components" and `module_name_full` to be "source_declarative_manifest.components") are intentionally designed to provide early, clear feedback when class declarations won't be found later in execution. These restrictions may be loosened in the future if the requirements for class definition locations change.
airbyte_cdk/sources/declarative/parsers/component_constructor.py (2)
Learnt from: aaronsteers
PR: #174
File: airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:1093-1102
Timestamp: 2025-01-14T00:20:32.310Z
Learning: In the airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
file, the strict module name checks in _get_class_from_fully_qualified_class_name
(requiring module_name
to be "components" and module_name_full
to be "source_declarative_manifest.components") are intentionally designed to provide early, clear feedback when class declarations won't be found later in execution. These restrictions may be loosened in the future if the requirements for class definition locations change.
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the declarative_component_schema.py
file is auto-generated from declarative_component_schema.yaml
and should be ignored in the recommended reviewing order.
airbyte_cdk/sources/declarative/extractors/record_filter.py (1)
Learnt from: ChristoGrab
PR: #58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the YamlDeclarativeSource
class in airbyte_cdk/sources/declarative/yaml_declarative_source.py
, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (5)
Learnt from: aaronsteers
PR: #174
File: airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:1093-1102
Timestamp: 2025-01-14T00:20:32.310Z
Learning: In the airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
file, the strict module name checks in _get_class_from_fully_qualified_class_name
(requiring module_name
to be "components" and module_name_full
to be "source_declarative_manifest.components") are intentionally designed to provide early, clear feedback when class declarations won't be found later in execution. These restrictions may be loosened in the future if the requirements for class definition locations change.
Learnt from: ChristoGrab
PR: #58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the YamlDeclarativeSource
class in airbyte_cdk/sources/declarative/yaml_declarative_source.py
, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the declarative_component_schema.py
file is auto-generated from declarative_component_schema.yaml
and should be ignored in the recommended reviewing order.
Learnt from: aaronsteers
PR: #58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in airbyte_cdk/cli/source_declarative_manifest/
, including _run.py
, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
Learnt from: aaronsteers
PR: #58
File: airbyte_cdk/cli/source_declarative_manifest/spec.json:9-15
Timestamp: 2024-11-15T00:59:08.154Z
Learning: When code in airbyte_cdk/cli/source_declarative_manifest/
is being imported from another repository, avoid suggesting modifications to it during the import process.
🧬 Code Graph Analysis (2)
airbyte_cdk/sources/declarative/parsers/component_constructor.py (3)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
ValueType
(1407-1411)airbyte_cdk/sources/declarative/extractors/record_filter.py (1)
resolve_dependencies
(34-46)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
_json_schema_type_name_to_type
(1024-1033)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)
airbyte_cdk/sources/declarative/parsers/component_constructor.py (2)
ComponentConstructor
(17-73)build
(35-61)airbyte_cdk/sources/declarative/extractors/record_filter.py (1)
RecordFilter
(21-68)airbyte_cdk/sources/declarative/models/declarative_component_schema.py (17)
RecordFilter
(1266-1277)Config
(136-137)Config
(150-151)Config
(164-165)Config
(178-179)Config
(196-197)Config
(210-211)Config
(224-225)Config
(238-239)Config
(252-253)Config
(266-267)Config
(280-281)Config
(294-295)Config
(310-311)Config
(324-325)Config
(338-339)Config
(372-373)
🪛 GitHub Actions: Linters
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
[error] 679-4107: Mypy type checking failed with 96 errors including missing type parameters, unexpected keyword arguments, incompatible argument types, incompatible return types, and attribute errors. See mypy output for details.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Check: source-shopify
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
🔇 Additional comments (9)
airbyte_cdk/sources/declarative/parsers/component_constructor.py (2)
1-14
: LGTM! Clean imports and type setup.The imports are well-organized and the type variable
M
bound toBaseModel
provides good type safety for the generic component constructor.
16-32
: Solid foundation for the component constructor pattern.The generic dataclass design with the template method pattern is well-executed. The
resolve_dependencies
method provides a safe default while establishing a clear contract for subclasses to implement their specific dependency resolution logic.airbyte_cdk/sources/declarative/extractors/record_filter.py (3)
5-5
: Clean import additions for the new component constructor pattern.The new imports are well-organized and necessary for integrating with the ComponentConstructor framework. The alias pattern for
RecordFilterModel
is a nice touch to avoid potential naming conflicts.Also applies to: 13-16
21-21
: Excellent integration with the component constructor pattern.The inheritance change from a plain dataclass to
ComponentConstructor[RecordFilterModel]
cleanly integrates RecordFilter into the new declarative component construction framework while maintaining strong type safety.
33-46
: Perfect implementation of the dependency resolution pattern.The
resolve_dependencies
method cleanly extracts the necessary parameters from the model and configuration, with sensible defaults for optional fields. The returned mapping aligns perfectly with the RecordFilter's initialization requirements, and the method signature correctly follows the ComponentConstructor contract.airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4)
633-638
: Great documentation addition!The docstring clearly explains the factory's purpose and the separation between default and custom component implementations.
669-677
: Clean implementation of reusable configuration flagsThe
_flags
dictionary provides a nice way to pass commonly needed configuration to component constructors without repetition.
679-681
: Correct type annotation for dual constructor supportThe updated type annotation properly reflects support for both
ComponentConstructor
subclasses and legacy callable constructors, maintaining backward compatibility.
759-759
: Simplified mapping for RecordFilterGood change to directly use the
RecordFilter
class instead of a factory method, aligning with the newComponentConstructor
pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
828-833
: Return type annotation issue already identified.This matches the previous review comment about the misleading return type annotation. The TODO comment indicates this is intentionally temporary during the transition period, which seems reasonable for now, wdyt?
🧹 Nitpick comments (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
828-858
: Excellent dual-pattern support logic with a small typo to fix!The logic perfectly handles both ComponentConstructor and legacy patterns. I love how it uses the recursive
dependency_constructor
for building nested components and passes theadditional_flags
for configuration sharing.One tiny fix needed - there's a typo on line 857: "Instance shource be" should be "Instance should be", wdyt?
- f"Instance shource be one of ComponentConstructor or method implemented in ModelToComponentFactory" + f"Instance should be one of ComponentConstructor or method implemented in ModelToComponentFactory"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(6 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#174
File: airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:1093-1102
Timestamp: 2025-01-14T00:20:32.310Z
Learning: In the `airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py` file, the strict module name checks in `_get_class_from_fully_qualified_class_name` (requiring `module_name` to be "components" and `module_name_full` to be "source_declarative_manifest.components") are intentionally designed to provide early, clear feedback when class declarations won't be found later in execution. These restrictions may be loosened in the future if the requirements for class definition locations change.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (5)
Learnt from: aaronsteers
PR: #174
File: airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:1093-1102
Timestamp: 2025-01-14T00:20:32.310Z
Learning: In the airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
file, the strict module name checks in _get_class_from_fully_qualified_class_name
(requiring module_name
to be "components" and module_name_full
to be "source_declarative_manifest.components") are intentionally designed to provide early, clear feedback when class declarations won't be found later in execution. These restrictions may be loosened in the future if the requirements for class definition locations change.
Learnt from: ChristoGrab
PR: #58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the YamlDeclarativeSource
class in airbyte_cdk/sources/declarative/yaml_declarative_source.py
, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the declarative_component_schema.py
file is auto-generated from declarative_component_schema.yaml
and should be ignored in the recommended reviewing order.
Learnt from: aaronsteers
PR: #58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in airbyte_cdk/cli/source_declarative_manifest/
, including _run.py
, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
Learnt from: aaronsteers
PR: #58
File: airbyte_cdk/cli/source_declarative_manifest/spec.json:9-15
Timestamp: 2024-11-15T00:59:08.154Z
Learning: When code in airbyte_cdk/cli/source_declarative_manifest/
is being imported from another repository, avoid suggesting modifications to it during the import process.
🧬 Code Graph Analysis (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3)
airbyte_cdk/sources/declarative/parsers/component_constructor.py (2)
ComponentConstructor
(17-73)build
(35-61)airbyte_cdk/sources/declarative/extractors/record_filter.py (1)
RecordFilter
(21-68)airbyte_cdk/sources/declarative/models/declarative_component_schema.py (17)
RecordFilter
(1266-1277)Config
(136-137)Config
(150-151)Config
(164-165)Config
(178-179)Config
(196-197)Config
(210-211)Config
(224-225)Config
(238-239)Config
(252-253)Config
(266-267)Config
(280-281)Config
(294-295)Config
(310-311)Config
(324-325)Config
(338-339)Config
(372-373)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Check: source-intercom
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: source-shopify
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Analyze (python)
🔇 Additional comments (6)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (6)
21-21
: LGTM! Clean addition of generic type support.The TypeVar import and generic type variables M and D are properly defined and bound to BaseModel, which aligns perfectly with the ComponentConstructor pattern. The ComponentConstructor import is also correctly placed. Nice work on maintaining type safety throughout the refactor!
Also applies to: 628-630, 450-450
633-638
: Excellent documentation update!The updated docstring clearly explains the architectural goal of decoupling standard and custom component construction approaches. This helps future maintainers understand the design decisions, wdyt?
669-677
: Smart design pattern for configuration sharing!Creating a centralized
_flags
dictionary to hold reusable factory configuration is an elegant solution. This ensures consistent access to factory settings across all ComponentConstructor instances and reduces the need to pass individual parameters repeatedly. Well thought out!
679-681
: Perfect type annotation for the transition period!The Union type correctly captures that the mapping can contain either new ComponentConstructor classes or legacy callable methods. This provides excellent type safety during the gradual migration while being flexible enough to handle the mixed approach, wdyt?
759-759
: Great migration to the ComponentConstructor pattern!Changing the mapping from a factory method to the RecordFilter class directly is exactly what we want for the new ComponentConstructor pattern. Since RecordFilter now inherits from ComponentConstructor and implements
resolve_dependencies
, this change makes perfect sense and promotes consistency, wdyt?
632-858
: Outstanding refactor execution!This is a textbook example of how to introduce a significant architectural change. The approach:
✅ Maintains backward compatibility during transition
✅ Provides clear type safety with Union types
✅ Centralizes configuration through the_flags
pattern
✅ Uses proper dependency injection with recursivedependency_constructor
✅ Includes helpful documentation and TODO commentsThe gradual migration starting with RecordFilter is smart - it allows testing the pattern before rolling it out broadly. Really well done, wdyt?
/autofix
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (4)
airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py (1)
35-63
: Nice implementation of the resolve_dependencies pattern!The method cleanly handles both direct list and constructed property_list cases. One question though - I noticed that when constructing property_list via dependency_constructor (lines 47-49), the
additional_flags
parameter isn't passed along, unlike some other dependency constructor calls in the codebase. Is this intentional because property_list components don't need flags, or should we include them for consistency? wdyt?airbyte_cdk/sources/declarative/parsers/component_constructor.py (1)
18-43
: Consider using dataclass fields instead of manual initThe
AdditionalFlags
class is decorated with@dataclass
but defines a manual__init__
method. This defeats some benefits of dataclasses like automatic__repr__
,__eq__
, and field declarations. Would it make sense to use dataclass fields directly? wdyt?@dataclass class AdditionalFlags: - def __init__( - self, - emit_connector_builder_messages: bool, - disable_retries: bool, - message_repository: MessageRepository, - connector_state_manager: ConnectorStateManager, - limit_pages_fetched_per_slice: Optional[int], - limit_slices_fetched: Optional[int], - ): - self.emit_connector_builder_messages = emit_connector_builder_messages - self.disable_retries = disable_retries - self.message_repository = message_repository - self.connector_state_manager = connector_state_manager - self.limit_pages_fetched_per_slice = limit_pages_fetched_per_slice - self.limit_slices_fetched = limit_slices_fetched + emit_connector_builder_messages: bool + disable_retries: bool + message_repository: MessageRepository + connector_state_manager: ConnectorStateManager + limit_pages_fetched_per_slice: Optional[int] + limit_slices_fetched: Optional[int]airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
631-632
: Question about theD
TypeVarI notice that
D
is declared but doesn't seem to be used anywhere in this file. TheM
TypeVar makes sense for the generic component pattern, but could we removeD
if it's not needed here, wdyt? This would keep the code cleaner and avoid confusion.
832-862
: Solid implementation of the dual construction pattern!The logic flow correctly handles both ComponentConstructor subclasses and legacy callable constructors. The recursive dependency resolution is well implemented by passing
self._create_component_from_model
as thedependency_constructor
.One small suggestion: could we make the error message in the final ValueError a bit more specific about what types are expected, wdyt? Something like:
- raise ValueError( - f"Unexpected component mapping type for {model.__class__}. " - f"Instance should be one of ComponentConstructor or method implemented in ModelToComponentFactory" - ) + raise ValueError( + f"Unexpected component mapping type for {model.__class__}: {type(component)}. " + f"Expected ComponentConstructor subclass or callable method, got {component}" + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
airbyte_cdk/sources/declarative/extractors/record_filter.py
(2 hunks)airbyte_cdk/sources/declarative/parsers/component_constructor.py
(1 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(6 hunks)airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py
(3 hunks)airbyte_cdk/sources/declarative/retrievers/simple_retriever.py
(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte_cdk/sources/declarative/extractors/record_filter.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (5)
airbyte_cdk/sources/declarative/parsers/component_constructor.py (3)
AdditionalFlags
(19-42)ComponentConstructor
(46-102)build
(64-90)airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py (1)
QueryProperties
(21-84)airbyte_cdk/sources/declarative/models/declarative_component_schema.py (19)
QueryProperties
(2770-2787)RecordFilter
(1266-1277)SimpleRetriever
(2806-2860)Config
(136-137)Config
(150-151)Config
(164-165)Config
(178-179)Config
(196-197)Config
(210-211)Config
(224-225)Config
(238-239)Config
(252-253)Config
(266-267)Config
(280-281)Config
(294-295)Config
(310-311)Config
(324-325)Config
(338-339)Config
(372-373)airbyte_cdk/sources/declarative/extractors/record_filter.py (1)
RecordFilter
(24-71)airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (2)
SimpleRetriever
(93-962)build
(409-432)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Check: source-shopify
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: source-intercom
- GitHub Check: Check: destination-motherduck
- GitHub Check: Pytest (Fast)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Analyze (python)
🔇 Additional comments (10)
airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py (1)
4-12
: LGTM! Clean import organizationThe imports are well-organized, properly distinguishing between model schemas and construction utilities.
airbyte_cdk/sources/declarative/parsers/component_constructor.py (2)
45-91
: Clean template method pattern implementation!The
ComponentConstructor
base class provides a nice abstraction for declarative component construction. The separation betweenresolve_dependencies
andbuild
is well thought out. One minor observation - the return type annotation on line 71 showsComponentConstructor[M]
, but it actually returns an instance of the concrete subclass (e.g.,QueryProperties
,SimpleRetriever
). This is technically correct due to inheritance, but might be worth noting in the docstring for clarity? wdyt?
92-103
: Useful type conversion helperThe
_json_schema_type_name_to_type
method provides a clean mapping from JSON schema types to Python types. Good defensive programming with the None check.airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (3)
93-93
: Good adoption of the ComponentConstructor patternThe SimpleRetriever now properly implements the ComponentConstructor interface, enabling standardized dependency resolution.
206-224
: Clean closure implementation for URL retrievalThe
_get_url
closure elegantly handles the deferred URL resolution needed for lazy retriever scenarios.
272-328
: Thorough query properties resolution with good error handlingThe method carefully handles all possible sources of query properties configuration and provides clear error messages for conflicting definitions. The backward compatibility consideration (line 304) is appreciated.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4)
450-453
: LGTM on the new imports!These imports are essential for the ComponentConstructor pattern being introduced. The imports are clean and properly organized within the existing import structure.
672-680
: Nice centralization of flags!The
_flags
initialization looks solid and follows the ComponentConstructor pattern well. All the necessary configuration flags are included and will be properly passed to component constructors for dependency resolution.
683-685
: Great refactor to the ComponentConstructor pattern!The type hint update to support both
ComponentConstructor
subclasses and callable factory methods is well thought out. The specific entries forQueryProperties
,RecordFilter
, andSimpleRetriever
correctly point to their respective classes that now inherit fromComponentConstructor
. This maintains backward compatibility while enabling the new pattern.Also applies to: 762-763, 770-770
636-640
: Excellent documentation of the architectural change!The updated class docstring clearly explains the new dual construction pattern and the reasoning behind decoupling standard and custom component construction. This will help other developers understand the architectural direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (2)
135-135
: Past review concern still applies regarding class-level state modificationThis issue was previously identified - the class-level
_should_use_lazy_simple_retriever
attribute creates shared state problems across instances. The validation method modifies this class attribute, which could lead to race conditions. wdyt about making this an instance variable or passing it through the resolution process instead?
408-432
: Critical issue from past review still exists - LazySimpleRetriever created but not returnedThe build method has the same issue previously identified: when
_should_use_lazy_simple_retriever
is True, aLazySimpleRetriever
is created on line 431 but discarded, and a regularSimpleRetriever
is always returned instead. Shouldn't we return the lazy variant when the conditions are met? wdyt?if cls._should_use_lazy_simple_retriever: - LazySimpleRetriever(**resolved_dependencies) - return SimpleRetriever(**resolved_dependencies) + return LazySimpleRetriever(**resolved_dependencies) + return SimpleRetriever(**resolved_dependencies)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py
(5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Check: source-intercom
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: source-shopify
- GitHub Check: Pytest (Fast)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Analyze (python)
🔇 Additional comments (5)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (5)
13-13
: LGTM on the imports and class declaration changes!The new imports and the updated class declaration to implement
ComponentConstructor[SimpleRetrieverModel]
properly support the component factory pattern refactor. The imports are well-organized and necessary for the new functionality.Also applies to: 22-22, 26-27, 31-31, 33-67, 78-78, 80-80, 82-82, 93-93
178-206
: Method signature and structure look well-designedThe
resolve_dependencies
method signature properly follows the ComponentConstructor pattern with comprehensive parameter support. The early validation for the requiredname
parameter is good defensive programming. The method structure with helper functions promotes readability. Nice work!
206-256
: Helper functions are well-craftedThe helper functions within
resolve_dependencies
are nicely designed:
- The
_get_url()
closure provides lazy URL resolutionquery_properties_in_request_parameters()
handles detection logic cleanly_get_log_formatter()
appropriately respects the slice limiting flagsThe single-responsibility principle is well-applied here. Great approach! wdyt?
272-328
: Query properties resolution logic is robustThe query properties resolution handles the complexity of multiple potential sources very well. The conflict detection between
fetch_properties_from_endpoint
andrequest_parameters
with clear error messages is excellent defensive programming. The deprecation note for the legacy approach is helpful for maintainers. Solid implementation! wdyt?
329-406
: Component construction and dependency resolution is comprehensiveThe component construction logic handles the various edge cases well:
- Cursor assignment from stream_slicer is appropriate
- The DatetimeBasedCursor special handling for request_options_provider makes sense
- Conditional application of StreamSlicerTestReadDecorator respects the flags properly
- The lazy URL evaluation in paginator construction is clever
- Early return for lazy retriever case maintains the optimization
The final dependency dictionary construction is thorough and complete. Great work on this complex orchestration! wdyt?
/autofix
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1)
422-429
: Duplicate comment from previous review - lazy retriever creation issue was addressed.This issue where LazySimpleRetriever was created but not returned has already been resolved in the current implementation. The build method now correctly returns the LazySimpleRetriever when conditions are met.
🧹 Nitpick comments (1)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1)
176-253
: The closure pattern for URL retrieval is clever - wdyt about adding documentation?The
_get_url()
closure and helper functions are well-designed for lazy evaluation scenarios. The logic handles both direct URL access and fallback to requester methods appropriately. Would you consider adding a bit more documentation to the_get_log_formatter
function to clarify when the log formatter is applied vs when it returns None?def _get_log_formatter( log_formatter: Callable[[Response], Any] | None, name: str, additional_flags: AdditionalFlags, ) -> Callable[[Response], Any] | None: + """ + Returns a log formatter for test slice limiting scenarios. + Only applies formatting when slices are being limited for testing. + """ if additional_flags.should_limit_slices_fetched:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(6 hunks)airbyte_cdk/sources/declarative/retrievers/simple_retriever.py
(5 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (5)
airbyte_cdk/sources/declarative/parsers/component_constructor.py (3)
AdditionalFlags
(19-42)ComponentConstructor
(46-102)build
(64-90)airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py (1)
QueryProperties
(21-84)airbyte_cdk/sources/declarative/models/declarative_component_schema.py (19)
QueryProperties
(2770-2787)RecordFilter
(1266-1277)SimpleRetriever
(2806-2860)Config
(136-137)Config
(150-151)Config
(164-165)Config
(178-179)Config
(196-197)Config
(210-211)Config
(224-225)Config
(238-239)Config
(252-253)Config
(266-267)Config
(280-281)Config
(294-295)Config
(310-311)Config
(324-325)Config
(338-339)Config
(372-373)airbyte_cdk/sources/declarative/extractors/record_filter.py (1)
RecordFilter
(24-71)airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (2)
SimpleRetriever
(93-959)build
(401-429)
🪛 GitHub Actions: Linters
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py
[error] 423-423: mypy: Argument "name" to "_should_use_lazy_simple_retriever" of "SimpleRetriever" has incompatible type "Any | None"; expected "str" [arg-type]
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Check: source-shopify
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: source-pokeapi
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Analyze (python)
🔇 Additional comments (10)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (5)
13-87
: Import additions look well-structured for the new component factory pattern.The new imports support the refactoring to use ComponentConstructor and related models. The imports are logically organized and all appear necessary for the new functionality. Good work on keeping the imports clean!
93-93
: Clean inheritance from ComponentConstructor - nice architectural improvement!The class now properly inherits from
ComponentConstructor[SimpleRetrieverModel]
, which aligns perfectly with the component factory pattern described in the PR objectives. This sets up the foundation for standardized dependency resolution across declarative components.
136-174
: The lazy retriever validation logic is comprehensive and well-structured.The conditions for applying the lazy retriever are thorough - checking partition router type, stream state, lazy read pointers, incremental sync constraints, and decoder type. The error messages are clear and informative. This addresses the previous concerns about class-level state modification by returning a boolean instead of modifying class state.
255-331
: Excellent dependency resolution for decoder, record selector, and query properties handling.The logic for handling query properties from multiple locations (request_parameters, fetch_properties_from_endpoint, query_properties) is robust with good error handling for conflicting definitions. The fallback to JsonDecoder when no decoder is specified is sensible. The validation preventing multiple QueryProperties definitions is a great safeguard.
332-399
: Smart handling of cursor and request options provider relationships.The logic correctly identifies when to use the stream_slicer as the request_options_provider for custom DatetimeBasedCursor implementations, while defaulting appropriately for standard cases. The slice limiting logic for tests is well-integrated. The early return for lazy retrievers with base dependencies is a clean optimization.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (5)
21-21
: LGTM! New imports and type variables are well-structured.The addition of
TypeVar
import and the newComponentConstructor
/AdditionalFlags
imports align perfectly with the refactoring goals. The type variablesM
andD
are appropriately bound toBaseModel
and follow good naming conventions, wdyt?Also applies to: 450-453, 631-632
636-641
: Nice documentation update!The enhanced docstring clearly explains the separation between default and custom component construction patterns. This helps developers understand the architectural changes, wdyt?
672-680
: Excellent abstraction with AdditionalFlags!Creating a reusable container for common configuration flags is a clean design pattern. The selected flags (emit_connector_builder_messages, disable_retries, etc.) are exactly what components would need during construction. This should make dependency injection much cleaner, wdyt?
683-685
: Perfect transition to the dual constructor pattern!The updated type annotation correctly captures that the mapping now supports both
ComponentConstructor
subclasses and legacy factory methods. The specific changes forQueryPropertiesModel
,RecordFilterModel
, andSimpleRetrieverModel
show a clean migration to the new pattern - mapping directly to classes instead of factory methods. This maintains backward compatibility while enabling the new approach, wdyt?Also applies to: 762-763, 770-770
832-862
: Excellent implementation of the dual construction pattern!This refactored method elegantly handles both the new
ComponentConstructor
pattern and legacy factory methods. The logic flow is very clean:
- Type inspection to detect
ComponentConstructor
subclasses- Calling
.build()
with proper dependency injection for new components- Fallback to legacy factory methods
- Comprehensive error handling
The recursive dependency construction via
self._create_component_from_model
is particularly well-designed. The past review comment about the return type annotation was correctly addressed, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1)
178-400
: Comprehensive dependency resolution, but wondering about decomposition?This method handles the dependency orchestration really well - I can see how it resolves all the complex interdependencies between components. The helper functions like
_get_url
andquery_properties_in_request_parameters
help organize the logic nicely.The query properties resolution logic is particularly thorough, handling conflicts between different definition sources with clear error messages. The early return pattern for lazy retrievers is also clean.
That said, at ~220 lines, this method is doing quite a lot. Would you consider breaking it down into smaller, more focused methods? Something like
_resolve_query_properties
,_resolve_core_components
,_resolve_pagination_components
? It might make testing and maintenance easier. wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py
(5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: source-shopify
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Analyze (python)
🔇 Additional comments (3)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (3)
13-93
: LGTM! Clean imports and class signature update.The import additions support the new component factory pattern nicely, and the class signature change to inherit from
ComponentConstructor[SimpleRetrieverModel]
aligns perfectly with the refactor objectives. The imports are well-organized and necessary for the new functionality.
136-177
: Nice validation logic for lazy retriever conditions!The method does a great job ensuring LazySimpleRetriever is only used when all constraints are met. The validation for incremental sync type, decoder type, and parent stream configurations looks comprehensive. I appreciate how it raises clear error messages when constraints are violated - this will help developers understand the limitations quickly. wdyt?
402-432
: Clean build method implementation!The build method follows the ComponentConstructor pattern perfectly - delegating to
resolve_dependencies
and then making the right decision about whether to return a LazySimpleRetriever or regular SimpleRetriever. The flow is clear and the implementation looks solid. wdyt?
/autofix
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change makes sense to me. Thanks for cleaning stuff up! Just a couple of questions/nits but I'll approve anyway
**kwargs, | ||
) | ||
return component_instance | ||
if inspect.ismethod(component) or inspect.isfunction(component): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: elif here to clarify that both if clauses are mutually exclusive
**kwargs: Any, | ||
) -> Mapping[str, Any]: | ||
return { | ||
"condition": model.condition or "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to better understand the pattern: In an ideal world where this isn't a dataclass that exposes everything as public, would the logic of having the condition as InterpolatedBoolean
be done in resolve_dependencies
or would we still have this in the init?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for now we will still have this in __init__
just to have less thing to refactor and avoid missing something. But it's a good idea and I think we consider changing the place of interpolation here or in __init__
return cls(**resolved_dependencies) | ||
|
||
@staticmethod | ||
def _json_schema_type_name_to_type(value_type: Optional[ValueType]) -> Optional[Type[Any]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the usage of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is used for AddedFieldDefinition
, probably it is better to add this method only for the AddedFieldDefinition
class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh ok! So in the PoC PR (or whatever we call this PR), we are using it but in this PR specifically, it is not used yet, is that it? I think I'm fine with having this here then. Thanks for the added context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
831-861
: Excellent dynamic dispatch implementation!The _create_component_from_model refactor handles both patterns elegantly:
- ComponentConstructor subclasses use the new build() method with dependency injection
- Legacy callables maintain backward compatibility
The type checking with
inspect.isclass
andissubclass
is robust, and the error messages are clear. One small suggestion - would it make sense to useelif
on line 856 instead ofif
to clarify the mutual exclusivity, as mentioned in the past review comments?- if inspect.ismethod(component) or inspect.isfunction(component): + elif inspect.ismethod(component) or inspect.isfunction(component):This would make the flow more explicit, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
airbyte_cdk/sources/declarative/extractors/record_filter.py
(2 hunks)airbyte_cdk/sources/declarative/parsers/component_constructor.py
(1 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(6 hunks)airbyte_cdk/sources/declarative/retrievers/simple_retriever.py
(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte_cdk/sources/declarative/parsers/component_constructor.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-01-14T00:20:32.310Z
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#174
File: airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:1093-1102
Timestamp: 2025-01-14T00:20:32.310Z
Learning: In the `airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py` file, the strict module name checks in `_get_class_from_fully_qualified_class_name` (requiring `module_name` to be "components" and `module_name_full` to be "source_declarative_manifest.components") are intentionally designed to provide early, clear feedback when class declarations won't be found later in execution. These restrictions may be loosened in the future if the requirements for class definition locations change.
Applied to files:
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
📚 Learning: 2024-12-11T16:34:46.319Z
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.
Applied to files:
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
🧬 Code Graph Analysis (3)
airbyte_cdk/sources/declarative/extractors/record_filter.py (3)
airbyte_cdk/sources/declarative/interpolation/interpolated_boolean.py (1)
InterpolatedBoolean
(29-66)airbyte_cdk/sources/declarative/models/declarative_component_schema.py (17)
RecordFilter
(1266-1277)Config
(136-137)Config
(150-151)Config
(164-165)Config
(178-179)Config
(196-197)Config
(210-211)Config
(224-225)Config
(238-239)Config
(252-253)Config
(266-267)Config
(280-281)Config
(294-295)Config
(310-311)Config
(324-325)Config
(338-339)Config
(372-373)airbyte_cdk/sources/declarative/parsers/component_constructor.py (3)
AdditionalFlags
(19-42)ComponentConstructor
(46-90)resolve_dependencies
(48-61)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (16)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (23)
JsonDecoder
(864-865)DatetimeBasedCursor
(1801-1909)HttpRequester
(2542-2681)IncrementingCountCursor
(1779-1798)QueryProperties
(2772-2789)SimpleRetriever
(2808-2862)SubstreamPartitionRouter
(2971-2978)Config
(136-137)Config
(150-151)Config
(164-165)Config
(178-179)Config
(196-197)Config
(210-211)Config
(224-225)Config
(238-239)Config
(252-253)Config
(266-267)Config
(280-281)Config
(294-295)Config
(310-311)Config
(324-325)Config
(338-339)Config
(372-373)airbyte_cdk/sources/declarative/decoders/json_decoder.py (1)
JsonDecoder
(19-47)airbyte_cdk/sources/declarative/incremental/datetime_based_cursor.py (2)
DatetimeBasedCursor
(28-448)get_stream_state
(130-131)airbyte_cdk/sources/declarative/incremental/resumable_full_refresh_cursor.py (2)
ResumableFullRefreshCursor
(12-93)get_stream_state
(18-19)airbyte_cdk/sources/declarative/requesters/http_requester.py (1)
HttpRequester
(37-484)airbyte_cdk/sources/declarative/partition_routers/substream_partition_router.py (2)
SubstreamPartitionRouter
(80-422)get_stream_state
(397-418)airbyte_cdk/sources/declarative/parsers/component_constructor.py (5)
AdditionalFlags
(19-42)ComponentConstructor
(46-90)resolve_dependencies
(48-61)should_limit_slices_fetched
(37-42)build
(64-90)airbyte_cdk/sources/declarative/retrievers/file_uploader/default_file_uploader.py (1)
DefaultFileUploader
(32-97)airbyte_cdk/sources/declarative/retrievers/retriever.py (1)
Retriever
(13-54)airbyte_cdk/sources/declarative/stream_slicers/stream_slicer_test_read_decorator.py (1)
StreamSlicerTestReadDecorator
(14-28)airbyte_cdk/sources/declarative/stream_slicers/stream_slicer.py (1)
StreamSlicer
(15-25)airbyte_cdk/sources/http_logger.py (1)
format_http_message
(12-51)airbyte_cdk/sources/declarative/manifest_declarative_source.py (1)
streams
(300-332)airbyte_cdk/sources/declarative/partition_routers/single_partition_router.py (2)
get_stream_state
(59-63)SinglePartitionRouter
(13-63)airbyte_cdk/sources/declarative/requesters/request_options/request_options_provider.py (1)
RequestOptionsProvider
(13-79)airbyte_cdk/sources/declarative/requesters/request_options/default_request_options_provider.py (1)
DefaultRequestOptionsProvider
(15-60)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4)
airbyte_cdk/sources/declarative/parsers/component_constructor.py (3)
AdditionalFlags
(19-42)ComponentConstructor
(46-90)build
(64-90)airbyte_cdk/sources/declarative/models/declarative_component_schema.py (19)
QueryProperties
(2772-2789)RecordFilter
(1266-1277)SimpleRetriever
(2808-2862)Config
(136-137)Config
(150-151)Config
(164-165)Config
(178-179)Config
(196-197)Config
(210-211)Config
(224-225)Config
(238-239)Config
(252-253)Config
(266-267)Config
(280-281)Config
(294-295)Config
(310-311)Config
(324-325)Config
(338-339)Config
(372-373)airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py (1)
QueryProperties
(21-84)airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (2)
SimpleRetriever
(94-939)build
(403-431)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-shopify
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Pytest (Fast)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Analyze (python)
🔇 Additional comments (12)
airbyte_cdk/sources/declarative/extractors/record_filter.py (4)
8-14
: LGTM! Imports align well with the refactoring pattern.The imports properly support the component constructor pattern with AdditionalFlags and ComponentConstructor, and the RecordFilterModel alias creates clean separation between the model and implementation.
20-20
: Component inheritance looks good!The change from dataclass to ComponentConstructor[RecordFilterModel] inheritance properly implements the new factory pattern while maintaining the existing functionality.
32-45
: Solid dependency resolution implementation!The resolve_dependencies method cleanly extracts the required dependencies from the model and provides sensible defaults. The mapping structure is clear and follows the expected pattern.
70-109
: Great addition of the decorator pattern!The ClientSideIncrementalRecordFilterDecorator is well-implemented with proper cursor integration. The conditional chaining logic in filter_records (lines 102-108) nicely handles the case where both cursor filtering and condition filtering are needed. The cursor pre-filtering followed by optional condition filtering makes sense, wdyt?
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (4)
94-94
: Perfect inheritance update!The multiple inheritance from both Retriever and ComponentConstructor[SimpleRetrieverModel] properly supports the new factory pattern while maintaining backward compatibility.
137-177
: Comprehensive lazy retriever validation logic!The _should_use_lazy_simple_retriever method has thorough validation with clear error messages. The conditions for lazy retriever usage are well-defined:
- SubstreamPartitionRouter with lazy_read_pointer
- No existing stream state
- DatetimeBasedCursor only
- JsonDecoder only
- Single slice constraint
The validation provides helpful error messages for each constraint violation, which will make debugging easier for users, wdyt?
179-400
: Excellent dependency resolution orchestration!The resolve_dependencies method is comprehensive and well-structured. I particularly like:
- The closure pattern for _get_url() to handle lazy URL resolution
- The query properties conflict detection and resolution
- The conditional logic for request_options_provider based on cursor types
- The clean separation between lazy and non-lazy dependency sets
The method handles all the complex interdependencies between decoder, requester, query properties, and cursor components very elegantly.
402-431
: Clean build method implementation!The build method properly orchestrates the dependency resolution and correctly returns either LazySimpleRetriever or SimpleRetriever based on the lazy conditions. The flow is clear and follows the expected ComponentConstructor pattern.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4)
630-639
: Nice documentation and type variable additions!The class docstring clearly explains the separation between default and custom component construction approaches. The type variables M and D provide better type safety for the generic patterns used throughout the factory.
671-679
: Smart addition of AdditionalFlags integration!The _flags attribute centralizes all the reusable configuration flags needed for dependency injection with default components. This creates a clean abstraction for passing runtime configuration down to component constructors.
682-684
: Type annotation improvement looks good!The updated mapping type
Dict[Type[BaseModel], Union[Type[ComponentConstructor[Any]], Callable[..., Any]]]
properly reflects that mappings can now point to either component classes or legacy callables.
761-762
: Clean migration to class-based mappings!The direct mapping to QueryProperties, RecordFilter, and SimpleRetriever classes (instead of factory methods) properly implements the new ComponentConstructor pattern. This creates cleaner separation and enables the dependency injection flow.
Also applies to: 769-769
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
1914-1933
: Ensure parameter inference works for ComponentConstructor classesWhen the mapping value is a class that implements
ComponentConstructor
, usinggetfullargspec
on the class can miss keyword-only args accepted bybuild(...)
. Shall we detect that case and introspectbuild
instead so$parameters
and parent kwargs are forwarded correctly, wdyt?Apply:
- model_constructor = self.PYDANTIC_MODEL_TO_CONSTRUCTOR.get(parsed_model.__class__) - constructor_kwargs = inspect.getfullargspec(model_constructor).kwonlyargs + model_constructor = self.PYDANTIC_MODEL_TO_CONSTRUCTOR.get(parsed_model.__class__) + if inspect.isclass(model_constructor) and issubclass(model_constructor, ComponentConstructor): + sig = inspect.signature(model_constructor.build) + constructor_kwargs = [ + name for name, p in sig.parameters.items() if p.kind == inspect.Parameter.KEYWORD_ONLY + ] + else: + constructor_kwargs = inspect.getfullargspec(model_constructor).kwonlyargsairbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1)
974-975
: Fix infinite recursion in LazySimpleRetriever fallback path
self._read_pages(...)
calls itself in theelse
branch, leading to infinite recursion whenchild_response
is falsy. Should this delegate to the parent implementation viasuper()
, wdyt?Apply:
- else: - yield from self._read_pages(records_generator_fn, stream_state, stream_slice) + else: + yield from super()._read_pages(records_generator_fn, stream_state, stream_slice)
♻️ Duplicate comments (2)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (2)
395-402
: Guard and pass a stringname
to satisfy mypy and runtime intentLet’s validate
name
before use and pass a guaranteedstr
to_should_use_lazy_simple_retriever
, wdyt?Apply:
- if cls._should_use_lazy_simple_retriever( + name = kwargs.get("name") + if not isinstance(name, str): + raise ValueError("name must be a string") + if cls._should_use_lazy_simple_retriever( model=model, additional_flags=additional_flags, incremental_sync=kwargs.get("incremental_sync"), - name=kwargs.get("name"), + name=name, ): return LazySimpleRetriever(**resolved_dependencies)
133-141
: Type mismatch causes mypy failure (name must be str)
name
is Optional here, but it’s used to callget_stream_state(name, ...)
which expectsstr
. This is the mypy error flagged by CI. Can we makename
a requiredstr
, wdyt?Apply:
- name: Optional[str] = None, + name: str,
🧹 Nitpick comments (5)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
652-654
: Drop unused TypeVar D
D
is defined but unused. Remove to avoid dead code, wdyt?Apply:
-M = TypeVar("M", bound=BaseModel) -D = TypeVar("D", bound=BaseModel) +M = TypeVar("M", bound=BaseModel)
2946-2953
: Make wildcard check robust to InterpolatedString
lazy_read_pointer
elements can beInterpolatedString
. Converting tostr
avoids a possible type issue. Shall we cast tostr
, wdyt?Apply:
- if model.lazy_read_pointer and any("*" in pointer for pointer in model.lazy_read_pointer): + if model.lazy_read_pointer and any("*" in str(pointer) for pointer in model.lazy_read_pointer):airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (3)
195-206
: Nit: grammar in error message“more that one slice per parent” → “more than one slice per parent”. Quick copyedit, wdyt?
Apply:
- f"Found more that one slice per parent. LazySimpleRetriever only supports single slice read for stream - {name}." + f"Found more than one slice per parent. LazySimpleRetriever only supports single slice read for stream - {name}."
183-184
: Correct type for transformations
transformations
is a list everywhere it’s passed/used. Shall we update the annotation to reflect that, wdyt?Apply:
- transformations: Optional[RecordTransformation] = None, + transformations: Optional[List[RecordTransformation]] = None,
227-246
: Optional: preserve custom log_formatter when not limiting slicesCurrently the helper returns
None
unlessshould_limit_slices_fetched
is True, discarding a user-providedlog_formatter
. Do we want to return the provided formatter in the else branch to honor explicit configuration, wdyt?Apply:
- return None + return log_formatter
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(6 hunks)airbyte_cdk/sources/declarative/retrievers/simple_retriever.py
(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4)
airbyte_cdk/sources/declarative/parsers/component_constructor.py (3)
AdditionalFlags
(19-42)ComponentConstructor
(46-90)build
(64-90)airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py (1)
QueryProperties
(21-84)airbyte_cdk/sources/declarative/extractors/record_filter.py (1)
RecordFilter
(20-67)airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (2)
SimpleRetriever
(89-916)build
(374-402)
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (7)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (23)
JsonDecoder
(846-847)CustomRequester
(219-230)HttpRequester
(2520-2659)QueryProperties
(2750-2767)SimpleRetriever
(2786-2840)SubstreamPartitionRouter
(2949-2956)Config
(136-137)Config
(150-151)Config
(164-165)Config
(178-179)Config
(192-193)Config
(206-207)Config
(220-221)Config
(234-235)Config
(248-249)Config
(262-263)Config
(276-277)Config
(292-293)Config
(306-307)Config
(320-321)Config
(354-355)Config
(378-379)NoPagination
(1004-1005)airbyte_cdk/sources/declarative/decoders/json_decoder.py (1)
JsonDecoder
(19-47)airbyte_cdk/sources/declarative/requesters/http_requester.py (4)
HttpRequester
(37-484)_get_url
(145-179)get_url
(130-143)get_url_base
(181-193)airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py (2)
QueryProperties
(21-84)resolve_dependencies
(36-62)airbyte_cdk/sources/declarative/parsers/component_constructor.py (5)
AdditionalFlags
(19-42)ComponentConstructor
(46-90)resolve_dependencies
(48-61)should_limit_slices_fetched
(37-42)build
(64-90)airbyte_cdk/sources/declarative/partition_routers/single_partition_router.py (2)
SinglePartitionRouter
(13-57)get_stream_state
(53-57)airbyte_cdk/sources/declarative/requesters/request_options/default_request_options_provider.py (1)
DefaultRequestOptionsProvider
(15-60)
🪛 GitHub Actions: Linters
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
[error] 5-5: Ruff: Import block is unsorted or unformatted. (I001).
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py
[error] 145-145: Mypy type-check failed. Command: poetry run mypy --config-file mypy.ini airbyte_cdk. Argument 1 to 'get_stream_state' of 'ConnectorStateManager' has incompatible type 'str | None'; expected 'str' [arg-type].
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: Check: source-intercom
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: source-shopify
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (Fast)
🔇 Additional comments (3)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
780-789
: Nice switch to DI-backed constructorsMapping
QueryPropertiesModel
,RecordFilterModel
, andSimpleRetrieverModel
directly to theirComponentConstructor
classes looks good and aligns with the new DI pattern. 👍
635-638
: Fix Ruff import-order violation (I001)Poetry wasn't available in the sandbox ("poetry: command not found"), so I couldn't run ruff here. Could you run one of these locally to auto-fix (or move the import into the existing airbyte_cdk.sources.declarative.* import block), wdyt?
- poetry install && poetry run ruff check --fix airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
- ruff check --fix airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
File: airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (lines 635–638).
airbyte_cdk/sources/declarative/retrievers/simple_retriever.py (1)
145-145
: Re-run mypy/ruff after Optional typing fix in get_stream_statemypy failure is at: and not bool(additional_flags.connector_state_manager.get_stream_state(name, None)) — Optional[str] may be passed to get_stream_state. Local verification failed with "/bin/bash: line 3: poetry: command not found". Can you re-run the provided checks in an environment with poetry (or run
python -m mypy --config-file mypy.ini airbyte_cdk/sources/declarative/retrievers/simple_retriever.py
andruff check airbyte_cdk/sources/declarative/retrievers/simple_retriever.py
) to confirm green? wdyt?
based on airbytehq/airbyte#43322
Summary by CodeRabbit
New Features
Refactor