-
Notifications
You must be signed in to change notification settings - Fork 31
feat: extend spec class for config migrations #538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: extend spec class for config migrations #538
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR extends the Spec class by adding support for configuration migrations, transformations, and validations. Key changes include:
- Adding new optional fields (config_migrations, config_transformations, config_validations) and a message repository to the Spec class.
- Introducing new methods to migrate, transform, and validate the configuration.
- Updating the component factory (ModelToComponentFactory) to propagate the new fields from the normalization rules.
Reviewed Changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| airbyte_cdk/sources/declarative/spec/spec.py | Extended Spec with new config migration/transformation/validation fields and corresponding methods. |
| airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py | Updated create_spec to forward new normalization rules fields. |
| pyproject.toml | Added extra dependencies. |
📝 WalkthroughWalkthroughThis change introduces a declarative framework for configuration migration, transformation, and validation in Airbyte's CDK. It extends the specification schema and models to support config normalization rules, implements corresponding runtime logic in the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Spec
participant ConfigMigration
participant ConfigTransformation
participant Validator
participant MessageRepository
User->>Spec: migrate_config(args, source, config)
Spec->>ConfigMigration: Apply transformations to config
ConfigMigration-->>Spec: Return migrated config
alt Config changed
Spec->>MessageRepository: Emit config control message
Spec->>User: Write migrated config to file
Spec->>User: Print emitted messages
end
User->>Spec: transform_config(config)
Spec->>ConfigTransformation: Apply all transformations
ConfigTransformation-->>Spec: Return transformed config
User->>Spec: validate_config(config)
Spec->>Validator: Run all validations
Validator-->>Spec: Raise if invalid, else return
Would you like a breakdown of how these new config normalization rules could be composed in a real-world connector spec, or perhaps an example migration scenario? Wdyt? Note ⚡️ AI Code Reviews for VS Code, Cursor, WindsurfCodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback. Note ⚡️ Faster reviews with cachingCodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure ✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🔭 Outside diff range comments (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
3469-3478:⚠️ Potential issueCritical: Fix
create_specto matchSpecsignature and handle optional rules.
TheSpecconstructor now expectsconfig_transformationsandconfig_validations(nottransformations/validations), andmodel.config_normalization_rulesmay beNone, causing attribute errors. Could we update this block accordingly? For example:- return Spec( - connection_specification=model.connection_specification, - documentation_url=model.documentation_url, - advanced_auth=model.advanced_auth, - parameters={}, - config_migrations=model.config_normalization_rules.config_migrations, - transformations=model.config_normalization_rules.transformations, - validations=model.config_normalization_rules.validations, - ) + return Spec( + connection_specification=model.connection_specification, + documentation_url=model.documentation_url, + advanced_auth=model.advanced_auth, + parameters={}, + config_migrations=model.config_normalization_rules.config_migrations if model.config_normalization_rules else [], + config_transformations=model.config_normalization_rules.transformations if model.config_normalization_rules else [], + config_validations=model.config_normalization_rules.validations if model.config_normalization_rules else [], + )This change should resolve the mypy errors about unexpected keyword arguments and guard against
Nonenormalization rules. wdyt?🧰 Tools
🪛 GitHub Actions: Linters
[error] 3469-3469: mypy error: Unexpected keyword argument "transformations" for "Spec"; did you mean "config_transformations"? [call-arg]
[error] 3469-3469: mypy error: Unexpected keyword argument "validations" for "Spec"; did you mean "config_validations"? [call-arg]
[error] 3474-3474: mypy error: Item "None" of "ConfigNormalizationRules | None" has no attribute "config_migrations" [union-attr]
[error] 3474-3474: mypy error: Argument "config_migrations" to "Spec" has incompatible type "list[RemapField] | Any | None"; expected "list[ConfigTransformation] | None" [arg-type]
[error] 3475-3475: mypy error: Item "None" of "ConfigNormalizationRules | None" has no attribute "transformations" [union-attr]
[error] 3476-3476: mypy error: Item "None" of "ConfigNormalizationRules | None" has no attribute "validations" [union-attr]
🧹 Nitpick comments (18)
airbyte_cdk/sources/declarative/transformations/config_transformations/config_transformation.py (1)
9-24: Consider using MutableMapping instead of Dict for better flexibility?The implementation looks great! One small suggestion - I noticed in the
remap_field.pyimplementation (from the snippets), you're usingMutableMapping[str, Any]for the config parameter, but here you're usingDict[str, Any]. UsingMutableMappingwould be more flexible and consistent with the implementations. Wdyt?- from typing import Any, Dict + from typing import Any, Dict, MutableMapping @abstractmethod def transform( self, - config: Dict[str, Any], + config: MutableMapping[str, Any], ) -> None:airbyte_cdk/sources/declarative/validators/predicate_validator.py (1)
11-26: Consider aligning validate method signature with other validators?The implementation looks clean and follows good composition practices! Based on the snippets from other validators, I noticed that other validators implement a
validatemethod that takes aninput_dataparameter. Would it make sense to align the method signature here for consistency across validators? Something like:- def validate(self) -> None: + def validate(self, input_data: Any = None) -> None: """ Applies the validation strategy to the value. :raises ValueError: If validation fails """ self.strategy.validate(self.value)This way all validators would have a consistent interface, even if this particular implementation ignores the input. Wdyt?
airbyte_cdk/sources/declarative/validators/dpath_validator.py (3)
25-33: Consider simplifying the field_path conversion logicThere appears to be redundancy in the way you're handling the field_path conversion. You first create a new list with all paths converted, and then iterate through the original list again to convert string elements. Could this be simplified to a single pass approach, wdyt?
- self._field_path = [ - InterpolatedString.create(path, parameters={}) for path in self.field_path - ] - for path_index in range(len(self.field_path)): - if isinstance(self.field_path[path_index], str): - self._field_path[path_index] = InterpolatedString.create( - self.field_path[path_index], parameters={} - ) + self._field_path = [ + InterpolatedString.create(path, parameters={}) for path in self.field_path + ]
47-59: Consider consolidating duplicate error handlingThe error handling logic for both wildcard and non-wildcard paths is duplicated. Maybe you could refactor this to reduce duplication and improve maintainability, wdyt?
if "*" in path: try: values = dpath.values(input_data, path) for value in values: self.strategy.validate(value) - except KeyError as e: - raise ValueError(f"Error validating path '{self.field_path}': {e}") else: try: value = dpath.get(input_data, path) self.strategy.validate(value) - except KeyError as e: - raise ValueError(f"Error validating path '{self.field_path}': {e}") + except KeyError as e: + raise ValueError(f"Error validating path '{self.field_path}': {e}")
35-59: Add validation for input_data typeThe method assumes
input_datais a dictionary without explicitly checking. Adding a type check could prevent cryptic errors if a non-dict value is passed.def validate(self, input_data: dict[str, Any]) -> None: + if not isinstance(input_data, dict): + raise ValueError(f"Expected dictionary input, got {type(input_data).__name__}") path = [path.eval({}) for path in self._field_path] # rest of the method...unit_tests/sources/declarative/transformations/config_transformations/test_remap_field.py (1)
93-96: Consider extending exception test to verify messageThe test confirms an exception is raised with empty field path, but doesn't verify the exception message. Would it be helpful to also check that the exception message matches expectations, wdyt?
with pytest.raises(Exception) as exc_info: RemapField(field_path=[], map={"old_value": "new_value"}) + assert "field_path cannot be empty" in str(exc_info.value)unit_tests/sources/declarative/validators/test_validate_adheres_to_schema.py (1)
119-131: Consider adding invalid JSON string testYou're testing validation with a valid JSON string, which is great. Would it also be valuable to test with an invalid JSON string to verify appropriate error handling, wdyt?
def test_given_invalid_json_string_when_validate_then_raises_error(self): schema = {"type": "object"} validator = ValidateAdheresToSchema(schema=schema) with pytest.raises(ValueError) as exc_info: validator.validate('{"invalid json') assert "Invalid JSON" in str(exc_info.value)airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (3)
23-33: Simplify field_path conversion logicSimilar to the comment in DpathValidator, there's redundancy in the way field_path is handled. You create a new list with all elements converted, then iterate again to convert string elements. Could this be simplified to a single approach, wdyt?
- self._field_path = [ - InterpolatedString.create(path, parameters={}) for path in self.field_path - ] - for path_index in range(len(self.field_path)): - if isinstance(self.field_path[path_index], str): - self._field_path[path_index] = InterpolatedString.create( - self.field_path[path_index], parameters={} - ) + self._field_path = [ + InterpolatedString.create(path, parameters={}) for path in self.field_path + ]
24-25: Improve error message for empty field pathThe error message could be more descriptive about why empty paths aren't allowed.
if not self.field_path: - raise Exception("field_path cannot be empty.") + raise ValueError("field_path cannot be empty. A valid path is required to identify the field to remap.")
59-60: Maybe add support for non-string map keys?Currently, the map lookup assumes string keys. If there's a chance of non-string values in the field being remapped (like integers), would it be valuable to add type conversion, wdyt?
if field_name in current and current[field_name] in self.map: current[field_name] = self.map[current[field_name]] + elif field_name in current and str(current[field_name]) in self.map: + current[field_name] = self.map[str(current[field_name])]unit_tests/sources/declarative/validators/test_dpath_validator.py (1)
77-93: Redundant assertion in wildcard test.There's a duplicate assertion at line 92 that's using unittest style after already using pytest style at line 91. Consider removing one of these assertions for clarity, wdyt?
assert strategy.validate_called assert strategy.validated_value in ["user1@example.com", "user2@example.com"] - self.assertIn(strategy.validated_value, ["user1@example.com", "user2@example.com"])unit_tests/sources/declarative/validators/test_predicate_validator.py (1)
1-56: Consider adding tests for edge cases.The tests look solid for the main use cases, but perhaps consider adding tests for edge cases like
Nonevalues or other special cases that might occur in real configurations, wdyt?def test_given_none_value_when_validate_then_validation_occurs(): strategy = MockValidationStrategy() validator = PredicateValidator(value=None, strategy=strategy) validator.validate() assert strategy.validate_called assert strategy.validated_value is Noneairbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)
3806-3832: Consider extractingconfig_normalization_rulesto a reusable definition?
Rather than inlining the schema underSpec, would it be clearer to defineConfigNormalizationRulesunderdefinitionsand reference it here for reuse and consistency? wdyt?
4310-4339: Add default empty-list values forconfig_*arrays?
A lot of our YAML arrays (e.g.,state_migrations) includedefault: []to simplify downstream logic. Would you consider addingdefault: []toconfig_migrations,transformations, andvalidationsso they always resolve to a list? wdyt?airbyte_cdk/sources/declarative/spec/spec.py (3)
71-74: Should we short-circuit when no migrations are configured?After adopting
default_factory=list, we could still save a needless copy when the list is empty.if not self.config_migrations: return # nothing to migrateMinor, but avoids touching the file and emitting control messages when no-op, wdyt?
🧰 Tools
🪛 GitHub Actions: Linters
[error] 72-72: mypy error: Item "None" of "list[ConfigTransformation] | None" has no attribute "iter" (not iterable) [union-attr]
83-96: Mirror the migration improvements intransform_config.With the
default_factory=listchange, both loops become safe, but adding an early return keeps things tidy and avoids an unnecessary dict copy when no transformations exist.🧰 Tools
🪛 GitHub Actions: Linters
[error] 92-92: mypy error: Item "None" of "list[ConfigTransformation] | None" has no attribute "iter" (not iterable) [union-attr]
98-105: Return early (or raise) on failed validations?Right now we iterate through validators but never surface which one failed. Would it make sense to accumulate exceptions and raise an aggregate (or raise immediately) so users know exactly what went wrong? Happy to sketch code if useful, wdyt?
🧰 Tools
🪛 GitHub Actions: Linters
[error] 104-104: mypy error: Item "None" of "list[Validator] | None" has no attribute "iter" (not iterable) [union-attr]
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
1972-1975: Duplicate GraphQL request-body types – intentional?There is already a
RequestBodyGraphQlQueryclass (note the lowercase l) a few hundred lines above.
AddingRequestBodyGraphQLintroduces two near-identical schema nodes, which might confuse the manifest authors and muddy auto-completion.Would it be simpler to consolidate both into a single well-named class (e.g.
RequestBodyGraphQLonly) and deprecate the other? Happy to suggest a deprecation alias if backwards compatibility is required, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (18)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml(2 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py(13 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py(1 hunks)airbyte_cdk/sources/declarative/spec/spec.py(3 hunks)airbyte_cdk/sources/declarative/transformations/config_transformations/__init__.py(1 hunks)airbyte_cdk/sources/declarative/transformations/config_transformations/config_transformation.py(1 hunks)airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py(1 hunks)airbyte_cdk/sources/declarative/validators/__init__.py(1 hunks)airbyte_cdk/sources/declarative/validators/dpath_validator.py(1 hunks)airbyte_cdk/sources/declarative/validators/predicate_validator.py(1 hunks)airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py(1 hunks)airbyte_cdk/sources/declarative/validators/validation_strategy.py(1 hunks)airbyte_cdk/sources/declarative/validators/validator.py(1 hunks)pyproject.toml(1 hunks)unit_tests/sources/declarative/transformations/config_transformations/test_remap_field.py(1 hunks)unit_tests/sources/declarative/validators/test_dpath_validator.py(1 hunks)unit_tests/sources/declarative/validators/test_predicate_validator.py(1 hunks)unit_tests/sources/declarative/validators/test_validate_adheres_to_schema.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (9)
airbyte_cdk/sources/declarative/transformations/config_transformations/__init__.py (1)
airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (1)
RemapField(15-60)
airbyte_cdk/sources/declarative/validators/predicate_validator.py (2)
airbyte_cdk/sources/declarative/validators/validation_strategy.py (2)
ValidationStrategy(9-22)validate(15-22)airbyte_cdk/sources/declarative/validators/validator.py (1)
validate(11-18)
airbyte_cdk/sources/declarative/transformations/config_transformations/config_transformation.py (1)
airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (1)
transform(35-60)
airbyte_cdk/sources/declarative/validators/__init__.py (5)
airbyte_cdk/sources/declarative/validators/dpath_validator.py (1)
DpathValidator(16-59)airbyte_cdk/sources/declarative/validators/predicate_validator.py (1)
PredicateValidator(12-26)airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (1)
ValidateAdheresToSchema(15-39)airbyte_cdk/sources/declarative/validators/validation_strategy.py (1)
ValidationStrategy(9-22)airbyte_cdk/sources/declarative/validators/validator.py (1)
Validator(9-18)
airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (4)
airbyte_cdk/sources/declarative/validators/validation_strategy.py (2)
ValidationStrategy(9-22)validate(15-22)airbyte_cdk/sources/declarative/validators/dpath_validator.py (1)
validate(35-59)airbyte_cdk/sources/declarative/validators/predicate_validator.py (1)
validate(20-26)airbyte_cdk/sources/declarative/validators/validator.py (1)
validate(11-18)
airbyte_cdk/sources/declarative/validators/validation_strategy.py (4)
airbyte_cdk/sources/declarative/validators/dpath_validator.py (1)
validate(35-59)airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (1)
validate(22-39)airbyte_cdk/sources/declarative/validators/predicate_validator.py (1)
validate(20-26)airbyte_cdk/sources/declarative/validators/validator.py (1)
validate(11-18)
unit_tests/sources/declarative/transformations/config_transformations/test_remap_field.py (1)
airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (2)
RemapField(15-60)transform(35-60)
airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (2)
airbyte_cdk/sources/declarative/interpolation/interpolated_string.py (1)
InterpolatedString(13-79)airbyte_cdk/sources/declarative/transformations/config_transformations/config_transformation.py (2)
ConfigTransformation(9-23)transform(15-23)
unit_tests/sources/declarative/validators/test_predicate_validator.py (2)
airbyte_cdk/sources/declarative/validators/validation_strategy.py (1)
ValidationStrategy(9-22)unit_tests/sources/declarative/validators/test_dpath_validator.py (2)
MockValidationStrategy(9-20)validate(16-20)
🪛 GitHub Actions: Dependency Analysis
pyproject.toml
[error] 1-1: DEP002 'dagger-io' defined as a dependency but not used in the codebase
[error] 1-1: DEP002 'anyio' defined as a dependency but not used in the codebase
🪛 GitHub Actions: Linters
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
[error] 3469-3469: mypy error: Unexpected keyword argument "transformations" for "Spec"; did you mean "config_transformations"? [call-arg]
[error] 3469-3469: mypy error: Unexpected keyword argument "validations" for "Spec"; did you mean "config_validations"? [call-arg]
[error] 3474-3474: mypy error: Item "None" of "ConfigNormalizationRules | None" has no attribute "config_migrations" [union-attr]
[error] 3474-3474: mypy error: Argument "config_migrations" to "Spec" has incompatible type "list[RemapField] | Any | None"; expected "list[ConfigTransformation] | None" [arg-type]
[error] 3475-3475: mypy error: Item "None" of "ConfigNormalizationRules | None" has no attribute "transformations" [union-attr]
[error] 3476-3476: mypy error: Item "None" of "ConfigNormalizationRules | None" has no attribute "validations" [union-attr]
airbyte_cdk/sources/declarative/spec/spec.py
[error] 72-72: mypy error: Item "None" of "list[ConfigTransformation] | None" has no attribute "iter" (not iterable) [union-attr]
[error] 76-76: mypy error: Argument 1 to "open" has incompatible type "Any | None"; expected "int | str | bytes | PathLike[str] | PathLike[bytes]" [arg-type]
[error] 80-80: mypy error: "AirbyteMessage" has no attribute "json" [attr-defined]
[error] 92-92: mypy error: Item "None" of "list[ConfigTransformation] | None" has no attribute "iter" (not iterable) [union-attr]
[error] 104-104: mypy error: Item "None" of "list[Validator] | None" has no attribute "iter" (not iterable) [union-attr]
airbyte_cdk/sources/declarative/models/declarative_component_schema.py
[error] 1-1: Ruff formatting check failed. File would be reformatted. Run 'ruff format' to fix code style issues.
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (28)
pyproject.toml (1)
93-94:❌ Incorrect review comment
Dependencies added but not being used?
The pipeline analysis indicates that
dagger-ioandanyiodependencies are declared but not used in the codebase. Are these dependencies going to be used in future PRs or are they needed for the config migrations feature that's currently being implemented? Maybe these are transitive dependencies needed by other new code?
🏁 Script executed:
#!/bin/bash # Let's check if these packages are imported anywhere in the code but the static analysis is missing it echo "Checking for dagger-io imports..." rg -l "import dagger" || echo "No direct dagger imports found" rg -l "from dagger" || echo "No direct dagger imports found" echo -e "\nChecking for anyio imports..." rg -l "import anyio" || echo "No direct anyio imports found" rg -l "from anyio" || echo "No direct anyio imports found" echo -e "\nChecking the broader context of what's being added in this PR..." fd -e py -e yaml -c 'git diff' | head -n 100Length of output: 793
dagger-io & anyio are utilized in the
binscript
•bin/generate_component_manifest_files.pyimports bothdaggerandanyio.These dependencies are in use there, so they’re not unused. Does this cover all usage, or are you planning further integration? wdyt?
Likely an incorrect or invalid review comment.
airbyte_cdk/sources/declarative/transformations/config_transformations/__init__.py (1)
1-7: LGTM!Clean and concise implementation of the module's
__init__.pyfile. It follows best practices by explicitly declaring what symbols are exported from the module using__all__.airbyte_cdk/sources/declarative/validators/validator.py (1)
1-18: LGTM!Well-designed abstract base class with clear documentation. The method signature and docstring provide good guidance for implementers on how validation should be performed.
airbyte_cdk/sources/declarative/validators/__init__.py (1)
1-19: Looks good to me!
The new__all__cleanly exposes the validator abstractions as intended. Nice work!airbyte_cdk/sources/declarative/validators/validation_strategy.py (1)
9-22: Well-designed abstract base class for validation strategies!Good job creating a clean, concise interface for validation strategies with clear documentation and proper error handling expectations. This abstract class provides a solid foundation for the validation framework.
airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (1)
14-39: Solid implementation of the ValidationStrategy for JSON schema validation!The implementation handles both string and non-string inputs gracefully, with clear error handling and good separation of concerns. Nice job adding the string-to-JSON conversion logic to make this validator more flexible.
airbyte_cdk/sources/declarative/validators/dpath_validator.py (1)
43-43: Should path evaluation consider the input context?I notice you're evaluating the path using an empty context
{}. Is this intentional, or should it useinput_datafor interpolation? This might be important if the path needs to be dynamically resolved based on the input.unit_tests/sources/declarative/transformations/config_transformations/test_remap_field.py (4)
11-36: The test cases look thorough and well-structuredGood job testing both the positive case and verifying that the original config is preserved.
37-51: Great edge case coverageThis test effectively verifies that values not found in the mapping remain unchanged, which is an important edge case.
68-82: Well-designed test for interpolated path functionalityThis test effectively demonstrates that fields can be remapped when paths contain dynamic components.
97-112: Good test for complex interactionsThis test effectively verifies that multiple transformations can be applied sequentially without interference, which is important for real-world scenarios with multiple config transformations.
unit_tests/sources/declarative/validators/test_validate_adheres_to_schema.py (3)
11-29: Well-structured test for successful validationThe test effectively verifies that valid data passes schema validation.
30-52: Great test for required field validationThis test effectively verifies that missing required fields result in appropriate validation errors with descriptive messages.
80-90: Good edge case handling for invalid schemaTesting with an invalid schema and checking for a specific error type demonstrates robust error handling.
airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (1)
48-56: Efficient path traversal logicThe navigation through nested dictionary structures is well-implemented with appropriate checks for existence and type. This makes the transformation safely handle incomplete or unexpected configurations.
unit_tests/sources/declarative/validators/test_dpath_validator.py (5)
9-21: Clean mock implementation for ValidationStrategy.The
MockValidationStrategyimplementation provides a clear way to track validation calls and simulate both success and failure cases. Good approach with tracking validation call state for verification in tests.
24-34: Test for happy path looks good.The test covers the expected behavior when both the path and validation are valid. The assertions effectively verify that the strategy was called and received a value.
35-46: Good error handling test.This test correctly verifies that the validator raises a ValueError with an appropriate message when the path doesn't exist, and that the strategy isn't called in this case.
47-59: Well-structured strategy failure test.Nice job testing that validation errors from the strategy are properly propagated. The assertions verify both the error and that the strategy was called with the correct value.
60-76: Good boundary condition tests.These tests for empty path and empty input data are valuable edge cases that ensure robust error handling in the validator.
unit_tests/sources/declarative/validators/test_predicate_validator.py (4)
9-21: Well-designed mock implementation.The
MockValidationStrategyis consistent with the one in thetest_dpath_validator.pyfile and properly implements theValidationStrategyinterface. This consistency across test files helps maintain clarity.
24-33: Good basic validation test.This test effectively verifies that the validator passes the value to the strategy correctly and that validation succeeds when expected.
34-46: Thorough error handling test.The test properly validates that errors from the strategy are propagated and that the error message is preserved in the exception. The additional assertions confirm the strategy was called with the right value.
47-56: Good test for complex objects.Testing with a nested object is important since validators should handle complex data structures. The test confirms that the entire object is passed to the strategy unchanged.
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)
4224-4257: PredicateValidator schema looks solid
ThePredicateValidatordefinition aligns well with our validation abstractions and mirrors the Pydantic model perfectly. Nice work—no changes needed here, wdyt?
4259-4297: ValidateAdheresToSchema definition is good to go
This validator supports both string and object schemas as intended. Everything appears consistent and complete. wdyt?airbyte_cdk/sources/declarative/spec/spec.py (1)
75-81:AirbyteMessage.json()isn’t in stubs – wrap withcastor usemodel_dump_json.mypy flags
jsonas missing. One option:-from message in self.message_repository.consume_queue(): - print(message.json(exclude_unset=True)) +for message in self.message_repository.consume_queue(): + # mypy: AirbyteMessage inherits from pydantic.BaseModel, so json() exists. + print(message.model_dump_json(exclude_unset=True)) # type: ignore[attr-defined]This silences mypy without runtime impact. Another approach is a cast to
BaseModel.🧰 Tools
🪛 GitHub Actions: Linters
[error] 76-76: mypy error: Argument 1 to "open" has incompatible type "Any | None"; expected "int | str | bytes | PathLike[str] | PathLike[bytes]" [arg-type]
[error] 80-80: mypy error: "AirbyteMessage" has no attribute "json" [attr-defined]
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
2069-2087: Ruff formatting failed – runruff format.The CI job reports a single formatting violation in this file. Running
ruff format airbyte_cdk/sources/declarative/models/declarative_component_schema.py(orpre-commit run --all-files) will auto-fix white-space & import-ordering issues.
airbyte_cdk/sources/declarative/models/declarative_component_schema.py
Outdated
Show resolved
Hide resolved
…nd-spec-class-for-config-migrations
…nd-spec-class-for-config-migrations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (9)
airbyte_cdk/sources/declarative/spec/spec.py (2)
72-100: Well-implemented config migration methodThe
migrate_configmethod is well structured to:
- Extract config path from CLI args
- Apply transformations to a copy of the config
- Only write back changes and emit messages if the config actually changed
One suggestion: Should we include a try/except block around the file operations to handle potential I/O errors gracefully? What do you think?
- if mutable_config != config: - with open(config_path, "w") as f: - json.dump(mutable_config, f) - self.message_repository.emit_message( - create_connector_config_control_message(mutable_config) - ) - for message in self.message_repository.consume_queue(): - print(orjson.dumps(AirbyteMessageSerializer.dump(message)).decode()) + if mutable_config != config: + try: + with open(config_path, "w") as f: + json.dump(mutable_config, f) + self.message_repository.emit_message( + create_connector_config_control_message(mutable_config) + ) + for message in self.message_repository.consume_queue(): + print(orjson.dumps(AirbyteMessageSerializer.dump(message)).decode()) + except IOError as e: + print(f"Error writing migrated config to {config_path}: {e}")
83-85: Consider more explicit handling of missing config pathThe check for
if not config_path:is good, but would it be more explicit to add a log message or warning when skipping due to missing config path? This could help with debugging.- if not config_path: - return + if not config_path: + print("Skipping config migration: No config path provided in arguments") + returnairbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)
4372-4399: Nit: typo in example forConfigAddFields
The example usesconfig['environemnt']; I believe it should beconfig['environment']to avoid confusion. wdyt?ConfigAddFields: ... properties: condition: description: Fields will be added if expression is evaluated to True. type: string default: "" interpolation_context: - config - property examples: - - "{{ config['environemnt'] == 'sandbox' }}" + - "{{ config['environment'] == 'sandbox' }}" - "{{ property is integer }}" - "{{ property|length > 5 }}" - "{{ property == 'some_string_to_match' }}"
4401-4433: Nit: typo in example forConfigRemoveFields
Similarly here, the example hasconfig['environemnt']; updating toconfig['environment']might prevent downstream confusion. wdyt?ConfigRemoveFields: ... properties: examples: - - "{{ config['environemnt'] == 'sandbox' }}" + - "{{ config['environment'] == 'sandbox' }}"airbyte_cdk/sources/declarative/models/declarative_component_schema.py (3)
1575-1593: Small typo in examplesThe conditional examples contain a typo in "environemnt" which should be "environment":
- "{{ config['environemnt'] == 'sandbox' }}", + "{{ config['environment'] == 'sandbox' }}",
1983-2000: Tiny typo in title fieldThere's a small spelling error in the title for the validation_strategy field:
- title="Validation Stragey", + title="Validation Strategy",
2022-2039: Same typo as noted earlier in examplesThe same "environemnt" typo appears in the examples here. Consider fixing it consistently across all occurrences.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
823-833: Minor: local variable shadows imported moduletransformations– rename for clarity?Inside
create_config_migration()the list variable is namedtransformations, which shadows the earlierfrom airbyte_cdk.sources.declarative import transformationsimport (even though it’s scoped locally). While harmless, renaming to something liketransformation_componentscould avoid head-scratching when debugging. wdyt?
3584-3612:create_spec()does not propagatemodel.parameters– intentional?All other factory helpers forward
model.parameters or {}to the runtime component, butcreate_spec()hard-codesparameters={}. If the Spec model ever supports$parameters, this silently drops user input.Would you consider passing them through for consistency?
- return Spec( + return Spec( connection_specification=model.connection_specification, documentation_url=model.documentation_url, advanced_auth=model.advanced_auth, - parameters={}, + parameters=model.parameters or {}, config_migrations=config_migrations, config_transformations=config_transformations, config_validations=config_validations, )Let me know what you think! wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting
📒 Files selected for processing (10)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml(3 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py(5 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py(14 hunks)airbyte_cdk/sources/declarative/spec/__init__.py(1 hunks)airbyte_cdk/sources/declarative/spec/spec.py(3 hunks)airbyte_cdk/sources/declarative/transformations/config_transformations/add_fields.py(4 hunks)airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py(2 hunks)airbyte_cdk/sources/declarative/validators/dpath_validator.py(1 hunks)unit_tests/sources/declarative/spec/test_spec.py(3 hunks)unit_tests/sources/declarative/transformations/config_transformations/test_config_add_fields.py(5 hunks)
✅ Files skipped from review due to trivial changes (3)
- airbyte_cdk/sources/declarative/spec/init.py
- airbyte_cdk/sources/declarative/validators/dpath_validator.py
- unit_tests/sources/declarative/transformations/config_transformations/test_config_add_fields.py
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (24)
airbyte_cdk/sources/declarative/transformations/config_transformations/add_fields.py (2)
12-15: Good refactoring! Using centralized module importsNice job moving
AddedFieldDefinitionandParsedAddFieldDefinitionimports from local definitions to the central module. This improves maintainability by centralizing these shared components.
78-79: Consistent parameters argument - nicely done!You've correctly updated the
ParsedAddFieldDefinitionconstruction to include the emptyparameters={}argument, making it compatible with the updated dataclass signature from the centralized module.Also applies to: 87-88
unit_tests/sources/declarative/spec/test_spec.py (6)
164-203: Well-structured test fixture for migration mocksExcellent job creating this detailed fixture to mock all the external dependencies for migration testing. The comprehensive setup mocks message repositories, file I/O, serialization, and printing - giving you full control over testing the migration process in isolation.
206-234: Thorough testing of unmigrated config scenarioThis test effectively verifies that when encountering an unmigrated config, the system:
- Properly applies the migration transformations
- Emits a connector config control message
- Writes the migrated config to a file
All assertions comprehensively validate the side effects - good job!
236-263: Nice negative test case for already migrated configsGood thinking adding this test to verify that already-migrated configs don't trigger unnecessary side effects. It ensures that the implementation avoids redundant operations when the config is already in the expected state.
266-310: Comprehensive test of transformation sequenceExcellent test case demonstrating a multi-step transformation workflow. The test shows how different transformations (add fields, remap fields, remove fields) can be composed together to achieve complex config normalization. The detailed assertions verify the exact expected output.
313-341: Good validation test for valid configThis test effectively verifies that valid config values won't raise exceptions when validated against a JSON schema. Using the DpathValidator with ValidateAdheresToSchema is a clean approach to validating nested structures.
344-373: Properly testing exception cases for invalid configsWell done testing the negative case - confirming that invalid config values properly trigger exceptions when validated. The type mismatch (number vs string) is a good validation test scenario.
airbyte_cdk/sources/declarative/spec/spec.py (4)
28-31: Clean design of ConfigMigration dataclassThis is a nice, simple dataclass that groups related transformations with an optional description. This approach allows for organized, named groups of transformations that can be applied as a unit.
48-51: Good use of field default_factoryNice choice using
field(default_factory=list)for collections andInMemoryMessageRepository()for the message repository. This avoids the shared-state issues that would occur with mutable default values.
101-113: Clean implementation of transform_configGood choice to operate on a copy of the config (
dict(config)) to avoid modifying the input. The method is simple, focused, and returns the transformed config as expected.
115-122: Simple and effective validate_config implementationThe validation method iterates through all validators and applies them to the config. Clean and straightforward implementation.
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (5)
171-171: Looks good: updatedinject_intodescription is clear.
The new description for how the API key is sent reads well and aligns with other request option components.
3835-3853: Great:ConfigMigrationschema definition looks solid
The newConfigMigrationobject has the requiredtransformationslist and sensible defaults. This aligns nicely with the model-to-component factory additions. wdyt?
4215-4243: Verified:DpathValidatorenum matches component name
Theenum: [DpathValidator]now correctly matches the component class, resolving the previous mismatch. Nice catch!
4246-4278: Approved:PredicateValidatordefinition is consistent
The required fields (type,value,validation_strategy) and interpolation context are well-defined.
4280-4332: Well done:ValidateAdheresToSchemaschema is comprehensive
This definition covers both JSON and string base schemas, and provides clear examples.airbyte_cdk/sources/declarative/models/declarative_component_schema.py (7)
1524-1549: Looks good: Added validation against JSON schemaThe
ValidateAdheresToSchemamodel provides a flexible validation approach that can work with both string-based schema references and inline dictionary definitions. This will be useful for ensuring configs adhere to predefined schemas.
1977-1981: LGTM: Good addition for GraphQL supportClean implementation that follows the pattern of other request body models. This will be helpful for interacting with GraphQL APIs.
2002-2020: LGTM: Versatile validation approachThe
PredicateValidatorprovides a nice complement to the path-based validator, allowing direct value validation with good type flexibility.
2096-2105: LGTM: Good grouping mechanism for transformationsThe
ConfigMigrationmodel provides a nice structure for grouping related transformations and documenting their purpose.
2107-2125: LGTM: Well-structured normalization rules containerThis model clearly separates migrations, transformations, and validations - making the configuration process more organized and easier to follow.
2127-2147: LGTM: Clean extension of the Spec modelAdding
config_normalization_rulesto the Spec model is a non-breaking change that enables all the new functionality. Nice work!
2328-2331: Good clarification on schema loader behaviorThe improved description clearly explains how multiple schema loaders interact, which will help prevent confusion when users define multiple loaders.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3)
171-171: Consider highlighting mutual exclusivity ofheaderandinject_into
Theinject_intodescription was simplified nicely, but we no longer call out thatheaderandinject_intocannot both be defined. Should we reintroduce a brief note (or cross-reference theheaderdeprecation) to avoid confusion? wdyt?
4372-4399: Fix typo in example forConfigAddFields
The examples referenceconfig['environemnt'], which looks like a misspelling. Could we correct it toconfig['environment']? wdyt?- - "{{ config['environemnt'] == 'sandbox' }}" + - "{{ config['environment'] == 'sandbox' }}"
4401-4430: Fix typo in example forConfigRemoveFields
Similarly, theConfigRemoveFieldsexamples includeconfig['environemnt']. Should we update toconfig['environment']for consistency? wdyt?- - "{{ config['environemnt'] == 'sandbox' }}" + - "{{ config['environment'] == 'sandbox' }}"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting
⛔ Files ignored due to path filters (1)
poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml(3 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py(5 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py(14 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- airbyte_cdk/sources/declarative/models/declarative_component_schema.py
- airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (Fast)
🔇 Additional comments (3)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3)
3806-3834: Config normalization rules added
Great addition of theconfig_normalization_rulesblock underSpecto support declarative migrations, transformations, and validations. The schema structure and camelCaseadditionalPropertiesare consistent with the rest of the file. Looks solid!
4215-4245: DpathValidator definition aligns with new validation workflow
The newDpathValidatorschema correctly requirestype,field_path, andvalidation_strategy. The enum matches the component name, and the JSON schema keyword casing is consistent.
4333-4371: ConfigRemapField schema is correctly named and defined
TheConfigRemapFielddefinition uses the matching enum value and describes the transformation appropriately. The interpolation context onmapandfield_pathlooks complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3)
3835-3853: Consider stricter validation forConfigMigrationobjects.
Would it be valuable to addadditionalProperties: falseunder theConfigMigrationdefinition to catch typos or unsupported keys early in migrations? wdyt?
4215-4245: Fix spelling inDpathValidatorproperty title
Thevalidation_strategyproperty title currently reads "Validation Stragey". Could we update it to "Validation Strategy" for clarity and consistency? wdyt?
4372-4400: Fix typo inconditionexample forConfigAddFields.
The example usesconfig['environemnt'], which appears misspelled. Should it beconfig['environment']instead? wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting
📒 Files selected for processing (2)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml(2 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte_cdk/sources/declarative/models/declarative_component_schema.py
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
🔇 Additional comments (4)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (4)
3806-3834: Excellent addition ofconfig_normalization_rulestoSpec!
The new structure for migrations, transformations, and validations aligns perfectly with the PR objectives and maintains consistency with the rest of the schema.
4246-4273: PredicateValidator schema looks solid.
The required fields, types, and interpolation context are well defined and consistent with the new validators.
4280-4331: ValidateAdheresToSchema definition is clear and well-structured.
It correctly accommodates both string and object variants forbase_schemaand supports interpolation context.
4333-4371: ConfigRemapField schema is spot on.
The enum value,maptypes, andfield_pathinterpolation contexts are all consistent and correctly defined.
brianjlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a few small nits and clarifications. But non-blocking so happy to ✅
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
Outdated
Show resolved
Hide resolved
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
4408-4439:⚠️ Potential issueNested
field_pointersitems array needs atype: arrayThe
field_pointersproperty currently declares:field_pointers: type: array items: items: type: stringWe’re missing
type: arrayfor the inneritems. It should be:field_pointers: - type: array - items: - items: - type: string + type: array + items: + type: array + items: + type: stringThis will ensure each pointer is an array of strings. wdyt?
🧹 Nitpick comments (3)
airbyte_cdk/sources/declarative/spec/spec.py (3)
51-51: Consider using default_factory for message_repository as well?The message_repository is initialized with a direct instance, which means all Spec instances will share the same repository. Would it make sense to use field(default_factory=InMemoryMessageRepository) here too to avoid shared state across instances, as suggested in past reviews?
- message_repository: MessageRepository = InMemoryMessageRepository() + message_repository: MessageRepository = field(default_factory=InMemoryMessageRepository)wdyt?
72-86: Looks good, but could the error handling be a bit more explicit?The migrate_config method looks well-structured. For line 84-85, the check for an empty config path is good, but would it be clearer to handle the None case with an explicit message as suggested in a previous review?
- if not config_path: + if config_path is None: + # Silently skip when no config path is provided returnThis makes the intentional skipping more explicit for future readers. Wdyt?
93-100: Consider adding a comment about the print statement.A previous reviewer questioned the print statement on line 99. If it's intentional to both emit through the queue and print, maybe adding a brief comment would help clarify the dual purpose?
for message in self.message_repository.consume_queue(): + # Print messages directly to stdout for immediate feedback, in addition to queue emission print(orjson.dumps(AirbyteMessageSerializer.dump(message)).decode())
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting
⛔ Files ignored due to path filters (1)
poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml(2 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py(4 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py(15 hunks)airbyte_cdk/sources/declarative/spec/spec.py(3 hunks)pyproject.toml(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- pyproject.toml
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
⏰ Context from checks skipped due to timeout of 90000ms (6)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
🔇 Additional comments (26)
airbyte_cdk/sources/declarative/spec/spec.py (8)
5-7: Good standard library imports added!Nice choice to import json and dataclasses.field here for the new functionality. The new type annotations for ConfigTransformation lists look good too.
9-12: Using orjson for performance is a nice touch.Including orjson for message serialization and adding the config observation imports makes sense for the new migration features. Good job adding the minimum necessary imports.
18-25: Imports look well-organized.Good organization of the imports, keeping related functionality together. The message repository additions align well with the new functionality being added.
28-31: Simple and clean ConfigMigration dataclass!This looks like a nice abstraction to group related transformations with an optional description. Makes the migration intent clear.
48-50: Good use of field(default_factory=list)!You've applied the suggestion from the past review comment to use default_factory=list instead of None defaults. This is a solid improvement that avoids having to check for None before iteration.
87-92: Good approach for immutability!Creating a mutable copy of the config before processing is a nice way to preserve the original until you're sure changes are needed. The check at line 92 ensuring changes actually occurred before writing is also a great optimization.
101-108: Clean implementation of transform_config.This method is straightforward and has good documentation. It iterates through the transformations and applies them directly to the provided config.
110-117: Well-implemented validate_config method.The validate_config implementation is concise and follows the same pattern as transform_config. Good consistency across these related operations.
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (7)
3806-3836: Newconfig_normalization_rulesproperty looks solid – well structured and properly optional?The addition of
config_normalization_rulesto theSpecschema with defaults for each sub-property feels right, and you’ve used the correct camelCaseadditionalProperties. Everything aligns with our JSON Schema conventions. wdyt?
3838-3860:ConfigMigrationdefinition sanity check – required fields are setNice to see you added
required: [type, transformations]to enforce the presence of thetypeandtransformationsproperties. This matches our best practices for schema definitions. wdyt?
4223-4250:DpathValidatorschema enum & properties look correctThe
DpathValidatornow hasenum: [DpathValidator]and the required properties (type,field_path,validation_strategy) are in place. This resolves the previous mismatch, and interpolation contexts are accurate. wdyt?
4253-4284:PredicateValidatorschema looks consistentGreat job defining the
PredicateValidatorwith the correctenumvalue and required fields (type,value,validation_strategy). The value type supports all JSON types as intended. wdyt?
4287-4325:ValidateAdheresToSchemaschema definition is clear and completeThe
ValidateAdheresToSchemaobject now correctly requirestypeandbase_schemawith appropriate types and interpolation contexts. Looks good for validating JSON schema adherence. wdyt?
4340-4378:ConfigRemapFieldenum and properties align with the patternWell done updating the
ConfigRemapFieldenum to match the definition name and including themapandfield_pathrequirements with context interpolation. This will integrate seamlessly with the component factory. wdyt?
4379-4407:ConfigAddFieldsschema follows the established patternThe
ConfigAddFieldstransformation correctly listsenum: [ConfigAddFields], enforcesfields, and includes an optionalcondition. Interpolation contexts look appropriate. wdyt?airbyte_cdk/sources/declarative/models/declarative_component_schema.py (11)
1526-1550: LGTM - Good implementation of schema validation componentThe
ValidateAdheresToSchemamodel provides a comprehensive framework for validating user configs against a JSON schema. The flexibility to accept either string or dict forbase_schemais a nice touch that allows for runtime schema definition.
1553-1574: LGTM - Remapping utility looks goodThe
ConfigRemapFieldmodel provides a clean way to transform config field values using a mapping dictionary. The field path specification with nested path support is particularly useful.
1577-1594: LGTM - ConfigRemoveFields implementation is cleanThe conditional field removal functionality will be very useful for handling environment-specific configurations. The examples in the documentation are helpful for understanding the feature.
1979-1981: LGTM - GraphQL request body wrapperSimple and focused wrapper for GraphQL queries.
1984-2001: LGTM - Path-based validatorThe
DpathValidatorimplementation for validating specific config paths looks solid.
2004-2021: LGTM - Value-based validatorThe
PredicateValidatorallows for flexible validation of configuration values directly or through interpolation.
2024-2040: LGTM - Field addition component looks goodThe
ConfigAddFieldsmodel provides a clean way to conditionally add fields to a configuration.
2098-2108: LGTM - Migration framework looks well-designedThe
ConfigMigrationmodel provides a comprehensive way to define ordered transformations for migrating configurations.
2110-2130: LGTM - Good organization of normalization componentsThe
ConfigNormalizationRulesmodel nicely aggregates the different types of configuration operations. Settingextra = Extra.forbidensures strict schema adherence.
2133-2153: LGTM - Spec model updated appropriatelyThe
Specmodel has been properly extended to include the new config normalization rules.
2335-2335: Improved documentation clarity, nice!The expanded description for
schema_loadernow clearly explains how multiple schema loaders work together and how precedence is handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
unit_tests/sources/declarative/spec/test_spec.py (4)
2-2: Copyright updated to 2025Looking ahead a bit with that copyright year there 😄. Might want to update to the current year (2024) instead?
- # Copyright (c) 2025 Airbyte, Inc., all rights reserved. + # Copyright (c) 2024 Airbyte, Inc., all rights reserved.
164-204: Well-structured fixture for migration testing, wdyt about adding a docstring?Nice comprehensive fixture that mocks all the necessary dependencies for testing the migration functionality. It's a clean pattern to return a dictionary of mocks.
Could we add a docstring to explain what each mock is for? This would help other developers understand the purpose of each mock more easily, wdyt?
@pytest.fixture def migration_mocks(monkeypatch): + """ + Creates and configures mocks for testing config migrations. + + Mocks message repositories, entrypoint, file operations, JSON serialization, + printing, and orjson serialization to enable isolated testing of migration logic. + + Returns: + dict: A dictionary containing all configured mocks. + """ mock_message_repository = Mock() mock_message_repository.consume_queue.return_value = [Mock()]
206-234: Test for unmigrated config migration looks good!This test thoroughly verifies that unmigrated configs trigger the appropriate migration actions. It checks message emission, file writes, and serialization calls.
One small suggestion - perhaps verify the actual content that's being migrated? It would make the test even more robust to check that the configuration was transformed as expected, wdyt?
spec.migrate_config(["spec"], migration_mocks["source"], input_config) migration_mocks["message_repository"].emit_message.assert_called_once() migration_mocks["open"].assert_called_once_with("/fake/config/path", "w") migration_mocks["json_dump"].assert_called_once() migration_mocks["print"].assert_called() migration_mocks["serializer_dump"].assert_called() migration_mocks["orjson_dumps"].assert_called() migration_mocks["decoded_bytes"].decode.assert_called() +# Verify the config has been transformed +assert input_config == {"planet": "Coruscant"}
347-376: Validation failure test - consider specifying exception typeGood test for validation failure. It would be even more robust to assert the specific type of exception that should be raised and possibly check the error message to ensure it provides helpful feedback to users, wdyt?
- with pytest.raises(Exception): + with pytest.raises(ValueError, match="Field 'field_to_validate' expected type 'string' but got 'integer'"): spec.validate_config(input_config)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting
⛔ Files ignored due to path filters (1)
poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (1)
unit_tests/sources/declarative/spec/test_spec.py(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: Pytest (Fast)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Analyze (python)
🔇 Additional comments (3)
unit_tests/sources/declarative/spec/test_spec.py (3)
236-264: Test for already migrated config looks good!Great complementary test case that verifies the negative path - when a config is already migrated, no actions should be taken. The test correctly verifies that no messages are emitted and no file operations or serializations are performed.
266-314: Comprehensive test for transformation sequenceThis test nicely covers a complex sequence of transformations (adding fields, remapping values, removing fields). The only thing that seems a bit unusual is that you're initially setting both
planet_nameandplanet_populationto template values fromplanet_codeand then remapping them separately.Would it be clearer to directly set
planet_namewith a template for the name andplanet_populationwith a template for the population? Or is this specific pattern important to test? Just curious!
316-345: Good validation success testThis test properly verifies that no exception is raised when validating a config that matches the schema. The JSON schema definition is well-structured.
What
ValidateAdheresToSchemaPredicateValidatorDpathValidatorConfigRemapFieldConfigAddFieldsConfigRemoveFieldsConfigMigrationSpecruntime component, declarative component schema, and model to component factory method per new config migrations, transformations, and validations.Recommended Review Order
declarative_component_schema.pyspec.pymodel_to_component_factory.pytest_spec.pySummary by CodeRabbit
New Features
Bug Fixes
Tests
Documentation