Skip to content

Conversation

@pnilan
Copy link
Contributor

@pnilan pnilan commented May 8, 2025

What

  • Closes https://github.com/airbytehq/airbyte-internal-issues/issues/12697
  • Adds component schemas and creates model_to_component_factory methods for:
    • ValidateAdheresToSchema
    • PredicateValidator
    • DpathValidator
    • ConfigRemapField
    • ConfigAddFields
    • ConfigRemoveFields
    • ConfigMigration
  • Updates Spec runtime component, declarative component schema, and model to component factory method per new config migrations, transformations, and validations.

Recommended Review Order

  1. declarative_component_schema.py
  2. spec.py
  3. model_to_component_factory.py
  4. test_spec.py

Summary by CodeRabbit

  • New Features

    • Introduced support for declarative configuration migrations, transformations, and validations, allowing configs to be automatically updated, transformed, and validated before syncs.
    • Added new schema options to specify config normalization rules, including migrations, field additions/removals, remapping, and schema-based validations.
  • Bug Fixes

    • Improved type consistency for configuration field paths in transformation and validation components.
  • Tests

    • Added comprehensive tests for config migration, transformation, and validation logic, ensuring correct behavior and integration.
  • Documentation

    • Enhanced schema and model descriptions for improved clarity on configuration normalization features.

@github-actions github-actions bot added the enhancement New feature or request label May 8, 2025
@pnilan pnilan marked this pull request as ready for review May 12, 2025 18:09
Copilot AI review requested due to automatic review settings May 12, 2025 18:09
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR extends the Spec class by adding support for configuration migrations, transformations, and validations. Key changes include:

  • Adding new optional fields (config_migrations, config_transformations, config_validations) and a message repository to the Spec class.
  • Introducing new methods to migrate, transform, and validate the configuration.
  • Updating the component factory (ModelToComponentFactory) to propagate the new fields from the normalization rules.

Reviewed Changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

File Description
airbyte_cdk/sources/declarative/spec/spec.py Extended Spec with new config migration/transformation/validation fields and corresponding methods.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py Updated create_spec to forward new normalization rules fields.
pyproject.toml Added extra dependencies.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented May 12, 2025

📝 Walkthrough

Walkthrough

This change introduces a declarative framework for configuration migration, transformation, and validation in Airbyte's CDK. It extends the specification schema and models to support config normalization rules, implements corresponding runtime logic in the Spec class, updates the component factory for instantiation, and adds comprehensive unit tests for migration, transformation, and validation workflows.

Changes

File(s) Change Summary
airbyte_cdk/sources/declarative/declarative_component_schema.yaml Extended the Spec schema with config_normalization_rules, enabling declarative config migrations, transformations, and validations. Added new definitions for migration, transformation, and validation objects.
airbyte_cdk/sources/declarative/models/declarative_component_schema.py Added Pydantic models for config migration, transformation, and validation. Updated the Spec model to include config_normalization_rules. Refined some field descriptions.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py Added creation logic for new migration, transformation, and validation components. Updated mappings and create_spec to instantiate and attach these components to the Spec.
airbyte_cdk/sources/declarative/spec/spec.py Added config_migrations, config_transformations, config_validations, and message_repository fields to Spec. Implemented migrate_config, transform_config, and validate_config methods. Introduced ConfigMigration dataclass.
airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py Changed field_path type annotation in ConfigRemapField from List[Union[InterpolatedString, str]] to List[str]. Removed unused imports.
airbyte_cdk/sources/declarative/validators/dpath_validator.py Changed field_path type annotation in DpathValidator from List[Union[InterpolatedString, str]] to List[str].
airbyte_cdk/sources/declarative/spec/__init__.py Exported ConfigMigration alongside Spec in the module's public API.
airbyte_cdk/sources/declarative/transformations/config_transformations/add_fields.py Removed local definitions of AddedFieldDefinition and ParsedAddFieldDefinition, importing them instead. Adjusted instantiation to explicitly pass parameters. Minor comment formatting.
unit_tests/sources/declarative/spec/test_spec.py Added fixture and tests for config migration, transformation, and validation, including mocks for file and message operations.
unit_tests/sources/declarative/transformations/config_transformations/test_config_add_fields.py Updated tests to explicitly pass value_type=None and parameters={} to AddedFieldDefinition.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Spec
    participant ConfigMigration
    participant ConfigTransformation
    participant Validator
    participant MessageRepository

    User->>Spec: migrate_config(args, source, config)
    Spec->>ConfigMigration: Apply transformations to config
    ConfigMigration-->>Spec: Return migrated config
    alt Config changed
        Spec->>MessageRepository: Emit config control message
        Spec->>User: Write migrated config to file
        Spec->>User: Print emitted messages
    end

    User->>Spec: transform_config(config)
    Spec->>ConfigTransformation: Apply all transformations
    ConfigTransformation-->>Spec: Return transformed config

    User->>Spec: validate_config(config)
    Spec->>Validator: Run all validations
    Validator-->>Spec: Raise if invalid, else return
Loading

Would you like a breakdown of how these new config normalization rules could be composed in a real-world connector spec, or perhaps an example migration scenario? Wdyt?

Note

⚡️ AI Code Reviews for VS Code, Cursor, Windsurf

CodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback.
Learn more here.


Note

⚡️ Faster reviews with caching

CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure Review - Disable Cache at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the Data Retention setting under your Organization Settings.
Enjoy the performance boost—your workflow just got faster.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🔭 Outside diff range comments (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

3469-3478: ⚠️ Potential issue

Critical: Fix create_spec to match Spec signature and handle optional rules.
The Spec constructor now expects config_transformations and config_validations (not transformations/validations), and model.config_normalization_rules may be None, causing attribute errors. Could we update this block accordingly? For example:

-        return Spec(
-            connection_specification=model.connection_specification,
-            documentation_url=model.documentation_url,
-            advanced_auth=model.advanced_auth,
-            parameters={},
-            config_migrations=model.config_normalization_rules.config_migrations,
-            transformations=model.config_normalization_rules.transformations,
-            validations=model.config_normalization_rules.validations,
-        )
+        return Spec(
+            connection_specification=model.connection_specification,
+            documentation_url=model.documentation_url,
+            advanced_auth=model.advanced_auth,
+            parameters={},
+            config_migrations=model.config_normalization_rules.config_migrations if model.config_normalization_rules else [],
+            config_transformations=model.config_normalization_rules.transformations if model.config_normalization_rules else [],
+            config_validations=model.config_normalization_rules.validations if model.config_normalization_rules else [],
+        )

This change should resolve the mypy errors about unexpected keyword arguments and guard against None normalization rules. wdyt?

🧰 Tools
🪛 GitHub Actions: Linters

[error] 3469-3469: mypy error: Unexpected keyword argument "transformations" for "Spec"; did you mean "config_transformations"? [call-arg]


[error] 3469-3469: mypy error: Unexpected keyword argument "validations" for "Spec"; did you mean "config_validations"? [call-arg]


[error] 3474-3474: mypy error: Item "None" of "ConfigNormalizationRules | None" has no attribute "config_migrations" [union-attr]


[error] 3474-3474: mypy error: Argument "config_migrations" to "Spec" has incompatible type "list[RemapField] | Any | None"; expected "list[ConfigTransformation] | None" [arg-type]


[error] 3475-3475: mypy error: Item "None" of "ConfigNormalizationRules | None" has no attribute "transformations" [union-attr]


[error] 3476-3476: mypy error: Item "None" of "ConfigNormalizationRules | None" has no attribute "validations" [union-attr]

🧹 Nitpick comments (18)
airbyte_cdk/sources/declarative/transformations/config_transformations/config_transformation.py (1)

9-24: Consider using MutableMapping instead of Dict for better flexibility?

The implementation looks great! One small suggestion - I noticed in the remap_field.py implementation (from the snippets), you're using MutableMapping[str, Any] for the config parameter, but here you're using Dict[str, Any]. Using MutableMapping would be more flexible and consistent with the implementations. Wdyt?

- from typing import Any, Dict
+ from typing import Any, Dict, MutableMapping

@abstractmethod
def transform(
    self,
-   config: Dict[str, Any],
+   config: MutableMapping[str, Any],
) -> None:
airbyte_cdk/sources/declarative/validators/predicate_validator.py (1)

11-26: Consider aligning validate method signature with other validators?

The implementation looks clean and follows good composition practices! Based on the snippets from other validators, I noticed that other validators implement a validate method that takes an input_data parameter. Would it make sense to align the method signature here for consistency across validators? Something like:

- def validate(self) -> None:
+ def validate(self, input_data: Any = None) -> None:
    """
    Applies the validation strategy to the value.

    :raises ValueError: If validation fails
    """
    self.strategy.validate(self.value)

This way all validators would have a consistent interface, even if this particular implementation ignores the input. Wdyt?

airbyte_cdk/sources/declarative/validators/dpath_validator.py (3)

25-33: Consider simplifying the field_path conversion logic

There appears to be redundancy in the way you're handling the field_path conversion. You first create a new list with all paths converted, and then iterate through the original list again to convert string elements. Could this be simplified to a single pass approach, wdyt?

- self._field_path = [
-     InterpolatedString.create(path, parameters={}) for path in self.field_path
- ]
- for path_index in range(len(self.field_path)):
-     if isinstance(self.field_path[path_index], str):
-         self._field_path[path_index] = InterpolatedString.create(
-             self.field_path[path_index], parameters={}
-         )
+ self._field_path = [
+     InterpolatedString.create(path, parameters={}) for path in self.field_path
+ ]

47-59: Consider consolidating duplicate error handling

The error handling logic for both wildcard and non-wildcard paths is duplicated. Maybe you could refactor this to reduce duplication and improve maintainability, wdyt?

if "*" in path:
    try:
        values = dpath.values(input_data, path)
        for value in values:
            self.strategy.validate(value)
-   except KeyError as e:
-       raise ValueError(f"Error validating path '{self.field_path}': {e}")
else:
    try:
        value = dpath.get(input_data, path)
        self.strategy.validate(value)
-   except KeyError as e:
-       raise ValueError(f"Error validating path '{self.field_path}': {e}")
+   except KeyError as e:
+       raise ValueError(f"Error validating path '{self.field_path}': {e}")

35-59: Add validation for input_data type

The method assumes input_data is a dictionary without explicitly checking. Adding a type check could prevent cryptic errors if a non-dict value is passed.

def validate(self, input_data: dict[str, Any]) -> None:
+   if not isinstance(input_data, dict):
+       raise ValueError(f"Expected dictionary input, got {type(input_data).__name__}")
    
    path = [path.eval({}) for path in self._field_path]
    # rest of the method...
unit_tests/sources/declarative/transformations/config_transformations/test_remap_field.py (1)

93-96: Consider extending exception test to verify message

The test confirms an exception is raised with empty field path, but doesn't verify the exception message. Would it be helpful to also check that the exception message matches expectations, wdyt?

with pytest.raises(Exception) as exc_info:
    RemapField(field_path=[], map={"old_value": "new_value"})
+ assert "field_path cannot be empty" in str(exc_info.value)
unit_tests/sources/declarative/validators/test_validate_adheres_to_schema.py (1)

119-131: Consider adding invalid JSON string test

You're testing validation with a valid JSON string, which is great. Would it also be valuable to test with an invalid JSON string to verify appropriate error handling, wdyt?

def test_given_invalid_json_string_when_validate_then_raises_error(self):
    schema = {"type": "object"}
    validator = ValidateAdheresToSchema(schema=schema)
    
    with pytest.raises(ValueError) as exc_info:
        validator.validate('{"invalid json')
    
    assert "Invalid JSON" in str(exc_info.value)
airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (3)

23-33: Simplify field_path conversion logic

Similar to the comment in DpathValidator, there's redundancy in the way field_path is handled. You create a new list with all elements converted, then iterate again to convert string elements. Could this be simplified to a single approach, wdyt?

- self._field_path = [
-     InterpolatedString.create(path, parameters={}) for path in self.field_path
- ]
- for path_index in range(len(self.field_path)):
-     if isinstance(self.field_path[path_index], str):
-         self._field_path[path_index] = InterpolatedString.create(
-             self.field_path[path_index], parameters={}
-         )
+ self._field_path = [
+     InterpolatedString.create(path, parameters={}) for path in self.field_path
+ ]

24-25: Improve error message for empty field path

The error message could be more descriptive about why empty paths aren't allowed.

if not self.field_path:
-   raise Exception("field_path cannot be empty.")
+   raise ValueError("field_path cannot be empty. A valid path is required to identify the field to remap.")

59-60: Maybe add support for non-string map keys?

Currently, the map lookup assumes string keys. If there's a chance of non-string values in the field being remapped (like integers), would it be valuable to add type conversion, wdyt?

if field_name in current and current[field_name] in self.map:
    current[field_name] = self.map[current[field_name]]
+ elif field_name in current and str(current[field_name]) in self.map:
+    current[field_name] = self.map[str(current[field_name])]
unit_tests/sources/declarative/validators/test_dpath_validator.py (1)

77-93: Redundant assertion in wildcard test.

There's a duplicate assertion at line 92 that's using unittest style after already using pytest style at line 91. Consider removing one of these assertions for clarity, wdyt?

  assert strategy.validate_called
  assert strategy.validated_value in ["user1@example.com", "user2@example.com"]
- self.assertIn(strategy.validated_value, ["user1@example.com", "user2@example.com"])
unit_tests/sources/declarative/validators/test_predicate_validator.py (1)

1-56: Consider adding tests for edge cases.

The tests look solid for the main use cases, but perhaps consider adding tests for edge cases like None values or other special cases that might occur in real configurations, wdyt?

def test_given_none_value_when_validate_then_validation_occurs():
    strategy = MockValidationStrategy()
    validator = PredicateValidator(value=None, strategy=strategy)
    
    validator.validate()
    
    assert strategy.validate_called
    assert strategy.validated_value is None
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)

3806-3832: Consider extracting config_normalization_rules to a reusable definition?
Rather than inlining the schema under Spec, would it be clearer to define ConfigNormalizationRules under definitions and reference it here for reuse and consistency? wdyt?


4310-4339: Add default empty-list values for config_* arrays?
A lot of our YAML arrays (e.g., state_migrations) include default: [] to simplify downstream logic. Would you consider adding default: [] to config_migrations, transformations, and validations so they always resolve to a list? wdyt?

airbyte_cdk/sources/declarative/spec/spec.py (3)

71-74: Should we short-circuit when no migrations are configured?

After adopting default_factory=list, we could still save a needless copy when the list is empty.

if not self.config_migrations:
    return  # nothing to migrate

Minor, but avoids touching the file and emitting control messages when no-op, wdyt?

🧰 Tools
🪛 GitHub Actions: Linters

[error] 72-72: mypy error: Item "None" of "list[ConfigTransformation] | None" has no attribute "iter" (not iterable) [union-attr]


83-96: Mirror the migration improvements in transform_config.

With the default_factory=list change, both loops become safe, but adding an early return keeps things tidy and avoids an unnecessary dict copy when no transformations exist.

🧰 Tools
🪛 GitHub Actions: Linters

[error] 92-92: mypy error: Item "None" of "list[ConfigTransformation] | None" has no attribute "iter" (not iterable) [union-attr]


98-105: Return early (or raise) on failed validations?

Right now we iterate through validators but never surface which one failed. Would it make sense to accumulate exceptions and raise an aggregate (or raise immediately) so users know exactly what went wrong? Happy to sketch code if useful, wdyt?

🧰 Tools
🪛 GitHub Actions: Linters

[error] 104-104: mypy error: Item "None" of "list[Validator] | None" has no attribute "iter" (not iterable) [union-attr]

airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)

1972-1975: Duplicate GraphQL request-body types – intentional?

There is already a RequestBodyGraphQlQuery class (note the lowercase l) a few hundred lines above.
Adding RequestBodyGraphQL introduces two near-identical schema nodes, which might confuse the manifest authors and muddy auto-completion.

Would it be simpler to consolidate both into a single well-named class (e.g. RequestBodyGraphQL only) and deprecate the other? Happy to suggest a deprecation alias if backwards compatibility is required, wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bcfcf04 and a29e424.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (18)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (13 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1 hunks)
  • airbyte_cdk/sources/declarative/spec/spec.py (3 hunks)
  • airbyte_cdk/sources/declarative/transformations/config_transformations/__init__.py (1 hunks)
  • airbyte_cdk/sources/declarative/transformations/config_transformations/config_transformation.py (1 hunks)
  • airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (1 hunks)
  • airbyte_cdk/sources/declarative/validators/__init__.py (1 hunks)
  • airbyte_cdk/sources/declarative/validators/dpath_validator.py (1 hunks)
  • airbyte_cdk/sources/declarative/validators/predicate_validator.py (1 hunks)
  • airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (1 hunks)
  • airbyte_cdk/sources/declarative/validators/validation_strategy.py (1 hunks)
  • airbyte_cdk/sources/declarative/validators/validator.py (1 hunks)
  • pyproject.toml (1 hunks)
  • unit_tests/sources/declarative/transformations/config_transformations/test_remap_field.py (1 hunks)
  • unit_tests/sources/declarative/validators/test_dpath_validator.py (1 hunks)
  • unit_tests/sources/declarative/validators/test_predicate_validator.py (1 hunks)
  • unit_tests/sources/declarative/validators/test_validate_adheres_to_schema.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (9)
airbyte_cdk/sources/declarative/transformations/config_transformations/__init__.py (1)
airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (1)
  • RemapField (15-60)
airbyte_cdk/sources/declarative/validators/predicate_validator.py (2)
airbyte_cdk/sources/declarative/validators/validation_strategy.py (2)
  • ValidationStrategy (9-22)
  • validate (15-22)
airbyte_cdk/sources/declarative/validators/validator.py (1)
  • validate (11-18)
airbyte_cdk/sources/declarative/transformations/config_transformations/config_transformation.py (1)
airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (1)
  • transform (35-60)
airbyte_cdk/sources/declarative/validators/__init__.py (5)
airbyte_cdk/sources/declarative/validators/dpath_validator.py (1)
  • DpathValidator (16-59)
airbyte_cdk/sources/declarative/validators/predicate_validator.py (1)
  • PredicateValidator (12-26)
airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (1)
  • ValidateAdheresToSchema (15-39)
airbyte_cdk/sources/declarative/validators/validation_strategy.py (1)
  • ValidationStrategy (9-22)
airbyte_cdk/sources/declarative/validators/validator.py (1)
  • Validator (9-18)
airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (4)
airbyte_cdk/sources/declarative/validators/validation_strategy.py (2)
  • ValidationStrategy (9-22)
  • validate (15-22)
airbyte_cdk/sources/declarative/validators/dpath_validator.py (1)
  • validate (35-59)
airbyte_cdk/sources/declarative/validators/predicate_validator.py (1)
  • validate (20-26)
airbyte_cdk/sources/declarative/validators/validator.py (1)
  • validate (11-18)
airbyte_cdk/sources/declarative/validators/validation_strategy.py (4)
airbyte_cdk/sources/declarative/validators/dpath_validator.py (1)
  • validate (35-59)
airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (1)
  • validate (22-39)
airbyte_cdk/sources/declarative/validators/predicate_validator.py (1)
  • validate (20-26)
airbyte_cdk/sources/declarative/validators/validator.py (1)
  • validate (11-18)
unit_tests/sources/declarative/transformations/config_transformations/test_remap_field.py (1)
airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (2)
  • RemapField (15-60)
  • transform (35-60)
airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (2)
airbyte_cdk/sources/declarative/interpolation/interpolated_string.py (1)
  • InterpolatedString (13-79)
airbyte_cdk/sources/declarative/transformations/config_transformations/config_transformation.py (2)
  • ConfigTransformation (9-23)
  • transform (15-23)
unit_tests/sources/declarative/validators/test_predicate_validator.py (2)
airbyte_cdk/sources/declarative/validators/validation_strategy.py (1)
  • ValidationStrategy (9-22)
unit_tests/sources/declarative/validators/test_dpath_validator.py (2)
  • MockValidationStrategy (9-20)
  • validate (16-20)
🪛 GitHub Actions: Dependency Analysis
pyproject.toml

[error] 1-1: DEP002 'dagger-io' defined as a dependency but not used in the codebase


[error] 1-1: DEP002 'anyio' defined as a dependency but not used in the codebase

🪛 GitHub Actions: Linters
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py

[error] 3469-3469: mypy error: Unexpected keyword argument "transformations" for "Spec"; did you mean "config_transformations"? [call-arg]


[error] 3469-3469: mypy error: Unexpected keyword argument "validations" for "Spec"; did you mean "config_validations"? [call-arg]


[error] 3474-3474: mypy error: Item "None" of "ConfigNormalizationRules | None" has no attribute "config_migrations" [union-attr]


[error] 3474-3474: mypy error: Argument "config_migrations" to "Spec" has incompatible type "list[RemapField] | Any | None"; expected "list[ConfigTransformation] | None" [arg-type]


[error] 3475-3475: mypy error: Item "None" of "ConfigNormalizationRules | None" has no attribute "transformations" [union-attr]


[error] 3476-3476: mypy error: Item "None" of "ConfigNormalizationRules | None" has no attribute "validations" [union-attr]

airbyte_cdk/sources/declarative/spec/spec.py

[error] 72-72: mypy error: Item "None" of "list[ConfigTransformation] | None" has no attribute "iter" (not iterable) [union-attr]


[error] 76-76: mypy error: Argument 1 to "open" has incompatible type "Any | None"; expected "int | str | bytes | PathLike[str] | PathLike[bytes]" [arg-type]


[error] 80-80: mypy error: "AirbyteMessage" has no attribute "json" [attr-defined]


[error] 92-92: mypy error: Item "None" of "list[ConfigTransformation] | None" has no attribute "iter" (not iterable) [union-attr]


[error] 104-104: mypy error: Item "None" of "list[Validator] | None" has no attribute "iter" (not iterable) [union-attr]

airbyte_cdk/sources/declarative/models/declarative_component_schema.py

[error] 1-1: Ruff formatting check failed. File would be reformatted. Run 'ruff format' to fix code style issues.

⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (28)
pyproject.toml (1)

93-94:

❌ Incorrect review comment

Dependencies added but not being used?

The pipeline analysis indicates that dagger-io and anyio dependencies are declared but not used in the codebase. Are these dependencies going to be used in future PRs or are they needed for the config migrations feature that's currently being implemented? Maybe these are transitive dependencies needed by other new code?


🏁 Script executed:

#!/bin/bash
# Let's check if these packages are imported anywhere in the code but the static analysis is missing it

echo "Checking for dagger-io imports..."
rg -l "import dagger" || echo "No direct dagger imports found"
rg -l "from dagger" || echo "No direct dagger imports found"

echo -e "\nChecking for anyio imports..."
rg -l "import anyio" || echo "No direct anyio imports found"
rg -l "from anyio" || echo "No direct anyio imports found"

echo -e "\nChecking the broader context of what's being added in this PR..."
fd -e py -e yaml -c 'git diff' | head -n 100

Length of output: 793


dagger-io & anyio are utilized in the bin script
bin/generate_component_manifest_files.py imports both dagger and anyio.

These dependencies are in use there, so they’re not unused. Does this cover all usage, or are you planning further integration? wdyt?

Likely an incorrect or invalid review comment.

airbyte_cdk/sources/declarative/transformations/config_transformations/__init__.py (1)

1-7: LGTM!

Clean and concise implementation of the module's __init__.py file. It follows best practices by explicitly declaring what symbols are exported from the module using __all__.

airbyte_cdk/sources/declarative/validators/validator.py (1)

1-18: LGTM!

Well-designed abstract base class with clear documentation. The method signature and docstring provide good guidance for implementers on how validation should be performed.

airbyte_cdk/sources/declarative/validators/__init__.py (1)

1-19: Looks good to me!
The new __all__ cleanly exposes the validator abstractions as intended. Nice work!

airbyte_cdk/sources/declarative/validators/validation_strategy.py (1)

9-22: Well-designed abstract base class for validation strategies!

Good job creating a clean, concise interface for validation strategies with clear documentation and proper error handling expectations. This abstract class provides a solid foundation for the validation framework.

airbyte_cdk/sources/declarative/validators/validate_adheres_to_schema.py (1)

14-39: Solid implementation of the ValidationStrategy for JSON schema validation!

The implementation handles both string and non-string inputs gracefully, with clear error handling and good separation of concerns. Nice job adding the string-to-JSON conversion logic to make this validator more flexible.

airbyte_cdk/sources/declarative/validators/dpath_validator.py (1)

43-43: Should path evaluation consider the input context?

I notice you're evaluating the path using an empty context {}. Is this intentional, or should it use input_data for interpolation? This might be important if the path needs to be dynamically resolved based on the input.

unit_tests/sources/declarative/transformations/config_transformations/test_remap_field.py (4)

11-36: The test cases look thorough and well-structured

Good job testing both the positive case and verifying that the original config is preserved.


37-51: Great edge case coverage

This test effectively verifies that values not found in the mapping remain unchanged, which is an important edge case.


68-82: Well-designed test for interpolated path functionality

This test effectively demonstrates that fields can be remapped when paths contain dynamic components.


97-112: Good test for complex interactions

This test effectively verifies that multiple transformations can be applied sequentially without interference, which is important for real-world scenarios with multiple config transformations.

unit_tests/sources/declarative/validators/test_validate_adheres_to_schema.py (3)

11-29: Well-structured test for successful validation

The test effectively verifies that valid data passes schema validation.


30-52: Great test for required field validation

This test effectively verifies that missing required fields result in appropriate validation errors with descriptive messages.


80-90: Good edge case handling for invalid schema

Testing with an invalid schema and checking for a specific error type demonstrates robust error handling.

airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (1)

48-56: Efficient path traversal logic

The navigation through nested dictionary structures is well-implemented with appropriate checks for existence and type. This makes the transformation safely handle incomplete or unexpected configurations.

unit_tests/sources/declarative/validators/test_dpath_validator.py (5)

9-21: Clean mock implementation for ValidationStrategy.

The MockValidationStrategy implementation provides a clear way to track validation calls and simulate both success and failure cases. Good approach with tracking validation call state for verification in tests.


24-34: Test for happy path looks good.

The test covers the expected behavior when both the path and validation are valid. The assertions effectively verify that the strategy was called and received a value.


35-46: Good error handling test.

This test correctly verifies that the validator raises a ValueError with an appropriate message when the path doesn't exist, and that the strategy isn't called in this case.


47-59: Well-structured strategy failure test.

Nice job testing that validation errors from the strategy are properly propagated. The assertions verify both the error and that the strategy was called with the correct value.


60-76: Good boundary condition tests.

These tests for empty path and empty input data are valuable edge cases that ensure robust error handling in the validator.

unit_tests/sources/declarative/validators/test_predicate_validator.py (4)

9-21: Well-designed mock implementation.

The MockValidationStrategy is consistent with the one in the test_dpath_validator.py file and properly implements the ValidationStrategy interface. This consistency across test files helps maintain clarity.


24-33: Good basic validation test.

This test effectively verifies that the validator passes the value to the strategy correctly and that validation succeeds when expected.


34-46: Thorough error handling test.

The test properly validates that errors from the strategy are propagated and that the error message is preserved in the exception. The additional assertions confirm the strategy was called with the right value.


47-56: Good test for complex objects.

Testing with a nested object is important since validators should handle complex data structures. The test confirms that the entire object is passed to the strategy unchanged.

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)

4224-4257: PredicateValidator schema looks solid
The PredicateValidator definition aligns well with our validation abstractions and mirrors the Pydantic model perfectly. Nice work—no changes needed here, wdyt?


4259-4297: ValidateAdheresToSchema definition is good to go
This validator supports both string and object schemas as intended. Everything appears consistent and complete. wdyt?

airbyte_cdk/sources/declarative/spec/spec.py (1)

75-81: AirbyteMessage.json() isn’t in stubs – wrap with cast or use model_dump_json.

mypy flags json as missing. One option:

-from message in self.message_repository.consume_queue():
-    print(message.json(exclude_unset=True))
+for message in self.message_repository.consume_queue():
+    # mypy: AirbyteMessage inherits from pydantic.BaseModel, so json() exists.
+    print(message.model_dump_json(exclude_unset=True))  # type: ignore[attr-defined]

This silences mypy without runtime impact. Another approach is a cast to BaseModel.

🧰 Tools
🪛 GitHub Actions: Linters

[error] 76-76: mypy error: Argument 1 to "open" has incompatible type "Any | None"; expected "int | str | bytes | PathLike[str] | PathLike[bytes]" [arg-type]


[error] 80-80: mypy error: "AirbyteMessage" has no attribute "json" [attr-defined]

airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)

2069-2087: Ruff formatting failed – run ruff format.

The CI job reports a single formatting violation in this file. Running ruff format airbyte_cdk/sources/declarative/models/declarative_component_schema.py (or pre-commit run --all-files) will auto-fix white-space & import-ordering issues.

@pnilan pnilan changed the base branch from main to pnilan/feat/implement-validators May 12, 2025 18:15
Base automatically changed from pnilan/feat/implement-validators to main May 14, 2025 16:16
@pnilan pnilan requested review from brianjlai and maxi297 May 15, 2025 00:20
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (9)
airbyte_cdk/sources/declarative/spec/spec.py (2)

72-100: Well-implemented config migration method

The migrate_config method is well structured to:

  1. Extract config path from CLI args
  2. Apply transformations to a copy of the config
  3. Only write back changes and emit messages if the config actually changed

One suggestion: Should we include a try/except block around the file operations to handle potential I/O errors gracefully? What do you think?

-        if mutable_config != config:
-            with open(config_path, "w") as f:
-                json.dump(mutable_config, f)
-            self.message_repository.emit_message(
-                create_connector_config_control_message(mutable_config)
-            )
-            for message in self.message_repository.consume_queue():
-                print(orjson.dumps(AirbyteMessageSerializer.dump(message)).decode())
+        if mutable_config != config:
+            try:
+                with open(config_path, "w") as f:
+                    json.dump(mutable_config, f)
+                self.message_repository.emit_message(
+                    create_connector_config_control_message(mutable_config)
+                )
+                for message in self.message_repository.consume_queue():
+                    print(orjson.dumps(AirbyteMessageSerializer.dump(message)).decode())
+            except IOError as e:
+                print(f"Error writing migrated config to {config_path}: {e}")

83-85: Consider more explicit handling of missing config path

The check for if not config_path: is good, but would it be more explicit to add a log message or warning when skipping due to missing config path? This could help with debugging.

-        if not config_path:
-            return
+        if not config_path:
+            print("Skipping config migration: No config path provided in arguments")
+            return
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)

4372-4399: Nit: typo in example for ConfigAddFields
The example uses config['environemnt']; I believe it should be config['environment'] to avoid confusion. wdyt?

   ConfigAddFields:
     ...
     properties:
       condition:
         description: Fields will be added if expression is evaluated to True.
         type: string
         default: ""
         interpolation_context:
           - config
           - property
         examples:
-          - "{{ config['environemnt'] == 'sandbox' }}"
+          - "{{ config['environment'] == 'sandbox' }}"
           - "{{ property is integer }}"
           - "{{ property|length > 5 }}"
           - "{{ property == 'some_string_to_match' }}"

4401-4433: Nit: typo in example for ConfigRemoveFields
Similarly here, the example has config['environemnt']; updating to config['environment'] might prevent downstream confusion. wdyt?

   ConfigRemoveFields:
     ...
     properties:
       examples:
-          - "{{ config['environemnt'] == 'sandbox' }}"
+          - "{{ config['environment'] == 'sandbox' }}"
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (3)

1575-1593: Small typo in examples

The conditional examples contain a typo in "environemnt" which should be "environment":

-            "{{ config['environemnt'] == 'sandbox' }}",
+            "{{ config['environment'] == 'sandbox' }}",

1983-2000: Tiny typo in title field

There's a small spelling error in the title for the validation_strategy field:

-        title="Validation Stragey",
+        title="Validation Strategy",

2022-2039: Same typo as noted earlier in examples

The same "environemnt" typo appears in the examples here. Consider fixing it consistently across all occurrences.

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)

823-833: Minor: local variable shadows imported module transformations – rename for clarity?

Inside create_config_migration() the list variable is named transformations, which shadows the earlier from airbyte_cdk.sources.declarative import transformations import (even though it’s scoped locally). While harmless, renaming to something like transformation_components could avoid head-scratching when debugging. wdyt?


3584-3612: create_spec() does not propagate model.parameters – intentional?

All other factory helpers forward model.parameters or {} to the runtime component, but create_spec() hard-codes parameters={}. If the Spec model ever supports $parameters, this silently drops user input.

Would you consider passing them through for consistency?

-        return Spec(
+        return Spec(
             connection_specification=model.connection_specification,
             documentation_url=model.documentation_url,
             advanced_auth=model.advanced_auth,
-            parameters={},
+            parameters=model.parameters or {},
             config_migrations=config_migrations,
             config_transformations=config_transformations,
             config_validations=config_validations,
         )

Let me know what you think! wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between a29e424 and 12043e5.

📒 Files selected for processing (10)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (5 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (14 hunks)
  • airbyte_cdk/sources/declarative/spec/__init__.py (1 hunks)
  • airbyte_cdk/sources/declarative/spec/spec.py (3 hunks)
  • airbyte_cdk/sources/declarative/transformations/config_transformations/add_fields.py (4 hunks)
  • airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py (2 hunks)
  • airbyte_cdk/sources/declarative/validators/dpath_validator.py (1 hunks)
  • unit_tests/sources/declarative/spec/test_spec.py (3 hunks)
  • unit_tests/sources/declarative/transformations/config_transformations/test_config_add_fields.py (5 hunks)
✅ Files skipped from review due to trivial changes (3)
  • airbyte_cdk/sources/declarative/spec/init.py
  • airbyte_cdk/sources/declarative/validators/dpath_validator.py
  • unit_tests/sources/declarative/transformations/config_transformations/test_config_add_fields.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • airbyte_cdk/sources/declarative/transformations/config_transformations/remap_field.py
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (24)
airbyte_cdk/sources/declarative/transformations/config_transformations/add_fields.py (2)

12-15: Good refactoring! Using centralized module imports

Nice job moving AddedFieldDefinition and ParsedAddFieldDefinition imports from local definitions to the central module. This improves maintainability by centralizing these shared components.


78-79: Consistent parameters argument - nicely done!

You've correctly updated the ParsedAddFieldDefinition construction to include the empty parameters={} argument, making it compatible with the updated dataclass signature from the centralized module.

Also applies to: 87-88

unit_tests/sources/declarative/spec/test_spec.py (6)

164-203: Well-structured test fixture for migration mocks

Excellent job creating this detailed fixture to mock all the external dependencies for migration testing. The comprehensive setup mocks message repositories, file I/O, serialization, and printing - giving you full control over testing the migration process in isolation.


206-234: Thorough testing of unmigrated config scenario

This test effectively verifies that when encountering an unmigrated config, the system:

  1. Properly applies the migration transformations
  2. Emits a connector config control message
  3. Writes the migrated config to a file

All assertions comprehensively validate the side effects - good job!


236-263: Nice negative test case for already migrated configs

Good thinking adding this test to verify that already-migrated configs don't trigger unnecessary side effects. It ensures that the implementation avoids redundant operations when the config is already in the expected state.


266-310: Comprehensive test of transformation sequence

Excellent test case demonstrating a multi-step transformation workflow. The test shows how different transformations (add fields, remap fields, remove fields) can be composed together to achieve complex config normalization. The detailed assertions verify the exact expected output.


313-341: Good validation test for valid config

This test effectively verifies that valid config values won't raise exceptions when validated against a JSON schema. Using the DpathValidator with ValidateAdheresToSchema is a clean approach to validating nested structures.


344-373: Properly testing exception cases for invalid configs

Well done testing the negative case - confirming that invalid config values properly trigger exceptions when validated. The type mismatch (number vs string) is a good validation test scenario.

airbyte_cdk/sources/declarative/spec/spec.py (4)

28-31: Clean design of ConfigMigration dataclass

This is a nice, simple dataclass that groups related transformations with an optional description. This approach allows for organized, named groups of transformations that can be applied as a unit.


48-51: Good use of field default_factory

Nice choice using field(default_factory=list) for collections and InMemoryMessageRepository() for the message repository. This avoids the shared-state issues that would occur with mutable default values.


101-113: Clean implementation of transform_config

Good choice to operate on a copy of the config (dict(config)) to avoid modifying the input. The method is simple, focused, and returns the transformed config as expected.


115-122: Simple and effective validate_config implementation

The validation method iterates through all validators and applies them to the config. Clean and straightforward implementation.

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (5)

171-171: Looks good: updated inject_into description is clear.
The new description for how the API key is sent reads well and aligns with other request option components.


3835-3853: Great: ConfigMigration schema definition looks solid
The new ConfigMigration object has the required transformations list and sensible defaults. This aligns nicely with the model-to-component factory additions. wdyt?


4215-4243: Verified: DpathValidator enum matches component name
The enum: [DpathValidator] now correctly matches the component class, resolving the previous mismatch. Nice catch!


4246-4278: Approved: PredicateValidator definition is consistent
The required fields (type, value, validation_strategy) and interpolation context are well-defined.


4280-4332: Well done: ValidateAdheresToSchema schema is comprehensive
This definition covers both JSON and string base schemas, and provides clear examples.

airbyte_cdk/sources/declarative/models/declarative_component_schema.py (7)

1524-1549: Looks good: Added validation against JSON schema

The ValidateAdheresToSchema model provides a flexible validation approach that can work with both string-based schema references and inline dictionary definitions. This will be useful for ensuring configs adhere to predefined schemas.


1977-1981: LGTM: Good addition for GraphQL support

Clean implementation that follows the pattern of other request body models. This will be helpful for interacting with GraphQL APIs.


2002-2020: LGTM: Versatile validation approach

The PredicateValidator provides a nice complement to the path-based validator, allowing direct value validation with good type flexibility.


2096-2105: LGTM: Good grouping mechanism for transformations

The ConfigMigration model provides a nice structure for grouping related transformations and documenting their purpose.


2107-2125: LGTM: Well-structured normalization rules container

This model clearly separates migrations, transformations, and validations - making the configuration process more organized and easier to follow.


2127-2147: LGTM: Clean extension of the Spec model

Adding config_normalization_rules to the Spec model is a non-breaking change that enables all the new functionality. Nice work!


2328-2331: Good clarification on schema loader behavior

The improved description clearly explains how multiple schema loaders interact, which will help prevent confusion when users define multiple loaders.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3)

171-171: Consider highlighting mutual exclusivity of header and inject_into
The inject_into description was simplified nicely, but we no longer call out that header and inject_into cannot both be defined. Should we reintroduce a brief note (or cross-reference the header deprecation) to avoid confusion? wdyt?


4372-4399: Fix typo in example for ConfigAddFields
The examples reference config['environemnt'], which looks like a misspelling. Could we correct it to config['environment']? wdyt?

-          - "{{ config['environemnt'] == 'sandbox' }}"
+          - "{{ config['environment'] == 'sandbox' }}"

4401-4430: Fix typo in example for ConfigRemoveFields
Similarly, the ConfigRemoveFields examples include config['environemnt']. Should we update to config['environment'] for consistency? wdyt?

-          - "{{ config['environemnt'] == 'sandbox' }}"
+          - "{{ config['environment'] == 'sandbox' }}"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 12043e5 and 1d908a4.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (3)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (5 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (14 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (Fast)
🔇 Additional comments (3)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3)

3806-3834: Config normalization rules added
Great addition of the config_normalization_rules block under Spec to support declarative migrations, transformations, and validations. The schema structure and camelCase additionalProperties are consistent with the rest of the file. Looks solid!


4215-4245: DpathValidator definition aligns with new validation workflow
The new DpathValidator schema correctly requires type, field_path, and validation_strategy. The enum matches the component name, and the JSON schema keyword casing is consistent.


4333-4371: ConfigRemapField schema is correctly named and defined
The ConfigRemapField definition uses the matching enum value and describes the transformation appropriately. The interpolation context on map and field_path looks complete.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3)

3835-3853: Consider stricter validation for ConfigMigration objects.
Would it be valuable to add additionalProperties: false under the ConfigMigration definition to catch typos or unsupported keys early in migrations? wdyt?


4215-4245: Fix spelling in DpathValidator property title
The validation_strategy property title currently reads "Validation Stragey". Could we update it to "Validation Strategy" for clarity and consistency? wdyt?


4372-4400: Fix typo in condition example for ConfigAddFields.
The example uses config['environemnt'], which appears misspelled. Should it be config['environment'] instead? wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 1d908a4 and ded5fcb.

📒 Files selected for processing (2)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py
⏰ Context from checks skipped due to timeout of 90000ms (7)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (Fast)
🔇 Additional comments (4)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (4)

3806-3834: Excellent addition of config_normalization_rules to Spec!
The new structure for migrations, transformations, and validations aligns perfectly with the PR objectives and maintains consistency with the rest of the schema.


4246-4273: PredicateValidator schema looks solid.
The required fields, types, and interpolation context are well defined and consistent with the new validators.


4280-4331: ValidateAdheresToSchema definition is clear and well-structured.
It correctly accommodates both string and object variants for base_schema and supports interpolation context.


4333-4371: ConfigRemapField schema is spot on.
The enum value, map types, and field_path interpolation contexts are all consistent and correctly defined.

Copy link
Contributor

@brianjlai brianjlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a few small nits and clarifications. But non-blocking so happy to ✅

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)

4408-4439: ⚠️ Potential issue

Nested field_pointers items array needs a type: array

The field_pointers property currently declares:

field_pointers:
  type: array
  items:
    items:
      type: string

We’re missing type: array for the inner items. It should be:

 field_pointers:
-  type: array
-  items:
-    items:
-      type: string
+  type: array
+  items:
+    type: array
+    items:
+      type: string

This will ensure each pointer is an array of strings. wdyt?

🧹 Nitpick comments (3)
airbyte_cdk/sources/declarative/spec/spec.py (3)

51-51: Consider using default_factory for message_repository as well?

The message_repository is initialized with a direct instance, which means all Spec instances will share the same repository. Would it make sense to use field(default_factory=InMemoryMessageRepository) here too to avoid shared state across instances, as suggested in past reviews?

-    message_repository: MessageRepository = InMemoryMessageRepository()
+    message_repository: MessageRepository = field(default_factory=InMemoryMessageRepository)

wdyt?


72-86: Looks good, but could the error handling be a bit more explicit?

The migrate_config method looks well-structured. For line 84-85, the check for an empty config path is good, but would it be clearer to handle the None case with an explicit message as suggested in a previous review?

-        if not config_path:
+        if config_path is None:
+            # Silently skip when no config path is provided
             return

This makes the intentional skipping more explicit for future readers. Wdyt?


93-100: Consider adding a comment about the print statement.

A previous reviewer questioned the print statement on line 99. If it's intentional to both emit through the queue and print, maybe adding a brief comment would help clarify the dual purpose?

            for message in self.message_repository.consume_queue():
+                # Print messages directly to stdout for immediate feedback, in addition to queue emission
                print(orjson.dumps(AirbyteMessageSerializer.dump(message)).decode())
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between ded5fcb and 3983495.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (4 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (15 hunks)
  • airbyte_cdk/sources/declarative/spec/spec.py (3 hunks)
  • pyproject.toml (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • pyproject.toml
🚧 Files skipped from review as they are similar to previous changes (1)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
🔇 Additional comments (26)
airbyte_cdk/sources/declarative/spec/spec.py (8)

5-7: Good standard library imports added!

Nice choice to import json and dataclasses.field here for the new functionality. The new type annotations for ConfigTransformation lists look good too.


9-12: Using orjson for performance is a nice touch.

Including orjson for message serialization and adding the config observation imports makes sense for the new migration features. Good job adding the minimum necessary imports.


18-25: Imports look well-organized.

Good organization of the imports, keeping related functionality together. The message repository additions align well with the new functionality being added.


28-31: Simple and clean ConfigMigration dataclass!

This looks like a nice abstraction to group related transformations with an optional description. Makes the migration intent clear.


48-50: Good use of field(default_factory=list)!

You've applied the suggestion from the past review comment to use default_factory=list instead of None defaults. This is a solid improvement that avoids having to check for None before iteration.


87-92: Good approach for immutability!

Creating a mutable copy of the config before processing is a nice way to preserve the original until you're sure changes are needed. The check at line 92 ensuring changes actually occurred before writing is also a great optimization.


101-108: Clean implementation of transform_config.

This method is straightforward and has good documentation. It iterates through the transformations and applies them directly to the provided config.


110-117: Well-implemented validate_config method.

The validate_config implementation is concise and follows the same pattern as transform_config. Good consistency across these related operations.

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (7)

3806-3836: New config_normalization_rules property looks solid – well structured and properly optional?

The addition of config_normalization_rules to the Spec schema with defaults for each sub-property feels right, and you’ve used the correct camelCase additionalProperties. Everything aligns with our JSON Schema conventions. wdyt?


3838-3860: ConfigMigration definition sanity check – required fields are set

Nice to see you added required: [type, transformations] to enforce the presence of the type and transformations properties. This matches our best practices for schema definitions. wdyt?


4223-4250: DpathValidator schema enum & properties look correct

The DpathValidator now has enum: [DpathValidator] and the required properties (type, field_path, validation_strategy) are in place. This resolves the previous mismatch, and interpolation contexts are accurate. wdyt?


4253-4284: PredicateValidator schema looks consistent

Great job defining the PredicateValidator with the correct enum value and required fields (type, value, validation_strategy). The value type supports all JSON types as intended. wdyt?


4287-4325: ValidateAdheresToSchema schema definition is clear and complete

The ValidateAdheresToSchema object now correctly requires type and base_schema with appropriate types and interpolation contexts. Looks good for validating JSON schema adherence. wdyt?


4340-4378: ConfigRemapField enum and properties align with the pattern

Well done updating the ConfigRemapField enum to match the definition name and including the map and field_path requirements with context interpolation. This will integrate seamlessly with the component factory. wdyt?


4379-4407: ConfigAddFields schema follows the established pattern

The ConfigAddFields transformation correctly lists enum: [ConfigAddFields], enforces fields, and includes an optional condition. Interpolation contexts look appropriate. wdyt?

airbyte_cdk/sources/declarative/models/declarative_component_schema.py (11)

1526-1550: LGTM - Good implementation of schema validation component

The ValidateAdheresToSchema model provides a comprehensive framework for validating user configs against a JSON schema. The flexibility to accept either string or dict for base_schema is a nice touch that allows for runtime schema definition.


1553-1574: LGTM - Remapping utility looks good

The ConfigRemapField model provides a clean way to transform config field values using a mapping dictionary. The field path specification with nested path support is particularly useful.


1577-1594: LGTM - ConfigRemoveFields implementation is clean

The conditional field removal functionality will be very useful for handling environment-specific configurations. The examples in the documentation are helpful for understanding the feature.


1979-1981: LGTM - GraphQL request body wrapper

Simple and focused wrapper for GraphQL queries.


1984-2001: LGTM - Path-based validator

The DpathValidator implementation for validating specific config paths looks solid.


2004-2021: LGTM - Value-based validator

The PredicateValidator allows for flexible validation of configuration values directly or through interpolation.


2024-2040: LGTM - Field addition component looks good

The ConfigAddFields model provides a clean way to conditionally add fields to a configuration.


2098-2108: LGTM - Migration framework looks well-designed

The ConfigMigration model provides a comprehensive way to define ordered transformations for migrating configurations.


2110-2130: LGTM - Good organization of normalization components

The ConfigNormalizationRules model nicely aggregates the different types of configuration operations. Setting extra = Extra.forbid ensures strict schema adherence.


2133-2153: LGTM - Spec model updated appropriately

The Spec model has been properly extended to include the new config normalization rules.


2335-2335: Improved documentation clarity, nice!

The expanded description for schema_loader now clearly explains how multiple schema loaders work together and how precedence is handled.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
unit_tests/sources/declarative/spec/test_spec.py (4)

2-2: Copyright updated to 2025

Looking ahead a bit with that copyright year there 😄. Might want to update to the current year (2024) instead?

- # Copyright (c) 2025 Airbyte, Inc., all rights reserved.
+ # Copyright (c) 2024 Airbyte, Inc., all rights reserved.

164-204: Well-structured fixture for migration testing, wdyt about adding a docstring?

Nice comprehensive fixture that mocks all the necessary dependencies for testing the migration functionality. It's a clean pattern to return a dictionary of mocks.

Could we add a docstring to explain what each mock is for? This would help other developers understand the purpose of each mock more easily, wdyt?

 @pytest.fixture
 def migration_mocks(monkeypatch):
+    """
+    Creates and configures mocks for testing config migrations.
+    
+    Mocks message repositories, entrypoint, file operations, JSON serialization,
+    printing, and orjson serialization to enable isolated testing of migration logic.
+    
+    Returns:
+        dict: A dictionary containing all configured mocks.
+    """
     mock_message_repository = Mock()
     mock_message_repository.consume_queue.return_value = [Mock()]

206-234: Test for unmigrated config migration looks good!

This test thoroughly verifies that unmigrated configs trigger the appropriate migration actions. It checks message emission, file writes, and serialization calls.

One small suggestion - perhaps verify the actual content that's being migrated? It would make the test even more robust to check that the configuration was transformed as expected, wdyt?

 spec.migrate_config(["spec"], migration_mocks["source"], input_config)
 
 migration_mocks["message_repository"].emit_message.assert_called_once()
 migration_mocks["open"].assert_called_once_with("/fake/config/path", "w")
 migration_mocks["json_dump"].assert_called_once()
 migration_mocks["print"].assert_called()
 migration_mocks["serializer_dump"].assert_called()
 migration_mocks["orjson_dumps"].assert_called()
 migration_mocks["decoded_bytes"].decode.assert_called()
+# Verify the config has been transformed
+assert input_config == {"planet": "Coruscant"}

347-376: Validation failure test - consider specifying exception type

Good test for validation failure. It would be even more robust to assert the specific type of exception that should be raised and possibly check the error message to ensure it provides helpful feedback to users, wdyt?

-    with pytest.raises(Exception):
+    with pytest.raises(ValueError, match="Field 'field_to_validate' expected type 'string' but got 'integer'"):
         spec.validate_config(input_config)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 3983495 and 07c96e0.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (1)
  • unit_tests/sources/declarative/spec/test_spec.py (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Analyze (python)
🔇 Additional comments (3)
unit_tests/sources/declarative/spec/test_spec.py (3)

236-264: Test for already migrated config looks good!

Great complementary test case that verifies the negative path - when a config is already migrated, no actions should be taken. The test correctly verifies that no messages are emitted and no file operations or serializations are performed.


266-314: Comprehensive test for transformation sequence

This test nicely covers a complex sequence of transformations (adding fields, remapping values, removing fields). The only thing that seems a bit unusual is that you're initially setting both planet_name and planet_population to template values from planet_code and then remapping them separately.

Would it be clearer to directly set planet_name with a template for the name and planet_population with a template for the population? Or is this specific pattern important to test? Just curious!


316-345: Good validation success test

This test properly verifies that no exception is raised when validating a config that matches the schema. The JSON schema definition is well-structured.

@pnilan pnilan merged commit ce44cc5 into main May 20, 2025
26 of 28 checks passed
@pnilan pnilan deleted the pnilan/feat/extend-spec-class-for-config-migrations branch May 20, 2025 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants