Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAT: add validation for stream statuses #34675

Merged
merged 24 commits into from
Feb 8, 2024
Merged

CAT: add validation for stream statuses #34675

merged 24 commits into from
Feb 8, 2024

Conversation

artem1205
Copy link
Collaborator

What

Resolve #33565

How

add validation for stream statuses

Recommended reading order

  1. airbyte-integrations/bases/connector-acceptance-test/connector_acceptance_test/tests/test_core.py

🚨 User Impact 🚨

no breaking changes

Pre-merge Actions

Updating the Python CDK

Airbyter

Before merging:

  • Pull Request description explains what problem it is solving
  • Code change is unit tested
  • Build and my-py check pass
  • Smoke test the change on at least one affected connector
    • On Github: Run this workflow, passing --use-local-cdk --name=source-<connector> as options
    • Locally: airbyte-ci connectors --use-local-cdk --name=source-<connector> test
  • PR is reviewed and approved

After merging:

  • Publish the CDK
    • The CDK does not follow proper semantic versioning. Choose minor if this the change has significant user impact or is a breaking change. Choose patch otherwise.
    • Write a thoughtful changelog message so we know what was updated.
  • Merge the platform PR that was auto-created for updating the Connector Builder's CDK version
    • This step is optional if the change does not affect the connector builder or declarative connectors.

Copy link

vercel bot commented Jan 30, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Feb 8, 2024 9:00pm

@artem1205 artem1205 self-assigned this Jan 30, 2024
@artem1205
Copy link
Collaborator Author

artem1205 commented Jan 31, 2024

Source Google sheets FAILED as was predicted in #33565 (comment) :

airbyte-ci connectors --name=source-google-sheets test --only-step="acceptance" --acceptance.-k=test_read
image

Source Klaviyo: SUCCEEDED

image

@lazebnyi lazebnyi requested review from maxi297 and removed request for a team February 2, 2024 13:04
# Conflicts:
#	airbyte-integrations/bases/connector-acceptance-test/CHANGELOG.md
Copy link
Contributor

@natikgadzhi natikgadzhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. @maxi297 does this unblock your work for Stripe? Should we merge it today then?

@@ -136,7 +131,7 @@ def test_verify_records_schema(configured_catalog: ConfiguredAirbyteCatalog):
)
def test_verify_records_schema_with_fail_on_extra_columns(configured_catalog: ConfiguredAirbyteCatalog, json_schema, record, should_fail):
"""Test that fail_on_extra_columns works correctly with nested objects, array of objects"""
configured_catalog.streams[0].stream.json_schema =json_schema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, what formatter do you personally use, and what formatter does most of the team have?

@alafanechere works on setting up requirements on new PRs including type checks with mypy, and I'm considering a formatter as well.

Copy link
Collaborator Author

@artem1205 artem1205 Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what formatter do you personally use?

Mainly airbyte-ci format fix all (but it is needed to remove unit_test folder from excluded list in pyproject.toml

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@natikgadzhi we already have a repo wide formatter (airbyte-ci format fix all/python/js/java) which runs in CI.
We can double the same rules at the connector level if we want to localize the formatting.

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to understand the deploy process for this change

@@ -188,6 +188,7 @@ class BasicReadTestConfig(BaseConfig):
)
expect_records: Optional[ExpectedRecordsConfig] = Field(description="Expected records from the read")
validate_schema: bool = Field(True, description="Ensure that records match the schema of the corresponding stream")
validate_stream_statuses: bool = Field(True, description="Ensure that all streams emit status messages")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that it is a breaking change. What is the plan for updating the sources so that CATs pass after this is merged?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We expect that all certified connectors will pass this test.

What is the plan for updating the sources so that CATs pass after this is merged?

Our connector-health engineer will take care of it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be more explicit about this? I want to understand is CATs will be red for a while. We have a goal for Q1 which is to have CATs be green for certified connectors. This includes having a process which ensure that connectors are passing CATs

@maxi297
Copy link
Contributor

maxi297 commented Feb 5, 2024

To answer @natikgadzhi on #34675 (review): No it does not as the change that affected my PR is not related to this. I've tried to remove the validation but not CATs are failing for another reason. I'll work on unblocking some people on other issues and I'll come back to that

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please address those concerns?

@@ -188,6 +188,7 @@ class BasicReadTestConfig(BaseConfig):
)
expect_records: Optional[ExpectedRecordsConfig] = Field(description="Expected records from the read")
validate_schema: bool = Field(True, description="Ensure that records match the schema of the corresponding stream")
validate_stream_statuses: bool = Field(True, description="Ensure that all streams emit status messages")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be more explicit about this? I want to understand is CATs will be red for a while. We have a goal for Q1 which is to have CATs be green for certified connectors. This includes having a process which ensure that connectors are passing CATs

if not inputs.validate_stream_statuses and test_strictness_level is Config.TestStrictnessLevel.high:
pytest.fail("High strictness level error: validate_stream_statuses must be set to true in the basic read test configuration.")
else:
return inputs.validate_stream_statuses
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a grooming note that mentions that we should not apply this by default for community connectors. Right now, the default value for validate_stream_statuses is True. Does isn't that contradictory?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, this test will not run if ql < 400

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new change, does that mean that if we want to enable this for ql < 400, we just can't? It feels like we should be able to test this even if the connector is ql < 400

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case, do we need to set it to False as default, and force to True for certified (ql > 400) connectors?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we set it to "None" so that we have three states:

  • Not explicitly defined which means ql > 400 are enabled but not the rest
  • False would fail for ql > 400 but skip for non-certified
  • True would work for everyone

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!
fixed.

@artem1205 artem1205 requested a review from a team as a code owner February 7, 2024 18:31
@artem1205
Copy link
Collaborator Author

@maxi297 ,

I want to understand is CATs will be red for a while. We have a goal for Q1 which is to have CATs be green for certified connectors. This includes having a process which ensure that connectors are passing CATs

I think all certified connectors should be good with this test, otherwise we had to check them and fix or skip this test.

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last comment and I think I'm good

if not inputs.validate_stream_statuses and test_strictness_level is Config.TestStrictnessLevel.high:
pytest.fail("High strictness level error: validate_stream_statuses must be set to true in the basic read test configuration.")
else:
return inputs.validate_stream_statuses
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new change, does that mean that if we want to enable this for ql < 400, we just can't? It feels like we should be able to test this even if the connector is ql < 400

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

@artem1205 artem1205 merged commit a96b7f4 into master Feb 8, 2024
20 checks passed
@artem1205 artem1205 deleted the artem1205/CAT-33565 branch February 8, 2024 21:13
jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 21, 2024
jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 26, 2024
jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 26, 2024
jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add CAT scenario validating that a sync emits at least one stream status for a successful sync
5 participants