Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDK: Add schema normalization to declarative stream #32786

Merged
merged 22 commits into from
Dec 21, 2023

Conversation

keu
Copy link
Contributor

@keu keu commented Nov 24, 2023

What

solving #30737

The implementation follows this proposal here

@dataclass
class RecordSelector(HttpSelector):
    ...
    schema_normalization: TypeTransformer
    ...

It applies schema_normalization at the very end - after we extracted records and transformed them.

How

Describe the solution

Recommended reading order

  1. x.java
  2. y.python

🚨 User Impact 🚨

Are there any breaking changes? What is the end result perceived by the user?

For connector PRs, use this section to explain which type of semantic versioning bump occurs as a result of the changes. Refer to our Semantic Versioning for Connectors guidelines for more information. Breaking changes to connectors must be documented by an Airbyte engineer (PR author, or reviewer for community PRs) by using the Breaking Change Release Playbook.

If there are breaking changes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.

Pre-merge Actions

Expand the relevant checklist and delete the others.

New Connector

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Connector version is set to 0.0.1
    • Dockerfile has version 0.0.1
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • docs/integrations/<source or destination>/<name>.md including changelog with an entry for the initial version. See changelog example
    • docs/integrations/README.md

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Unit & integration tests added

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
Connector Generator
  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • If adding a new generator, add it to the list of scaffold modules being tested
  • The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:generateScaffolds then checking in your changes
  • Documentation which references the generator is updated as needed
Updating the Python CDK

Airbyter

Before merging:

  • Pull Request description explains what problem it is solving
  • Code change is unit tested
  • Build and my-py check pass
  • Smoke test the change on at least one affected connector
    • On Github: Run this workflow, passing --use-local-cdk --name=source-<connector> as options
    • Locally: airbyte-ci connectors --use-local-cdk --name=source-<connector> test
  • PR is reviewed and approved

After merging:

  • Publish the CDK
    • The CDK does not follow proper semantic versioning. Choose minor if this the change has significant user impact or is a breaking change. Choose patch otherwise.
    • Write a thoughtful changelog message so we know what was updated.
  • Merge the platform PR that was auto-created for updating the Connector Builder's CDK version
    • This step is optional if the change does not affect the connector builder or declarative connectors.

@keu keu requested a review from a team as a code owner November 24, 2023 06:01
Copy link

vercel bot commented Nov 24, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Dec 21, 2023 1:39am

@keu keu self-assigned this Nov 24, 2023
@CLAassistant
Copy link

CLAassistant commented Nov 24, 2023

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
3 out of 4 committers have signed the CLA.

✅ keu
✅ girarda
✅ yevhenii-ldv
❌ eugene-kulak
You have signed the CLA already but the status is still pending? Let us recheck it.

@keu keu marked this pull request as draft November 24, 2023 06:01
@octavia-squidington-iii octavia-squidington-iii added the CDK Connector Development Kit label Nov 24, 2023
@keu keu force-pushed the keu/declarative-cdk/add-schema-normalization branch from 686fdfd to 5da985f Compare November 29, 2023 13:28
@keu keu marked this pull request as ready for review November 29, 2023 13:28
@keu keu force-pushed the keu/declarative-cdk/add-schema-normalization branch from 05eb109 to 7b49fd1 Compare November 29, 2023 13:38
Copy link
Contributor

@girarda girarda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed this works as expected.

example record without casting ID to a string:
{"type": "RECORD", "record": {"stream": "email_templates", "data": {"id": 4071586003,

example with the normalization enabled and the schema defining id as a string:
{"type": "RECORD", "record": {"stream": "email_templates", "data": {"id": "4071585003",

description: Responsible for normalization according to the schema.
type: string
enum:
- no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's call this NO_TRANSFORM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is the option that user will see in the UI, right? wouldn't "No Trasform" will look better in that case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question. @lmossman does the FE rename enums or show them as they are in the schema?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Connector Builder currently just shows enum values as they are, e.g. for HTTP Method we show GET and POST in the UI.

But it also would be pretty low effort to rename the values when adding them to the builder UI

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks Lake! @keu let's go with something user-friendly then.

Thinking more about the name, I sort of dislike normalization because the name clashes with another Airbyte concept.

How about

  • None
  • Default

@keu keu force-pushed the keu/declarative-cdk/add-schema-normalization branch from 50a60b5 to 3c6b935 Compare November 30, 2023 10:21
@keu keu requested a review from girarda December 1, 2023 11:24
@keu keu linked an issue Dec 18, 2023 that may be closed by this pull request
@keu keu changed the base branch from master to 09-14-add_connectorBuildOptions_to_connector_metadata December 20, 2023 00:10
@keu keu requested review from a team as code owners December 20, 2023 00:10
@keu keu changed the base branch from 09-14-add_connectorBuildOptions_to_connector_metadata to master December 20, 2023 00:10
@@ -707,17 +709,15 @@ def create_http_requester(self, model: HttpRequesterModel, config: Config, *, na
parameters=model.parameters or {},
)

model_http_method = (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@girarda removed this after adding --set-default-enum-member flag

Copy link
Contributor

Warning

Soft code freeze is in effect until 2024-01-02. Please avoid merging to master. #freedom-and-responsibility

Copy link
Contributor

@girarda girarda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great thanks @keu !

@keu keu merged commit 4061f08 into master Dec 21, 2023
21 of 22 checks passed
@keu keu deleted the keu/declarative-cdk/add-schema-normalization branch December 21, 2023 12:51
jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 26, 2024
Co-authored-by: Eugene Kulak <kulak.eugene@gmail.com>
Co-authored-by: Yevhenii Kurochkin <ykurochkin@flyaps.com>
Co-authored-by: Alexandre Girard <alexandre@airbyte.io>
jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 26, 2024
Co-authored-by: Eugene Kulak <kulak.eugene@gmail.com>
Co-authored-by: Yevhenii Kurochkin <ykurochkin@flyaps.com>
Co-authored-by: Alexandre Girard <alexandre@airbyte.io>
jatinyadav-cc pushed a commit to ollionorg/datapipes-airbyte that referenced this pull request Feb 26, 2024
Co-authored-by: Eugene Kulak <kulak.eugene@gmail.com>
Co-authored-by: Yevhenii Kurochkin <ykurochkin@flyaps.com>
Co-authored-by: Alexandre Girard <alexandre@airbyte.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CDK Connector Development Kit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

low-code cdk: add support for DefaultSchemaNormalization
7 participants