Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bmoric/add namespace to protocol #13356

Merged
merged 7 commits into from
Jun 6, 2022
Merged

Conversation

benmoriceau
Copy link
Contributor

@benmoriceau benmoriceau commented May 31, 2022

What

update the protocol in order to add a namespace in association with the stream name.

There was a field that was missing in the change that happened to the state message in the protocol. This was about the need of a namespace field to make the stream name schema specific. This is related to this discussion

@github-actions github-actions bot added area/platform issues related to the platform area/protocol CDK Connector Development Kit labels May 31, 2022
@@ -106,6 +106,9 @@ definitions:
type: string
state:
"$ref": "#/definitions/AirbyteStateBlob"
namespace:
description: Optional Source-defined namespace. Currently only used by JDBC destinations to determine what schema to write to. Airbyte streams from the same sources should have the same namespace.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently only used by JDBC destinations to determine what schema to write to.

I'm not sure if we want to put this in the protocol, as more connectors could start using this in the future.

Airbyte streams from the same sources should have the same namespace.

Is this always true? I'm not sure if it exists in prod, but I believe we have tests where a single source has multiple namespaces

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this always true? I'm not sure if it exists in prod, but I believe we have tests where a single source has multiple namespaces

Not necessarily - we have a few tickets planned to allow streams in the CDK to define namespaces #12492. This would be used for things like bases in airtable (#12491).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this could be re-worded to address the original intent and @lmossman's suggestion:

Examples include JDBC destinations, which use this field to determine the destination schema...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like that the change is in progress and the use of it might change as well. I will just keep it at optional source defined namespace.

Copy link
Contributor

@pedroslopez pedroslopez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: looks like format needs to be run to fix the connectors-base build

Copy link
Contributor

@evantahler evantahler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the format & comment is fixed up, this looks good to me!

There may be a spot to talk about this on AirbyteStateMessage, but it's not required.

A question: When looking at the storage designed for per-stream state messages (doc), why was the choice to store both the stream_name and namespace rather than a "combined" name like name (string) = "${namespace}:#{stream_name}"? I'm a fan for clear column storage, but it may be possible to change the /meaning/ of AirbyteStreamState.name to accomplish the goal without updating the protocol - if that were a goal.

@lmossman
Copy link
Contributor

lmossman commented Jun 1, 2022

why was the choice to store both the stream_name and namespace rather than a "combined" name like name (string) = "${namespace}:#{stream_name}"?

This approach feels more prone to error to me, as it requires everything implementing the protocol to know about this special format, and to only use it in the case where there is a namespace. I think having the namespace be an explicit optional field is a clearer interface

@@ -106,6 +106,9 @@ definitions:
type: string
state:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add the following to the AirbyteStreamState object:

required:
  - name
  - state

to make it clear that those two properties are always required, and namespace is optional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@benmoriceau
Copy link
Contributor Author

This approach feels more prone to error to me, as it requires everything implementing the protocol to know about this special format, and to only use it in the case where there is a namespace. I think having the namespace be an explicit optional field is a clearer interface

yes it is more error prone. This was also an organization that got use in other part of the protocol (AirbyteCatalog). We don't want any discrepancy here

@benmoriceau benmoriceau temporarily deployed to more-secrets June 2, 2022 01:14 Inactive
@cgardens
Copy link
Contributor

cgardens commented Jun 2, 2022

why was the choice to store both the stream_name and namespace rather than a "combined" name like name (string) = "${namespace}:#{stream_name}"?

to piggyback on what lake and benoit said. because we allow a stream name to include any utf8 characters there is no separator we can use without getting into character escaping hell. separate fields avoids that, but it is admittedly more verbose.

Copy link
Contributor

@cgardens cgardens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@benmoriceau benmoriceau requested review from cgardens, pedroslopez and michel-tricot and removed request for jdpgrailsdev June 2, 2022 15:34
@pedroslopez pedroslopez temporarily deployed to more-secrets June 3, 2022 17:35 Inactive
@pedroslopez pedroslopez temporarily deployed to more-secrets June 3, 2022 18:26 Inactive
@benmoriceau benmoriceau temporarily deployed to more-secrets June 3, 2022 21:28 Inactive
@github-actions github-actions bot added area/api Related to the api area/documentation Improvements or additions to documentation area/server labels Jun 3, 2022
@benmoriceau benmoriceau temporarily deployed to more-secrets June 3, 2022 22:25 Inactive
@benmoriceau benmoriceau force-pushed the bmoric/add-namespace-to-protocol branch from f450a6f to adb912a Compare June 3, 2022 22:34
@github-actions github-actions bot removed area/documentation Improvements or additions to documentation area/api Related to the api area/server labels Jun 3, 2022
@benmoriceau benmoriceau temporarily deployed to more-secrets June 3, 2022 22:36 Inactive
@cgardens
Copy link
Contributor

cgardens commented Jun 4, 2022

@benmoriceau is there a reason we are doing name and namespace as top-level fields as opposed to a stream key object? Michel had mentioned it and I think it's a solid idea. Curious how we picked the current format.

@benmoriceau benmoriceau temporarily deployed to more-secrets June 6, 2022 22:08 Inactive
@benmoriceau benmoriceau merged commit 5a7b1aa into master Jun 6, 2022
@benmoriceau benmoriceau deleted the bmoric/add-namespace-to-protocol branch June 6, 2022 22:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/platform issues related to the platform area/protocol CDK Connector Development Kit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants