-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Namespace Field. #2704
Add Namespace Field. #2704
Conversation
public class WriteConfig { | ||
|
||
private final String streamName; | ||
private final String outputNamespaceName; | ||
private final String outputSchemaName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since namespace is something that doesn't appear in the destination concept space now, I felt it was clearer to rename this field.
@cgardens want to take a quick sanity check? The destination side implementation is very simple - mostly struggling with testing this on the non-locally available tests. I'm pretty confident; just 2 tests cases and this should be good to go. |
…orking for postgres and jdbc destination.
* SOFTWARE. | ||
*/ | ||
|
||
package io.airbyte.integrations.standardtest.destination; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broke this out since the TestDestination
class was getting quite large.
*/ | ||
public class DataArgumentsProvider implements ArgumentsProvider { | ||
|
||
public static final CatalogMessageTestConfigPair EXCHANGE_RATE_CONFIG = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also allows us to centralise naming and references to the various test configs here instead of scattered manually typed strings.
@@ -1,335 +0,0 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided that this test class here was of very little use since:
- they are almost an exact duplicate of the JDBC Integration tests in terms of what is being tested.
- they serve no real value since the JDBC integration test can be run locally without additional credentials.
- the main value this class brings is letting us run tests without building the docker image (the integration tests require doing so). however I feel this benefit is not worth the additional maintenance cost.
Same reasons to removing the Postgres test file.
@@ -127,20 +122,12 @@ | |||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are many changes in this file since I did some refactoring here for readability.
Main changes are adding a namespace
field to the retrieveRecords
and retrieveNormalizedRecords
functions. Adding the getDefaultSchema
function to make it easy for destinations to specify the default schema.
I rearrange the test methods so it's
- Get
- Check(s)
- Syncs
- Incremental
- Normalisation
- Namespace
And pushed all the helper functions down. Sadly, can't use nested classes for better organisation here since we are using an abstract class pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the diff on this file is hard to parse. i'm assuming it is just 1. a rearrange, 2 adding the defaultNamespace method, and 3 adding this test: source does not specify a schema. lmk if there's another change in here i should be looking for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aye apologies. I should have broken the refactoring changes into another commit. Will be better next time.
yes you were looked at all the right bits!
…ination. Running into a strange test error. Checkpointing for now.
/test connector=connectors/destination-csv
|
/test connector=connectors/destination-local-json
|
/test connector=connectors/destination-meilisearch
|
/test connector=connectors/destination-redshift
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One blocking question
airbyte-protocol/models/src/main/resources/airbyte_protocol/airbyte_protocol.yaml
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worked out my question - no longer blocking
@sherifnada was my tech spec clear? how did you work it out? |
airbyte-integrations/connectors/destination-postgres/src/main/resources/spec.json
Show resolved
Hide resolved
@@ -11,34 +11,40 @@ | |||
"properties": { | |||
"host": { | |||
"description": "Host Endpoint of the Redshift Cluster (must include the cluster-id, region and end with .redshift.amazonaws.com)", | |||
"type": "string" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -21,37 +21,44 @@ | |||
"host": { | |||
"description": "Host domain of the snowflake instance (must include the account, region, cloud environment, and end with snowflakecomputing.com).", | |||
"examples": ["accountname.us-east-2.aws.snowflakecomputing.com"], | |||
"type": "string" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -76,8 +76,8 @@ | |||
"form.edit": "Edit", | |||
"form.done": "Done", | |||
"form.examples": "(e.g. {examples})", | |||
"form.prefix": "Namespace Prefix", | |||
"form.prefix.message": "Where to replicate your data to in your destination", | |||
"form.prefix": "Table Prefix", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
@cgardens @sherifnada @ChristopheDuong besides publishing the new docker images, the destination side of enabling namespaces is done, so please take a look. Will follow this up with a source side PR to enable this for JDBC sources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very happy with this. Nice work! I especially like how you added support for testing the namespace stuff to testing. It's concise but I think will give us a lot of security. I also appreciate the screenshots you put in to show how your changes will affect UX.
I think one thing that needs to happen (either in this PR or in a follow up) is adding to the docs for each of these destinations what the expect behavior is around namespaces. Maybe just a new subsection we put in the docs for each destination. e.g. the csv, meiliseach, json ones ignore it. BQ should say namespace corresponds to dataset id, etc. I just want to make sure we don't forget it because it's going to be a nuanced idea for our users to wrap their head around. Also you should think about how how we want to document the "namespace" concept in general in our docs. maybe it gets its own readme page in the architecture section alongside incremental.md, etc?
@@ -16,8 +16,8 @@ | |||
}, | |||
"dataset_id": { | |||
"type": "string", | |||
"description": "The BigQuery dataset id that will house replicated tables.", | |||
"title": "Dataset ID" | |||
"description": "Default BigQuery Dataset ID tables are replicated to if the source does not specify a schema.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this might be the wrong screenshot? i think this is the one for redshift?
airbyte-integrations/connectors/destination-postgres/src/main/resources/spec.json
Show resolved
Hide resolved
...tegration/java/io/airbyte/integrations/destination/redshift/RedshiftCopyIntegrationTest.java
Outdated
Show resolved
Hide resolved
@@ -76,8 +76,8 @@ | |||
"form.edit": "Edit", | |||
"form.done": "Done", | |||
"form.examples": "(e.g. {examples})", | |||
"form.prefix": "Namespace Prefix", | |||
"form.prefix.message": "Where to replicate your data to in your destination", | |||
"form.prefix": "Table Prefix", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
@@ -16,8 +16,8 @@ | |||
}, | |||
"dataset_id": { | |||
"type": "string", | |||
"description": "The BigQuery dataset id that will house replicated tables.", | |||
"title": "Dataset ID" | |||
"description": "Default BigQuery Dataset ID tables are replicated to if the source does not specify a schema.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but reading the actual json, i think it's reasonable.
@@ -127,20 +122,12 @@ | |||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the diff on this file is hard to parse. i'm assuming it is just 1. a rearrange, 2 adding the defaultNamespace method, and 3 adding this test: source does not specify a schema. lmk if there's another change in here i should be looking for.
Glad this made sense! Good call on documentation @cgardens. Agree with ya. Will release an actual version + add the necessary documentation in a follow up PR. I was thinking we should release the new destinations + documentation after the source side changes are done (should be in the next few days) since this isn't useful as is yet. What do you think? |
Add namespace field to the Airbyte Stream in preparation to propagate a source defined namespace to the Destination. This namespace field is then consumed as the destination schema the table is written to. This only applies to JDBC destinations. This is Steps 1 - 4 of the namespace tech spec, seen at https://docs.google.com/document/d/1qFk4YqnwxE4MCGeJ9M2scGOYej6JnDy9A0zbICP_zjI/edit. Some minor refactoring and commenting as I go. * Remove unnecessary test classes as they match Integration tests in terms of what is being tested. They have no real value since the corresponding integration test can be run locally without additional credentials. The main value the classes brings is letting us run tests without building the docker image (the integration tests require doing so). however I feel this benefit is not worth the additional maintenance cost. * Centralise DataArgumentProvider into it's own class for easier maintenance and usability.
What
Add namespace field to the Airbyte Stream in preparation to propagate a source defined namespace to the Destination.
This namespace field is then consumed as the destination schema the table is written to.
This only applies to JDBC destinations.
This is Steps 1 - 4 of this tech spec.
Some minor refactoring and commenting as I go.
How
See the above.
Pre-merge Checklist
[x] Run integration test.
Recommended reading order
airbyte_protocol.yaml
(main change)JdbcBufferedConsumerFactory
(propagate change to JDBC destinations)BigQueryDestination
(propagate change to BigQuery)RedshiftCopyDestination
andRedshiftCopier
(propagate change to Redshift)TestDestination
and variousYDestination
classes (understand how this change tests)spec.json
(UI changes)en.json
(UI changes)