Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Namespace Field. #2704

Merged
merged 21 commits into from
Apr 6, 2021
Merged

Add Namespace Field. #2704

merged 21 commits into from
Apr 6, 2021

Conversation

davinchia
Copy link
Contributor

@davinchia davinchia commented Apr 1, 2021

What

Add namespace field to the Airbyte Stream in preparation to propagate a source defined namespace to the Destination.

This namespace field is then consumed as the destination schema the table is written to.

This only applies to JDBC destinations.

This is Steps 1 - 4 of this tech spec.

Some minor refactoring and commenting as I go.

How

See the above.

Pre-merge Checklist

[x] Run integration test.

Recommended reading order

  1. airbyte_protocol.yaml (main change)
  2. JdbcBufferedConsumerFactory (propagate change to JDBC destinations)
  3. BigQueryDestination (propagate change to BigQuery)
  4. RedshiftCopyDestination and RedshiftCopier (propagate change to Redshift)
  5. TestDestination and various YDestination classes (understand how this change tests)
  6. spec.json (UI changes)
  7. en.json (UI changes)

@davinchia davinchia changed the title Davinchia/add namespace field Add Namespace Field. Apr 1, 2021
public class WriteConfig {

private final String streamName;
private final String outputNamespaceName;
private final String outputSchemaName;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since namespace is something that doesn't appear in the destination concept space now, I felt it was clearer to rename this field.

@davinchia
Copy link
Contributor Author

@cgardens want to take a quick sanity check? The destination side implementation is very simple - mostly struggling with testing this on the non-locally available tests. I'm pretty confident; just 2 tests cases and this should be good to go.

* SOFTWARE.
*/

package io.airbyte.integrations.standardtest.destination;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broke this out since the TestDestination class was getting quite large.

*/
public class DataArgumentsProvider implements ArgumentsProvider {

public static final CatalogMessageTestConfigPair EXCHANGE_RATE_CONFIG =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also allows us to centralise naming and references to the various test configs here instead of scattered manually typed strings.

@@ -1,335 +0,0 @@
/*
Copy link
Contributor Author

@davinchia davinchia Apr 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided that this test class here was of very little use since:

  1. they are almost an exact duplicate of the JDBC Integration tests in terms of what is being tested.
  2. they serve no real value since the JDBC integration test can be run locally without additional credentials.
  3. the main value this class brings is letting us run tests without building the docker image (the integration tests require doing so). however I feel this benefit is not worth the additional maintenance cost.

Same reasons to removing the Postgres test file.

@@ -127,20 +122,12 @@
*
Copy link
Contributor Author

@davinchia davinchia Apr 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many changes in this file since I did some refactoring here for readability.

Main changes are adding a namespace field to the retrieveRecords and retrieveNormalizedRecords functions. Adding the getDefaultSchema function to make it easy for destinations to specify the default schema.

I rearrange the test methods so it's

  1. Get
  2. Check(s)
  3. Syncs
  4. Incremental
  5. Normalisation
  6. Namespace

And pushed all the helper functions down. Sadly, can't use nested classes for better organisation here since we are using an abstract class pattern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the diff on this file is hard to parse. i'm assuming it is just 1. a rearrange, 2 adding the defaultNamespace method, and 3 adding this test: source does not specify a schema. lmk if there's another change in here i should be looking for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aye apologies. I should have broken the refactoring changes into another commit. Will be better next time.

yes you were looked at all the right bits!

Davin Chia added 2 commits April 3, 2021 01:52
…ination. Running into a strange test error. Checkpointing for now.
@davinchia
Copy link
Contributor Author

davinchia commented Apr 3, 2021

/test connector=connectors/destination-csv

🕑 connectors/destination-csv https://github.com/airbytehq/airbyte/actions/runs/714239619
✅ connectors/destination-csv https://github.com/airbytehq/airbyte/actions/runs/714239619

@davinchia
Copy link
Contributor Author

davinchia commented Apr 3, 2021

/test connector=connectors/destination-local-json

🕑 connectors/destination-local-json https://github.com/airbytehq/airbyte/actions/runs/714239742
✅ connectors/destination-local-json https://github.com/airbytehq/airbyte/actions/runs/714239742

@davinchia
Copy link
Contributor Author

davinchia commented Apr 3, 2021

/test connector=connectors/destination-meilisearch

🕑 connectors/destination-meilisearch https://github.com/airbytehq/airbyte/actions/runs/714253798
✅ connectors/destination-meilisearch https://github.com/airbytehq/airbyte/actions/runs/714253798

@davinchia davinchia marked this pull request as ready for review April 3, 2021 12:10
@davinchia davinchia requested review from ChristopheDuong and removed request for michel-tricot April 3, 2021 12:27
@davinchia
Copy link
Contributor Author

davinchia commented Apr 3, 2021

/test connector=connectors/destination-redshift

🕑 connectors/destination-redshift https://github.com/airbytehq/airbyte/actions/runs/714464398
❌ connectors/destination-redshift https://github.com/airbytehq/airbyte/actions/runs/714464398
🕑 connectors/destination-redshift https://github.com/airbytehq/airbyte/actions/runs/714464398
✅ connectors/destination-redshift https://github.com/airbytehq/airbyte/actions/runs/714464398

Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One blocking question

Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worked out my question - no longer blocking

@davinchia
Copy link
Contributor Author

davinchia commented Apr 4, 2021

@sherifnada was my tech spec clear? how did you work it out?

@@ -11,34 +11,40 @@
"properties": {
"host": {
"description": "Host Endpoint of the Redshift Cluster (must include the cluster-id, region and end with .redshift.amazonaws.com)",
"type": "string"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Result:
Screen Shot 2021-04-04 at 6 11 31 PM

@@ -21,37 +21,44 @@
"host": {
"description": "Host domain of the snowflake instance (must include the account, region, cloud environment, and end with snowflakecomputing.com).",
"examples": ["accountname.us-east-2.aws.snowflakecomputing.com"],
"type": "string"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Result:
image

@@ -76,8 +76,8 @@
"form.edit": "Edit",
"form.done": "Done",
"form.examples": "(e.g. {examples})",
"form.prefix": "Namespace Prefix",
"form.prefix.message": "Where to replicate your data to in your destination",
"form.prefix": "Table Prefix",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Result:
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@davinchia
Copy link
Contributor Author

@cgardens @sherifnada @ChristopheDuong besides publishing the new docker images, the destination side of enabling namespaces is done, so please take a look. Will follow this up with a source side PR to enable this for JDBC sources.

Copy link
Contributor

@cgardens cgardens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very happy with this. Nice work! I especially like how you added support for testing the namespace stuff to testing. It's concise but I think will give us a lot of security. I also appreciate the screenshots you put in to show how your changes will affect UX.

I think one thing that needs to happen (either in this PR or in a follow up) is adding to the docs for each of these destinations what the expect behavior is around namespaces. Maybe just a new subsection we put in the docs for each destination. e.g. the csv, meiliseach, json ones ignore it. BQ should say namespace corresponds to dataset id, etc. I just want to make sure we don't forget it because it's going to be a nuanced idea for our users to wrap their head around. Also you should think about how how we want to document the "namespace" concept in general in our docs. maybe it gets its own readme page in the architecture section alongside incremental.md, etc?

@@ -16,8 +16,8 @@
},
"dataset_id": {
"type": "string",
"description": "The BigQuery dataset id that will house replicated tables.",
"title": "Dataset ID"
"description": "Default BigQuery Dataset ID tables are replicated to if the source does not specify a schema.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this might be the wrong screenshot? i think this is the one for redshift?

@@ -76,8 +76,8 @@
"form.edit": "Edit",
"form.done": "Done",
"form.examples": "(e.g. {examples})",
"form.prefix": "Namespace Prefix",
"form.prefix.message": "Where to replicate your data to in your destination",
"form.prefix": "Table Prefix",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@@ -16,8 +16,8 @@
},
"dataset_id": {
"type": "string",
"description": "The BigQuery dataset id that will house replicated tables.",
"title": "Dataset ID"
"description": "Default BigQuery Dataset ID tables are replicated to if the source does not specify a schema.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but reading the actual json, i think it's reasonable.

@@ -127,20 +122,12 @@
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the diff on this file is hard to parse. i'm assuming it is just 1. a rearrange, 2 adding the defaultNamespace method, and 3 adding this test: source does not specify a schema. lmk if there's another change in here i should be looking for.

@sherifnada sherifnada self-requested a review April 5, 2021 22:38
@davinchia
Copy link
Contributor Author

Glad this made sense! Good call on documentation @cgardens. Agree with ya. Will release an actual version + add the necessary documentation in a follow up PR.

I was thinking we should release the new destinations + documentation after the source side changes are done (should be in the next few days) since this isn't useful as is yet. What do you think?

@davinchia davinchia merged commit 9f16651 into master Apr 6, 2021
@davinchia davinchia deleted the davinchia/add-namespace-field branch April 6, 2021 03:50
davinchia pushed a commit that referenced this pull request Apr 6, 2021
Add namespace field to the Airbyte Stream in preparation to propagate a source defined namespace to the Destination.

This namespace field is then consumed as the destination schema the table is written to.

This only applies to JDBC destinations.

This is Steps 1 - 4 of the namespace tech spec, seen at https://docs.google.com/document/d/1qFk4YqnwxE4MCGeJ9M2scGOYej6JnDy9A0zbICP_zjI/edit.

Some minor refactoring and commenting as I go.
* Remove unnecessary test classes as they match Integration tests in terms of what is being tested. They have no real value since the corresponding integration test can be run locally without additional credentials. The main value the classes brings is letting us run tests without building the docker image (the integration tests require doing so). however I feel this benefit is not worth the additional maintenance cost.
* Centralise DataArgumentProvider into it's own class for easier maintenance and usability.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants