Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oracle destination implementation #3498

Merged
merged 11 commits into from
Jun 3, 2021
Merged

Oracle destination implementation #3498

merged 11 commits into from
Jun 3, 2021

Conversation

masonwheeler
Copy link
Contributor

@masonwheeler masonwheeler commented May 19, 2021

7/16 tests passing

Checking in a WIP update at @cgardens's request, to get some help with debugging.

┆Issue is synchronized with this Asana task by Unito

@michel-tricot
Copy link
Contributor

@masonwheeler can you give a more explicit title to the PR? :)

@masonwheeler masonwheeler changed the title WIP initial checkin of changes Oracle destination implementation May 19, 2021
Comment on lines 34 to 41
"schema": {
"title": "Default Schema",
"description": "The default schema tables are written to if the source does not specify a namespace. The usual value for this field is \"public\".",
"type": "string",
"examples": ["public"],
"default": "public",
"order": 3
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://dba.stackexchange.com/questions/37012/difference-between-database-vs-user-vs-schema#:~:text=In%20Oracle%2C%20users%20and%20schemas,that%20belong%20to%20that%20account.

basically you dont need schema... or could be a good thing if on the docs we suggest creating a user airbyte and give permission to CREATE in another user/schema =)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently in Oracle, a "user" is what a "schema" is in other SQL databases. Oracle documentation even talks about user names in identifiers as a "schema name" in various places. I've set this up to use users as schemas, and all the tests passed that way. We should be good just keeping the Airbyte concept of schemas and mapping them onto Users under the hood.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 questions:

  • How should we support namespaces? should we just not support them for oracle? (@marcosmarxm - would love your opinion since you're the only one here who has used oracle at all) @masonwheeler for context: when the source provides a namespace (e.g. if you're replicating from the public schema in postgres , the namespace is public), then the destination tries to put data in the same schema in the destination. so the data would end up in the public schema in the destination. So I guess what I'm asking is should we support that for Oracle or force all data to go into one schema / user? I think the current implementation is allow us to put data in multiple schemas but requires that the user have create schema access? If my current understanding is correct, then that seems reasonable to me.
  • Does it make sense to have both of these fields? I guess the idea is that the provided username has to have permission to create multiple schemas? but the schema field is the default place we will put data if no namespace is provided. the idea is that if needed the username has permission to to create that schema?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't support them for now. If Airbyte needs to create schemas in Oracle this means we need to create users(user and passwords) and for that the Airbyte user must need admin privileges to do that because it needs to give permissions and manage this to every new schema.

@masonwheeler masonwheeler marked this pull request as ready for review May 25, 2021 23:31
@auto-assign auto-assign bot requested a review from jrhizor May 25, 2021 23:31
Copy link
Contributor

@cgardens cgardens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! A bit of clean up to do but overall looks great.

One blocking question on a change you made to the test suite. After that I think most of the rest of the changes are pretty straight forward.

Remember to add a docs page in docs/. It should include what permissions someone needs to grant a user (ideally with the actual queries one runs in order to grant the user those permissions)


// Simple class to host a Destination in-memory rather than spinning up a container for it.
// For debugging and testing purposes only; not recommended to use this for real code
public class LocalAirbyteDestination implements AirbyteDestination {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be handy while developing / testing a connector. I'd like to keep it.

I don't love the idea of base-java having a dependency on airybte-workers. Would it makes ense to put this in bases/standard-destination-test? This way it is more clearly only for testing. I believe that package already depends on airbyte-workers anyway.

@@ -400,14 +405,17 @@ public void testIncrementalSync() throws Exception {
final ConfiguredAirbyteCatalog configuredCatalog = CatalogHelpers.toDefaultConfiguredCatalog(catalog);
configuredCatalog.getStreams().forEach(s -> {
s.withSyncMode(SyncMode.INCREMENTAL);
s.withDestinationSyncMode(DestinationSyncMode.APPEND);
s.withDestinationSyncMode(DestinationSyncMode.OVERWRITE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this changing? I think the point of the test is to test the append functionality? If a destination can't handle an APPEND here it will break in production. lmk if this isn't clear, happy to clarify.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first one is changing to overwrite so that it clears out any data that may have been left there by previous tests. The second one changes back to append to test append functionality correctly. This is not a problem with the destination being unable to handle appends; it's a problem with the destination using a container that takes a long time to recycle, so we're not recycling it. None of the other tests have a problem with this because they all use overwrite mode; this one needs to begin on overwrite mode as well in order to give consistent results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests should clean themselves up. The test case should not be having to worry about this. In production it is totally reasonable that on the first run there is going to be an APPEND so it is valuable for the test to emulate that behavior.

JavaBaseConstants.COLUMN_NAME_DATA.substring(1));
}

/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. The JavaBaseConstants all start with underscores, which are toxic to Oracle for whatever reason. (Underscores are just fine within identifiers, but if you try to start with one, it will blow up on you.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my question here was just referring to the commented out code. i'm asking if we need this commented out code.

@Override
public void createSchemaIfNotExists(JdbcDatabase database, String schemaName) throws Exception {
final String query = String.format(
"declare userExists integer; begin select count(*) into userExists from dba_users where upper(username) = upper('%s'); " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it safe to assume the PL/SQL will be available on all oracle dbs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, but I can rewrite this to not use it easily enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we need to be clear on this. I'm fine if we do depend on it, but if we do, we need to be clear about it in the docs. that way, at least, if it turns out it was bad to depend on it, we will get the feedback clearly.

Comment on lines 148 to 150
// todo (cgardens) - rework this abstraction so that we do not have to pass a null into the
// constru
// ctor. at least explicitly handle it, even if the impl doesn't change.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kill this comment or at least get it to format nicely?

Comment on lines 162 to 164
// how to interact with the mssql test container manually.
// 1. exec into mssql container (not the test container container)
// 2. /opt/mssql-tools/bin/sqlcmd -S localhost -U SA -P "A_Str0ng_Required_Password"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mssql instead of oracle db? would be great to do the same instructions for oracle though if you learned them.

}

@Override
protected void tearDown(TestDestinationEnv testEnv) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the tear down try to remove the tables? i guess since setup does it we don't technically need to worry about it, but it is still more predictable if tests do clean themselves up. probably just putting line 174 in a helper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no good way for the tests to clean up after themselves. Look at the functionality provided by AirbyteDestination and you'll see what I mean. On our original Destination (the Postgres one), this was handled by simply recycling the container each time, so it wasn't an issue until we started working with heavier DB containers.

Copy link
Contributor

@cgardens cgardens May 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough.

  1. first priority is to get these tests to clean up properly so that each test runs in a hermetic environment.
  2. second priority is speed.

given how slow the start up is on the container it would be nice if we didn't turn it on and off every time time, but if there's no reasonable way to query the system schema for all of the tables and truncate / delete them, then i guess that's what we have to do.

Comment on lines 175 to 177
ctx.fetch("CREATE TABLE id_and_name(id INTEGER NOT NULL, name VARCHAR(200), born TIMESTAMP WITH TIME ZONE)");
ctx.fetch(
"INSERT ALL INTO id_and_name (id, name, born) VALUES (1,'picard', TIMESTAMP '2124-03-04 01:01:01') INTO id_and_name (id, name, born) VALUES (2, 'crusher', TIMESTAMP '2124-03-04 01:01:01') INTO id_and_name (id, name, born) VALUES (3, 'vash', TIMESTAMP '2124-03-04 01:01:01') SELECT 1 FROM DUAL");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we creating this table and inserting rows? the extended test class should handle setting up all test cases. i think this might be copy pasta.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh. i guess the drop table above isn't relative to this test suite either.


@Override
public String getTmpTableName(String streamName) {
return maxStringLength(convertStreamName("airbyte_tmp_" + streamName + "_" + Instant.now().toEpochMilli()), 30);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like this reformulation of the tmp table name. it does risk that if a previous sync failed in a bad state and didn't clean up its temporary tables that there be collisions. I think either we need to decrease the chance of collision by guaranteeing some randomness in the name OR we need to make sure to wipe out the table when we start syncing. e.g. before this line drop the table if it exists.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@masonwheeler any thoughts on this comment? ^

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also i think the logic below can be simplified. maxStringLength is getting called in a couple different places which a bit confusing, and I think not necessary (see lines 46 and 62).

Copy link
Contributor

@cgardens cgardens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still a couple loose ends to tie up here.

cleanup(configuredCatalog);
}

private void cleanup(ConfiguredAirbyteCatalog catalog) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be removed?

@@ -408,6 +413,9 @@ public void testIncrementalSync() throws Exception {
final JsonNode config = getConfig();
runSync(config, firstSyncMessages, configuredCatalog);

configuredCatalog.getStreams().forEach(s -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wasn't this already done on line 408?


integrationTestJavaImplementation project(':airbyte-integrations:bases:standard-destination-test')
integrationTestJavaImplementation project(':airbyte-integrations:connectors:destination-oracle')
integrationTestJavaImplementation project(':airbyte-workers')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you still need this dep?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, AirbyteDestination is declared in that package.


@Override
public String getTmpTableName(String streamName) {
return maxStringLength(convertStreamName("airbyte_tmp_" + streamName + "_" + Instant.now().toEpochMilli()), 30);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@masonwheeler any thoughts on this comment? ^


@Override
public String getTmpTableName(String streamName) {
return maxStringLength(convertStreamName("airbyte_tmp_" + streamName + "_" + Instant.now().toEpochMilli()), 30);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also i think the logic below can be simplified. maxStringLength is getting called in a couple different places which a bit confusing, and I think not necessary (see lines 46 and 62).

.put("port", db.getFirstMappedPort())
.put("username", db.getUsername())
.put("password", db.getPassword())
.put("schema", "testSchema")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a schema in these tests but not in the spec.json? something doesn't seem right here. based on the docs I think you wanted to go with supporting name spaces (hence the create databases privileges), which I'm supportive of, but then it needs to be in the spec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I thought I left it in! 🤦‍♂️

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a bit scary that tests were all still passing despite this being the case. it would be helpful to identify how we could have caught this programmatically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's as simple as "there's nothing in the system that validates this generated config object in the test system against the destination spec." Should there be?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. i think there should be. would be worth creating creating an issue for this, i think.

Copy link
Contributor

@cgardens cgardens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the comment below on the usage LocalAirbyteDestination. i want to make sure that's clear before we move forward.


@Override
protected AirbyteDestination getDestination() {
return new LocalAirbyteDestination(new OracleDestination());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely fine to use LocalAirbyteDestination while you're developing but the integration tests that run in the CI need to use the DefaultAirbyteDestination. This is how we get the best guarantee that the docker image that we're publishing actually works.

By switching back to the default one, I think you now be able to drop the airbyte-workers dep.


@Override
public String getTmpTableName(String streamName) {
return convertStreamName("airbyte_tmp_" + streamName + "_" + UUID.randomUUID().toString().replace("`", ""));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.replace("`", "")

why are we replacing backticks? is this just a utf8 character that is not supported? are there other that oracle doesn't support that should be handled here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch! Not sure how that happened; that's supposed to be stripping out hyphens. How in the world did I typo that to something waaaaay on the other side of the keyboard? 😳

.put("port", db.getFirstMappedPort())
.put("username", db.getUsername())
.put("password", db.getPassword())
.put("schema", "testSchema")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a bit scary that tests were all still passing despite this being the case. it would be helpful to identify how we could have caught this programmatically.

Copy link
Contributor

@cgardens cgardens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wahoo! looks good. make sure to publish it before merging and get the build passing.

@masonwheeler
Copy link
Contributor Author

masonwheeler commented Jun 3, 2021

/publish connector=connectors/destination-oracle

🕑 connectors/destination-oracle https://github.com/airbytehq/airbyte/actions/runs/904397599
❌ connectors/destination-oracle https://github.com/airbytehq/airbyte/actions/runs/904397599

@masonwheeler masonwheeler merged commit 8dadd1c into master Jun 3, 2021
@masonwheeler masonwheeler deleted the issue_2962 branch June 3, 2021 22:27
@masonwheeler masonwheeler restored the issue_2962 branch June 3, 2021 22:37
@@ -41,7 +41,7 @@ public String getRawTableName(String streamName) {

@Override
public String getTmpTableName(String streamName) {
return convertStreamName("_airbyte_" + Instant.now().toEpochMilli() + "_" + streamName);
return convertStreamName("_airbyte_" + Instant.now().toEpochMilli() + "_" + getRawTableName(streamName));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey

I am curious to know if there was any particular reason for this change?

Instead of _airbyte_20120010310_streamname as TmpTableName, it is now _airbyte_20120010310__airbyte_raw_streamname which makes the temporary names even longer than before...

Copy link
Contributor

@cgardens cgardens Jun 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChristopheDuong I don't think that's true. He did a lift and shift of the original behavior as far as I can tell. Here's a screenshot of the logic he replaced in this PR in JdbcBufferedConsumerFactory.

Screen Shot 2021-06-15 at 2 47 36 PM

I agree the current naming is needlessly verbose, but that didn't happen as part of this change. The logic just moved.

The reason we pushed this logic into here is to avoid hacks like this: https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-jdbc/src/main/java/io/airbyte/integrations/destination/jdbc/JdbcBufferedConsumerFactory.java#L114-L120. Essentially by pushing the logic into the name transformer it is not configurable specifically for each database based on their specific needs (you'll see that the oracle name transformer handles shortening names).

By all means though, we can change the standard naming convention to something less silly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants