Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destination Oracle: change table limit and format record data #5542

Merged
merged 5 commits into from
Aug 24, 2021

Conversation

marcosmarxm
Copy link
Member

@marcosmarxm marcosmarxm commented Aug 20, 2021

What

Closes #5498

Today we support Oracle minimum version 11 source and destination.
Unfortunately Oracle 11 doesn't have the json_value function or support json objects. This makes normalization really hard to be implemented.

Suggestion: Removes support for Oracle 11g in destination. The EOL for Oracle 11G was dec-2020.
What this causes to our tests? The unique public container exists is Dockerhub is Oracle 11g (dont know how...). Oracle allows you download Oracle 18 or 19 version from their registry, which could be a way to run integration tests. The problem is this container took ~45 min to spin up which can be really complicated. The viable solution is have a cloud instance similar to Redshift to help us with integration tests.

This modification will support Oracle 12g+ (this is the first version supports table name with 128 chars and json objects). The suggestions to use Oracle 19 in tests is because this version is available in Oracle Registry/AWS RDS.

How

Oracle 19 json_value doesnt allow ' or " in keys so it we need to clean some records the key name using a formatted function. Example: column`_'with\"_quotes becomes column___with\__quotes.

Recommended reading order

  1. x.java
  2. y.python

Checklist

  • create AWS RDS Instance
  • add secrets
  • run tests
  • update docs

@github-actions github-actions bot added the area/connectors Connector related issues label Aug 20, 2021
@marcosmarxm marcosmarxm mentioned this pull request Aug 22, 2021
9 tasks
@@ -150,8 +152,9 @@ private static void insertRawRecordsInSingleQuery(String tableName,
int i = 1;
for (final AirbyteRecordMessage message : records) {
// 1-indexed
final JsonNode formattedData = StandardNameTransformer.formatJsonPath(message.getData());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is modifying the data itself. Is there any way around this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed with sherif. the name transformer was intended to transform "names" (e.g. table names and column names). in general we want to avoid modifying the data if at all possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm doing the same as in BigQuery destination (this change the json path to remove any non-supported char in the stream name not change the original data):

final JsonNode formattedData = StandardNameTransformer.formatJsonPath(recordMessage.getData());
return Jsons.jsonNode(ImmutableMap.of(
JavaBaseConstants.COLUMN_NAME_AB_ID, UUID.randomUUID().toString(),
JavaBaseConstants.COLUMN_NAME_DATA, Jsons.serialize(formattedData),
JavaBaseConstants.COLUMN_NAME_EMITTED_AT, formattedEmittedAt));
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the comment in PR, sorry the confuction: Oracle 19 json_value doesnt allow ' or " in keys so it we need to clean some records the key name using a formatted function. Example: column`_'with\"_quotes becomes column___with\__quotes.

Copy link
Member Author

@marcosmarxm marcosmarxm Aug 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example of AIRBYTE_DATA saved using this function. The original data is kept but to run normalization I need a "cleaned" field name.

'{"date":"2020-08-29T00:00:00Z",
"partition":{"DATA":[{"currency":"EUR"}],
"double_array_data":[[{"id":"EUR"}]],
"column___with__quotes":[{"currency":"EUR"}], --formatted
"column`_\'with\\"_quotes":[{"currency":"EUR"}]}, --original 
"id":4.2}'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough. this feels like a weird sacrifice we are making for dbt.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something needed for the destination itself because the source is writing a json key not supported.

Copy link
Contributor

@cgardens cgardens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removes support for Oracle 11g in destination.

👍 Fine with me. Is there some way that we can check the version of a database at runtime? e.g. during check connection. If so, can we throw a useful error message if they are using a version of the db that is too old.

What this causes to our tests? The unique public container exists is Dockerhub is Oracle 11g (dont know how...). Oracle allows you download Oracle 18 or 19 version from their registry, which could be a way to run integration tests. The problem is this container took ~45 min to spin up which can be really complicated. The viable solution is have a cloud instance similar to Redshift to help us with integration tests.

I don't understand what you're saying here. Is is possible to user a container with newer versions of oracle that is faster to start up? or are you saying even with newer versions it is really slow? I am fine with use using newer versions of oracle db in the tests if it allows us to user a docker container that is faster to spin up.

@@ -150,8 +152,9 @@ private static void insertRawRecordsInSingleQuery(String tableName,
int i = 1;
for (final AirbyteRecordMessage message : records) {
// 1-indexed
final JsonNode formattedData = StandardNameTransformer.formatJsonPath(message.getData());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed with sherif. the name transformer was intended to transform "names" (e.g. table names and column names). in general we want to avoid modifying the data if at all possible.

@marcosmarxm
Copy link
Member Author

I don't understand what you're saying here. Is is possible to user a container with newer versions of oracle that is faster to start up? or are you saying even with newer versions it is really slow? I am fine with use using newer versions of oracle db in the tests if it allows us to user a docker container that is faster to spin up.

@cgardens sorry. The container we use today (Oracle 11g) is pretty fast to spin up. The problem is with Oracle 18/19 that takes 45 min to setup (the newest version...)

@cgardens
Copy link
Contributor

The container we use today (Oracle 11g) is pretty fast to spin up. The problem is with Oracle 18/19 that takes 45 min to setup (the newest version...)

That is awful! 😵 Yeah. I guess if that's the case we need to just have an oracle db that we keep running in our integration test environment. @sherifnada is that the right way to do it or is there something better?

@sherifnada
Copy link
Contributor

@cgardens yup we should keep one running encoded via terraform

@github-actions github-actions bot added area/documentation Improvements or additions to documentation CDK Connector Development Kit labels Aug 23, 2021
@github-actions github-actions bot removed the CDK Connector Development Kit label Aug 23, 2021
@marcosmarxm
Copy link
Member Author

marcosmarxm commented Aug 23, 2021

/test connector=connectors/destination-oracle

🕑 connectors/destination-oracle https://github.com/airbytehq/airbyte/actions/runs/1159962596
✅ connectors/destination-oracle https://github.com/airbytehq/airbyte/actions/runs/1159962596

Comment on lines -178 to -181
database.query(ctx -> {
ctx.execute("alter database default tablespace users");
return null;
});
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this because requires a very strong privilege to execute it. I'm running the AWS instance where I can change the permission of my admin user.

@jrhizor jrhizor temporarily deployed to more-secrets August 23, 2021 19:30 Inactive
@marcosmarxm
Copy link
Member Author

@cgardens yup we should keep one running encoded via terraform

I created #5581 to solve this, I already created an instance in AWS to run tests. Can we follow this PR using that instance?

@cgardens
Copy link
Contributor

yeah. i think that's okay.

@marcosmarxm
Copy link
Member Author

marcosmarxm commented Aug 24, 2021

/publish connector=connectors/destination-oracle

🕑 connectors/destination-oracle https://github.com/airbytehq/airbyte/actions/runs/1160984054
✅ connectors/destination-oracle https://github.com/airbytehq/airbyte/actions/runs/1160984054

@jrhizor jrhizor temporarily deployed to more-secrets August 24, 2021 02:29 Inactive
@marcosmarxm marcosmarxm merged commit 29d65e9 into master Aug 24, 2021
@marcosmarxm marcosmarxm deleted the marcosmarxm/oracle-dest branch August 24, 2021 02:39
avida pushed a commit that referenced this pull request Aug 25, 2021
* change oracle dest connector

* remove container to run tests add depency to secrets/config.json

* correct functions in tests

* gradlew format
@marcosmarxm marcosmarxm restored the marcosmarxm/oracle-dest branch August 31, 2021 00:00
@swyxio swyxio deleted the marcosmarxm/oracle-dest branch October 12, 2022 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Destination Oracle: json_value doesnt handle special char
4 participants