-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use parse base64Binary to parse binary related data #32610
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
@@ -6,7 +6,7 @@ plugins { | |||
airbyteJavaConnector { | |||
cdkVersionRequired = '0.4.1' | |||
features = ['db-sources'] | |||
useLocalCdk = false | |||
useLocalCdk = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember to toggle these to false and bump ckd version finally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of comments:
- Let's add a test to MySqlSourceOperationsTest.java
- This actually only affects cases where binary data is set as the cursor.
The flow is :
- From the source DB, data is read in via putBinary method, which reads it as Base 64 encoded string
- Data is saved in the load state, again as a base 64 encoded string
- Data is parsed while setting the cursor as a hex string, which this pR fixes. Thanks!
But to clarify setting the cursor field is a source connector only setting - this isn't changing what is sent to the destination in the case of a binary field
Co-authored-by: Alexandre Girard <alexandre@airbyte.io>
@akashkulk I'll add the test in the next PR where we point source connector to the new CDK version (otherwise it won't pass because it's still using old logic to parse out binary) |
@@ -222,7 +222,7 @@ protected void setString(final PreparedStatement preparedStatement, final int pa | |||
} | |||
|
|||
protected void setBinary(final PreparedStatement preparedStatement, final int parameterIndex, final String value) throws SQLException { | |||
preparedStatement.setBytes(parameterIndex, DatatypeConverter.parseHexBinary(value)); | |||
preparedStatement.setBytes(parameterIndex, DatatypeConverter.parseBase64Binary(value)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI, this affects other connectors too, since this is shared code. We should understand which connectors this affects - this would be the classes with source operations that inheirit from this and classes that allow BINARY columns as cursors (which I don't think we have more of)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should clarify, no need to publish them - more of a documentation exercise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are called from postgres and mysql, and general JdbcSourceOperations which is inherited by Teradata, Oracle, Cockroach, Db2, Snowflake, Mssql, Redshift.
Indeed we do not allow binary column as cursor, both in individual SourceOperations(Mysql, Postgres) and general JDBCSourceOperations
airbyte/airbyte-cdk/java/airbyte-cdk/core/src/main/java/io/airbyte/cdk/db/jdbc/JdbcUtils.java
Line 67 in da32fc8
public static final Set<JDBCType> ALLOWED_CURSOR_TYPES = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, good to know - then this shouldn't affect anything for now other than MySQL
@@ -63,6 +69,7 @@ public class MySqlSourceOperations extends AbstractJdbcCompatibleSourceOperation | |||
private static final Logger LOGGER = LoggerFactory.getLogger(MySqlSourceOperations.class); | |||
private static final Set<MysqlType> ALLOWED_CURSOR_TYPES = Set.of(TINYINT, TINYINT_UNSIGNED, SMALLINT, | |||
SMALLINT_UNSIGNED, MEDIUMINT, MEDIUMINT_UNSIGNED, INT, INT_UNSIGNED, BIGINT, BIGINT_UNSIGNED, | |||
TINYBLOB, BLOB, MEDIUMBLOB, LONGBLOB, BINARY, VARBINARY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed yesterday, we should not allow BINARY as a cursor type, since it's hard for the user to make sure that this value is increasing (hallmark of a good cursor)
We mainly use it for the initial load (where we checkpoint by PK for the user), so I think we can exclude it here
airbyte-integrations/connectors/source-mysql-strict-encrypt/metadata.yaml
Outdated
Show resolved
Hide resolved
@@ -11,7 +11,7 @@ data: | |||
connectorSubtype: database | |||
connectorType: source | |||
definitionId: 435bb9a5-7887-4809-aa58-28c27df0d7ad | |||
dockerImageTag: 3.1.8 | |||
dockerImageTag: 3.1.10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 3.1.10 instead of 3.1.9?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3.1.9 was used by my previous PR (The one to extend debezium schema loading time)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general - some very minor nits. Approving to unblock!! FYI - I bumped the CDK to 0.5.3....so you're going to have to use 0.5.4
/publish-java-cdk
|
Fix for
https://github.com/airbytehq/oncall/issues/3079
#31311
Rational is
we treat all binary in JDBC as String in discover according to this line:
airbyte/airbyte-cdk/java/airbyte-cdk/core/src/main/java/io/airbyte/cdk/db/jdbc/JdbcSourceOperations.java
Line 147 in da32fc8
And we are reading it assuming the value is hex: https://github.com/airbytehq/airbyte/blob/da32fc86bf6450b4d02d4cc9e5f761b80529255a[…]airbyte/cdk/db/jdbc/AbstractJdbcCompatibleSourceOperations.java
Tested on mysql connector locally. However this will take effect on all JDBC source connectors too.
Logs output. Please confirm we are going to output base64 encoded string for binary typed columns.
What
Describe what the change is solving
It helps to add screenshots if it affects the frontend.
How
Describe the solution
Recommended reading order
x.java
y.python
🚨 User Impact 🚨
Are there any breaking changes? What is the end result perceived by the user?
For connector PRs, use this section to explain which type of semantic versioning bump occurs as a result of the changes. Refer to our Semantic Versioning for Connectors guidelines for more information. Breaking changes to connectors must be documented by an Airbyte engineer (PR author, or reviewer for community PRs) by using the Breaking Change Release Playbook.
If there are breaking changes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.
Pre-merge Actions
Expand the relevant checklist and delete the others.
New Connector
Community member or Airbyter
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.0.0.1
Dockerfile
has version0.0.1
README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog with an entry for the initial version. See changelog exampledocs/integrations/README.md
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
Updating a connector
Community member or Airbyter
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
Connector Generator
-scaffold
in their name) have been updated with the latest scaffold by running./gradlew :airbyte-integrations:connector-templates:generator:generateScaffolds
then checking in your changesUpdating the Python CDK
Airbyter
Before merging:
--use-local-cdk --name=source-<connector>
as optionsairbyte-ci connectors --use-local-cdk --name=source-<connector> test
After merging: