New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't convert a string to binary base64 unnecessarily #25586
Conversation
/test connector=connectors/source-mysql
Build PassedTest summary info:
|
Affected Connector ReportNOTE
|
Connector | Version | Changelog | Publish |
---|---|---|---|
source-mysql |
2.0.23 |
✅ | ❌ (diff seed version) |
source-mysql-strict-encrypt |
2.0.23 |
🔵 (ignored) |
🔵 (ignored) |
- See "Actionable Items" below for how to resolve warnings and errors.
✅ Destinations (0)
Connector | Version | Changelog | Publish |
---|
- See "Actionable Items" below for how to resolve warnings and errors.
✅ Other Modules (0)
Actionable Items
(click to expand)
Category | Status | Actionable Item |
---|---|---|
Version | ❌ mismatch |
The version of the connector is different from its normal variant. Please bump the version of the connector. |
⚠ doc not found |
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug. |
|
Changelog | ⚠ doc not found |
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug. |
❌ changelog missing |
There is no chnagelog for the current version of the connector. If you are the author of the current version, please add a changelog. | |
Publish | ⚠ not in seed |
The connector is not in the cloud or oss registry, so its publication status cannot be checked. This can be normal (e.g. some connectors are cloud-specific, and only listed in the cloud seed file). Please double-check to make sure that you have added a metadata.yaml file and the expected registries are enabled. |
putString(json, columnName, resultSet, colIndex); | ||
} | ||
} | ||
case CHAR, VARCHAR -> putString(json, columnName, resultSet, colIndex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a test case for binary fields to ensure that conversion to string logic is taking place as expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code change is putting us back to handling VARCHAR the same way under any charset or collation.
We don't have tests for variation of charset as far as I can tell, but the change itself is returning to the way this used to behave before checking for .isBinary()
was added.
checking my understanding:
this sounds like a breaking change. Do we need any coordination to release this? Should we recommend that users run a reset after upgrading? (or they can manually rewrite their raw tables + rebuild normalized tables I guess, but that sounds complicated) |
What you describe @edgao is correct - that's the change. |
presumably there are some folks who need to manually base64-decode their data right now; we should at least tell them to stop doing that decode. does our schema include anything like |
I don't think that's the case. I'm not sure what can be done other than notifying customers that may have reverse conversion on they destination. |
sorry, what does this mean?
|
The case here is db with charset encoding of utf8,
The data is encoded in plain utf8 but when sorted for example in Not to be confused with Binary encoding that has its own "binary" collation order which is unrelated to this case and will still get destination strings converted to base64 as it should. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah got it :/ that seems challenging to coordinate then, if we can't even recognize which workspaces are affected...
is our plan to just tell all source-mysql users something like "check your source DB for columns with COLLATE=utf8_bin
and run a reset if you have them?" (or maybe "update your query to attempt base64_decode, and fallback to the raw value if that decode fails"?)
approving since I have no opinion on the code change, but I think the release plan needs some clarification. (destinations has no problems with this, since the schema was and will remain type: string
)
Thanks Ed |
Airbyte technical support: Will see the following potentially breaking change: What can be done: Note: the issue won't affect anybody who is running with default settings but only customers that intentionally created a binary collated DB or tables or columns. cc: @erica-airbyte |
Thanks @rodireich can you fill out the breaking changes doc? |
Hi @rodireich, is this still scheduled to be released tomorrow, May 24th? |
/connector-performance connector=connectors/source-mysql |
/connector-performance connector=connectors/source-mysql ref=master |
* Don't convert a string to binary base64 unnecessarily * update doc * update doc
* Don't convert a string to binary base64 unnecessarily * update doc * update doc
What
Mysql column that is encoded with any character set and collated (sorted) in a binary collation logic (e.g utf8mb4_bin) is today treated as a binary value and converted to base64 on detonation side.
This is not necessary as the value itself is still a simple string.
We need to treat it as a string and create one on destination.
How
Our code is checking incorrectly the
.isBinary()
property of a column when deciding how to convert it.Remove this logic when copying to json.
🚨 User Impact 🚨
Any string value previously synced as base64 will remain so until a reset or a full refresh.
Newly synced values will appear as the original on source side ("abcd").
Resolves #9938 .