Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source MySQL: Connector provides wrong values for some datatypes #3932

Closed
DoNotPanicUA opened this issue Jun 7, 2021 · 4 comments · Fixed by #6093
Closed

Source MySQL: Connector provides wrong values for some datatypes #3932

DoNotPanicUA opened this issue Jun 7, 2021 · 4 comments · Fixed by #6093

Comments

@DoNotPanicUA
Copy link
Contributor

DoNotPanicUA commented Jun 7, 2021

Found by comprehensive tests #3562
Requires #5704

Expected Behavior

A connector should transform source data into an Airbyte datatype without value/meaning losses.

Current Behavior

General source implementation:

  • bit. Returns values true, false instead of 1, 0
  • varbinary, blob. Binary text type transforms into binary value instead of text.
  • mediumtext, longtext. Long text replaces by empty value. For example, text with length 16777214 will be empty after transformation.
  • bool/boolean. MySQL transformation logic from number to boolean treats only 1 as true. All other values are false. The connector treats all values >0 as true.

CDC source implementation:

  • bit. Returns values true, false instead of 1, 0
  • varchar. Values with different carchecter sets become unreadable after transformation. For example, тест -> тест or utf16 symbol -> �
  • varbinary, blob. Binary text type transforms into binary value instead of text.
  • mediumtext, longtext. Long text replaces by empty value. For example, text with length 16777214 will be empty after transformation.
  • bool/boolean. Returns number values instead of boolean.

Steps to Reproduce

Short way:

  1. Checkout master branch
  2. Go to MySqlSourceComprehensiveTest.java and uncomment test values at lines: 197, 273, 282, 291, 339. Or go to CdcMySqlSourceComprehensiveTest.java and uncomment test values at lines: 238, 272, 282, 301, 310, 319, 36\
  3. Run the test.
  4. 💣

Long way:

  1. Create a table with corresponding data types.
  2. Insert test data.
  3. Setup and run connector
  4. 💣

Severity of the bug for you

High

Airbyte Version

0.24.7-alpha

Connector Version

0.3.4

@DoNotPanicUA DoNotPanicUA added the type/bug Something isn't working label Jun 7, 2021
@sherifnada sherifnada added area/connectors Connector related issues lang/java labels Jun 8, 2021
@dl-lim
Copy link

dl-lim commented Jun 8, 2021

I'm also experiencing this issue. Some tinyint(1) values from mysql are being turned into boolean upon reaching a "normalised" postgres.

Boolean can be expressed as tinyint(1) but not all tinyint(1) are booleans.

I DO note, however, that where a tinyint(1) column has non-'1' values (e.g. '2' or '13'), the normalised field becomes double precision.

But, let's say a table was synced with a tinyint(1) with only '1' and '0', but '2' or '3' are also possible values. Then airbyte would write it as a boolean field. If, say, an incremental sync method was used, it would probably run into issues when trying to added the '3' value into the destination boolean field later. (untested)

I would suggest staying true to the original data type e.g. TINYINT => SMALLINT, etc.

@tuanchris
Copy link
Contributor

I'm also facing this issue.

  • Bad double value: AA==
  • Bad double value: CFg7AA==

This is the data type and value in the db
image
This is what was replicated to BQ:
image

@DoNotPanicUA
Copy link
Contributor Author

I'm also facing this issue.

* Bad double value: AA==

* Bad double value: CFg7AA==

This is the data type and value in the db
image
This is what was replicated to BQ:
image

Hi @tuanchris,
As you might know, there are two parts: Source and Destination connectors.
In your case looks like the problem is in the Destination (write) connector, because I've checked your data set locally and the Source (read) connector works properly.
In means that unfortunately your case will not be fixed as part of this issue. But don't worry we will handle this bug as part of the Destination comprehensive testing.

@tuanchris
Copy link
Contributor

tuanchris commented Jun 10, 2021

I'm also facing this issue.

* Bad double value: AA==

* Bad double value: CFg7AA==

This is the data type and value in the db
image
This is what was replicated to BQ:
image

Hi @tuanchris,
As you might know, there are two parts: Source and Destination connectors.
In your case looks like the problem is in the Destination (write) connector, because I've checked your data set locally and the Source (read) connector works properly.
In means that unfortunately your case will not be fixed as part of this issue. But don't worry we will handle this bug as part of the Destination comprehensive testing.

Hi @DoNotPanicUA,

I think this might be a problem with how debezium treats float values. I faced a similar issue before with debezium & kafka as outlined here.

Also since the raw data replicated to BQ is in byte already, I doubt that this is a destination problem.

@DoNotPanicUA DoNotPanicUA self-assigned this Aug 11, 2021
@sherifnada sherifnada added the cdc label Sep 24, 2021
@alexandr-shegeda alexandr-shegeda linked a pull request Oct 4, 2021 that will close this issue
38 tasks
alexandr-shegeda added a commit that referenced this issue Oct 6, 2021
DoNotPanicUA added a commit that referenced this issue Oct 12, 2021
* move logic from static JdbcUtils to JdbcSourceOperations

* format

* Add methods for all types in order to have possibility rewrite them and use common type mapping.

* Make BigQuery in line impl with JDBC changes

* fix binary type

* add database creations methods with sourceOperations input

* add MySqlSourceOperations

* fix processing

* Fix CDC processing

* format

* add some tests for double and float

* incr version

* #3931 #3932 added zero-date converting to null param "zeroDateTimeBehavior=convertToNull"

* remove old tests covered by data type tests + incr ver

* Update airbyte-integrations/connectors/destination-mysql/src/main/java/io/airbyte/integrations/destination/mysql/MySQLDestination.java

Co-authored-by: Sherif A. Nada <snadalive@gmail.com>

* Update docs/integrations/sources/mysql.md

Co-authored-by: Sherif A. Nada <snadalive@gmail.com>

* add back comments to the data type mapping

* incr config version

Co-authored-by: Oleksandr Sheheda <alexandrshegeda@gmail.com>
Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
schlattk pushed a commit to schlattk/airbyte that referenced this issue Jan 4, 2022
* move logic from static JdbcUtils to JdbcSourceOperations

* format

* Add methods for all types in order to have possibility rewrite them and use common type mapping.

* Make BigQuery in line impl with JDBC changes

* fix binary type

* add database creations methods with sourceOperations input

* add MySqlSourceOperations

* fix processing

* Fix CDC processing

* format

* add some tests for double and float

* incr version

* airbytehq#3931 airbytehq#3932 added zero-date converting to null param "zeroDateTimeBehavior=convertToNull"

* remove old tests covered by data type tests + incr ver

* Update airbyte-integrations/connectors/destination-mysql/src/main/java/io/airbyte/integrations/destination/mysql/MySQLDestination.java

Co-authored-by: Sherif A. Nada <snadalive@gmail.com>

* Update docs/integrations/sources/mysql.md

Co-authored-by: Sherif A. Nada <snadalive@gmail.com>

* add back comments to the data type mapping

* incr config version

Co-authored-by: Oleksandr Sheheda <alexandrshegeda@gmail.com>
Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants