Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBZ-3966 JsonTableChangeSerializer support serialization for defaultValue #2671

Conversation

Jiabao-Sun
Copy link
Contributor

@Jiabao-Sun Jiabao-Sun commented Sep 2, 2021

Both JsonTableChangeSerializer and ConnectTableChangeSerializer ignored defaultValue, hasDefaultValue, enumValues of ColumnImpl.

If we serialize and then deserialize the TableChanges, missing default value may cause:

org.apache.kafka.connect.errors.DataException: Invalid value: null used for required field

https://issues.redhat.com/browse/DBZ-3966

@Jiabao-Sun Jiabao-Sun marked this pull request as draft September 3, 2021 02:57
@Jiabao-Sun
Copy link
Contributor Author

Failed tests need to be fixed.

@gunnarmorling
Copy link
Member

@Jiabao-Sun, any insights on the test failures? Are you going to push a fix to this PR?

@Jiabao-Sun
Copy link
Contributor Author

@gunnarmorling
Sorry, I found that serializing TableChange is a bit more complicated than I thought.

Because we need to deal with some possible types of ColumnImpl's default values ​​that are not recognized by Value such as ByteBuffer and Short.

Thanks for your reply and I'm going to push a fix to this PR.

@jpechane
Copy link
Contributor

jpechane commented Sep 6, 2021

@Jiabao-Sun Cool, thanks! Do you think you might re-use *DefaultValueConverter classes to help with deserialization?

@Jiabao-Sun
Copy link
Contributor Author

@Jiabao-Sun Cool, thanks! Do you think you might re-use *DefaultValueConverter classes to help with deserialization?

@jpechane Thanks! I'll try with it.

@Jiabao-Sun Jiabao-Sun force-pushed the fix-table-change-serializer-missing-default-value branch 4 times, most recently from c7c8f7b to 8ba42a1 Compare September 6, 2021 19:58
@Jiabao-Sun
Copy link
Contributor Author

@gunnarmorling @jpechane Sorry for the late fix.

It's much difficult to serialize the defaultValue object to json directly, cause the data type of defaultValue may be unpredictable and non-serializable. In addition, forcibly serializing to json may also lose data types.

So, I added a new defaultValueExpression property of ColumnImpl to retain database original default expression.
Serialize defaultValueExpression to json and use DefaultValueConverter to restore the defaultValue when deserialization.

It works but changes a lot.
If there is a better plan, please suggest it.

@Jiabao-Sun Jiabao-Sun marked this pull request as ready for review September 6, 2021 20:10
@Jiabao-Sun Jiabao-Sun force-pushed the fix-table-change-serializer-missing-default-value branch 2 times, most recently from 843bfa2 to 7e5789e Compare September 7, 2021 09:36
@Jiabao-Sun
Copy link
Contributor Author

@gunnarmorling @jpechane

Failed tests were fixed and the PR is ready now.
Do you have any suggestions, or is there a better plan?

@jpechane
Copy link
Contributor

jpechane commented Sep 9, 2021

@Jiabao-Sun I think this is nice approach, thanks!

@Naros What is the current support of default values in Oracle connector? Is it properly supported yet so this PR is vlaid for it or not?

@gunnarmorling This allows us finally to switch fully from DDL parsing to logical changes based history recovery.
The problem is that the currently created histories lack the information about default value so they would need to be resnpashotted. The problem is that we cannot automatically decide whether DDL or logical changes should be used, so do you think it makes sense to move the full switch to 2.0?

@jpechane
Copy link
Contributor

@Jiabao-Sun We are near the release of 1.7.0.Final. The PR will be merged right after the realse into 1.8 master.

@Jiabao-Sun
Copy link
Contributor Author

@jpechane OK. Thanks a lot.

@Naros
Copy link
Member

Naros commented Sep 29, 2021

What is the current support of default values in Oracle connector? Is it properly supported yet so this PR is vlaid for it or not?

@jpechane I vaguely remember that the shortcoming of the Oracle connector is we don't have a column default value listener and parser for the DDL. So during snapshot we are able to get the values from the driver we don't capture and parse the value during streaming changes like we do for MySQL.

@gunnarmorling
Copy link
Member

@Naros, can you log a separate issue for default value support for the Oracle connector then?

@jpechane, @Jiabao-Sun, other than Oracle support, is this PR good to go? If not, what's missing?

@jpechane
Copy link
Contributor

jpechane commented Oct 5, 2021

@gunnarmorling This is good to go for me

@Naros
Copy link
Member

Naros commented Oct 5, 2021

can you log a separate issue for default value support for the Oracle connector then?

@gunnarmorling Done, see DBZ-4115. I could have sworn we had one previously but I didn't see one currently opened.

@Jiabao-Sun Jiabao-Sun force-pushed the fix-table-change-serializer-missing-default-value branch from 7e5789e to a3ca796 Compare October 13, 2021 16:27
@Jiabao-Sun
Copy link
Contributor Author

Conflicts resolved.

@Jiabao-Sun Jiabao-Sun force-pushed the fix-table-change-serializer-missing-default-value branch from a3ca796 to 974b13e Compare October 14, 2021 08:13
@Jiabao-Sun Jiabao-Sun changed the title DBZ-3966 JsonTableChangeSerializer support serialization for defaultV… DBZ-3966 JsonTableChangeSerializer support serialization for defaultValue Oct 14, 2021
@Jiabao-Sun Jiabao-Sun force-pushed the fix-table-change-serializer-missing-default-value branch 3 times, most recently from c2509a5 to 6bc3b67 Compare November 2, 2021 14:48
@Jiabao-Sun Jiabao-Sun force-pushed the fix-table-change-serializer-missing-default-value branch 2 times, most recently from ac85a08 to d5413ab Compare November 2, 2021 18:45
Copy link
Member

@gunnarmorling gunnarmorling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the rework, @Jiabao-Sun!

I think we're getting there. One more comment inline, looks like things can be simplified some more. I'll take another look tomorrow my time, but I don't see any more large road blocks.

Comment on lines 140 to 142
public Object defaultValue() {
return defaultValue;
return null;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a left-over?

Also hasDefaultValue() should not be needed any longer; as we now can differentiate whether a default value has been provided or not by examining the Optional returned from defaultValueExpression(). The explicit hasDefaultValue() method was needed before as we could not tell differentiate between no default value given or a default value of null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gunnarmorling

The left-over has been cleaned up.

I think we should keep hasDefaultValue cause there are scenarios for implicit default values.

Usually, defaultValueExpression is null and hasDefaultValue is false.
But in the implicit case, defaultValueExpression is null but hasDefaultValue should be true.

In a word, the null defaultValueExpression does not directly indicate whether the column has a default value.

https://dev.mysql.com/doc/refman/8.0/en/data-type-defaults.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this scenario needs to be concerned, then the hasDefaultValue should also be serialized in TableChangeSerializer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that's a good point. Indeed we are exporting some "implicit default values" in certain cases, and my current thinking is we probably should not do this. But it seems largely independent from the matter at hand and fixing this also would be sort of a breaking change, so it's something we should leave for Debezium 2.0 (I'll add this to the list of things to be considered for that, see DBZ-3899). No re serializing hasDefaultValue, I'd prefer if we'd not do that, given we should get rid of this entire implicit default value business at all. Admittedly, there'll be an inconsistency when reading back the schema history for such value, but I feel that's an acceptable limitation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gunnarmorling for the detailed explanation.
I will remove all the hasDefaultValue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm just not sure whether we can or should do this at this point. Here's a PR where I did that (just look at the last commit): #2887. It changes some assertions, i.e. behavior exposed to consumers is slightly modified in some cases. @jpechane expressed concerns about doing this in a release like 1.8 (as opposed to a major release like 2.0). @jpechane, can you take a look at that last commit and the assertions I've changed; as stated below, I think one could also argue that the previous behavior actually was incorrect when it comes to those implicit default values.

@github-actions
Copy link

github-actions bot commented Nov 3, 2021

Hi @Jiabao-Sun, thanks for your contribution. Please prefix the commit message(s) with the DBZ-xxx JIRA issue key.

@Jiabao-Sun Jiabao-Sun force-pushed the fix-table-change-serializer-missing-default-value branch from 0c9cb7e to ad12282 Compare November 3, 2021 03:50
@Jiabao-Sun Jiabao-Sun force-pushed the fix-table-change-serializer-missing-default-value branch from ad12282 to fea6ac6 Compare November 3, 2021 04:59
@gunnarmorling
Copy link
Member

@Jiabao-Sun, pushed one more commit to return Optional<String> from ColumnEditor, too. If that passes CI, I think we can merge this PR. I'll create a follow-up draft PR, so to show how removing hasDefaultValue would look like. As said, that's something we likely want to keep for 2.0 though. Still going to share that preview PR, so to facilitate early feedback on it. In fact, one could also argue that the current behavior isn't correct actually, and a fix is warranted.

@Jiabao-Sun
Copy link
Contributor Author

@gunnarmorling
By the way, if we still have to keep hasDefaultValue now, I recommend to serialize it in the TableChangeSerializer as well. Because ignoring it may cause inconsistent deserialization results eg. the column has implicit default value null.

@gunnarmorling
Copy link
Member

By the way, if we still have to keep hasDefaultValue now, I recommend to serialize it in the TableChangeSerializer as well.

Yes, you are right. So I think we have two options at this point:

  • Keep hasDefaultValue and also serialize it
  • Do what I've sketched in that PR linked above, which would get rid of hasDefaultValue altogether

@Jiabao-Sun, @jpechane, which option would you prefer? I'm on the fence, I think I'm slighly leaning towards removing it, as I feel it's more about rectifying previous, buggy behavior rather than breaking correct, expected behavior. But I'm open for both options really.

@Jiabao-Sun
Copy link
Contributor Author

@gunnarmorling

Personally, I prefer the first option.
We can keep it in the minor version and try to remove it completely in the major version.

@gunnarmorling
Copy link
Member

Ok, so let's do this then. Thanks a lot for iterating over this with us, really appreciating your efforts, @Jiabao-Sun! Are you going to push another commit for adding that serialization work? I'll log an issue describing the follow-up work, also linking to that branch of mine with the preliminary work for it. Thx!

@Jiabao-Sun
Copy link
Contributor Author

OK, I will push another commit to complete it.

@github-actions
Copy link

github-actions bot commented Nov 4, 2021

Hi @Jiabao-Sun, thanks for your contribution. Please prefix the commit message(s) with the DBZ-xxx JIRA issue key.

@Jiabao-Sun Jiabao-Sun force-pushed the fix-table-change-serializer-missing-default-value branch from e4d8b30 to 8d306a3 Compare November 4, 2021 06:53
@github-actions
Copy link

github-actions bot commented Nov 4, 2021

Hi @Jiabao-Sun, thanks for your contribution. Please prefix the commit message(s) with the DBZ-xxx JIRA issue key.

@gunnarmorling
Copy link
Member

Rebased and applied. Thanks a lot, @Jiabao-Sun!

@gunnarmorling
Copy link
Member

I've logged https://issues.redhat.com/browse/DBZ-4239 as a follow-up for removing hasDefaultValue() altogether.

@Jiabao-Sun
Copy link
Contributor Author

Thanks, Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants