New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DBZ-132 Capturing only first char of String in Enum for each row entr… #116
Conversation
THE logs show following: Debezium Checkstyle Rules .......................... SUCCESS [ 4.446 s] How it can affect mongodb connector any clue ? |
@pranmitt, our build runs MySQL and MongoDB in Docker containers, and unfortunately there is a quirk (mostly on Travis) where Maven is not able to see that the MongoDB containers are ready even when they are, and thus the build times out and fails. Don't worry, it's not related to your code changes. I've restarted the build to see if it will go green. |
} else { | ||
sb.append(ENUM_AND_SET_DELIMINATOR); | ||
} | ||
sb.append(option); | ||
} | ||
} | ||
return sb.toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method used to return a String
when I thought the allowed values were limited to single characters, but perhaps even then it would have been better in hindsight to returned char[]
instead. Now that we know that allowed values can be any string, wouldn't it be better for this method to return a List<String>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually my concern was suppose we had column which can have maximum 50/60 enum values, schema builder will always contain it as comma separated string, then
- for each row insert, update i will be un necessary creating array of 60 strings values just to get single enum value at particular index .
- similarly i will be performing split based on deliminator for each row insert/update.
Above two can be high performance impact, if my number of row entries are huge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this method is called only when the schema changes, and it is not called every time a row is updated. Plus, given the impact that using List<String>
would have also means that the code that is run on every row change/update is far faster and more efficient.
enumLen ++; | ||
} | ||
} | ||
column.length(maxLength); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic would be far simpler if (see above comment) it could be:
List<String> options = parseSetAndEnumOptions(dataType.expression());
column.length(options.size());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As enum column will always has single enum value. Max column length should be length of longest string in allowed enum values, don't you think ?
List options = parseSetAndEnumOptions(dataType.expression());
column.length(options.size());
This will give you total number of allowed enum values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a difference between the length of the string representation of the value, and the column length. MySQL enums actually don't have a "length" like VARCHAR, CHAR, etc. columns, so I'm not sure this is really all that important anyway.
Consider how the JDBC driver works with an enum column, and what it reports as the column's length or precision. It is not the length of the string representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that, I'm not sure the proposed changes in this method are correct. The column length for an enum column will always be one, since for any row is only a single enumeration value.
@@ -85,11 +86,11 @@ public SchemaBuilder schemaBuilder(Column column) { | |||
return Year.builder(); | |||
} | |||
if (matches(typeName, "ENUM")) { | |||
String commaSeparatedOptions = extractEnumAndSetOptions(column, true); | |||
String commaSeparatedOptions = extractEnumAndSetOptions(column); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I think it would be far clearer and simpler for extractEnumAndSetOptions(...)
to be changed to return a List<String>
. Yes, it changes the signature of several methods, but I think it's worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pranmitt, just for reference, this schemaBuilder(Column)
method is called only when the table schema changes (e.g., because of a DDL statement), and this is the method that ultimately calls the DDL parser method.
return options.substring(startDeliminatorIndex); | ||
} else if (nextDelimiatorIndex != -1) { | ||
return options.substring(startDeliminatorIndex,nextDelimiatorIndex); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic becomes trivial if the options are in the form of List<String>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a method that is called with every row change, so if the options were passed in as a List<String>
and this method becomes far simpler, it will also perform much better.
sb.append(options.substring(startDeliminatorIndex)); | ||
} else if (nextDelimiatorIndex != -1) { | ||
sb.append(options.substring(startDeliminatorIndex,nextDelimiatorIndex)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic also becomes a lot simpler if the options are in a List<String>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a method that is called with every row change, so if the options were passed in as a List<String>
and this method becomes far simpler, it will also perform much better.
@pranmitt, thank you for submitting this PR to fix this issue. This is pretty close, but I do think the changes could be made even better and simpler by parsing the allowed values as a |
} | ||
commaSeparatedOptions.append(value); | ||
} | ||
return io.debezium.data.Enum.builder(commaSeparatedOptions.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be done more easily with:
String commaSeparatedOptions = Strings.join(","options);
using the io.debezium.util.Strings.join(...) utility.
} | ||
commaSeparatedOptions.append(value); | ||
} | ||
return io.debezium.data.EnumSet.builder(commaSeparatedOptions.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And can use Strings.join
here, too.
} | ||
sb.append(value); | ||
} | ||
assertThat(optionString).isEqualTo(sb.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another use for Strings.join
to make things much simpler.
@pranmitt, excellent work. I mentioned a few places where one of our utility methods can be used to dramatically simplify the code. Then, please squash the commits into one so the commit keeps your authorship and we lose the intermediate states. |
@pranmitt looks great, but please squash the commits locally in your branch, and then use |
Updated MysqlParser to return list of String for allowed enum and set values And also added code fix to get a enum value at a particular index and for set option too. Used debezium string utility to join list of string into deliminator seperated String. Updating old test cases as per required to handle list of strings.
…y in kafka
Enum and Set were assumed to single character
Adding functionality to convert back allowed values as comma seperated string of enum allowed values
And also added to get a enum value at a particular index from comma seperated allowed values