Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBZ-132 Capturing only first char of String in Enum for each row entr… #116

Merged
merged 2 commits into from Oct 11, 2016

Conversation

pranmitt
Copy link
Contributor

…y in kafka

Enum and Set were assumed to single character
Adding functionality to convert back allowed values as comma seperated string of enum allowed values
And also added to get a enum value at a particular index from comma seperated allowed values

@pranmitt
Copy link
Contributor Author

THE logs show following:

Debezium Checkstyle Rules .......................... SUCCESS [ 4.446 s]
[INFO] Debezium Parent POM ................................ SUCCESS [ 1.412 s]
[INFO] Debezium Assembly Descriptors ...................... SUCCESS [ 1.450 s]
[INFO] Debezium Core ...................................... SUCCESS [01:06 min]
[INFO] Debezium Embedded .................................. SUCCESS [ 2.026 s]
[INFO] Debezium Connector for MySQL ....................... SUCCESS [01:35 min]
[INFO] Debezium Connector for MongoDB ..................... FAILURE [ 33.848 s]

How it can affect mongodb connector any clue ?

@rhauch
Copy link
Member

rhauch commented Oct 11, 2016

@pranmitt, our build runs MySQL and MongoDB in Docker containers, and unfortunately there is a quirk (mostly on Travis) where Maven is not able to see that the MongoDB containers are ready even when they are, and thus the build times out and fails. Don't worry, it's not related to your code changes. I've restarted the build to see if it will go green.

} else {
sb.append(ENUM_AND_SET_DELIMINATOR);
}
sb.append(option);
}
}
return sb.toString();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method used to return a String when I thought the allowed values were limited to single characters, but perhaps even then it would have been better in hindsight to returned char[] instead. Now that we know that allowed values can be any string, wouldn't it be better for this method to return a List<String>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually my concern was suppose we had column which can have maximum 50/60 enum values, schema builder will always contain it as comma separated string, then

  1. for each row insert, update i will be un necessary creating array of 60 strings values just to get single enum value at particular index .
  2. similarly i will be performing split based on deliminator for each row insert/update.

Above two can be high performance impact, if my number of row entries are huge.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this method is called only when the schema changes, and it is not called every time a row is updated. Plus, given the impact that using List<String> would have also means that the code that is run on every row change/update is far faster and more efficient.

enumLen ++;
}
}
column.length(maxLength);
Copy link
Member

@rhauch rhauch Oct 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic would be far simpler if (see above comment) it could be:

List<String> options = parseSetAndEnumOptions(dataType.expression());
column.length(options.size());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As enum column will always has single enum value. Max column length should be length of longest string in allowed enum values, don't you think ?

List options = parseSetAndEnumOptions(dataType.expression());
column.length(options.size());

This will give you total number of allowed enum values

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a difference between the length of the string representation of the value, and the column length. MySQL enums actually don't have a "length" like VARCHAR, CHAR, etc. columns, so I'm not sure this is really all that important anyway.

Consider how the JDBC driver works with an enum column, and what it reports as the column's length or precision. It is not the length of the string representation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that, I'm not sure the proposed changes in this method are correct. The column length for an enum column will always be one, since for any row is only a single enumeration value.

@@ -85,11 +86,11 @@ public SchemaBuilder schemaBuilder(Column column) {
return Year.builder();
}
if (matches(typeName, "ENUM")) {
String commaSeparatedOptions = extractEnumAndSetOptions(column, true);
String commaSeparatedOptions = extractEnumAndSetOptions(column);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think it would be far clearer and simpler for extractEnumAndSetOptions(...) to be changed to return a List<String>. Yes, it changes the signature of several methods, but I think it's worth it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pranmitt, just for reference, this schemaBuilder(Column) method is called only when the table schema changes (e.g., because of a DDL statement), and this is the method that ultimately calls the DDL parser method.

return options.substring(startDeliminatorIndex);
} else if (nextDelimiatorIndex != -1) {
return options.substring(startDeliminatorIndex,nextDelimiatorIndex);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic becomes trivial if the options are in the form of List<String>.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a method that is called with every row change, so if the options were passed in as a List<String> and this method becomes far simpler, it will also perform much better.

sb.append(options.substring(startDeliminatorIndex));
} else if (nextDelimiatorIndex != -1) {
sb.append(options.substring(startDeliminatorIndex,nextDelimiatorIndex));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic also becomes a lot simpler if the options are in a List<String>.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a method that is called with every row change, so if the options were passed in as a List<String> and this method becomes far simpler, it will also perform much better.

@rhauch
Copy link
Member

rhauch commented Oct 11, 2016

@pranmitt, thank you for submitting this PR to fix this issue. This is pretty close, but I do think the changes could be made even better and simpler by parsing the allowed values as a List<String>. Perhaps you were trying to minimize the impact of your changes, but I do think that this fix requires fixing some of the incorrect assumptions I made earlier. Do you want to make those changes?

}
commaSeparatedOptions.append(value);
}
return io.debezium.data.Enum.builder(commaSeparatedOptions.toString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be done more easily with:

String commaSeparatedOptions = Strings.join(","options);

using the io.debezium.util.Strings.join(...) utility.

}
commaSeparatedOptions.append(value);
}
return io.debezium.data.EnumSet.builder(commaSeparatedOptions.toString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And can use Strings.join here, too.

}
sb.append(value);
}
assertThat(optionString).isEqualTo(sb.toString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another use for Strings.join to make things much simpler.

@rhauch
Copy link
Member

rhauch commented Oct 11, 2016

@pranmitt, excellent work. I mentioned a few places where one of our utility methods can be used to dramatically simplify the code. Then, please squash the commits into one so the commit keeps your authorship and we lose the intermediate states.

@rhauch
Copy link
Member

rhauch commented Oct 11, 2016

@pranmitt looks great, but please squash the commits locally in your branch, and then use git push -f origin DBZ-132 (if origin is the name of your fork in your local git).

Updated MysqlParser to return list of String for allowed enum and set values
And also added code fix to get a enum value at a particular index and for set option too.
Used debezium string utility to join list of string into deliminator seperated String.
Updating old test cases as per required to handle list of strings.
@rhauch rhauch merged commit 0984ed3 into debezium:master Oct 11, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants