New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRAFT: DBZ-121 Added options for how change events are mapped to topics #129
Conversation
…atabase to a database-specific topic, or using a custom mapping.
@rhauch: I've also added a |
@hchiorean Awesome. Any preferences or suggested changes to names of anything? I do like |
the only comment I would have based on the current proposed PR changes is that I'm not convinced of the benefit of the added complexity for the custom As far as integrating with the rest of the code, it's hard to say until I actually get around to doing that, so I'm fine with it in its current form. |
@hchiorean, agreed. Ideally we can completely hide how this is implemented, and instead use a configuration option that specifies something like DBZ-121 not only deals with expanding how the connector maps events to topics beyond just topic-per-table to also handle topic-per-database and topic-per-schema (whichever makes sense for the DBMS), but it also calls for a way to deal with sharded tables. Originally, I was thinking that dealing with shards is really just a different topic mapping strategy, since you may want to annotate the key to do this. There are so many different approaches to sharding that I'm not sure we can have a built-in approach that works for everyone. Thus the need for specifying a custom mapping strategy. But just because we could use a custom mapping strategy doesn't mean the configuration should require an implementation class name. This still needs to be changed/refined. |
} | ||
|
||
/** | ||
* Get the schema of the keys for all messages produced from the table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docblock seems a bit weird... "Get" implies to me something other than a void
return.
void enhanceKeySchema(SchemaBuilder keySchemaBuilder); | ||
|
||
/** | ||
* Get the key for the row defined by the specified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docblock seems incomplete here.
|
||
@Override | ||
public TopicMapping getMapper(String prefix, Table table) { | ||
String topicName = prefix + table.id().toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is supposed to be equivalent to the current dbz approach, right? AFAICT, the topic name in current logic ends with tableName
, which is populated via tableId.table()
, which is equivalent to table.id().table()
.
But in this new logic, the topic name ends with table.id().toString()
. Is that equivalent?
* @author Randall Hauch | ||
* | ||
*/ | ||
public interface TopicMappingProvider { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the Provider
abstraction necessary? Compared to the old approach, it seems like it just adds another layer needlessly to the abstraction. Could users specify in the config a TopicMapping
rather than a TopicMappingProvider
?
} | ||
|
||
@Override | ||
public void enhanceKeySchema(SchemaBuilder keySchemaBuilder) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be nice to add docs about why this enhancement is necessary.
* | ||
* @return the key's schema name; may not be null | ||
*/ | ||
default String getKeySchemaName() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we providing a way to customize the KeySchemaName and ValueSchemaName? As I understand it, the motivation for this PR was custom DML topic mappings. That doesn't seem to require customizing the KeySchemaName or ValueSchemaName...
That said, I guess it doesn't hurt to add that customization feature.
I'm excited about these changes. Let me know if I can help! |
@dasl- I hope to get back to this soon. It is possible to limit the changes to only allow someone to simply customize the names of the topics, but not the table-to-topic mapping. This proposal tries to be even more flexible by allowing multiple tables to be mapping to a single topic, and to do this we'll likely need to augment the message keys since a table's primary key is no longer sufficient to distinguish the row in table A from perhaps a row from table B with the same primary key. Having said all this, though, I've not really looked at this PR for some time, and I may want to do things differently. Any chance you want to take a whack at it, even if it's just a start/concept? |
@rhauch My approach would probably be very similar to the one you took here. I would like to allow multiple tables to map to a single topic also. I will try applying this patch and seeing how it works. Perhaps I will have a follow up code review with some additions. |
Closed without merging. Instead, we've gone with using Single Message Transforms per #211. |
The default is to map all change events in each table to a separate topic, but this commit adds the ability to map all change events for all database tables to a database-specific topic, or to use a custom mapping via
TopicMappingProvider
implementation.NOTE: This is still incomplete and not thoroughly tested.