Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRAFT: DBZ-121 Added options for how change events are mapped to topics #129

Closed
wants to merge 1 commit into from

Conversation

rhauch
Copy link
Member

@rhauch rhauch commented Nov 1, 2016

The default is to map all change events in each table to a separate topic, but this commit adds the ability to map all change events for all database tables to a database-specific topic, or to use a custom mapping via TopicMappingProvider implementation.

NOTE: This is still incomplete and not thoroughly tested.

…atabase to a database-specific topic, or using a custom mapping.
@hchiorean
Copy link

hchiorean commented Nov 4, 2016

@rhauch: I've also added a TOPIC_SELECTION_STRATEGY config option for the PG connector with the options of PER_TABLE(default) or PER_SCHEMA. This PR contains the missing pieces in TableSchemaBuilder to support this feature (i.e. correct event Key schema and value generator) so I will not attempt to make the same/similar changes to TableSchemaBuilder, but rather rely on this once it's merged.

@rhauch
Copy link
Member Author

rhauch commented Nov 4, 2016

@hchiorean Awesome. Any preferences or suggested changes to names of anything? I do like TOPIC_SELECTION_STRATEGY better than TOPIC_MAPPING.

@hchiorean
Copy link

the only comment I would have based on the current proposed PR changes is that I'm not convinced of the benefit of the added complexity for the custom TopicMappingProvider option. For PG, there are only 2 options which make sense : per table and per schema.

As far as integrating with the rest of the code, it's hard to say until I actually get around to doing that, so I'm fine with it in its current form.

@rhauch
Copy link
Member Author

rhauch commented Nov 4, 2016

@hchiorean, agreed. Ideally we can completely hide how this is implemented, and instead use a configuration option that specifies something like per-table, per-database, per-schemas or whatever reads well. We can expand these if/when we add new built-in strategies.

DBZ-121 not only deals with expanding how the connector maps events to topics beyond just topic-per-table to also handle topic-per-database and topic-per-schema (whichever makes sense for the DBMS), but it also calls for a way to deal with sharded tables. Originally, I was thinking that dealing with shards is really just a different topic mapping strategy, since you may want to annotate the key to do this. There are so many different approaches to sharding that I'm not sure we can have a built-in approach that works for everyone. Thus the need for specifying a custom mapping strategy.

But just because we could use a custom mapping strategy doesn't mean the configuration should require an implementation class name. This still needs to be changed/refined.

@rhauch rhauch changed the title DBZ-121 Added options for how change events are mapped to topics DRAFT: DBZ-121 Added options for how change events are mapped to topics Feb 10, 2017
}

/**
* Get the schema of the keys for all messages produced from the table.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docblock seems a bit weird... "Get" implies to me something other than a void return.

void enhanceKeySchema(SchemaBuilder keySchemaBuilder);

/**
* Get the key for the row defined by the specified
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docblock seems incomplete here.


@Override
public TopicMapping getMapper(String prefix, Table table) {
String topicName = prefix + table.id().toString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is supposed to be equivalent to the current dbz approach, right? AFAICT, the topic name in current logic ends with tableName, which is populated via tableId.table(), which is equivalent to table.id().table().

But in this new logic, the topic name ends with table.id().toString(). Is that equivalent?

* @author Randall Hauch
*
*/
public interface TopicMappingProvider {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the Provider abstraction necessary? Compared to the old approach, it seems like it just adds another layer needlessly to the abstraction. Could users specify in the config a TopicMapping rather than a TopicMappingProvider?

}

@Override
public void enhanceKeySchema(SchemaBuilder keySchemaBuilder) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be nice to add docs about why this enhancement is necessary.

*
* @return the key's schema name; may not be null
*/
default String getKeySchemaName() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we providing a way to customize the KeySchemaName and ValueSchemaName? As I understand it, the motivation for this PR was custom DML topic mappings. That doesn't seem to require customizing the KeySchemaName or ValueSchemaName...

That said, I guess it doesn't hurt to add that customization feature.

@dasl-
Copy link
Contributor

dasl- commented Feb 17, 2017

I'm excited about these changes. Let me know if I can help!

@rhauch
Copy link
Member Author

rhauch commented Feb 18, 2017

@dasl- I hope to get back to this soon. It is possible to limit the changes to only allow someone to simply customize the names of the topics, but not the table-to-topic mapping. This proposal tries to be even more flexible by allowing multiple tables to be mapping to a single topic, and to do this we'll likely need to augment the message keys since a table's primary key is no longer sufficient to distinguish the row in table A from perhaps a row from table B with the same primary key.

Having said all this, though, I've not really looked at this PR for some time, and I may want to do things differently. Any chance you want to take a whack at it, even if it's just a start/concept?

@dasl-
Copy link
Contributor

dasl- commented Feb 22, 2017

@rhauch My approach would probably be very similar to the one you took here. I would like to allow multiple tables to map to a single topic also.

I will try applying this patch and seeing how it works. Perhaps I will have a follow up code review with some additions.

This was referenced Mar 1, 2017
@rhauch
Copy link
Member Author

rhauch commented Apr 4, 2017

Closed without merging. Instead, we've gone with using Single Message Transforms per #211.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants