Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBZ-6591: Document read preference changes in 2.4 #4654

Merged
merged 2 commits into from
Jul 24, 2023

Conversation

jcechace
Copy link
Member

@jcechace jcechace commented Jul 4, 2023

@jcechace jcechace requested a review from roldanbob July 4, 2023 11:24
@jcechace
Copy link
Member Author

jcechace commented Jul 4, 2023

@roldanbob feel free to push or suggest any changes as you see fit.

Copy link
Contributor

@roldanbob roldanbob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As always, let me know if you have any questions, @jcechace

documentation/modules/ROOT/pages/connectors/mongodb.adoc Outdated Show resolved Hide resolved
// ModuleID: description-of-how-the-mongodb-connector-uses-read-preference
[id="read-preference"]
=== Read Preference
Starting with Debezium 2.4 the MongoDB connector honors the link:https://www.mongodb.com/docs/manual/core/read-preference/[read preference] configured in either xref:mongodb-property-mongodb-connection-string[`mongodb.connection.string`] or xref:mongodb-property-mongodb-connection-string-shard-params[`mongodb.connection.string.shard.params`] for sharded clusters when xref:mongodb-property-mongodb-connection-mode[`mongodb.connection.mode`] is set to `replica_set`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roldanbob I believe this isn't right. Your wording changes the meaning. The important note here is that mongodb.connection.string is used for everything but sharded cluster with mongodb.connection.mode set to replica_set. In such case mongodb.connection.string.shard.params is used instead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @jcechace I think that I've addressed the issue. Please take a look to verify.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roldanbob not quite... though reading my original post it is ambiguous.

So.. By default in 2.4 the connector honors the setting in mongodb.connection.string. However specifically (and only) for sharded clusters with mongodb.connection.mode set to replica_set the setting is taken from mongodb.connection.string.shard.params.

This is because for all other combinations of connnector config and MongoDB topology the connector uses the connection String set in mognodb.connection.string to connect to the database AND capture changes. However, for sharded clusters with mongodb.connection.mode set to replica_set (as opposed to shared) the initial connection string from mongodb.connection.string is used ONLY to read the cluster topology from the config.shards collection. Once that is done a new connection string is constructed for each shard by combining the information retrieved from config.shards collection and the parameters specified in mongodb.connection.string.shard.pamars.

Copy link
Contributor

@roldanbob roldanbob Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, yeah, thanks for teasing that out a little more. I was/am still a little confused about the linkage between how Debezium connects to a shard, and the read preference mode, but it's beginning to make sense.
Would it be worth adding a sentence or two to make that linkage explicit? I don't see anywhere else in our docs where we mention read preference, but maybe that's common knowledge for a MongoDB user? From what I can piece together I'm guessing that the read preference is implied from the connection interface?
I'm not sure that I've cracked this nut yet, but here's another go at it. Getting closer?

Copy link
Member Author

@jcechace jcechace Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roldanbob OK... I will try to explain each bit separately. Hopefully it will help.

Read Preference

Read preference provides means for MongoDB client to specify from which type of nodes it wishes to perform read operations (so find queries and change streams). How exactly this happens is more or less driver thing (for replca sets the driver simply forwards the operations to the right type of node, for sharded clusters it needs to go through the router - mongos). This setting can be done either programatically by using the client settings builder or it can be all put into the connection string -- which is what we want as we want debezium users to be able to provide this configuration.

How debezium connects to MongoDB

For replica set deployment of MongoDB the connection is simple. The user provides connection string via mongodb.connection.string and the client is initialized from that (more or less, some properties can also be set via additional properties, but that is just an alternative which exist for historical reasons). This connection string is then used to connect to the MongoDB RS an to perform all operations in connector tasks.

For sharded MongoDB cluster the connection is a bit complicated as there are two options - determined by the value of mongodb.connection.mode

With mongodb.connection.mode: sharded we have the same behaviour as was the case for Replica Set topology. The user provides connection string via mongodb.connection.string and the client is initialized from that. This connection is then used to perform all operations in the connector tasks. The difference (compared to the RS case) is that the connection string shouldn't contain addresses of individual mongod nodes (these are the "database" nodes which store data) but rather the addresses of mongos routers.

With mongodb.connection.mode: replica_set we have the most complicated and bit dirty case which is also the default for sharded clusters (for backwards compatibility and historical reasons). Once again the users provides connection string (containing the addresses of mongos routes) via mongodb.connection.string and the client is initialized from that. HOWEVER in this mode we use this connection ONLY to read the information about the individual shards in the cluster (from the config.shards collection). This information is retrieved periodically and when the shard information changes the connector tasks are reconfigured. The connector tasks themselves are not using the initial client created from the connection string provided in mongodb.connection.string as that contains mongos router addresses. Rather the tasks are provided with a new connection string which is constructed for each individual shard in the cluster (and thus the tasks are completely ignoring mongo routers). So if a readPreference was to be specified in mongodb.connection.string it would not be applied to this shard specific connection string (as there might be reasons to have slightly different configuration for the initial connection and for the connection s to individual shards). For this reason the mognodb.connection.string.shard.params property exists so that it allows specifying the query parameters of the shard specific connection string.

Note: mongodb.connection.mode: replica_set is a bit dirty. The MongoDB documentation states that applications should not be connecting to the individual shards and as stated previously -- it exists for historical reason. Lately our community users are starting to encounter the issues connected to this mode more frequently which is something I expected and exactly the reason why mongodb.connection.mode: sharded was introduced. We intend to deprecate the replica_set connection mode in the future.

documentation/modules/ROOT/pages/connectors/mongodb.adoc Outdated Show resolved Hide resolved
documentation/modules/ROOT/pages/connectors/mongodb.adoc Outdated Show resolved Hide resolved
// ModuleID: description-of-how-the-mongodb-connector-uses-read-preference
[id="read-preference"]
=== Read Preference
Starting with Debezium 2.4 the MongoDB connector honors the link:https://www.mongodb.com/docs/manual/core/read-preference/[read preference] configured in either xref:mongodb-property-mongodb-connection-string[`mongodb.connection.string`] or xref:mongodb-property-mongodb-connection-string-shard-params[`mongodb.connection.string.shard.params`] for sharded clusters when xref:mongodb-property-mongodb-connection-mode[`mongodb.connection.mode`] is set to `replica_set`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roldanbob I believe this isn't right. Your wording changes the meaning. The important note here is that mongodb.connection.string is used for everything but sharded cluster with mongodb.connection.mode set to replica_set. In such case mongodb.connection.string.shard.params is used instead

Copy link
Contributor

@roldanbob roldanbob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcechace Apologies for taking a while to get this back to you. I ended up going a bit deeper on the content, and I think that this new version accomplishes what we want.

documentation/modules/ROOT/pages/connectors/mongodb.adoc Outdated Show resolved Hide resolved
documentation/modules/ROOT/pages/connectors/mongodb.adoc Outdated Show resolved Hide resolved
Co-authored-by: roldanbob <broldan@redhat.com>
@jcechace
Copy link
Member Author

@roldanbob I've squashed the commit and added all your suggestions. Please give it one final re-read to confirm this is now good for merge :). LGTM

Copy link
Contributor

@roldanbob roldanbob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcechace I just added one xref, otherwise I think you can merge this. Thanks for your patience in helping me figure out what this was all about.

documentation/modules/ROOT/pages/connectors/mongodb.adoc Outdated Show resolved Hide resolved
Adds link to the `mongodb.connection.string.shard.params` property
Copy link
Contributor

@roldanbob roldanbob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and committed the change to add a link to the mongodb.connection.string.shard.params property.

@jcechace jcechace merged commit a798995 into debezium:main Jul 24, 2023
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants