-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DBZ-6591: Document read preference changes in 2.4 #4654
Conversation
@roldanbob feel free to push or suggest any changes as you see fit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As always, let me know if you have any questions, @jcechace
// ModuleID: description-of-how-the-mongodb-connector-uses-read-preference | ||
[id="read-preference"] | ||
=== Read Preference | ||
Starting with Debezium 2.4 the MongoDB connector honors the link:https://www.mongodb.com/docs/manual/core/read-preference/[read preference] configured in either xref:mongodb-property-mongodb-connection-string[`mongodb.connection.string`] or xref:mongodb-property-mongodb-connection-string-shard-params[`mongodb.connection.string.shard.params`] for sharded clusters when xref:mongodb-property-mongodb-connection-mode[`mongodb.connection.mode`] is set to `replica_set`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@roldanbob I believe this isn't right. Your wording changes the meaning. The important note here is that mongodb.connection.string
is used for everything but sharded cluster with mongodb.connection.mode
set to replica_set
. In such case mongodb.connection.string.shard.params
is used instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @jcechace I think that I've addressed the issue. Please take a look to verify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@roldanbob not quite... though reading my original post it is ambiguous.
So.. By default in 2.4 the connector honors the setting in mongodb.connection.string
. However specifically (and only) for sharded clusters with mongodb.connection.mode
set to replica_set
the setting is taken from mongodb.connection.string.shard.params
.
This is because for all other combinations of connnector config and MongoDB topology the connector uses the connection String set in mognodb.connection.string
to connect to the database AND capture changes. However, for sharded clusters with mongodb.connection.mode
set to replica_set
(as opposed to shared
) the initial connection string from mongodb.connection.string
is used ONLY to read the cluster topology from the config.shards
collection. Once that is done a new connection string is constructed for each shard by combining the information retrieved from config.shards
collection and the parameters specified in mongodb.connection.string.shard.pamars
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, yeah, thanks for teasing that out a little more. I was/am still a little confused about the linkage between how Debezium connects to a shard, and the read preference mode, but it's beginning to make sense.
Would it be worth adding a sentence or two to make that linkage explicit? I don't see anywhere else in our docs where we mention read preference, but maybe that's common knowledge for a MongoDB user? From what I can piece together I'm guessing that the read preference is implied from the connection interface?
I'm not sure that I've cracked this nut yet, but here's another go at it. Getting closer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@roldanbob OK... I will try to explain each bit separately. Hopefully it will help.
Read Preference
Read preference provides means for MongoDB client to specify from which type of nodes it wishes to perform read operations (so find queries and change streams). How exactly this happens is more or less driver thing (for replca sets the driver simply forwards the operations to the right type of node, for sharded clusters it needs to go through the router - mongos). This setting can be done either programatically by using the client settings builder or it can be all put into the connection string -- which is what we want as we want debezium users to be able to provide this configuration.
How debezium connects to MongoDB
For replica set deployment of MongoDB the connection is simple. The user provides connection string via mongodb.connection.string
and the client is initialized from that (more or less, some properties can also be set via additional properties, but that is just an alternative which exist for historical reasons). This connection string is then used to connect to the MongoDB RS an to perform all operations in connector tasks.
For sharded MongoDB cluster the connection is a bit complicated as there are two options - determined by the value of mongodb.connection.mode
With mongodb.connection.mode: sharded
we have the same behaviour as was the case for Replica Set topology. The user provides connection string via mongodb.connection.string
and the client is initialized from that. This connection is then used to perform all operations in the connector tasks. The difference (compared to the RS case) is that the connection string shouldn't contain addresses of individual mongod nodes (these are the "database" nodes which store data) but rather the addresses of mongos routers.
With mongodb.connection.mode: replica_set
we have the most complicated and bit dirty case which is also the default for sharded clusters (for backwards compatibility and historical reasons). Once again the users provides connection string (containing the addresses of mongos routes) via mongodb.connection.string
and the client is initialized from that. HOWEVER in this mode we use this connection ONLY to read the information about the individual shards in the cluster (from the config.shards
collection). This information is retrieved periodically and when the shard information changes the connector tasks are reconfigured. The connector tasks themselves are not using the initial client created from the connection string provided in mongodb.connection.string
as that contains mongos router addresses. Rather the tasks are provided with a new connection string which is constructed for each individual shard in the cluster (and thus the tasks are completely ignoring mongo routers). So if a readPreference
was to be specified in mongodb.connection.string
it would not be applied to this shard specific connection string (as there might be reasons to have slightly different configuration for the initial connection and for the connection s to individual shards). For this reason the mognodb.connection.string.shard.params
property exists so that it allows specifying the query parameters of the shard specific connection string.
Note: mongodb.connection.mode: replica_set
is a bit dirty. The MongoDB documentation states that applications should not be connecting to the individual shards and as stated previously -- it exists for historical reason. Lately our community users are starting to encounter the issues connected to this mode more frequently which is something I expected and exactly the reason why mongodb.connection.mode: sharded
was introduced. We intend to deprecate the replica_set
connection mode in the future.
// ModuleID: description-of-how-the-mongodb-connector-uses-read-preference | ||
[id="read-preference"] | ||
=== Read Preference | ||
Starting with Debezium 2.4 the MongoDB connector honors the link:https://www.mongodb.com/docs/manual/core/read-preference/[read preference] configured in either xref:mongodb-property-mongodb-connection-string[`mongodb.connection.string`] or xref:mongodb-property-mongodb-connection-string-shard-params[`mongodb.connection.string.shard.params`] for sharded clusters when xref:mongodb-property-mongodb-connection-mode[`mongodb.connection.mode`] is set to `replica_set`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@roldanbob I believe this isn't right. Your wording changes the meaning. The important note here is that mongodb.connection.string
is used for everything but sharded cluster with mongodb.connection.mode
set to replica_set
. In such case mongodb.connection.string.shard.params
is used instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jcechace Apologies for taking a while to get this back to you. I ended up going a bit deeper on the content, and I think that this new version accomplishes what we want.
Co-authored-by: roldanbob <broldan@redhat.com>
@roldanbob I've squashed the commit and added all your suggestions. Please give it one final re-read to confirm this is now good for merge :). LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jcechace I just added one xref, otherwise I think you can merge this. Thanks for your patience in helping me figure out what this was all about.
Adds link to the `mongodb.connection.string.shard.params` property
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and committed the change to add a link to the mongodb.connection.string.shard.params
property.
https://issues.redhat.com/browse/DBZ-6591