msg: the main messenger also randomizes nonces instead of using 0#58673
msg: the main messenger also randomizes nonces instead of using 0#58673rzarzynski wants to merge 4 commits intoceph:mainfrom
Conversation
Lack of nonce randomization can lead to reusing existing-but-stale connection preventing a mon to join cluster which actually has been observed in the field. This 0-as-nonce behavior can be tracked at least back to 091b176 from 2012. Fixes: https://tracker.ceph.com/issues/67033 Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
| int rank = monmap.get_rank(g_conf()->name.get_id()); | ||
| std::string public_msgr_type = g_conf()->ms_public_type.empty() ? g_conf().get_val<std::string>("ms_type") : g_conf()->ms_public_type; | ||
| Messenger *msgr = Messenger::create(g_ceph_context, public_msgr_type, | ||
| entity_name_t::MON(rank), "mon", 0); |
There was a problem hiding this comment.
I thought 0 was used so that one could connect to a monitor without knowing the nonce in advance?
The addresses of other daemons come from cluster maps, but monitor addresses typically come from a configuration file and don't have nonces on them.
There was a problem hiding this comment.
I thought 0 was used so that one could connect to a monitor without knowing the nonce in advance?
The addresses of other daemons come from cluster maps, but monitor addresses typically come from a configuration file and don't have nonces on them.
Explicit NONCE_WILDCARD with nonce learning? I'm thinking about a mechanism partially analogous to the learning of own addr from peer but based on the ServerIdentFrame. Messenger already doesn't require there (at handle_server_ident) the strict equality of addrvecs precisely because of "connecting based on addresses parsed out of mon_host or something". IIUC updating AsyncConnection::target_addr at this stage should be fine as it is for reconnects. The wildcard value set to 0 should handle the compatibility with monmaps and older peers.
Overall a better name might be NONCE_LEARN_LATER.
As `entity_addr_t` needs to be aware about the wildcard concept, `Messenger::get_random_nonce()` delegates out the randomization to it. However, there is a kludge: the space of numbers becomes far smaller (32 bits instead of 64). However, effectively this was always the case as `entity_addr_t::nonce` is typed as `__u32`. Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
|
Appended some drafts on experimentation to exploit the |
|
Just to record after the yesterday's CDS RADOS:
/*
* Return addr of desired type (MSGR2 or LEGACY) or error.
* Make sure there is only one match.
*
* Assume encoding with MSG_ADDR2.
*/
int ceph_decode_entity_addrvec(void **p, void *end, bool msgr2,
struct ceph_entity_addr *addr)
{
// ...
found = false;
for (i = 0; i < addr_cnt; i++) {
ret = ceph_decode_entity_addr(p, end, &tmp_addr);
if (ret)
return ret;
dout("%s i %d addr %s\n", __func__, i, ceph_pr_addr(&tmp_addr));
if (tmp_addr.type == my_type) {
if (found) {
pr_err("another match of type %d in addrvec\n",
le32_to_cpu(my_type));
return -EINVAL;
}
memcpy(addr, &tmp_addr, sizeof(*addr));
found = true;
}
}
|
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution! |
Lack of nonce randomization can lead to reusing existing-but-stale connection preventing a mon to join cluster which actually has been observed in the field. This 0-as-nonce behavior can be tracked at least back to 091b176 from 2012.
However, in history I can't find a justification for it nor I don't see why monitors should depart from the wide spread practice of nonce randomization.
Fixes: https://tracker.ceph.com/issues/67033
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e