New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(WIP) Make controller go through attribute manager to clear transient attributes #1695
Conversation
ATTRD_OP_PEER_CLEAR will remove all attributes set for a particular node, and write the changes to the CIB. This differs from ATTRD_OP_PEER_REMOVE in that the changes are written, and the node is not purged from the peer caches. This will enable putting attrd in full control of writing transient attribute changes to the CIB. @todo This will not work in mixed-version clusters
…ibutes Previously, the controller would clear a node's transient attributes from the CIB directly, but if the attribute manager has any attributes in memory, this simply results in writing them back out again. The only right time to clear the CIB directly is when the attribute manager's memory has already been purged. This may close some CLBZ's @todo This has not been tested or fully thought through
Hi Ken, Right now, I am testing the behavior. Apparently, next week, I am likely to be able to present the first amendment that I can discuss. I am planning to present the following modifications.
(begin memo) Memo for me (thinking....now)
2:In the case of processing after executing erase_status_tag (node_name, XML_CIB_TAG_LRM,...), I think that there is a problem depending on the timing.When pengine calculation is early...(->need test) 3: The attrd thinks that after receiving ATTRD_OP_PEER_CLEAR it really needs to check that the node has left the member. (-->need test) 4: It seems effective to delete the execution of this attrd_peer_remove ().(-->need test)
5: In the case of deleting, it will be necessary to immediately reflect even if there is a timer. (->need test) The problem of attrd remaining after this correspondence.
(end memo) Best Regards, |
Hi Ken, There seems to be a mistake in your correction as much as 2 points in relation to the remote node. And... The PR that I issue includes your fix. Best Regards, |
|
||
/* Clear node's probed attribute */ | ||
update_attrd(node_name, CRM_OP_PROBED, NULL, NULL, TRUE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounded like a magic trick...
Execution of update_attrd () is required after update_attrd_clear_node (node_name, TRUE).
The remote node can not be recognized.
* cases, clear them here as well, to be sure. Importantly, this includes | ||
* the CRM_OP_PROBED ("probe_complete") attribute. | ||
*/ | ||
update_attrd_clear_node(node_name, TRUE); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* Set the CRM_OP_PROBED ("probe_complete") attribute. */
update_attrd(node_name, CRM_OP_PROBED, NULL, NULL, TRUE);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code again, I am a little confused. I believe the CRM_OP_PROBED attribute should be unnecessary since 8f76b78 . Also, remote_node_up() explictly clears it in the case of a remote node. I'm not completely sure, but I think we can remove all references to CRM_OP_PROBED from the controller (we probably need to keep the references in the scheduler for regression test purposes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Ken,
I understood this cause.
When attrd now handles remote node, we use this "probe_complete" attribute to register remote node.(crm_remote_peer_get())
Therefore, unless the "probe_complete" attribute is sent from crmd, attrd will not recognize the remote node properly.
(snip)
static attribute_value_t *
attrd_lookup_or_create_value(GHashTable *values, const char *host, xmlNode *xml)
{
attribute_value_t *v = g_hash_table_lookup(values, host);
int is_remote = 0;
crm_element_value_int(xml, F_ATTRD_IS_REMOTE, &is_remote);
if (is_remote) {
/* If we previously assumed this node was an unseen cluster node,
* remove its entry from the cluster peer cache.
*/
crm_node_t *dup = crm_find_peer(0, host);
if (dup && (dup->uuid == NULL)) {
reap_crm_member(0, host);
}
/* Ensure this host is in the remote peer cache */
CRM_ASSERT(crm_remote_peer_get(host) != NULL);
}
(snip)
By the way, the attribute at this time works properly even if it is not "probe_complete".
#if 0
//YAMAUCHI
/* Set the CRM_OP_PROBED ("probe_complete") attribute. */
update_attrd(node_name, CRM_OP_PROBED, NULL, NULL, TRUE);
#endif
update_attrd(node_name, "remote_up", NULL, NULL, TRUE);
It is like a magic trick to be recognized by the remote node to the last.
It may be better to fix it in the future, but this update_attrd () is necessary at present.
Best Regards,
Hideo Yamauchi.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is like a magic trick to be recognized by the remote node to the last.
Ah, I see, thanks.
It may be better to fix it in the future, but this update_attrd () is necessary at present.
Agreed.
Hi Hideo, That sounds good. I'll close this one. |
Hi Ken, Thank you for closing this PR. I will be able to implement PR for new discussions tomorrow. Best Regards, |
This is for discussion only. I have made sure this compiles but otherwise not done any testing.