Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heartbeat 3.0.6 compatibility #637

Merged
merged 17 commits into from Feb 19, 2015
Merged

Heartbeat 3.0.6 compatibility #637

merged 17 commits into from Feb 19, 2015

Conversation

lge
Copy link
Member

@lge lge commented Feb 11, 2015

No description provided.

lge added 12 commits December 9, 2014 12:02
ACTIVE is defined to be MEMBER anyways:
include/crm/cluster.h:#define CRM_NODE_ACTIVE    CRM_NODE_MEMBER

Don't confuse the reader of the code
by implying it was something different.
Get rid of some spurious error messages, and speed up shutdown,
even if the connection to the stonith daemon failed.
The rest of the code deals in "online" and "offline",
not "join" and "leave". Need to map these states,
or the rest of the code won't work properly.
The "set_bit()" function used here actually deals with masks, not bit numbers.
The "flag" argument should in fact be plural: flags.

These proc flag bits are not always set one at a time,
but for example as "crm_proc_crmd | crm_proc_cpg",
and not necessarily cleared with the same combination.

Ignoring to-be-set flags just because *some* of the flag bits are
already set is clearly a bug, and may be the reason for stale process
cache information.
Don't optimistically assume that peer client processes are alive,
or that a node that can talk to us is in fact member of the same
ccm partition.

Whenever ccm tells us about a new membership, *ask* for peer client
process status.
Since the introduction of the additional F_TYPE messages
T_STONITH_NOTIFY and T_STONITH_TIMEOUT_VALUE, and their use as message
types in global heartbeat cluster messages, stonith-ng was broken on the
heartbeat cluster stack.

When delegation was made the default, and the result could only be
reaped by listening for the T_STONITH_NOTIFY message, no-one (but
stonithd itself) would ever notice successful completion,
and stonith would be re-issued forever.

Registering callbacks for these F_TYPE fixes these hung stonith and
stonith_admin operations on the heartbeat cluster stack.
…artbeat

In ha_msg_dispatch(), change from rcvmsg() to readmsg().
rcvmsg() is internally simply a wrapper around readmsg(),
which silently deletes messages without matching callback.

Use readmsg() directly here. It will only return unprocessed (by
callbacks) messages, so log a warning, notice or debug message
depending on message header information, and ha_msg_del() it ourselves.
Heartbeat 3.0.6 now may spawn the pengine directly, and will announce
this in the environment -- I introduced the setting "crmd_spawns_pengine".

This improves shutdown behavior.  Otherwise I regularly find an orphaned
pengine process after pacemaker shutdown.
This is for the "Why does my resource not start?" guys who
forgot to remove the limiting target-role setting.

Report target role (unless "Started", which is the default anyways),
if it limits our abilities (Slave, Stopped),
or if it differs from the current status.
Add back the fake meta data for old style class heartbeat resource agents
 # crm_resource --show-metadata heartbeat:Something
It was lost during the lrmd rewrite.
@lge
Copy link
Member Author

lge commented Feb 12, 2015

Oops. Some "summary" changed, obviously, because of the new (target-role:Stopped) annotation.
Can you fix that with a followup commit please? I'm not too familiar with that process,
and may mistake a real fail for a format glitch or the other was around...

lge and others added 5 commits February 13, 2015 14:46
The lrmd regression tests are unhappy with double quotes.
Also, single quotes do not need to be escaped,
now it looks better, too :-)
Semantics:
- /etc/ha.d/resource.d/*
- take optional positional arguments
- start/stop exit codes as lsb,
- status exit code ignored,
  status determined by matching stdout against
  "stopped", "not running", resp. "running", "OK",
  empty or no match is equivalent to not running.
@lge
Copy link
Member Author

lge commented Feb 13, 2015

Ok, added test case summary fixes for the target-role annotation,
and fixed the class heartbeat resource agents, including lrmd regression test cases.

beekhof added a commit that referenced this pull request Feb 19, 2015
Heartbeat 3.0.6 compatibility
@beekhof beekhof merged commit c3f10a6 into ClusterLabs:master Feb 19, 2015
@lge lge deleted the for-beekhof branch June 2, 2016 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants