Resource agents vs next/ra-api.ng #10

marxsk · 2018-01-10T13:18:02Z

In order to map inconsistencies, I'm creating this bug to follow what should be fixed:

remove element from every element as it is no longer valid for new versions
actions:
- optional attribute role: master/slave (rename to lowercase)
- rename validate-all to existing verify-all
- replace status with monitor (pcs handles "monitor" op specially, but (deprecated) "status" is very same pcs#76) (remove from OCF 2.x)
- new actions:
  - restart
  - reload
  - notify
  - methods
  - migrate_from, migrate_to (I propose renaming to migrate-from, migrate-to; to follow meta-data, verify-all)

@oalbrigt please comment what is useful

krig · 2018-01-10T13:25:46Z

verify-all vs. validate-all is a strange case. The document and most of the resource agents use validate-all (not all though), and the XML schema uses verify-all. Is one from rgmanager and the other from Linux-HA perhaps?

marxsk · 2018-01-10T13:28:06Z

I'm happy to switch to validate-all, we have it almost everywhere

marxsk · 2018-01-10T13:32:49Z

Should actions be marked with 'replaced-by' as parameters to properly rename migrate_to/migrate_from/status ?

oalbrigt · 2018-01-10T13:38:24Z

I guess we'll just update the case statements for the existing agents to e.g. "migrate-to|migrate_to" to avoid issues for users upgrading.

oalbrigt · 2018-01-10T13:41:17Z

...and update the metadata to only show the new actions.

kgaillot · 2018-01-16T15:58:43Z

More RA vs OCF1.0 inconsistencies to clear up:

For stateful (master/slave) clones, we need to standardize everything ... even the promote/demote/notify actions are not in the spec. I'd like to take the opportunity to change the terminology, and use "stateful" instead of "master/slave", "promoted" instead of "master", and "unpromoted" instead of "slave". It will be a good idea for RAs to accept both sets of terms for a long transition period, so maybe "MUST" for the new terminology and "MAY" or "SHOULD" for the old (we could follow that model for all renamed syntax, such as migrate-to/migrate_to).
We need to decide whether to bless the Pacemaker clone notification variables (OCF_RESKEY_CRM_meta_notify_type, etc.) or leave them implementation-defined. Similarly for the other special information that Pacemaker passes to agents (local node name, etc.). Standardizing them would be, well, more standard, but would guarantee divergence as implementations evolve, and would constrain any future alternate implementations to Pacemaker's current model. Leaving them implementation-defined means agents become implementation-specific, but that might be acceptable. If agents have a standard way of identifying the implementation, that would make it cleaner, e.g. if pacemaker do this else if newthing do that. It would be easy for pacemaker to identify itself via an environment variable, but it would be more complicated to get all possible callers (pcs, crm, GUIs, administrator scripts, manual command-line, etc.) to do the same when they want the pacemaker behavior.
Location of agents and shell include files. The standard says OCF_ROOT=/usr/ocf and RAs in $OCF_ROOT/resource.d, with no mention of includes. In practice, everyone uses OCF_ROOT=/usr/lib/ocf with agents in $OCF_ROOT/resource.d and includes in $OCF_ROOT/lib. The FHS followed by many distros would suggest no OCF_ROOT, /usr/lib/ocf for includes and /usr/libexec/ocf for agents. My suggestion: the standard should leave the two locations implementation-defined, and RAs should use some standard environment variable (e.g. OCF_INC_DIR) for the include location when present, otherwise use their own built-in (implementation-defined) default. (That has some nice benefits such as allowing testing of includes in an alternate location such as a repo checkout.) The RA location could either be a single directory as now, or potentially a path (like /usr/libexec/ocf/resource.d:/usr/local/ocf/resource.d:~/ocf/resource.d).
The meaning of exit status codes should be clarified (and probably new ones added). The most glaring example is exit status 6 ("Program is not configured"). The interpretation that most RAs use, which seems more consistent with the standard wording, is that there is no usable service configuration on the local host (and thus is a host-specific error). However, Pacemaker currently interprets this as "the parameters passed to the agent are invalid", which is not host-specific but global to the cluster. We should have a clearly defined error code for each condition. Potential new exit codes include "promote failed but running unpromoted without problems" and "omg fence this host".
The reload action not only needs to be added to the standard, but clarified (and possibly separated into two commands). All known RAs that implement reload interpret it to mean "call the service's native functionality to reload its local configuration". However, Pacemaker interprets it to mean "reloadable parameters passed to the agent have changed". I think we need separate actions for each meaning. For the record, Andrew Beekhof believes two actions are unnecessary, RAs should just do both interpretations whenever reload is called.

I'm assuming this issue is just for reconciling existing RA usage with the OCF standard, not proposing new features, which would be much longer :)

marxsk · 2018-01-16T17:22:55Z

Yes, this is only to create version 1.1 without any real extensions which will follow current state as much as possible. Although it is quite likely that we will have to do quite minor changes in order to cover all kind of existing resource agents.

krig · 2018-01-16T21:41:19Z

For stateful (master/slave) clones, we need to standardize everything ... even the promote/demote/notify actions are not in the spec. I'd like to take the opportunity to change the terminology, and use "stateful" instead of "master/slave", "promoted" instead of "master", and "unpromoted" instead of "slave".

Can I suggest "primary" and "secondary"? In particular "unpromoted" sounds awkward.

kgaillot · 2018-01-16T22:03:18Z

For stateful (master/slave) clones, we need to standardize everything ... even the promote/demote/notify actions are not in the spec. I'd like to take the opportunity to change the terminology, and use "stateful" instead of "master/slave", "promoted" instead of "master", and "unpromoted" instead of "slave".

Can I suggest "primary" and "secondary"? In particular "unpromoted" sounds awkward.

I'm OK with primary/secondary. The reason I didn't suggest it initially was to avoid terms that were already associated with particular software, to emphasize that the cluster functionality is application-agnostic. (master/slave, master/worker, master/replicant, primary/secondary, primary/backup)

I think "promoted" works b/c that's all pacemaker really cares about. "demoted" would be OK but suggests something was actively done whereas it may have just been left in the not-promoted state. "default" or "started" could also work.

I'll bring this up on the users@clusterlabs.org mailing list for discussion, since everyone is likely to have an opinion :)

krig · 2018-01-16T22:27:55Z

"promoted" and "started" seem sensible to me, but yes, I'm sure there will be more opinons ;)

marxsk · 2018-01-22T10:01:41Z

As @oalbrigt tested, all our resource agents work with -next release without any change.

marxsk mentioned this issue Jan 10, 2018

Next chunk of minor updates #11

Closed

marxsk closed this as completed Jan 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resource agents vs next/ra-api.ng #10

Resource agents vs next/ra-api.ng #10

marxsk commented Jan 10, 2018 •

edited

krig commented Jan 10, 2018

marxsk commented Jan 10, 2018

marxsk commented Jan 10, 2018

oalbrigt commented Jan 10, 2018

oalbrigt commented Jan 10, 2018

kgaillot commented Jan 16, 2018

marxsk commented Jan 16, 2018

krig commented Jan 16, 2018

kgaillot commented Jan 16, 2018

krig commented Jan 16, 2018

marxsk commented Jan 22, 2018

Resource agents vs next/ra-api.ng #10

Resource agents vs next/ra-api.ng #10

Comments

marxsk commented Jan 10, 2018 • edited

krig commented Jan 10, 2018

marxsk commented Jan 10, 2018

marxsk commented Jan 10, 2018

oalbrigt commented Jan 10, 2018

oalbrigt commented Jan 10, 2018

kgaillot commented Jan 16, 2018

marxsk commented Jan 16, 2018

krig commented Jan 16, 2018

kgaillot commented Jan 16, 2018

krig commented Jan 16, 2018

marxsk commented Jan 22, 2018

marxsk commented Jan 10, 2018 •

edited