Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource agents vs next/ra-api.ng #10

Closed
marxsk opened this issue Jan 10, 2018 · 11 comments
Closed

Resource agents vs next/ra-api.ng #10

marxsk opened this issue Jan 10, 2018 · 11 comments

Comments

@marxsk
Copy link
Contributor

marxsk commented Jan 10, 2018

In order to map inconsistencies, I'm creating this bug to follow what should be fixed:

  • remove element from every element as it is no longer valid for new versions
  • actions:

@oalbrigt please comment what is useful

@krig
Copy link
Contributor

krig commented Jan 10, 2018

verify-all vs. validate-all is a strange case. The document and most of the resource agents use validate-all (not all though), and the XML schema uses verify-all. Is one from rgmanager and the other from Linux-HA perhaps?

@marxsk
Copy link
Contributor Author

marxsk commented Jan 10, 2018

I'm happy to switch to validate-all, we have it almost everywhere

@marxsk
Copy link
Contributor Author

marxsk commented Jan 10, 2018

Should actions be marked with 'replaced-by' as parameters to properly rename migrate_to/migrate_from/status ?

@oalbrigt
Copy link

I guess we'll just update the case statements for the existing agents to e.g. "migrate-to|migrate_to" to avoid issues for users upgrading.

@oalbrigt
Copy link

...and update the metadata to only show the new actions.

@kgaillot
Copy link
Contributor

More RA vs OCF1.0 inconsistencies to clear up:

  • For stateful (master/slave) clones, we need to standardize everything ... even the promote/demote/notify actions are not in the spec. I'd like to take the opportunity to change the terminology, and use "stateful" instead of "master/slave", "promoted" instead of "master", and "unpromoted" instead of "slave". It will be a good idea for RAs to accept both sets of terms for a long transition period, so maybe "MUST" for the new terminology and "MAY" or "SHOULD" for the old (we could follow that model for all renamed syntax, such as migrate-to/migrate_to).

  • We need to decide whether to bless the Pacemaker clone notification variables (OCF_RESKEY_CRM_meta_notify_type, etc.) or leave them implementation-defined. Similarly for the other special information that Pacemaker passes to agents (local node name, etc.). Standardizing them would be, well, more standard, but would guarantee divergence as implementations evolve, and would constrain any future alternate implementations to Pacemaker's current model. Leaving them implementation-defined means agents become implementation-specific, but that might be acceptable. If agents have a standard way of identifying the implementation, that would make it cleaner, e.g. if pacemaker do this else if newthing do that. It would be easy for pacemaker to identify itself via an environment variable, but it would be more complicated to get all possible callers (pcs, crm, GUIs, administrator scripts, manual command-line, etc.) to do the same when they want the pacemaker behavior.

  • Location of agents and shell include files. The standard says OCF_ROOT=/usr/ocf and RAs in $OCF_ROOT/resource.d, with no mention of includes. In practice, everyone uses OCF_ROOT=/usr/lib/ocf with agents in $OCF_ROOT/resource.d and includes in $OCF_ROOT/lib. The FHS followed by many distros would suggest no OCF_ROOT, /usr/lib/ocf for includes and /usr/libexec/ocf for agents. My suggestion: the standard should leave the two locations implementation-defined, and RAs should use some standard environment variable (e.g. OCF_INC_DIR) for the include location when present, otherwise use their own built-in (implementation-defined) default. (That has some nice benefits such as allowing testing of includes in an alternate location such as a repo checkout.) The RA location could either be a single directory as now, or potentially a path (like /usr/libexec/ocf/resource.d:/usr/local/ocf/resource.d:~/ocf/resource.d).

  • The meaning of exit status codes should be clarified (and probably new ones added). The most glaring example is exit status 6 ("Program is not configured"). The interpretation that most RAs use, which seems more consistent with the standard wording, is that there is no usable service configuration on the local host (and thus is a host-specific error). However, Pacemaker currently interprets this as "the parameters passed to the agent are invalid", which is not host-specific but global to the cluster. We should have a clearly defined error code for each condition. Potential new exit codes include "promote failed but running unpromoted without problems" and "omg fence this host".

  • The reload action not only needs to be added to the standard, but clarified (and possibly separated into two commands). All known RAs that implement reload interpret it to mean "call the service's native functionality to reload its local configuration". However, Pacemaker interprets it to mean "reloadable parameters passed to the agent have changed". I think we need separate actions for each meaning. For the record, Andrew Beekhof believes two actions are unnecessary, RAs should just do both interpretations whenever reload is called.

I'm assuming this issue is just for reconciling existing RA usage with the OCF standard, not proposing new features, which would be much longer :)

@marxsk
Copy link
Contributor Author

marxsk commented Jan 16, 2018

Yes, this is only to create version 1.1 without any real extensions which will follow current state as much as possible. Although it is quite likely that we will have to do quite minor changes in order to cover all kind of existing resource agents.

@krig
Copy link
Contributor

krig commented Jan 16, 2018

For stateful (master/slave) clones, we need to standardize everything ... even the promote/demote/notify actions are not in the spec. I'd like to take the opportunity to change the terminology, and use "stateful" instead of "master/slave", "promoted" instead of "master", and "unpromoted" instead of "slave".

Can I suggest "primary" and "secondary"? In particular "unpromoted" sounds awkward.

@kgaillot
Copy link
Contributor

For stateful (master/slave) clones, we need to standardize everything ... even the promote/demote/notify actions are not in the spec. I'd like to take the opportunity to change the terminology, and use "stateful" instead of "master/slave", "promoted" instead of "master", and "unpromoted" instead of "slave".

Can I suggest "primary" and "secondary"? In particular "unpromoted" sounds awkward.

I'm OK with primary/secondary. The reason I didn't suggest it initially was to avoid terms that were already associated with particular software, to emphasize that the cluster functionality is application-agnostic. (master/slave, master/worker, master/replicant, primary/secondary, primary/backup)

I think "promoted" works b/c that's all pacemaker really cares about. "demoted" would be OK but suggests something was actively done whereas it may have just been left in the not-promoted state. "default" or "started" could also work.

I'll bring this up on the users@clusterlabs.org mailing list for discussion, since everyone is likely to have an opinion :)

@krig
Copy link
Contributor

krig commented Jan 16, 2018

"promoted" and "started" seem sensible to me, but yes, I'm sure there will be more opinons ;)

@marxsk
Copy link
Contributor Author

marxsk commented Jan 22, 2018

As @oalbrigt tested, all our resource agents work with -next release without any change.

@marxsk marxsk closed this as completed Jan 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants