-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add op_defaults and rsc_defaults expressions #2045
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a solid approach.
see upgrade-2.10.xsl | ||
- cibtr:table for="cluster-properties" | ||
--> | ||
<define name="cluster_property_set.nvpair.name-value-unsupported"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnpkrn , I forget whether we need to keep these "unsupported" blocks once the relevant file moves beyond 3.0 -- were they only needed for the 3.0 schema specifically, or do we need to keep them for all 3.x?
Something went wrong with the monitor interval parsing, see the scheduler tests. Not sure where the issue is. |
Looks like it's in e993890. That patch made sense to me at the time, but not right now. |
It makes sense to me too. unpack_operation() is only called by custom_action(), and custom_action() always has the key. I can't think of any situation where the action key has an interval different from the action meta-attributes, but that's the only thing I can see that would be different. I'd be curious if you traced the interval parsed from the key vs the one parsed from the XML to see when it's different. |
I'd be happy to hear other ideas about how I can test this further. |
We do need to add a scheduler regression test. Come up with a CIB XML that contains several test cases, e.g. something using rsc_defaults, something using op_defaults, something using both, and something using neither. Then we have to get the results to show up in the output graph which means we'll need to trigger some action. Generally I find the easiest way is to set target-role=Stopped on an active resource, then the scheduler will want to stop it. To add a scheduler test:
|
This is starting to look pretty good.
Just to be clear, I should mark every resource with target-role=Stopped? |
More specifically whatever resources you want to trigger a stop action for. The operation timeout etc. values won't show up anywhere in the output unless there's an action scheduled that needs them. With an action scheduled, you can verify in the .exp file that the values are what you want. |
11fe44b
to
82f8460
Compare
These new functions all take the same input arguments - an xmlNodePtr and a pe_rule_eval_data_t. This latter type holds all the parameters that could possibly be useful for evaluating some rule. Most functions will only need a few items out of this structure. Then, implement pe_test_*_expression in terms of these new functions.
cts/scheduler/op-rsc-defaults.exp
Outdated
<action_set> | ||
<rsc_op id="10" operation="monitor" operation_key="uses-rsc_defaults-fencing_monitor_60000" on_node="cluster01" on_node_uuid="1"> | ||
<primitive id="uses-rsc_defaults-fencing" class="stonith" type="fence_xvm"/> | ||
<attributes CRM_meta_interval="60000" CRM_meta_name="monitor" CRM_meta_on_node="cluster01" CRM_meta_on_node_uuid="1" CRM_meta_timeout="10000" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI the 10s timeout here is coming from op_defaults-monitor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yep, I see that. Is it reasonable to keep it this way in the test? Perhaps I should change op_defaults-monitor to something else so it doesn't get applied all over the place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we want to test a bunch of scenarios. I.e. we should have some action in the graph using the default-default timeout, another using a general op_defaults timeout, another using an op_expression, etc.
Testing rsc_defaults is trickier if none of them appears in the graph. The way to do that would be to set something like target-role or is-managed in the XML so the simulation shows something that needs to be done (e.g. setting target-role=Stopped for a started resource will show a start action, setting is-managed=false for a resource will show "unmanaged" in the status output, etc.)
To make this PR maximally confusing, let's start over on the test cases. I think I was trying to do too much in one file which was just making things more difficult. I'm pushing a new test case that only handles op_defaults. Everything here appears to be working except for op-monitor-interval-defaults. I'm not sure whether that's due to an incorrect test, incorrect testing, or a bug in the code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW one question we left implicit is what the precedence should be when two rules match. First match, last match, or most specific match could all make sense.
We also need to document this in Pacemaker Explained in the Rules chapter
<op_expression id="op-ping-default-expr" name="monitor"/> | ||
</rule> | ||
<nvpair id="op-ping-monitor-timeout" name="timeout" value="7s"/> | ||
</meta_attributes> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All monitor actions (other than 10s-interval, which is covered below) should have a 7s timeout (note this applies to all resources, not just ping resources)
<op_defaults> | ||
<meta_attributes id="op-defaults"> | ||
<nvpair id="op-defaults-timeout" name="timeout" value="5s"/> | ||
</meta_attributes> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just translating into English, all actions not matched by something below should have a 5s timeout
<rsc_expression id="op-dummy-default-expr" class="ocf" provider="pacemaker" type="Dummy"/> | ||
</rule> | ||
<nvpair id="op-dummy-timeout" name="timeout" value="6s"/> | ||
</meta_attributes> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All ocf:pacemaker:Dummy actions (other than monitor, which is covered below) should have a 6s timeout
<op_expression id="op-monitor-interval-default-expr" name="monitor" interval="10s"/> | ||
</rule> | ||
<nvpair id="op-monitor-interval-timeout" name="timeout" value="8s"/> | ||
</meta_attributes> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All 10-second monitors should have an 8s timeout
221aab8
to
9965b78
Compare
Updated with additional tests and documentation. The one thing I have not tested is what happens when two rules apply to the same resource. I believe the last one will take effect (and I've updated the documentation to that effect), but I need to add a test for it. That's coming next. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just had one question
These are like all the other rule evaluating functions, but they do not have any wrappers for the older style API.
The core functions of pe_evaluate_rules, pe_test_rule, and pe_test_expression have been turned into new, similarly named functions that take a pe_rule_eval_data_t as an argument. The old ones still exist as wrappers around the new ones.
This is just to get rid of a couple extra arguments to some internal functions and make them look like the external functions.
It should now take a pe_rule_eval_data_t instead of various separate arguments. This will allow passing further data that needs to be tested against in the future (such as rsc_defaults and op_defaults). It's also convenient to make versions of pe_unpack_nvpairs and pe_unpack_versioned_attributes that take the same arguments. Then, adapt callers of pe__unpack_dataset_nvpairs to pass the new argument.
See: rhbz#1628701.
Only show the "Setting attribute:" text when it comes time to actually set the attribute. Also show the value being set. This makes it clearer that an attribute is actually being set, not just that the function is processing something.
Woohoo! |
This is an extremely early version of this patch set. I've done very minimal testing, including not really any on a live cluster. I just wanted to start getting comments to see if I am on the right track.
With the following XML blurb:
And running that through crm_simulate with enough V args, I am seeing things like this:
Showing that the rule does not apply for a fencing device, and:
Showing that it does apply correctly for the dummy resource.