Add op_defaults and rsc_defaults expressions #2045

clumens · 2020-04-27T21:11:17Z

This is an extremely early version of this patch set. I've done very minimal testing, including not really any on a live cluster. I just wanted to start getting comments to see if I am on the right track.

With the following XML blurb:

      <primitive class="stonith" id="Fencing" type="fence_xvm">
        <instance_attributes id="Fencing-instance_attributes">
          <nvpair id="Fencing-instance_attributes-ip_family" name="ip_family" value="ipv4"/>
        </instance_attributes>
        <operations>
          <op id="Fencing-monitor-interval-60s" interval="60s" name="monitor"/>
        </operations>
      </primitive>
      <primitive class="ocf" id="dummy" provider="pacemaker" type="Dummy">
        <instance_attributes id="dummy-instance_attributes">
          <nvpair id="dummy-instance_attributes-op_sleep" name="op_sleep" value="6"/>
        </instance_attributes>
        <operations>
          <op id="dummy-migrate_from-interval-0s" interval="0s" name="migrate_from" timeout="20s"/>
          <op id="dummy-migrate_to-interval-0s" interval="0s" name="migrate_to" timeout="20s"/>
          <op id="dummy-monitor-interval-60s" interval="60s" name="monitor" on-fail="stop"/>
          <op id="dummy-reload-interval-0s" interval="0s" name="reload" timeout="20s"/>
          <op id="dummy-start-interval-0s" interval="0s" name="start" timeout="20s"/>
          <op id="dummy-stop-interval-0s" interval="0s" name="stop" timeout="20s"/>
        </operations>
      </primitive>
    </resources>
    <constraints/>
    <tags/>
    <op_defaults>
      <meta_attributes id="op_defaults-options">
        <nvpair id="op_defaults-options-timeout" name="timeout" value="5s"/>
      </meta_attributes>
      <meta_attributes id="dummy-options">
        <rule id="dummy-options-timeout-rule" score="INFINITY">
          <rsc_expression id="dummy-rule-Dummy" class="ocf" provider="pacemaker" type="Dummy"/>
        </rule>
        <nvpair id="dummy-options-timeout" name="timeout" value="7s"/>
      </meta_attributes>
    </op_defaults>

And running that through crm_simulate with enough V args, I am seeing things like this:

(parse_op_key)  trace:   Action: monitor
(parse_op_key)  trace:   Resource: Fencing
(make_pairs)    trace: Checking for attributes
(pe_eval_expr)  trace: Testing rule dummy-options-timeout-rule
(pe_eval_rsc_defaults_expr)     trace: Testing rsc_defaults expression: dummy-rule-Dummy
(pe_eval_rsc_defaults_expr)     trace: Class doesn't match: ocf != stonith
(pe_eval_subexpr)       trace: Expression dummy-rule-Dummy failed on all nodes
(pe_eval_expr)  trace: Expression dummy-options-timeout-rule/dummy-rule-Dummy failed
(unpack_attr_set)       trace: Adding attributes from op_defaults-options
(populate_hash)         trace: Setting attribute: timeout = 5s

Showing that the rule does not apply for a fencing device, and:

(parse_op_key)  trace:   Action: monitor
(parse_op_key)  trace:   Resource: dummy
(make_pairs)    trace: Checking for attributes
(pe_eval_expr)  trace: Testing rule dummy-options-timeout-rule
(pe_eval_rsc_defaults_expr)     trace: Testing rsc_defaults expression: dummy-rule-Dummy
(pe_eval_subexpr)       trace: Expression dummy-rule-Dummy passed on all nodes
(pe_eval_expr)  trace: Rule dummy-options-timeout-rule passed
(unpack_attr_set)       trace: Adding attributes from dummy-options
(populate_hash)         trace: Setting attribute: timeout = 7s
(unpack_attr_set)       trace: Adding attributes from op_defaults-options

Showing that it does apply correctly for the dummy resource.

kgaillot

I think it's a solid approach.

include/crm/pengine/rules_internal.h

include/crm/pengine/rules.h

include/crm/pengine/internal.h

include/crm/pengine/rules.h

lib/pengine/utils.c

kgaillot · 2020-04-28T17:04:21Z

xml/options-3.4.rng

+   see upgrade-2.10.xsl
+   - cibtr:table for="cluster-properties"
+   -->
+  <define name="cluster_property_set.nvpair.name-value-unsupported">


@jnpkrn , I forget whether we need to keep these "unsupported" blocks once the relevant file moves beyond 3.0 -- were they only needed for the 3.0 schema specifically, or do we need to keep them for all 3.x?

lib/pengine/utils.c

kgaillot · 2020-04-30T15:00:48Z

Something went wrong with the monitor interval parsing, see the scheduler tests. Not sure where the issue is.

clumens · 2020-04-30T15:40:39Z

Something went wrong with the monitor interval parsing, see the scheduler tests. Not sure where the issue is.

Looks like it's in e993890. That patch made sense to me at the time, but not right now.

kgaillot · 2020-04-30T15:52:19Z

Looks like it's in e993890. That patch made sense to me at the time, but not right now.

It makes sense to me too. unpack_operation() is only called by custom_action(), and custom_action() always has the key. I can't think of any situation where the action key has an interval different from the action meta-attributes, but that's the only thing I can see that would be different. I'd be curious if you traced the interval parsed from the key vs the one parsed from the XML to see when it's different.

clumens · 2020-04-30T20:31:53Z

I'd be happy to hear other ideas about how I can test this further.

lib/pengine/utils.c

kgaillot · 2020-04-30T21:19:19Z

We do need to add a scheduler regression test. Come up with a CIB XML that contains several test cases, e.g. something using rsc_defaults, something using op_defaults, something using both, and something using neither. Then we have to get the results to show up in the output graph which means we'll need to trigger some action. Generally I find the easiest way is to set target-role=Stopped on an active resource, then the scheduler will want to stop it.

To add a scheduler test:

Put the CIB XML at cts/scheduler/${TEST_NAME}.xml
Run "cts/cts-scheduler --update --run $TEST_NAME" and verify the created files are as expected
Edit cts/cts-scheduler.in to add test name and description to tests

clumens · 2020-05-01T19:44:30Z

This is starting to look pretty good.

Then we have to get the results to show up in the output graph which means we'll need to trigger some action. Generally I find the easiest way is to set target-role=Stopped on an active resource, then the scheduler will want to stop it.

Just to be clear, I should mark every resource with target-role=Stopped?

kgaillot · 2020-05-01T19:57:36Z

This is starting to look pretty good.

Then we have to get the results to show up in the output graph which means we'll need to trigger some action. Generally I find the easiest way is to set target-role=Stopped on an active resource, then the scheduler will want to stop it.

Just to be clear, I should mark every resource with target-role=Stopped?

More specifically whatever resources you want to trigger a stop action for. The operation timeout etc. values won't show up anywhere in the output unless there's an action scheduled that needs them. With an action scheduled, you can verify in the .exp file that the values are what you want.

These new functions all take the same input arguments - an xmlNodePtr and a pe_rule_eval_data_t. This latter type holds all the parameters that could possibly be useful for evaluating some rule. Most functions will only need a few items out of this structure. Then, implement pe_test_*_expression in terms of these new functions.

cts/scheduler/op-rsc-defaults.xml

kgaillot · 2020-05-11T19:37:36Z

cts/scheduler/op-rsc-defaults.exp

+    <action_set>
+      <rsc_op id="10" operation="monitor" operation_key="uses-rsc_defaults-fencing_monitor_60000" on_node="cluster01" on_node_uuid="1">
+        <primitive id="uses-rsc_defaults-fencing" class="stonith" type="fence_xvm"/>
+        <attributes CRM_meta_interval="60000" CRM_meta_name="monitor" CRM_meta_on_node="cluster01" CRM_meta_on_node_uuid="1" CRM_meta_timeout="10000" />


FYI the 10s timeout here is coming from op_defaults-monitor

Ah yep, I see that. Is it reasonable to keep it this way in the test? Perhaps I should change op_defaults-monitor to something else so it doesn't get applied all over the place.

Sure, we want to test a bunch of scenarios. I.e. we should have some action in the graph using the default-default timeout, another using a general op_defaults timeout, another using an op_expression, etc.

Testing rsc_defaults is trickier if none of them appears in the graph. The way to do that would be to set something like target-role or is-managed in the XML so the simulation shows something that needs to be done (e.g. setting target-role=Stopped for a started resource will show a start action, setting is-managed=false for a resource will show "unmanaged" in the status output, etc.)

cts/scheduler/op-rsc-defaults.xml

cts/scheduler/op-rsc-defaults.exp

cts/scheduler/op-rsc-defaults.xml

lib/pengine/rules.c

clumens · 2020-05-12T20:29:03Z

To make this PR maximally confusing, let's start over on the test cases. I think I was trying to do too much in one file which was just making things more difficult. I'm pushing a new test case that only handles op_defaults. Everything here appears to be working except for op-monitor-interval-defaults. I'm not sure whether that's due to an incorrect test, incorrect testing, or a bug in the code.

kgaillot

BTW one question we left implicit is what the precedence should be when two rules match. First match, last match, or most specific match could all make sense.

We also need to document this in Pacemaker Explained in the Rules chapter

kgaillot · 2020-05-12T20:54:12Z

cts/scheduler/op-defaults.xml

+          <op_expression id="op-ping-default-expr" name="monitor"/>
+        </rule>
+        <nvpair id="op-ping-monitor-timeout" name="timeout" value="7s"/>
+      </meta_attributes>


All monitor actions (other than 10s-interval, which is covered below) should have a 7s timeout (note this applies to all resources, not just ping resources)

kgaillot · 2020-05-12T20:56:56Z

cts/scheduler/op-defaults.xml

+    <op_defaults>
+      <meta_attributes id="op-defaults">
+        <nvpair id="op-defaults-timeout" name="timeout" value="5s"/>
+      </meta_attributes>


Just translating into English, all actions not matched by something below should have a 5s timeout

kgaillot · 2020-05-12T20:57:16Z

cts/scheduler/op-defaults.xml

+          <rsc_expression id="op-dummy-default-expr" class="ocf" provider="pacemaker" type="Dummy"/>
+        </rule>
+        <nvpair id="op-dummy-timeout" name="timeout" value="6s"/>
+      </meta_attributes>


All ocf:pacemaker:Dummy actions (other than monitor, which is covered below) should have a 6s timeout

kgaillot · 2020-05-12T20:58:10Z

cts/scheduler/op-defaults.xml

+          <op_expression id="op-monitor-interval-default-expr" name="monitor" interval="10s"/>
+        </rule>
+        <nvpair id="op-monitor-interval-timeout" name="timeout" value="8s"/>
+      </meta_attributes>


All 10-second monitors should have an 8s timeout

cts/scheduler/op-defaults.exp

clumens · 2020-05-13T19:24:31Z

Updated with additional tests and documentation. The one thing I have not tested is what happens when two rules apply to the same resource. I believe the last one will take effect (and I've updated the documentation to that effect), but I need to add a test for it. That's coming next.

cts/scheduler/rsc-defaults-2.xml

kgaillot

Looks good, just had one question

doc/Pacemaker_Explained/en-US/Ch-Rules.txt

lib/pengine/rules.c

These are like all the other rule evaluating functions, but they do not have any wrappers for the older style API.

The core functions of pe_evaluate_rules, pe_test_rule, and pe_test_expression have been turned into new, similarly named functions that take a pe_rule_eval_data_t as an argument. The old ones still exist as wrappers around the new ones.

This is just to get rid of a couple extra arguments to some internal functions and make them look like the external functions.

It should now take a pe_rule_eval_data_t instead of various separate arguments. This will allow passing further data that needs to be tested against in the future (such as rsc_defaults and op_defaults). It's also convenient to make versions of pe_unpack_nvpairs and pe_unpack_versioned_attributes that take the same arguments. Then, adapt callers of pe__unpack_dataset_nvpairs to pass the new argument.

See: rhbz#1628701.

Only show the "Setting attribute:" text when it comes time to actually set the attribute. Also show the value being set. This makes it clearer that an attribute is actually being set, not just that the function is processing something.

…esource.

kgaillot · 2020-05-20T17:24:06Z

Woohoo!

clumens force-pushed the rhbz1628701 branch from 82133e5 to 6470c0b Compare April 28, 2020 16:18

kgaillot reviewed Apr 28, 2020

View reviewed changes

clumens force-pushed the rhbz1628701 branch from 6470c0b to 59dd147 Compare April 29, 2020 13:53

kgaillot reviewed Apr 29, 2020

View reviewed changes

lib/pengine/utils.c Show resolved Hide resolved

clumens force-pushed the rhbz1628701 branch from 59dd147 to f2c6bd3 Compare April 30, 2020 14:17

clumens force-pushed the rhbz1628701 branch from f2c6bd3 to c392dcd Compare April 30, 2020 17:48

kgaillot reviewed Apr 30, 2020

View reviewed changes

lib/pengine/utils.c Show resolved Hide resolved

clumens force-pushed the rhbz1628701 branch 3 times, most recently from 11fe44b to 82f8460 Compare May 11, 2020 15:10

clumens changed the title ~~WIP: Add op_defaults and rsc_defaults expressions~~ Add op_defaults and rsc_defaults expressions May 11, 2020

clumens added 2 commits May 11, 2020 13:32

Feature: scheduler: Add new expression_type values.

2f10dde

kgaillot reviewed May 11, 2020

View reviewed changes

clumens force-pushed the rhbz1628701 branch from 82f8460 to a002f9d Compare May 12, 2020 20:29

kgaillot reviewed May 12, 2020

View reviewed changes

clumens force-pushed the rhbz1628701 branch 2 times, most recently from 221aab8 to 9965b78 Compare May 13, 2020 19:23

kgaillot reviewed May 14, 2020

View reviewed changes

cts/scheduler/rsc-defaults-2.xml Outdated Show resolved Hide resolved

clumens force-pushed the rhbz1628701 branch from b63ebff to 7db2dc5 Compare May 14, 2020 20:18

kgaillot reviewed May 15, 2020

View reviewed changes

doc/Pacemaker_Explained/en-US/Ch-Rules.txt Show resolved Hide resolved

doc/Pacemaker_Explained/en-US/Ch-Rules.txt Show resolved Hide resolved

doc/Pacemaker_Explained/en-US/Ch-Rules.txt Outdated Show resolved Hide resolved

doc/Pacemaker_Explained/en-US/Ch-Rules.txt Show resolved Hide resolved

clumens force-pushed the rhbz1628701 branch from 7db2dc5 to 342e878 Compare May 15, 2020 18:50

kgaillot reviewed May 15, 2020

View reviewed changes

lib/pengine/rules.c Outdated Show resolved Hide resolved

clumens added 15 commits May 18, 2020 10:09

Feature: scheduler: Add new rule tests for op_defaults and rsc_defaults.

56a1337

These are like all the other rule evaluating functions, but they do not have any wrappers for the older style API.

Refactor: scheduler: Add rule_data to unpack_data_s.

ea63182

This is just to get rid of a couple extra arguments to some internal functions and make them look like the external functions.

Refactor: scheduler: unpack_operation should be static.

ad06f60

Refactor: scheduler: Pass interval to unpack_operation.

7e57d95

Feature: scheduler: Pass rsc_defaults and op_defaults data.

e4c411d

See: rhbz#1628701.

Feature: xml: Add rsc_expression and op_expression to the XML schema.

57eedca

Test: scheduler: Add a regression test for op_defaults.

d358543

Test: scheduler: Add a regression test for rsc_defaults.

6706792

Test: scheduler: Add a regression test for op_defaults with an AND expr.

bcfe068

Doc: Pacemaker Explained: Add documentation for rsc_expr and op_expr.

017b783

Test: scheduler: Add a test for multiple rules applying to the same r…

b8dd16c

…esource.

Test: scheduler: Add a test for rsc_defaults not specifying type.

b9ccde1

clumens force-pushed the rhbz1628701 branch from 342e878 to b9ccde1 Compare May 18, 2020 14:09

kgaillot merged commit 81d4b39 into ClusterLabs:master May 20, 2020

clumens deleted the rhbz1628701 branch May 21, 2020 15:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add op_defaults and rsc_defaults expressions #2045

Add op_defaults and rsc_defaults expressions #2045

clumens commented Apr 27, 2020

kgaillot left a comment

kgaillot Apr 28, 2020

kgaillot commented Apr 30, 2020

clumens commented Apr 30, 2020

kgaillot commented Apr 30, 2020

clumens commented Apr 30, 2020

kgaillot commented Apr 30, 2020

clumens commented May 1, 2020

kgaillot commented May 1, 2020

kgaillot May 11, 2020

clumens May 12, 2020

kgaillot May 12, 2020

clumens commented May 12, 2020

kgaillot left a comment

kgaillot May 12, 2020

kgaillot May 12, 2020

kgaillot May 12, 2020

kgaillot May 12, 2020

clumens commented May 13, 2020

kgaillot left a comment

kgaillot commented May 20, 2020

Add op_defaults and rsc_defaults expressions #2045

Add op_defaults and rsc_defaults expressions #2045

Conversation

clumens commented Apr 27, 2020

kgaillot left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kgaillot commented Apr 30, 2020

clumens commented Apr 30, 2020

kgaillot commented Apr 30, 2020

clumens commented Apr 30, 2020

kgaillot commented Apr 30, 2020

clumens commented May 1, 2020

kgaillot commented May 1, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clumens commented May 12, 2020

kgaillot left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clumens commented May 13, 2020

kgaillot left a comment

Choose a reason for hiding this comment

kgaillot commented May 20, 2020