Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rule support for location constraints #43

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

mib1185
Copy link

@mib1185 mib1185 commented May 17, 2024

Hi @OndrejHome

this is my approach to add rule support for location constraints. In fact there is no generic possibility to fetch a matching constraint rule (like we do for prefers constraints by checking the node-name) I thought we need to add an identifier for rule-based constraints.
it is tested with

  tasks:
    - ansible.builtin.import_role:
        name: ondrejhome.pcs-modules-2
    - pcs_constraint_location:
        resource: "RES_Webserver_VIP"
        rule: "pingd lt 1 or not_defined pingd"
        constraint_id: "location-RES_Webserver_VIP-1"
        score: "-INFINITY"
      run_once: true
    - pcs_constraint_location:
        resource: "RES_Webserver_VIP"
        rule: "pingd lt 1"
        constraint_id: "location-RES_Webserver_VIP-2"
        score: "-INFINITY"
      run_once: true

Todo

  • add docs and example

@OndrejHome
Copy link
Owner

Hello Michael (@mib1185),

Thank you for using pcs-modules-2 modules and for finding time to prepare this PR!

From initial look at code this is looks promising and I can see that you have implemented checks for content of the rule to match what pacemaker 2.1 would expect there - nice job!

Before having a better look at code itself I want to clarify few things:

    1. On what systems this code was tested/developed so far?
      (For example: system runing ansible was Fedora 40 using ansible version 2.16.5, the systems running pacemaker cluster were Fedora 39 with pacemaker 2.1.7 and pcs version 0.11.x)
    1. Which systems this you would like this code to be used with? (This could be same systems as in previous question or also some additional systems)
    1. Should you code also support the score-attribute for location rule or not? (from code I assume that it is not considering it and it is okay to not implement it)

As for the testing, documentation and examples I can see that Pacemaker Exaplained 2.1 chapter 10 Rules looks like a nice place to get some inspiration from on what to test.

Looking at pcs man page I think having one example for each type of rule should be sufficient. In other words: one example with defined, one example with <attribute > lt, one example with date lt, etc. But feel free to add more if you think that there is a good practical reason for it.

...
rule add <constraint id> [id=<rule id>] [role=master|slave] [score=<score>|score-attribute=<attribute>] <expression>
    Add a rule to a constraint where the expression looks like one of the following:

      defined|not_defined <attribute>

      <attribute> lt|gt|lte|gte|eq|ne [string|integer|version] <value>

      date gt|lt <date>

      date in_range <date> to <date>

      date in_range <date> to duration <duration options>...

      date-spec <date spec options>...

      <expression> and|or <expression>

      ( <expression> )
    where duration options and date spec options are: hours, monthdays, weekdays, yeardays, months, weeks, years, weekyears, moon If score is ommited it defaults to INFINITY. If id is ommited one is generated from the constraint id. 
...

As for documentation it is sufficient to update DOCUMENTATION and EXAMPLES variables in pcs_constraint_location.py.

Once there is documentation with examples and it is clear which systems this targets I will give it a try and provide you with feedback on code level. Note that I usually have time for checking code only during weekends so there might be no update from me on weekdays.

Once again Thank you for preparing the code and I will be looking to hear from you. Feel free to ask if you have any questions and feel free to take the time needed.

--
Ondrej (온드레이)

@mib1185
Copy link
Author

mib1185 commented May 18, 2024

Hello @OndrejHome

many thanks for fast reply and of course also developing this ansible module and the corresponding ansible role 👍

From initial look at code this is looks promising and I can see that you have implemented checks for content of the rule to match what pacemaker 2.1 would expect there - nice job!

yeah ... it take me some time to fetch (hopefully) all possible variants of rules 🙈

On what systems this code was tested/developed so far?
(For example: system runing ansible was Fedora 40 using ansible version 2.16.5, the systems running pacemaker cluster were Fedora 39 with pacemaker 2.1.7 and pcs version 0.11.x)

For developing the module code itself i've used an ubuntu 22.04 (WSL2).
All tests has been running on debian 12 with python 3.11 and ansible 2.14.4 (used within a mcr.microsoft.com/devcontainers/python:3.11 dev-container).
The target systems are based on debian 12 with pcs (0.11.5) and pacemaker (2.1.5) from debian package repository.

Which systems this you would like this code to be used with? (This could be same systems as in previous question or also some additional systems)

They are the same as above.

Should you code also support the score-attribute for location rule or not? (from code I assume that it is not considering it and it is okay to not implement it)

TBH i wasn't aware of the score-attribute, better said that there are two different score and score-attribute attributes 🙈 At the moment only the score paramater is supported and already considered here:

constraint_match = compare_rule_to_element(rule, constr_rule) and score == constr_rule.attrib.get("score")

As for documentation it is sufficient to update DOCUMENTATION and EXAMPLES variables in pcs_constraint_location.py.

I'll add documentation and examples soon, but maybe earliest week after next.

Once there is documentation with examples and it is clear which systems this targets I will give it a try and provide you with feedback on code level. Note that I usually have time for checking code only during weekends so there might be no update from me on weekdays.

Take your time and every feedback is highly appreciated 🙂

Regards,
Michael

@mib1185
Copy link
Author

mib1185 commented May 28, 2024

Hello @OndrejHome i've added the doc strings and some examples.

@OndrejHome
Copy link
Owner

Hello Michael (@mib1185 ),

Thank you for the explanations and addition of examples!

I like to see the pingd made it into examples and have one little nitpick (= no need to fix) to it - what about 'not_defined pingd or pingd lt 1' instead of current 'pingd lt 1 or not_defined pingd' ?
My reasoning for that is that we don't need to compare anything if the pingd attribute doesn't exists and also when people read (from left to right) trying to figure out where problem is it gives better hint to check if attribute exists first instead of searching for what value it is ;)

As for the score-attribute I have also not seen an use case in area I deal with so it is okay for me if that is not implemented here. Thank you for confirmation on that one.

I have done few tests so far and plan to find time to finish them by the end of this week to not let you wait for too long. If it will work functionally as expected on my systems I plan to merge this PR, give repository a new tag and let you know what it is.

--
Ondrej (온드레이)

@OndrejHome OndrejHome assigned OndrejHome and unassigned mib1185 Jun 3, 2024
@mib1185
Copy link
Author

mib1185 commented Jun 3, 2024

Hi Ondrej (@OndrejHome),

many thanks for this positiv feedback and reply 🙂

what about 'not_defined pingd or pingd lt 1' instead of current 'pingd lt 1 or not_defined pingd' ?

yeah, I just took the example from the official pcs docs about Moving Resources Due to Connectivity Changes 🙈

check if attribute exists first instead of searching for what value it is ;)

... but I also fully agree with you 👍

I have done few tests so far and plan to find time to finish them by the end of this week to not let you wait for too long. If it will work functionally as expected on my systems I plan to merge this PR, give repository a new tag and let you know what it is.

I'm looking forward for your test results 🤞
Further I'll already prepare a PR in ansible.ha-cluster-pacemaker to let it also accept the needed parameters for rule based location constraints.

regards,
Michael

@OndrejHome
Copy link
Owner

Hello Michael (@mib1185),

So far I have following results from my testing on Alma-8.10/Alma-9.4/Fedora-40/Debian-12.5:


  • issue-A - I get syntax errors when trying to use pcs_constraint_location when target systems are Alma-8.10 and Alma-9.4 (no such errors for Debian-12.5 nor Fedora-40). This happens also for tasks that were working before the introduction of new code and that don't need to use constrain rules.
The error was: TypeError: unsupported operand type(s) for |: 'type' and 'type'
Traceback (most recent call last):
  File \"/root/.ansible/tmp/ansible-tmp-1717840047.8084233-741730-22906470171929/AnsiballZ_pcs_constraint_location.py\", line 107, in <module>
    _ansiballz_main()\r\n  File \"/root/.ansible/tmp/ansible-tmp-1717840047.8084233-741730-22906470171929/AnsiballZ_pcs_constraint_location.py\", line 99, in _ansiballz_main
    invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)
  File \"/root/.ansible/tmp/ansible-tmp-1717840047.8084233-741730-22906470171929/AnsiballZ_pcs_constraint_location.py\", line 48, in invoke_module
    run_name='__main__', alter_sys=True)
  File \"/usr/lib64/python3.6/runpy.py\", line 205, in run_module
    return _run_module_code(code, init_globals, run_name, mod_spec)
  File \"/usr/lib64/python3.6/runpy.py\", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)\r\n  File \"/usr/lib64/python3.6/runpy.py\", line 85, in _run_code
    exec(code, run_globals)
  File \"/tmp/ansible_pcs_constraint_location_payload_vlhycfbn/ansible_pcs_constraint_location_payload.zip/ansible/modules/pcs_constraint_location.py\", line 129, in <module>
  File \"/tmp/ansible_pcs_constraint_location_payload_vlhycfbn/ansible_pcs_constraint_location_payload.zip/ansible/modules/pcs_constraint_location.py\", line 130, in DateSpec
    TypeError: unsupported operand type(s) for |: 'type' and 'type'

Looking at what the Alma systems might be upset about I have tried changing code in class definition to following to make it work:

...
129 class DateSpec:
130 #    hours: str | int = None
131 #    monthdays: str | int = None
132 #    weekdays: str | int = None
133 #    yeardays: str | int = None
134 #    months: str | int = None
135 #    weeks: str | int = None
136 #    years: str | int = None
137 #    weekyears: str = None
138 #    moon: str | int = None
139     hours: str = None
140     monthdays: str = None
141     weekdays: str = None
142     yeardays: str = None
143     months: str = None
144     weeks: str = None
145     years: str = None
146     weekyears: str = None
147     moon: str = None
...

  • issue-B - on all tested systems (debian-12.5,fedora-40,alma-8.10,alma-9.4) the example task with data-spec is not idempotent and always reports 'changed' even if there is no need for it.
    The task is this one:
- name: resource resA prefers to run on the current node during working hours
  pcs_constraint_location:
    resource: 'resA'
    constraint_id: 'resA_working_hours'
    rule: 'date-spec hours="9-16" weekdays="1-5"'
    score: 'INFINITY'

Every time I run above task I can see this:

TASK [resource resA prefers to run on the current node during working hours] *********
changed: [192.168.22.71]

While expectation is that it should just be 'ok' once it was created, so there seems to be issue with comparing the data and determining when "rule is same or not".

I have tried following to see if I get any better results:

  • specify only one item for date_spec, for example only 'hours' --> same issue
  • specify all possible items for date-spec --> same issue
- name: resource resA prefers to run on the current node during working hours - weekdays only
  pcs_constraint_location:
    resource: 'resA'
    constraint_id: 'resA_working_hours_weekdays'
    rule: 'date-spec weekdays="1-5"'
    score: 'INFINITY'
- name: resource resA date_spec ALL
  pcs_constraint_location:
    resource: 'resA'
    constraint_id: 'resA_date_spec_all'
    rule: 'date-spec hours="1" monthdays="1" weekdays="1" yeardays="1" months="1" weeks="1" years="1" weekyears="1" moon="1"'
    score: 'INFINITY'

Issue is also present in case the following task is used:

- name: resource resA date_range with duration
  pcs_constraint_location:
    resource: 'resA'
    constraint_id: 'resA_range_2022_2023_duration'
    rule: 'date in_range 2022-01-01 to duration weeks=2'
    score: 'INFINITY'

NOTE: some old systems (looking at pacemaker-1.1 don't have all possibilities for date-spec and future pacemaker-3.x is planning to drop support for 'moon' date-spec. So ideally we should compare only the values that are in CIB and that user specified in task as parameter to module.


  • issue-C - which seems to be rather extension of issue-A is that I have tried also some fairly old pacemaker - CentOS 7.9 with pacemaker 1.1.23 :) - and because the target system uses 'Python 2.7.5' the new syntax from python3 was making it quite unhappy. I have further removed the modern syntax from code and with changes below I have managed to get it to work - with exception of issue-B that persists, so it seems to me that it will be some logical issue.
--- pcs_constraint_location.py	2024-06-03 23:53:50.000000000 +0900
+++ roles/ondrejhome.pcs-modules-2/library/pcs_constraint_location.py	2024-06-08 20:03:18.446049489 +0900
@@ -127,24 +127,33 @@
 from ansible.module_utils.basic import AnsibleModule
 
 class DateSpec:
-    hours: str | int = None
-    monthdays: str | int = None
-    weekdays: str | int = None
-    yeardays: str | int = None
-    months: str | int = None
-    weeks: str | int = None
-    years: str | int = None
-    weekyears: str = None
-    moon: str | int = None
+    hours = None
+    monthdays = None
+    weekdays = None
+    yeardays = None
+    months = None
+    weeks = None
+    years = None
+    weekyears = None
+    moon = None
 
-    def __init__(self, expression: str):
+    def __init__(self, expression):
         for match_group in re.findall(
             r"(hours|monthdays|weekdays|yeardays|months|weeks|years|weekyears|moon)=['\"]?(\w+)['\"]?\s*",
             expression,
         ):
             setattr(self, match_group[0], match_group[1])
 
-    def compare(self, xml: ET.Element) -> bool:
+    def compare(self, xml):
         """Check if given XML element matches the date-spec expression."""
         if any(
             [
@@ -163,14 +172,14 @@
         return True
 
 class RscLocationRuleExpression:
-    operation: str
-    attribute: str = None
-    value: str = None
-    start: str = None
-    end: str = None
-    date_spec: DateSpec = None
+    operation = None
+    attribute = None
+    value = None
+    start = None
+    end = None
+    date_spec = None
 
-    def __init__(self, expression: str):
+    def __init__(self, expression):
         # expression: date gt|lt <date>
         exp_parsed = re.search(r"^date\s+(gt|lt)\s+(.*)$", expression)
         if exp_parsed:
@@ -216,7 +225,7 @@
             self.value = exp_parsed.group(3)
             return
 
-    def compare(self, xml: ET.Element) -> bool:
+    def compare(self, xml):
         """Check if given XML element matches the rule expression."""
         date_spec = xml.find("duration") or xml.find("date_spec")
         if any(
@@ -240,7 +249,7 @@
 
         return True
 
-def compare_rule_to_element(rule_string: str, xml_rule: ET.Element) -> bool:
+def compare_rule_to_element(rule_string, xml_rule):
     boolean_op = xml_rule.attrib.get("boolean-op")
     if boolean_op and " %s " % boolean_op not in rule_string:
         return False

Hope the above is somewhat readable. If there are any questions, feel free to just ask. I would like to have addressed at minimum the issue-A and issue-B before merge. I will help with testing of solution for issue-C as that is for fairly old, but still in-production systems that I know of.

--
Ondrej (온드레이)

@mib1185
Copy link
Author

mib1185 commented Jun 8, 2024

Hi Ondrej,

many thanks for your test efforts.

  • issue-A is just related to older python versions (older than 3.10), which does not support these shorthand type annotations - will rewrite them to use the typing.Union form
  • issue-B needs to be investigated, will do so
  • issue-C - if we need to support python 2.7, than yes all type hints (annotations) needs to be removed, since annotations are a python 3 only feature.

I'm on a business trip next week, so I'll care about these points earliest the week after.

Regards,
Michael

@mib1185
Copy link
Author

mib1185 commented Jun 19, 2024

c0837e2 should solve issue A and C

@mib1185
Copy link
Author

mib1185 commented Jun 20, 2024

Hi Ondrej,

issue A and C should be solved no (just removed all annotations).

About issue B - could you please provide me the CIB xml files of your test clusters, where you have set these constraint rules? (feel free to contact me via discord).

NOTE: some old systems (looking at pacemaker-1.1 don't have all possibilities for date-spec and future pacemaker-3.x is planning to drop support for 'moon' date-spec. So ideally we should compare only the values that are in CIB and that user specified in task as parameter to module.

This is already ensured, since the possible attributes are initialized as None in class DateSpec and xml.get() has default return None

Regards,
Michael

@OndrejHome
Copy link
Owner

2024-06-24-issue-b-debian-12.5-files.tgz

Hi Michael,

Thank you for the changes. For the issue-b I'm attaching the playbook I have used for testing and CIB (pcs cluster cib) from time before running playbook, then after 1st run, 2nd run and 3rd run. There is visibly nothing changed in CIB, but module will delete and recreate the constraint. Please let me know if those file are enough or more are needed.

--
Ondrej

@mib1185
Copy link
Author

mib1185 commented Jun 26, 2024

Hi Ondrej,

there was an issue in the regex for finding the date spec parts (\w does only match [a-zA-Z0-9_] so the - is missing ... \S will match every non-space character). With this issue-b should also be solved.

regards,
Michael

@OndrejHome
Copy link
Owner

Hi Michael,

I'm not sure if I'm testing it wrong, but even with new match I still run into same issue - the task is always 'changed'. I have briefly tested it on Debian-12.5, Fedora 40 and even tried to create one new Debian-12.5 cluster but issue-B was still present there. Looking at pcs cluster cib I don't see any changes in it even after several runs of playbook shared previously. I plan to have a bit better look over upcoming weekend on this. Thank you for your efforts!

--
Ondrej

@mib1185
Copy link
Author

mib1185 commented Jul 2, 2024

Hi Ondrej,

thanks for your tests - I'll check it further the upcoming days 👍

Regards,
Michael

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pcs_constraint_location: allow specification of rules
2 participants