Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: libpacemaker: Don't shuffle anonymous clone instances unnecessarily #2313

Closed
wants to merge 10 commits into from

Commits on Apr 21, 2021

  1. Refactor: libpe_status: Add pcmk__rsc_node_e enum

    This commit adds a new pcmk__rsc_node_e bitfield enum containing values
    for allocated, current, and pending. This indicates the criterion used
    to look up a resource's location (e.g., where is it now vs. where is it
    allocated?).
    
    After a compatibility break, native_location() could use these flags
    instead of an int. That would require making this enum public.
    
    Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
    nrwahl2 committed Apr 21, 2021
    Configuration menu
    Copy the full SHA
    1782352 View commit details
    Browse the repository at this point in the history
  2. Refactor: libpacemaker, libpe_status: pe_node_attribute_calculated enum

    Use the pcmk__rsc_node_e enum as a flags argument to the
    pe_node_attr_calculated() function. Pass pcmk__rsc_node_current as the
    flags argument for existing calls, as this mimics the existing behavior.
    
    Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
    nrwahl2 committed Apr 21, 2021
    Configuration menu
    Copy the full SHA
    b9e56ab View commit details
    Browse the repository at this point in the history
  3. Fix: libpacemaker, libpe_status: Get container attr from allocated node

    lookup_promotion_score() should get a container's promotion score from
    the host to which it's allocated (if it's been allocated), rather than
    the host on which it's running.
    
    pe_node_attribute_calculated() now accepts a flags variable using these
    a pcmk__rsc_node_e enum to specify whether the value should come from
    the allocated_to host or the first running_on host.
    
    Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
    nrwahl2 committed Apr 21, 2021
    Configuration menu
    Copy the full SHA
    b223509 View commit details
    Browse the repository at this point in the history
  4. Refactor: libpacemaker, libpe_status: Add PCMK__SCORE_MAX_LEN

    Maximum allowed length of a stringified score is strlen("-INFINITY"). We
    can use this instead of hard-coded integer array sizes with
    score2char_stack().
    
    Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
    nrwahl2 committed Apr 21, 2021
    Configuration menu
    Copy the full SHA
    b658f27 View commit details
    Browse the repository at this point in the history
  5. Refactor: libpacemaker: Functionalize sections of promotion_order()

    Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
    nrwahl2 committed Apr 21, 2021
    Configuration menu
    Copy the full SHA
    fb50c8a View commit details
    Browse the repository at this point in the history
  6. Test: scheduler: Apply bundle colocations in promotion order

    Currently, bundle colocations aren't considered when determining
    promotion order. Bundle colocations apply to the bundle and its
    containers; the clone wrapper is not aware of them.
    
    This commit adds four tests for positive bundle colocations:
      - bundle (promoted) with ip, where ip has an INFINITY location score
      - bundle (promoted) with ip, where ip has a positive non-INFINITY
        location score
      - ip with bundle (promoted), where ip has an INFINITY location score
      - ip with bundle (promoted), where ip has a positive non-INFINITY
        location score
    
    Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
    nrwahl2 committed Apr 21, 2021
    Configuration menu
    Copy the full SHA
    c7f8400 View commit details
    Browse the repository at this point in the history
  7. Fix: libpacemaker: Apply bundle colocations in promotion order

    Currently, bundle colocations aren't considered when determining
    promotion order. Bundle colocations apply to the bundle and its
    containers; the clone wrapper is not aware of them.
    
    This commit grabs the bundle's colocations into a working table. Then it
    iterates over the clone's children, finds each child's bundle node, and
    transfers node weights from the bundle node's host to the clone
    wrapper's copy of the bundle node object.
    
    This came up incidentally during an investigation of how bundles are
    processed when scheduling promotions.
    
    Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
    nrwahl2 committed Apr 21, 2021
    Configuration menu
    Copy the full SHA
    1a4f92a View commit details
    Browse the repository at this point in the history
  8. Test: scheduler: Clone placement when recovering promoted instance

    Currently, when a promoted instance of an anonymous clone is stopped and
    about to be recovered, the instances get shuffled. Instance 0, which is
    running in non-promoted state on another node, gets moved to the
    to-be-started-and-promoted node. Then instance 1 starts on the
    non-promoted node. This causes an unnecessary restart on the
    non-promoted node.
    
    This will be fixed in an upcoming commit.
    
    This commit adds six tests:
      - clone-anon-recover (correct)
      - clone-group-anon-recover (correct)
      - promotable-anon-recover-promoted (incorrect)
      - promotable-anon-recover-non-promoted (correct)
      - promotable-group-anon-recover-promoted (questionable)
        - correct in that it doesn't shuffle instances, but maybe incorrect
          in that it promotes a different node after one monitor failure on
          the promoted node
      - promotable-group-anon-recover-non-promoted (correct)
    
    Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
    nrwahl2 committed Apr 21, 2021
    Configuration menu
    Copy the full SHA
    1f401d2 View commit details
    Browse the repository at this point in the history
  9. Test: scheduler: Don't shuffle running anon clones due to constraints

    Currently, there are some circumstances under which running anonymous
    clones may be shuffled around the cluster. The exact requirements to
    reproduce the issue are unclear.
    
    In the case of this test CIB, the issue disappears if:
      - the colocation constraint between the Filesystem and clvm resources
        is removed, or
      - certain ones of the INFINITY location constraints for the Filesystem
        resource is removed.
    
    This will be fixed in an upcoming commit.
    
    Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
    nrwahl2 committed Apr 21, 2021
    Configuration menu
    Copy the full SHA
    4e536f6 View commit details
    Browse the repository at this point in the history
  10. Fix: libpacemaker: Don't shuffle anonymous clone instances

    Currently, anonymous clone instances may be shuffled under certain
    conditions, causing an unnecessary resource downtime when an instance is
    moved away from its current running node.
    
    For example, this can happen when a stopped promotable instance is
    scheduled to promote and the stickiness is lower than the promotion
    score (see the promotable-anon-recover-promoted test). Instance 0 gets
    allocated first and goes to the node that will be promoted, causing it
    to relocate if it's already running somewhere else.
    
    There are also some other corner cases that can trigger shuffling, like
    the one in the clone-anon-no-shuffle-constraints test.
    
    The fix is to allocate an instance to its current node during
    pre-allocation if that node is going to receive an instance at all.
    
    Previously, if instance:0 was running on node1 and got pre-allocated to
    node2 due to node2 having a higher weight, we backed out and immediately
    gave up on pre-allocating instance:0.
    
    Now, if instance:0 is running on node1 and gets pre-allocated to node2,
    we increment the "reserved" counter (to ensure we don't allocate the max
    number of instances without node2 getting one), and we make node2
    unavailable. If allocated + reserved < max, we try pre-allocating
    instance:0 again with node2 out of the picture.
    
    This commit also updates several tests that contain unnecessary instance
    moves, and it updates scores files that changed due to the fix.
    
    Resolves: RHBZ#1931023
    
    Signed-off-by: Reid Wahl <nrwahl@protonmail.com>
    nrwahl2 committed Apr 21, 2021
    Configuration menu
    Copy the full SHA
    ca476b6 View commit details
    Browse the repository at this point in the history