Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/pacemaker: updates to allow ocf:heartbeat:IPaddr2 to function #208298

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

mitchty
Copy link
Contributor

@mitchty mitchty commented Dec 30, 2022

Description of changes

Adds updates to pacemaker nixos module and ocf-resource-agents to allow the IPaddr2 resource to function at least under pacemaker theoretically should fix other users of those resource agents too but that is untested.

Updated the pacemaker tests to also add a vip resource and for both the cat resource and vip resource ensure that a systemctl restart pacemaker works, and also that afterwards the resources migrate. As well as a subsequent crash of the node migrates resources as well.

For now the vip is only pinged after a restart or vm crash, future updates could tie both the vip and the netcat resources together but it is largely unnecessary for these tests as they exist today.

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.05 Release Notes (or backporting 22.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
    • (Release notes changes) Ran nixos/doc/manual/md-to-db.sh to update generated release notes
  • Fits CONTRIBUTING.md.

Fixes: #207891

Fix might be a bit blunt but without it it cannot write/update to the
/var/lib/pacemaker/cib/ directory and fails to come up.

There might be a better systemd way to do this but for now this seems
serviceable.
Added/refactored pacemaker tests to test for vip IPaddr2 resource usage. Future
work might be to wrap the config into nixos module options for corosync.

Other ocfs:heartbeat:* resources might need similar work and/or tests added. For
now I only validated IPaddr2 as thats what I need.

I also punted on linking the vip to the cat test resource for now and simply
ping the vip address in the test to see if it is moving or not.
@mitchty
Copy link
Contributor Author

mitchty commented Sep 4, 2023

Should I do anything more here to get this merged into master? Its kinda useful for a basic pacemaker setup that moves vips around and the unit tests added should convey that.

serviceConfig = {
ExecStartPost = "${pkgs.coreutils}/bin/chown -R hacluster:pacemaker /var/lib/pacemaker";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary? Shouldn’t the tmpfilesd rule above handle this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't recall been a year let me see if its still needed,could be detritus from testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so its not "technically" needed but what can happen is if the main daemon running as root saves the current configuration in the event of a separate node failure, the pengine backup could get saved as root and not pacemaker and causes pacemaker to die when it tries to start and read the prior configuration after fork() in one of the children processes.

I can't seem to get it to replicate though so maybe it was a bug in earlier versions? But basically that was the rationale its less systemd and more the interaction of the daemon and its children that run as the hacluster user. I could swap it to just be a find and to find any non user owned files in the state dir and chown only those this is a bit sledgehammery.

@wegank wegank added 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 2.status: merge conflict labels Mar 19, 2024
@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Mar 20, 2024
@wegank wegank added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pacemaker doesn't appear to be fully functional in general after restarting and ocf resources don't work
5 participants