Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/haproxy: add reloading support, use upstream service hardening #88434

Merged
merged 1 commit into from May 31, 2020

Conversation

@pstch
Copy link
Contributor

pstch commented May 20, 2020

Motivation for this change

The current systemd service for haproxy doesn't support reloading. Reloading is useful for haproxy as it allows changing the configuration or updating the SSL certificates without losing the reverse proxy state (open connections, backend status, etc), and without missing requests during the restart time. Additionally, upstream recommends several configuration changes to the systemd service, mostly related to hardening, that are not used in the current systemd service.

Things done
  • Update the systemd service using the example provided by upstream
    • Enable reloadIfChanged by default
    • Store PID file in /run/haproxy/haproxy.pid
    • Add ExecReload commands for seamless reloading
    • Consider 143 as a success exit code
    • Change Restart to always
    • Add hardening options (NoNewPrivileges,Protect* and SystemCallFilter)
  • Add new tests for the reload feature
    • Check that systemctl reload haproxy succeeds
    • Check that no HTTP requests are lost during reloading
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.
Possible improvements/changes
  • Changing back Restart to on-failure, to preserve existing semantics
  • Not enabling reloadIfChanged by default, to preserve existing semantics
  • The reload test may be a bit fragile currently, I can try to improve it if needed.
@@ -60,15 +60,31 @@ with lib;
description = "HAProxy";
after = [ "network.target" ];
wantedBy = [ "multi-user.target" ];
reloadIfChanged = true;

This comment has been minimized.

Copy link
@aanderse

aanderse May 22, 2020

Contributor

I'm not sure you actual want this... I'm under the impression if a CVE were reported and fixed then you updated your OS the service would continue to run with the existing vulnerable binary. A manual stop and start would be required to get the newly patched binary.

This comment has been minimized.

Copy link
@pstch

pstch May 22, 2020

Author Contributor

The reload mechanism implemented here actually starts a completely new process, and kills the old one. The transition is implemented through a socket which forwards the sockets between the two processes. There is no shared memory between the two processes, only a few environment variables are passed from the old master process to the new binaries.

I have manually tested that the old binary is gone after reloading, and I can try to write a NixOS test for that if it is necessary (comparing binaries before/after reloading).

This comment has been minimized.

Copy link
@aanderse

aanderse May 23, 2020

Contributor

@pstch thanks for mentioning that. I'll have to review some scripts as I was under a different impression.

with subtest("seamless reload"):
machine.systemctl("start haproxy-reloader")
machine.succeed("echo http://localhost:80/index.txt | http-getter -i1 -n8 -l1")

This comment has been minimized.

Copy link
@pstch

pstch May 22, 2020

Author Contributor

I am not sure this test is solid enough. Depending on the test machine's performance, the reloading might not complete before http-getter stops sending requests, so it may not even be tested. If necessary I can write a proper multi-threaded test that ensures that we keep sending requests until haproxy has finished reloading, and that HTTP connections opened before the reload are still open afterwards.

This comment has been minimized.

Copy link
@flokli

flokli May 24, 2020

Contributor

https://www.haproxy.com/de/blog/truly-seamless-reloads-with-haproxy-no-more-hacks/ mentions a quite sophisticated load generator setup to reproduce these issues, and only on a very small percentage.

I doubt this will uncover these issues - maybe just change the test to do a simple reload, and ensure it still replies afterwards.

This comment has been minimized.

Copy link
@pstch

pstch May 24, 2020

Author Contributor

Right, I will change the test to ensure that it's still replying afterwards.

# support seamless reload
ExecReload = [
"${pkgs.haproxy}/sbin/haproxy -c -f ${haproxyCfg}"
"${pkgs.coreutils}/bin/kill -USR2 $MAINPID"

This comment has been minimized.

Copy link
@flokli

flokli May 23, 2020

Contributor

How does the reload behaviour work?

Does the "${pkgs.haproxy}/sbin/haproxy -c -f ${haproxyCfg}" exit? Or will it become the new "main process"?

What is $MAINPID pointing to? Will it change after a reload?

This comment has been minimized.

Copy link
@pstch

pstch May 23, 2020

Author Contributor

Yes, "${pkgs.haproxy}/sbin/haproxy -c -f ${haproxyCfg}" exits, it is just used to check the configuration file.

$MAINPID never changes, as the master process re-executes itself when receiving $USR2.

You're right that I missed something in the reload behaviour. The master process uses exec(argv[0]) to re-execute itself, but that won't work as-is because the binary's path will change on updates.

I've updated the PR to make the main process execute from a symlink to the real binary (in /run/haproxy/haproxy).

This comment has been minimized.

Copy link
@flokli

flokli May 23, 2020

Contributor

Hrm, I'm not sure if this brings us any further to the goal of only restarting where necessary.

I spend quite some thoughts earlier on how to handle reloads and restarts, and reloadIfChanged currently doesn't really work the way it should. I just wrote down in more detail an idea on how to fix this properly in #49528 (comment).

After something like this has landed, we could probably just make use of haproxy's "bind to a specific fd" functionality to provide seamless reloads: https://news.ycombinator.com/item?id=8004153

This comment has been minimized.

Copy link
@pstch

pstch May 23, 2020

Author Contributor

Changes of the underlying binary, change of sandboxing or other systemd options should basically always restart the service, as there's no way to apply these to an already running process.

In haproxy's case, changing the underlying binary should not restart the service, since haproxy can change it itself while preserving the connections using expose-fd which is already in use in this NixOS module. The last change I made (using a symlink in /run/haproxy/haproxy to start haproxy) allows this to work properly.

It's important to avoid haproxy restarts as much as we can, because in production this can cause significant disruption (losing all established connections), so if the changes you are proposing in #49528 are implemented, we need to expose a setting allowing haproxy to be reloaded even if the binary is updated.

After something like this has landed, we could probably just make use of haproxy's "bind to a specific fd" functionality to provide seamless reloads: https://news.ycombinator.com/item?id=8004153

This is already enabled in this NixOS module (expose-fds is set globally), but it wasn't used before this PR since it only works with Reload : if the old process is killed before the new one is started, the FDs cannot be forwarded.

Indeed, reloadIfChanged comes with the problem that sandboxing options are never updated, and that's a problem that should be fixed.

This comment has been minimized.

Copy link
@flokli

flokli May 23, 2020

Contributor

haproxy goes further than what the linked comment proposes - it doesn't yet talk about applications that support re-executing themselves with a new binary.

If we add support for something like this, generating the /run/haproxy/haproxy symlink in ExecStartPre easily won't work.
The line in ExecStartPre will change on a new haproxy binary (so the logic would normally decide to restart), and even if we'd add a special case for that, the symlink won't be updated if you systemctl reload after a new binary has been deployed.

If we do necessary changes in the module, can we make use of systemd sockets for seamless restarts, of haproxy.service, by making use of its expose-fds haproxy functionality?

This comment has been minimized.

Copy link
@pstch

pstch May 23, 2020

Author Contributor

[...] won't be updated if you systemctl reload after a new binary has been deployed.

Ah yes, but that's just because I forgot to recreate the symlink in Reload. I corrected this in the last push.

If we do necessary changes in the module, can we make use of systemd sockets for seamless restarts, of haproxy.service, by making use of its expose-fds haproxy functionality?

We are already using expose-fds in this PR, it's used to transfer the socket between the old and new binary. Using systemd sockets should be possible, but I don't think that it is a good solution, as it would force the user to configure the bound addresses outside of haproxy's configuration. It also breaks the possibility for the user to bind to additional addresses using haproxy's CLI, something that is very important in HA scenarios.

In my opinion, the current form of this PR is the best way to allow seamless reloads. Of course, if the changes you propose in #49258 are implemented, we would need some way to indicate that the service should not be restarted even if the package is updated.

EDIT: If you like it better, I can drop the reloadIfChanged option from this PR. I think the rest of the work done to allow seamless reloads should still be included so that users can enable it themselves, although I also think that this should be the default behaviour : not being able to apply new systemd sandboxing options is much less of a problem than losing connections when restarting.

This comment has been minimized.

Copy link
@flokli

flokli May 24, 2020

Contributor

Yes, please remove the reloadIfChanged for now, as it's not always working.

Please also add a small comment next to where we create the binary symlink about haproxy using exec(argv[0]) to re-execute itself, so people understand why we create this symlink.

I assume without reloadIfChanged, our activation script will still just systemctl restart haproxy.service, right?

This comment has been minimized.

Copy link
@pstch

pstch May 24, 2020

Author Contributor

Ok, I will do that, and add a comment for the exec(argv[0]) code.

I assume without reloadIfChanged, our activation script will still just systemctl restart haproxy.service, right?

Yes.

@pstch pstch force-pushed the pstch:patch-2 branch from 747a5db to fac0130 May 23, 2020
@pstch
Copy link
Contributor Author

pstch commented May 23, 2020

Updated the PR to make the main process execute from a symlink to the binary (in /run/haproxy/haproxy), so that it picks up new versions.

@pstch pstch force-pushed the pstch:patch-2 branch from fac0130 to 2bf30d0 May 23, 2020
@pstch
Copy link
Contributor Author

pstch commented May 23, 2020

Updated the PR to add an indirection for the configuration file, so that it's picked up when the master process reloads.

@pstch pstch force-pushed the pstch:patch-2 branch 5 times, most recently from b13aaf8 to 83fb6a4 May 23, 2020
@pstch
Copy link
Contributor Author

pstch commented May 25, 2020

@flokli I force-pushed a commit with the requested changes:

  • removing reloadIfChanged
  • adding a comment to justify the symlink
  • changing the test to only verify that the proxy is still answering after a reload

I had to add a delay in the reload test, to ensure that the test request is handled by the new workers (haproxy took ~300ms to reload on my machine).

nixos/modules/services/networking/haproxy.nix Outdated Show resolved Hide resolved
nixos/tests/haproxy.nix Outdated Show resolved Hide resolved
@pstch pstch force-pushed the pstch:patch-2 branch 2 times, most recently from 3dcfae8 to e0423ff May 27, 2020
@aanderse aanderse mentioned this pull request May 29, 2020
4 of 10 tasks complete
@flokli
Copy link
Contributor

flokli commented May 29, 2020

@talyz, @peterhoeg, would you mind taking another look at this?

@peterhoeg
Copy link
Member

peterhoeg commented May 29, 2020

Have you tried reload with an invalid config file? Does it leave the "old" instance running?

@pstch pstch force-pushed the pstch:patch-2 branch from e0423ff to 87b4813 May 29, 2020
@pstch
Copy link
Contributor Author

pstch commented May 29, 2020

@peterhoeg Yes. I had added a check of the configuration in ExecReload, but it seems I accidentally dropped it. I updated the PR to add it again, now if the configuration file is invalid the old instance will keep running, but reloading will return a non-zero exit code.

Copy link
Contributor

flokli left a comment

I assume we want to do the config test with the (possibly new) haproxy binary, but we only want to flip the symlink if the reload was successful - otherwise, this could mess up manual reloads.

nixos/modules/services/networking/haproxy.nix Outdated Show resolved Hide resolved
Refactor the systemd service definition for the haproxy reverse proxy,
using the upstream systemd service definition. This allows the service
to be reloaded on changes, preserving existing server state, and adds
some hardening options.
@pstch pstch force-pushed the pstch:patch-2 branch from 87b4813 to c784d3a May 31, 2020
@flokli
flokli approved these changes May 31, 2020
@flokli
Copy link
Contributor

flokli commented May 31, 2020

Thanks!

@flokli flokli merged commit 09a7612 into NixOS:master May 31, 2020
13 checks passed
13 checks passed
Evaluation Performance Report Evaluator Performance Report
Details
grahamcofborg-eval ^.^!
Details
grahamcofborg-eval-check-meta config.nix: checkMeta = true
Details
grahamcofborg-eval-darwin nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="c784d3a"; rev="c784d3ab76e872dc28c1ea62137e9f95106e6c58"; } ./pkgs/t
Details
grahamcofborg-eval-lib-tests nix-build --arg pkgs import ./. {} ./lib/tests/release.nix
Details
grahamcofborg-eval-nixos nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="c784d3a"; rev="c784d3ab76e872dc28c1ea62137e9f95106e6c58"; } ./nixos/
Details
grahamcofborg-eval-nixos-manual nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="c784d3a"; rev="c784d3ab76e872dc28c1ea62137e9f95106e6c58"; } ./nixos/
Details
grahamcofborg-eval-nixos-options nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="c784d3a"; rev="c784d3ab76e872dc28c1ea62137e9f95106e6c58"; } ./nixos/
Details
grahamcofborg-eval-nixpkgs-manual nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="c784d3a"; rev="c784d3ab76e872dc28c1ea62137e9f95106e6c58"; } ./pkgs/t
Details
grahamcofborg-eval-nixpkgs-tarball nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="c784d3a"; rev="c784d3ab76e872dc28c1ea62137e9f95106e6c58"; } ./pkgs/t
Details
grahamcofborg-eval-nixpkgs-unstable-jobset nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="c784d3a"; rev="c784d3ab76e872dc28c1ea62137e9f95106e6c58"; } ./pkgs/t
Details
grahamcofborg-eval-package-list nix-env -qa --json --file .
Details
grahamcofborg-eval-package-list-no-aliases nix-env -qa --json --file . --arg config { allowAliases = false; }
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.