Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow redeploying to interactive tests #281331

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Radvendii
Copy link
Contributor

Description of changes

  • adds a command to the output of driverInteractive, as well as in its own output redeploy, which will deploy the current test machine configuration to an actively running driverInteractive
  • Adds an extra machine to driverInteractive which serves as a jump host so that the user can ssh into machines in the test (including to do the redeploy).
  • Adds an ssh config to the output of driverInteractive, as well as in its own output sshConfig to take advantage of that jump host. This can be used with ssh -F ./result/ssh_config
  • document changes (haven't yet done. waiting for feedback)
  • automatically start jump host. Currently this setup requires the manual step of running redeploy_jumphost.start() after starting the interactive driver (couldn't figure out how to do this. does anyone have ideas?)

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.05 Release Notes (or backporting 23.05 and 23.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@Radvendii
Copy link
Contributor Author

I opened the issue (#281332) for discussion of alternate solutions and whether this is an issue worth solving at all.

@Radvendii
Copy link
Contributor Author

Radvendii commented Feb 13, 2024

Tagging some people who have touched the testing infrastructure recently, and therefore might be qualified to comment / advise

@roberth @K900 @tfc

Comment on lines +55 to +69
# taken from nixos-rebuild (we only want to do the activate part)
cmd=(
"systemd-run"
"-E" "LOCALE_ARCHIVE"
"--collect"
"--no-ask-password"
"--pty"
"--quiet"
"--same-dir"
"--service-type=exec"
"--unit=nixos-rebuild-switch-to-configuration"
"--wait"
"''${configPaths[$machine]}/bin/switch-to-configuration"
"test"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Why test instead of switch? Shouldn't a reboot of the VM boot the new config?
  • This systemd-run invocation should not be duplicated, but be factored out. I've proposed ${toplevel}/bin/apply in NixOS apply script [refactor + feature] #266290

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • test rather than switch because the VMs don't have bootloaders, so switch will fail.
  • are you proposing we hold off on this until NixOS apply script [refactor + feature] #266290 is merged, or that I should work on it myself, or that there's some other way we should factor it out in the meantime in this PR? I'm happy with any of them (except holding off indefinitely if it's not likely to be merged soon)

ProxyJump redeploy_jumphost
'';

redeploy = hostPkgs.writeShellScriptBin "redeploy" ''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer for this to be implemented in the (python) driver instead, but I'll defer to @tfc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love if redeploy could be put in the python driver. It could maybe be more powerful in that case, since it could also deploy new machines / power off ones that have been decommissioned.

But I'm not sure how that's possible. The python driver only has access to the configuration that existed at the time it was spun up. The whole point is that we then reconfigure the machines, and while the old driver is still running, redeploy the new configurations.

@roberth
Copy link
Member

roberth commented Feb 14, 2024

Question: The nodes are observable by the rest of the test, which will probably lead to heisenbugs. Assuming there isn't a command that just forwards socket connections to an address and port on a VDE network, perhaps you could use a different option for this extra node config; not nodes.<something>?

@tfc
Copy link
Contributor

tfc commented Feb 15, 2024

It feels like this should neither increase the complexity of the python code nor of the standard test module code.
Can we maybe have this similar to profiles in the NixOS modules? So that it is not included by default and people who don't use it don't have to regard it?

The reason for my thinking is that this is quite an opinionated way to do it (i.e. not switching on a few flags but even adding another host that ought to be used as jumphost etc.) and might not automatically play well with any test.

@Radvendii
Copy link
Contributor Author

Points well taken. To summarize

  1. This is a disruptive way of doing this, such that it shouldn't be on by default.

  2. Mostly this is because it adds an extra machine

I will work on finding a way around (2). But if there isn't a clean way to do it in the end, maybe the solution is more to put this in NUR (or a flake), and in Nixpkgs the only change necessary would be to expose the underlying module system structure of the tests, so that it can be extended outside of Nixpkgs.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/tweag-nix-dev-update-55/40996/1

@RaitoBezarius
Copy link
Member

cc @nikstur as folks working on the NixOS tests ecosystem have not been paged.

@K900
Copy link
Contributor

K900 commented Mar 8, 2024

Hard disagree. If we have slow iteration times, we should be removing complexity, not adding more complexity to bypass original complexity.

@Radvendii
Copy link
Contributor Author

It's not complexity that leads to the slow iteration time, it's the time to restart the VM and spin up all the services again. For some tests this is quick, but for others it can be slow and there's not much we can do to change that if the test needs services that are slow to start.

There's also the issue of: if I am running the interactive driver, and making stateful changes to test things out, and then I want to make a change to the configuration and test that out, I now have to redo all of the stateful changes I made. It's a question of workflow getting disrupted.

@Radvendii
Copy link
Contributor Author

There's another approach demonstrated here: https://github.com/tweag/nix-hour/blob/master/templates/vm/configuration.nix#L164-L167

Which mounts the relevant directory into the VM so that you can run some version of nixos-rebuild from inside the VM and it has the latest version of the source code.

This only works though because the directory structure is highly constrained. NixOS tests, and really any NixOS VM are Nix derivations that can come from, in theory, any Nix code, so you can't know how to extract the derivation from the directory, or even which directory to mount. There could even be references to arbitrary other parts of the file system.

I don't see a way to solve that problem...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants