-
-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configure NixOS from EC2 user-data #7370
Conversation
Actually looks pretty good and simple. Without hooking into the uevent system I'm not sure if there's a better way than polling to tell when networking is available, and unfortunately we can't use systemd's management. |
Why can't we use systemd's management? If it's about running nixos-rebuild from a service, I'm sure that can be accomplished. |
|
||
echo "Success" | ||
|
||
curl -s http://169.254.169.254/2011-01-01/user-data > /etc/nixos/amazon-init.nix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not write this to configuration.nix
? That's more standard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted a simple place that would make it clear what came from user-data and what didn't. That way, configuration.nix comes pre-wired to import from amazon-init.nix but if you want to jump in and change it, you can tell it to stop importing amazon-init.nix, or use the existing file differently and so on. It also feels weird to auto-write to a location that people typically treat as human-managed. If someone were generating their own AMI, they wouldn't run the risk of getting their handmade configuration clobbered by my startup script.
Yeah, I think it would be good to make this co-exist with the user data format used by NixOps (e.g., encode the It could be something like:
|
@edolstra I pretty much documented all my attempts at using systemd on the other ticket and failed. If someone can get it working, that'd be nice, but @rbvermaa also said it felt wrong to have a systemd service doing this and said he preferred the current way. About the format, I'm not super-opposed to that, but it feels like a historical accident would be making this less usable. Now I can't just go to the console and look at my NixOS configuration because it's base64-encoded because NixOps (which I don't even use) can't deal with it otherwise? It just seems like such a pity that we have this super-nice configuration format (nix itself) and are dropping it in favor of a hard-to-use simple key-value mapping that forces us to preprocess the input and not be able to read it later. This feels like a "let's rip off the bandaid and deal with a little bit of transition pain" kind of situation to me. For example, the configuration I included in the original ticket would not be human-readable right now if I followed that scheme. If you insist on the NixOps format, how should I represent the channels? |
@edolstra what if I devise some sort of real transition plan that would allow NixOps to continue to work unchanged for now, but also simultaneously supported the format I wanted (I haven't yet decided how to achieve that). Then there would be no real transition pain, but NixOps would switch over at some point to my proposed format without breaking anyone. |
Yeah, my only concern is that we can have a single AMI that supports both NixOps and configuration from user data. Note that (It would probably be good to move |
How about this:
|
@edolstra I think that makes sense, thanks! Will update the PR in the next couple of days and report back if I run into any issues 😄 Did my reasoning for using a separate |
I have no strong feelings about |
@copumpkin Are you still working on this? It sounds pretty slick. 😄 |
Yep, just haven't had a chance to finish it off yet! |
@copumpkin I'm working on this right now. |
A couple questions: When rebooting and/or rerunning the fetch-ec2-data service, what should the behavior be?
|
Here's a thought: we could ensure the whole thing happens only once by putting a file in the AMI (e.g. |
It already does only happen once! When it reconfigures itself from userdata, it removes the reconfiguration script from the current configuration (unless you explicitly add it back, of course). |
Re: clobbering /root/.nix-channels, I figured it would be fine because at that point no user will have ever touched the system. Once they touch the system, it won't get clobbered again due to the behavior I just mentioned. |
@copumpkin Ah, that makes sense - you could just overwrite the I see where you're copying over the Here's what I have: master...cstrahan:nixos-userdata |
The point is that the So instead, I build the image from an "external" configuration.nix that specifies that I want the machine to reconfigure itself on startup. Then after it does that, it reconfigures itself to not reconfigure itself anymore (because presumably the user doesn't ask for it to keep reconfiguring itself, although that's certainly a possibility too) It's a little subtle and there's a phasing thing going on that makes it tricky to think about, I think. |
Gotcha, just clicked a moment ago :). |
@copumpkin Ok, I think my branch should work - I'll give it a test tomorrow. |
@copumpkin How do you build an AMI? |
@cstrahan there's a script somewhere in the nixpkgs source tree, but I've been using https://gist.github.com/copumpkin/6df9c50630ed5fc5abb5 with a self-signed cert. There's a few different ways to build an AMI but I'm using a fairly simple S3-based instance store one. |
@copumpkin Thanks! I don't know what else needs to change in our maintainer scripts and such to support building EBS-backed AMIs (which must be created from an existing instance), so I'll have to leave that to someone else. /cc @edolstra |
There are already scripts in in nixpkgs to automate the creation of EBS-backed AMIs as well, near the one I mentioned earlier. All the stuff is in here. |
So what's the status on this? Is there anything left to do? I'd like to have three VM tests for it (to make sure NixOps-style userdata works, to make sure new-style userdata works, and that no userdata works), but would also like to get it into the 15.06 release. @domenkozar what's the cutoff date for getting things in there? |
8f65a2d
to
bb25461
Compare
This is almost ready to be used. It depends on #8204 and #8013 (which also depends on #8204). I now have two VM tests, built on top of the 169.254.169.254 user-data simulation functionality I put together in #8013. One tests that we still work properly with the NixOps user-data format, and the other ensures that we process the new-style configuration properly. The main remaining thing I'd like to do is not force the user-data to contain a channel marker. To do that, I need to pre-populate the channel from the "host" building the image, so that it matches the actual content of the image. I expect not specifying a channel to be the common case, because it means minimal downloads to rebuild the system. Not having this means that one of the two tests is fairly slow, since it has to download a ton of NixOS stuff from the internet to reconfigure itself. There's still a lot of stuff I'd clean up but we're getting close. Looking forward to feedback from anyone interested. |
…rahan). Now with VM tests!
9f49f46
to
4b758e3
Compare
Since I have a VM test in place that makes sure that the NixOps-style userdata still works, I'm going to go ahead and merge this, treating the "configure from userdata" portion as a "beta". Beta in the sense that I'm still not sure what the best way to do it is, but I want to iterate on it a bit and it'll be easier to do once I have something concrete that's building and multiple people can experiment with. |
Configure NixOS from EC2 user-data (beta)
@copumpkin - it looks like as of 15.09, this is available in <nixpkgs/nixos/modules/virtualization/amazon-init.nix>, but the default AMIs don't have that in their stock configuration.nix. Am I reading correctly? |
Any word on when we might get a 15.xx AMI with this fix? Or alternatively, is it safe to just copy the commit results to the module and then perform a rebuild to initialize the fix? |
I'll probably finish fixing the VM test for it this weekend and then I
|
Thanks, @copumpkin. I haven't even had a chance to look at the Hydra stuff yet. But plan to soon. |
I wonder if we have network connectivity in postBoot. I got this in my log (in reverse order) and my user-data is not written to /etc/nix/configuration.nix
|
I've observed the behavior @aycanirican describes, in 16.03. Would the script fit better in |
Yes, it's supposed to have network at that stage. In fact we even access the network in stage 1. |
Is it possible that the script needs something which gets set in |
This commit migrates the Nomad package from the 0.10.x line of releases to 0.11.X. This allows us to also bump the version of Go that is used to 1.14.x. NOTE: 1.14.x will be needed for the rest of the 0.11.x releases as Nomad only bumps patch versions of Go within a release series. CHANGELOG: FEATURES: Container Storage Interface [beta]: Nomad has expanded support of stateful workloads through support for CSI plugins. Exec UI: an in-browser terminal for connecting to running allocations. Audit Logging (Enterprise): Audit logging support for Nomad Enterprise. Scaling APIs: new scaling policy API and job scaling APIs to support external autoscalers Task Dependencies: introduces lifecycle stanza with prestart and sidecar hooks for tasks within a task group BACKWARDS INCOMPATIBILITIES: driver/rkt: The Rkt driver is no longer packaged with Nomad and is instead distributed separately as a driver plugin. Further, the Rkt driver codebase is now in a separate repository. IMPROVEMENTS: core: Optimized streaming RPCs made between Nomad agents [NixOSGH-7044] build: Updated to Go 1.14.1 [NixOSGH-7431] consul: Added support for configuring enable_tag_override on service stanzas. [NixOSGH-2057] client: Updated consul-template library to v0.24.1 - added support for working with consul connect. Deprecated vault_grace [NixOSGH-7170] driver/exec: Added no_pivot_root option for ramdisk use [NixOSGH-7149] jobspec: Added task environment interpolation to volume_mount [NixOSGH-7364] jobspec: Added support for a per-task restart policy [NixOSGH-7288] server: Added minimum quorum check to Autopilot with minQuorum option [NixOSGH-7171] connect: Added support for specifying Envoy expose path configurations [NixOSGH-7323] [NixOSGH-7396] connect: Added support for using Connect with TLS enabled Consul agents [NixOSGH-7602] BUG FIXES: core: Fixed a bug where group network mode changes were not honored [NixOSGH-7414] core: Optimized and fixed few bugs in underlying RPC handling [NixOSGH-7044] [NixOSGH-7045] api: Fixed a panic when canonicalizing a jobspec with an incorrect job type [NixOSGH-7207] api: Fixed a bug where calling the node GC or GcAlloc endpoints resulted in an error EOF return on successful requests [NixOSGH-5970] api: Fixed a bug where /client/allocations/... (e.g. allocation stats) requests may hang in special cases after a leader election [NixOSGH-7370] cli: Fixed a bug where nomad agent -dev fails on Windows [NixOSGH-7534] cli: Fixed a panic when displaying device plugins without stats [NixOSGH-7231] cli: Fixed a bug where alloc exec command in TLS environments may fail [NixOSGH-7274] client: Fixed a panic when running in Debian with /etc/debian_version is empty [NixOSGH-7350] client: Fixed a bug affecting network detection in environments that mimic the EC2 Metadata API [NixOSGH-7509] client: Fixed a bug where a multi-task allocation maybe considered healthy despite a task restarting [NixOSGH-7383] consul: Fixed a bug where modified Consul service definitions would not be updated [NixOSGH-6459] connect: Fixed a bug where Connect enabled allocation would not stop after promotion [NixOSGH-7540] connect: Fixed a bug where restarting a client would prevent Connect enabled allocations from cleaning up properly [NixOSGH-7643] driver/docker: Fixed handling of seccomp security_opts option [NixOSGH-7554] driver/docker: Fixed a bug causing docker containers to use swap memory unexpectedly [NixOSGH-7550] scheduler: Fixed a bug where changes to task group shutdown_delay were not persisted or displayed in plan output [NixOSGH-7618] ui: Fixed handling of multi-byte unicode characters in allocation log view [NixOSGH-7470] [NixOSGH-7551]
[See #6662 for more discussion of how I got here and why I'm doing it this way]
This is my initial attempt at getting a working configure-from-user-data NixOS image working. The basic idea is to create an "unstable" NixOS image: its
/etc/nixos/configuration.nix
doesn't actually specify the way the machine is configured, but rather assumes that an/etc/nixos/amazon-init.nix
exists, which is not bundled inside the image. Instead of bundlingamazon-init.nix
, the image bundles apostBootCommands
script that downloads the EC2 user-data and writes it into/etc/nixos/amazon-init.nix
.The "unstable" NixOS image thus only configures itself from user-data on first boot, since when it calls
nixos-rebuild switch
on your personalized configuration, the custompostBootCommands
will go away.User-data should look something like:
One thing to note is the
###
section at the top, which specifies the channels (I just strip off the###
and direct into~/.nix-channels
)If you trust me and want to try it out, I'm hosting a public AMI (until I get sick of paying for storage) of the above with ID
ami-1c477874
.Questions for anyone still reading:
postBootCommands
with periodic timed check feels kind of hacky, but I also can't use a systemd service and activation scripts didn't work either.###
convention), and is very clearly just nixos configuration. Unfortunately it's not compatible with the existing format nixops uses for its temporary host keys, but I'd rather make it use a clean nix configuration file than force this into that format and have to deal with escaping and other ugliness. I haven't used NixOps much so perhaps it'll be too painful to transition it, but if possible I'd like the user-data format to be clean.Note that it's still a WIP, so I'll obviously take out the useless
echo
calls and such before merging 😄cc @edolstra @shlevy @rbvermaa