-
-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EC2 EBS instances use ephemeral disks for /nix unionfs #12613
Comments
Setting |
The root store size isn't taken into account at all on these instances. I |
You'll need a fairly recent version of NixOS for that attribute to work. I'm not sure whether the normal version currently shipped in |
In any case, though, please report this issue at https://github.com/NixOS/nixops/; this repository is not quite the right place for Nixops-related items. |
This isn't a nixops issue. |
As far as I know, this isn't an issue at all, i.e. see #12613 (comment). Am I missing something? |
Yes, you are missing that the default NixOS behavior for an Amazon image could be putting /nix and /tmp on /disk0, just because it exists and is assumed to be larger than the root volume. I can submit the change I'm thinking of that would check the block device size for the root and subsequent disks and if the root is larger than the the other discovered attached disks, don't use the unionfs. Would that help illustrate the point? |
I too think that this is an issue having just spent half an hour figuring out why my |
@teh As far as I can tell, it's been around since at least 2013 and that code predates the m3.medium instances (which, I think are the only ones that have this tiny instance store disk attached) which got introduced in 2014. Since root partitions will auto re-size, I'm not sure that a unionfs for /nix and /tmp on another disk even matter. There's also the side thought that with this unionfs in action, the files in /disk0/nix/store aren't write-protected via the bind mounts which is very different to a normal NixOS deploy. |
The default behavior with an m3.medium instance is to relocate /nix and /tmp to /disk0 because an assumption is made that any ephemeral disk is larger than the root volume. Rather than make that assumption, add a check to see if the disk is larger, and only then relocate /nix and /tmp. This addresses #12613
I think the unionfs was mostly intended for instance-store instances, where (IIRC) you can't resize the root volume so you have to use the ephemeral disks. So the real WTF here is that we're using ephemeral disks for the Nix store on EBS instances - meaning that when you stop/start the instance, the Nix store rolls back to its pristine state... |
Okay, so this is a consequence of the unification of EBS and S3 images we did just before the 15.09 release. Before that, EBS root disks had a marker |
Oh boy. I deployed an instance and then stopped/started, and yes, it's completely broken. |
I have a fix that I'll push shortly. |
This is a regression introduced by merging the EBS and S3 images. The EBS images had a special marker /.ebs to prevent the initrd from using ephemeral storage for the unionfs, but this marker was missing in the consolidated image. The fix is to check the file ami-manifest-path on the metadata server to see if we're an S3-based instance. This does require networking in the initrd. Issue #12613.
This is a regression introduced by merging the EBS and S3 images. The EBS images had a special marker /.ebs to prevent the initrd from using ephemeral storage for the unionfs, but this marker was missing in the consolidated image. The fix is to check the file ami-manifest-path on the metadata server to see if we're an S3-based instance. This does require networking in the initrd. Issue #12613. (cherry picked from commit 06731df)
Hmm, I also hit the same problem as @cransom that my EC2 machine has become unbootable after upgrading it to 15.09 with exactly the same error that is shown in the system log... :( Is this fix already part of the 15.09 release? I tried building against the latest 15.09 branch but it didn't seem to work for me :( |
The fix is in 15.09 but the AMIs haven't been updated yet, that may explain why you hit it. The best solution for 15.09 is to use a machine that doesn't have local storage (e.g. c4.*) |
Sigh, that's not option for me atm... luckily I was able to limit the "damage" though.... |
@edolstra Am I correct, the fixes should go to 16.03-beta and 16.03-beta-small channels also? Currently 16.03-beta-small is unbootable on c3.large EDIT: ah, it is already there. Though I still have error
|
This is a regression introduced by merging the EBS and S3 images. The EBS images had a special marker /.ebs to prevent the initrd from using ephemeral storage for the unionfs, but this marker was missing in the consolidated image. The fix is to check the file ami-manifest-path on the metadata server to see if we're an S3-based instance. This does require networking in the initrd. Issue NixOS#12613. (cherry picked from commit 06731df)
We were testing some m3.medium hosts on AWS with nixops and found that some of the deploys failed to complete due to running out of disk space. I tracked down the space issue to https://github.com/NixOS/nixpkgs/blame/master/nixos/modules/virtualisation/amazon-image.nix#L52-L53 where it makes the assumption that an attached disk at xvdb is always bigger than the root disk. On an m3.medium, that disk is only 4g which is super small.
This only impacts once instance type, but it would be useful to check the sizes between xvda1 and xvdb and mount /nix to the larger disk
This is on the current set of 15.09 AMI's available on the market place and detailed in the nixops ec2.nix file.
The text was updated successfully, but these errors were encountered: