EC2 EBS instances use ephemeral disks for /nix unionfs #12613

cransom · 2016-01-25T21:19:09Z

We were testing some m3.medium hosts on AWS with nixops and found that some of the deploys failed to complete due to running out of disk space. I tracked down the space issue to https://github.com/NixOS/nixpkgs/blame/master/nixos/modules/virtualisation/amazon-image.nix#L52-L53 where it makes the assumption that an attached disk at xvdb is always bigger than the root disk. On an m3.medium, that disk is only 4g which is super small.

This only impacts once instance type, but it would be useful to check the sizes between xvda1 and xvdb and mount /nix to the larger disk

This is on the current set of 15.09 AMI's available on the market place and detailed in the nixops ec2.nix file.

peti · 2016-01-26T08:02:25Z

Setting deployment.ec2.ebsInitialRootDiskSize = 100; etc. might help.

cransom · 2016-01-26T18:43:10Z

The root store size isn't taken into account at all on these instances. I
upped the root store to 100g and the size increases, but /nix is still
placed on xvdb, a 4g volume.

peti · 2016-01-26T23:04:48Z

You'll need a fairly recent version of NixOS for that attribute to work. I'm not sure whether the normal version currently shipped in release-15.09 can do it already. I'm pretty sure that nixopsUnstable from the unstable channel a.k.a. master branch can do the necessary resize.

peti · 2016-01-26T23:05:44Z

In any case, though, please report this issue at https://github.com/NixOS/nixops/; this repository is not quite the right place for Nixops-related items.

cransom · 2016-01-26T23:41:53Z

This isn't a nixops issue.

peti · 2016-01-27T10:27:42Z

As far as I know, this isn't an issue at all, i.e. see #12613 (comment). Am I missing something?

cransom · 2016-01-27T16:27:53Z

Yes, you are missing that the default NixOS behavior for an Amazon image could be putting /nix and /tmp on /disk0, just because it exists and is assumed to be larger than the root volume.
On a fresh instance:
[root@ip-10-32-25-60:/disk0/root/nix/store]# df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/xvdb 3.9G 994M 2.7G 27% /disk0
[root@ip-10-32-25-60:/disk0/root/nix/store]# df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/disk/by-label/nixos 20G 955M 18G 6% /

I can submit the change I'm thinking of that would check the block device size for the root and subsequent disks and if the root is larger than the the other discovered attached disks, don't use the unionfs. Would that help illustrate the point?

teh · 2016-02-01T17:18:04Z

I too think that this is an issue having just spent half an hour figuring out why my /tmp is full. I'm trying to figure out when this changed because I deployed servers not too long ago that have /nix on the root disk, not on the 4G /disk0 volume.

cransom · 2016-02-01T19:49:17Z

@teh As far as I can tell, it's been around since at least 2013 and that code predates the m3.medium instances (which, I think are the only ones that have this tiny instance store disk attached) which got introduced in 2014.

Since root partitions will auto re-size, I'm not sure that a unionfs for /nix and /tmp on another disk even matter. There's also the side thought that with this unionfs in action, the files in /disk0/nix/store aren't write-protected via the bind mounts which is very different to a normal NixOS deploy.

The default behavior with an m3.medium instance is to relocate /nix and /tmp to /disk0 because an assumption is made that any ephemeral disk is larger than the root volume. Rather than make that assumption, add a check to see if the disk is larger, and only then relocate /nix and /tmp. This addresses #12613

edolstra · 2016-02-02T11:05:33Z

I think the unionfs was mostly intended for instance-store instances, where (IIRC) you can't resize the root volume so you have to use the ephemeral disks. So the real WTF here is that we're using ephemeral disks for the Nix store on EBS instances - meaning that when you stop/start the instance, the Nix store rolls back to its pristine state...

edolstra · 2016-02-02T11:08:49Z

Okay, so this is a consequence of the unification of EBS and S3 images we did just before the 15.09 release. Before that, EBS root disks had a marker /.ebs, which the initrd uses to decide whether to do the unionfs thing. And now that's missing.

cransom · 2016-02-02T17:41:18Z

Oh boy. I deployed an instance and then stopped/started, and yes, it's completely broken.
mounting /dev/disk/by-label/nixos on /... mounting /dev/xvdb on /disk0... checking /dev/xvdb... fsck (busybox 1.23.2, ) [fsck.ext3 (1) -- /mnt-root/disk0] fsck.ext3 -a /dev/xvdb /dev/xvdb: recovering journal /dev/xvdb: clean, 11/1966080 files, 167442/7862400 blocks mounting /dev/xvdb on /disk0... switch_root: can't execute '/nix/store/r0z1hccslxxv6d58brrhs7yz67mkkh0z-nixos-15.09.706.45128de/init': No such file or directory [ 0.526609] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100

edolstra · 2016-02-02T17:46:15Z

I have a fix that I'll push shortly.

This is a regression introduced by merging the EBS and S3 images. The EBS images had a special marker /.ebs to prevent the initrd from using ephemeral storage for the unionfs, but this marker was missing in the consolidated image. The fix is to check the file ami-manifest-path on the metadata server to see if we're an S3-based instance. This does require networking in the initrd. Issue #12613.

This is a regression introduced by merging the EBS and S3 images. The EBS images had a special marker /.ebs to prevent the initrd from using ephemeral storage for the unionfs, but this marker was missing in the consolidated image. The fix is to check the file ami-manifest-path on the metadata server to see if we're an S3-based instance. This does require networking in the initrd. Issue #12613. (cherry picked from commit 06731df)

svanderburg · 2016-03-09T20:18:42Z

Hmm, I also hit the same problem as @cransom that my EC2 machine has become unbootable after upgrading it to 15.09 with exactly the same error that is shown in the system log... :(

Is this fix already part of the 15.09 release? I tried building against the latest 15.09 branch but it didn't seem to work for me :(

teh · 2016-03-09T20:50:58Z

The fix is in 15.09 but the AMIs haven't been updated yet, that may explain why you hit it. The best solution for 15.09 is to use a machine that doesn't have local storage (e.g. c4.*)

svanderburg · 2016-03-09T20:57:57Z

Sigh, that's not option for me atm... luckily I was able to limit the "damage" though....

danbst · 2016-03-14T12:01:01Z

@edolstra Am I correct, the fixes should go to 16.03-beta and 16.03-beta-small channels also? Currently 16.03-beta-small is unbootable on c3.large

EDIT: ah, it is already there. Though I still have error

�[1;32m<<< NixOS Stage 1 >>>�[0m

loading module xen-blkfront...
loading module xen-netfront...
loading module af_packet...
loading module fuse...
loading module dm_mod...
running udev...
starting version 217
starting device mapper and LVM...
NOCHANGE: partition 1 is size 41940992. it cannot be grown
checking /dev/disk/by-label/nixos...
fsck (busybox 1.23.2, )
[fsck.ext4 (1) -- /mnt-root/] fsck.ext4 -a /dev/disk/by-label/nixos
nixos: clean, 65310/1310720 files, 408998/5242624 blocks
resizing /dev/disk/by-label/nixos...
resize2fs 1.42.13 (17-May-2015)
Please run 'e2fsck -f /dev/disk/by-label/nixos' first.

mounting /dev/disk/by-label/nixos on /...
mkdir: can't create directory '/mnt-root/etc': File exists
getting EC2 instance metadata...
udhcpc (v1.23.2) started
Sending discover...
Sending select for 10.0.0.39...
Lease of 10.0.0.39 obtained, lease time 3600
mounting /dev/xvdb on /disk0...
checking /dev/xvdb...
fsck (busybox 1.23.2, )
[fsck.ext3 (1) -- /mnt-root/disk0] fsck.ext3 -a /dev/xvdb
/dev/xvdb: clean, 106989/1001712 files, 672351/3999104 blocks
mounting /dev/xvdb on /disk0...
mounting /dev/xvdc on /disk1...
checking /dev/xvdc...
fsck (busybox 1.23.2, )
[fsck.ext3 (1) -- /mnt-root/disk1] fsck.ext3 -a /dev/xvdc
/dev/xvdc: clean, 11/1001712 files, 105440/3999104 blocks
mounting /dev/xvdc on /disk1...
switch_root: can't execute '/nix/store/kic9jaabjbxgn3zky1xk70jci2zs9yld-nixos-15.09.git.f6d1666/init': No such file or directory
[    0.962858] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
[    0.962858] 
[    0.963639] CPU: 1 PID: 1 Comm: switch_root Not tainted 3.18.26 #1-NixOS
[    0.963639] Hardware name: Xen HVM domU, BIOS 4.2.amazon 12/07/2015
[    0.963639]  ffff8800e8d38000 ffff8800e9de3e28 ffffffff814bc251 ffffffff818541b8
[    0.963639]  ffffffff81713d68 ffff8800e9de3ea8 ffffffff814bb43e 0000000000000001
[    0.963639]  0000000000000010 ffff8800e9de3eb8 ffff8800e9de3e58 ffffffff8185dd80
[    0.963639] Call Trace:
[    0.963639]  [<ffffffff814bc251>] dump_stack+0x46/0x58
[    0.963639]  [<ffffffff814bb43e>] panic+0xc1/0x1eb
[    0.963639]  [<ffffffff8106a045>] do_exit+0xac5/0xad0
[    0.963639]  [<ffffffff811aed3c>] ? vfs_write+0x15c/0x1f0
[    0.963639]  [<ffffffff8106a0e5>] do_group_exit+0x45/0xb0
[    0.963639]  [<ffffffff8106a164>] SyS_exit_group+0x14/0x20
[    0.963639]  [<ffffffff814c1fc9>] system_call_fastpath+0x12/0x17
[    0.963639] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)

Fixes NixOS/nixpkgs#12613.

This is a regression introduced by merging the EBS and S3 images. The EBS images had a special marker /.ebs to prevent the initrd from using ephemeral storage for the unionfs, but this marker was missing in the consolidated image. The fix is to check the file ami-manifest-path on the metadata server to see if we're an S3-based instance. This does require networking in the initrd. Issue NixOS#12613. (cherry picked from commit 06731df)

peti closed this as completed Jan 26, 2016

cransom mentioned this issue Feb 2, 2016

Do not relocate /nix and /tmp to small disks on AWS #12761

Merged

edolstra changed the title ~~nix store has very little space available on aws/m3.medium instance~~ EC2 EBS instances use ephemeral disks for /nix unionfs Feb 2, 2016

edolstra reopened this Feb 2, 2016

edolstra added 0.kind: bug 6.topic: nixos 2.status: work-in-progress 1.severity: blocker labels Feb 2, 2016

edolstra self-assigned this Feb 2, 2016

edolstra mentioned this issue Mar 9, 2016

Official AWS 15.09 AMIs need to be updated with user-data fix #13628

Closed

edolstra closed this as completed in NixOS/nixops@4c2e6ee Mar 15, 2016

johnalotoski pushed a commit to input-output-hk/nixops-aws that referenced this issue Jul 12, 2019

Update EC2 AMIs to 15.09.1134.19a3dd8

b903cbc

Fixes NixOS/nixpkgs#12613.

johnalotoski pushed a commit to input-output-hk/nixops-hetzner that referenced this issue Jul 13, 2019

Update EC2 AMIs to 15.09.1134.19a3dd8

32b805e

Fixes NixOS/nixpkgs#12613.

AmineChikhaoui pushed a commit to nix-community/nixops-libvirtd that referenced this issue Jul 21, 2019

Update EC2 AMIs to 15.09.1134.19a3dd8

7f5f15f

Fixes NixOS/nixpkgs#12613.

AmineChikhaoui pushed a commit to nix-community/nixops-libvirtd that referenced this issue Jul 21, 2019

Update EC2 AMIs to 15.09.1134.19a3dd8

ce99cdc

Fixes NixOS/nixpkgs#12613.

AmineChikhaoui pushed a commit to nix-community/nixops-vbox that referenced this issue Jul 22, 2019

Update EC2 AMIs to 15.09.1134.19a3dd8

2c2e66b

Fixes NixOS/nixpkgs#12613.

AmineChikhaoui pushed a commit to nix-community/nixops-gce that referenced this issue Jul 24, 2019

Update EC2 AMIs to 15.09.1134.19a3dd8

b1027a5

Fixes NixOS/nixpkgs#12613.

AmineChikhaoui pushed a commit to nix-community/nixops-datadog that referenced this issue Jul 24, 2019

Update EC2 AMIs to 15.09.1134.19a3dd8

ec4d7c8

Fixes NixOS/nixpkgs#12613.

PsyanticY pushed a commit to PsyanticY/nixops-container that referenced this issue Jul 25, 2019

Update EC2 AMIs to 15.09.1134.19a3dd8

19318e7

Fixes NixOS/nixpkgs#12613.

johnalotoski pushed a commit to input-output-hk/nixops-hetzner that referenced this issue Aug 7, 2019

Update EC2 AMIs to 15.09.1134.19a3dd8

f4559db

Fixes NixOS/nixpkgs#12613.

johnalotoski pushed a commit to input-output-hk/nixops-aws that referenced this issue Aug 7, 2019

Update EC2 AMIs to 15.09.1134.19a3dd8

5bc13bc

Fixes NixOS/nixpkgs#12613.

AmineChikhaoui pushed a commit to nix-community/nixops-libvirtd that referenced this issue Sep 4, 2019

Update EC2 AMIs to 15.09.1134.19a3dd8

405eefe

Fixes NixOS/nixpkgs#12613.

AmineChikhaoui pushed a commit to nix-community/nixops-vbox that referenced this issue Sep 4, 2019

Update EC2 AMIs to 15.09.1134.19a3dd8

9ff5f96

Fixes NixOS/nixpkgs#12613.

AmineChikhaoui pushed a commit to nix-community/nixops-datadog that referenced this issue Sep 4, 2019

Update EC2 AMIs to 15.09.1134.19a3dd8

2117a15

Fixes NixOS/nixpkgs#12613.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EC2 EBS instances use ephemeral disks for /nix unionfs #12613

EC2 EBS instances use ephemeral disks for /nix unionfs #12613

cransom commented Jan 25, 2016

peti commented Jan 26, 2016

cransom commented Jan 26, 2016

peti commented Jan 26, 2016

peti commented Jan 26, 2016

cransom commented Jan 26, 2016

peti commented Jan 27, 2016

cransom commented Jan 27, 2016

teh commented Feb 1, 2016

cransom commented Feb 1, 2016

edolstra commented Feb 2, 2016

edolstra commented Feb 2, 2016

cransom commented Feb 2, 2016

edolstra commented Feb 2, 2016

svanderburg commented Mar 9, 2016

teh commented Mar 9, 2016

svanderburg commented Mar 9, 2016

danbst commented Mar 14, 2016

EC2 EBS instances use ephemeral disks for /nix unionfs #12613

EC2 EBS instances use ephemeral disks for /nix unionfs #12613

Comments

cransom commented Jan 25, 2016

peti commented Jan 26, 2016

cransom commented Jan 26, 2016

peti commented Jan 26, 2016

peti commented Jan 26, 2016

cransom commented Jan 26, 2016

peti commented Jan 27, 2016

cransom commented Jan 27, 2016

teh commented Feb 1, 2016

cransom commented Feb 1, 2016

edolstra commented Feb 2, 2016

edolstra commented Feb 2, 2016

cransom commented Feb 2, 2016

edolstra commented Feb 2, 2016

svanderburg commented Mar 9, 2016

teh commented Mar 9, 2016

svanderburg commented Mar 9, 2016

danbst commented Mar 14, 2016