[AWS EBS] NVMe udev rename rules #2399

jalaziz · 2018-04-09T08:25:15Z

Issue Report

Feature Request

Environment

AWS CoreOS 1688.5.3 HVM on m5.* or c5.* instances

Desired Feature

Add udev symlink rules to map NVMe devices to traditional xvd[a-z] device names.

Other Information

With the newer m5 and c5 instances, EBS volumes show up as NVMe devices. AWS Linux provides built-in udev rules that symlink the NVMe devices to their equivalent /dev/sd[a-z] naming. This keeps things consistent with the older naming rules and matches what is configured in EBS block device mappings provided when launching the instance.

It would be great if CoreOS could provide similar rules. This would allow systemd mounts to work across all EC2 instance types without special hacks or relying on fixed device names. AWS Linux handles this with the help of a python script named ebsnvme-id that read EBS information from the NVMe device. I realize python is not installed on CoreOS, but the script could be rewritten to provide the basic functionality needed for udev renaming.

More information can be found here:
kubernetes-retired/kube-aws#1048
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html

An example udev rule without using the python script can be found here: https://github.com/oogali/ebs-automatic-nvme-mapping

I can't seem to find the python script publicly, but it's available on the Amazon Linux AMI and is licensed under the Apache 2.0 license. I've copied the current udev rules and scripts from the Amazon Linux AMI here: https://gist.github.com/jalaziz/c22c8464cb602bc2b8d0a339b013a9c4

One thing I've noticed (and has been mentioned elsewhere) is that the device name in the vendor-specific name is different depending on when and how the volume is mounted. For example, it seems that volumes mounted pre-boot do not have the /dev/ prefix in the vendor info, while volumes mounted after have the /dev/ prefix. Also, it appears that the device is not renamed to the xvd naming convention automatically.

The text was updated successfully, but these errors were encountered:

lucab · 2018-04-09T10:05:47Z

Thanks for the report. We are already collecting a few cloud-storage specific udev rules, so I think we should also add the AWS NVMe ones there too.

jalaziz · 2018-04-09T12:13:23Z

Until support is added to CoreOS, I've created a set of systemd services and a udev rule that works around the issue based on the resources I listed above: https://gist.github.com/jalaziz/bcfe2f71e3f7e8fe42a9c294c1e9279f

venezia · 2018-04-10T16:13:18Z

I'm also having this issue with coreos alpha and stable. @jalaziz scripts do identify the intended partition. Would be great to have this fixed within coreos. It seems reasonable to assume the OS would respect the wishes of the user to have /dev/sdk be the partition (for scripting purposes) rather than some haphazardly assigned nvme#n# partition.

core@whatever ~ $ cat /proc/partitions 
major minor  #blocks  name

 259        0    8388608 nvme1n1
 259        1    8388608 nvme2n1
 259        2    8388608 nvme0n1

core@whatever ~ $ sudo ./nvme.sh /dev/nvme1n1 
sdg xvdg
core@whatever ~ $ sudo ./nvme.sh /dev/nvme2n1 
sdk xvdk

nielssorensen · 2018-06-04T15:35:27Z

You realize the repo you listed, https://github.com/oogali/ebs-automatic-nvme-mapping, has an MIT license, correct? I believe if anyone wishes to use it, the license must follow the code; like if you are merging it into your project. Or copying it as the case may be.

jalaziz · 2018-06-11T14:19:44Z

Thanks for the note @nielssorensen. I overlooked that. I've updated my gist to include the appropriate copyright notices and licenses.

lucab · 2018-06-28T08:57:59Z

Udev rules and helper landed in coreos/init#268.
I added this to the GH board for the next alpha, bugfix is at coreos/coreos-overlay#3309.

lucab · 2018-07-09T07:26:16Z

This has been released as part of CL 1828.0.0 (current alpha).

zyclonite · 2018-09-11T13:47:54Z

r5.* have the same issue

zyclonite · 2018-09-11T16:24:09Z

@lucab tested against 1883.0.0 with r5.* instances and it looks solved, although when i compare t2.* instances where the volumes get mapped to /dev/xvd* against r5.* where i get only /dev/sd*... which is again not consistent or am i missing something?

lucab · 2018-09-11T17:00:42Z

@zyclonite you can look directly into nvme id-ctrl, but I think that difference is coming from AWS itself.

zyclonite · 2018-09-11T19:35:41Z

@lucab true, but on one instance type the ebs device is exposed as nvme and on the other as xvd or sd... wouldn't it make more sense to map ebs devices so that the same names can be used with the same single container linux config independent of the instance type?

jalaziz · 2018-09-11T20:38:30Z

The original workaround I proposed created symlinks for both variations /dev/xvd* and /dev/sd* because of the inconsistency from AWS. It's a bit unfortunate, but it allows us to configure everything based of a single naming convention without worrying what AWS is going to choose.

lucab · 2018-09-18T10:46:09Z

I understand it is unfortunate and that the other script was blindly creating multiple device names, but that is not a correct thing to do in distribution vendor rules. Those device names don't really exist and in future may clash with other device naming choices by AWS.

AWS documentation explicitly mentions that device names can differ wildly according to multiple parameters: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/device_naming.html

frittentheke · 2018-09-27T07:42:07Z

If you feel reusing device names that could exist regularly (though thats unlikely the case on the same instance), why not create new, artificial names like

/dev/clouddisk1

dwagoner · 2019-01-16T00:21:11Z

lucab's reference was useful. Until a better naming convention is implemented, here is what one might do to get around this. We use SLES, so:

zypper install nvme-cli

Simple script:

#!/bin/bash
#######################################################

nvme_map.bsh - Map nvme names to volume IDs

#######################################################
DEVS="$(lsblk | egrep "^nvme" | awk '{print $1}')"
for I in $DEVS; do
echo -n "$I "
nvme id-ctrl -v /dev/${I} | grep vol | awk '{print $3}'
done
Output:

./nvme_map.bsh

nvme2n1 vol0691209dc9ce0c7e4
nvme3n1 vol0c61deb48f3cbf5ed
nvme1n1 vol05c802343fe11aad1
nvme6n1 vol091f5e89a91d1503c
nvme4n1 vol051b5c9fc9192166e
nvme0n1 vol05f9c054f35e845fe
nvme7n1 vol0395e63b8e99a71aa

Futher, a wrapper for lsblk can be created to provide the mapping directly:

#!/usr/bin/perl
################################################################

lsblk.pl - show lsblk output combined with nvme mapping

################################################################
$LSBLK = "/usr/bin/lsblk";
$NVME = "/usr/sbin/nvme";

open(LSBLK_IN, "$LSBLK |") || die "Cannot run "$LSBLK"\n";
while ( $LSBLK_LINE = <LSBLK_IN> ) {
chop( $LSBLK_LINE );
if ( $LSBLK_LINE =~ /NAME/ ) {
#
# emit header
#
printf("%-65s VOLUME-ID\n",$LSBLK_LINE );

     } elsif ( $LSBLK_LINE =~ /(nvme[0-9n]+)\s/ ) {
        #
        # nvme device found - map the name and emit output
        #
        $DEV = $1;
        open(NVME_IN, "$NVME id-ctrl -v /dev/$1|grep vol|") || die "Cannot run \"$NVME id-ctrl -v /dev/$1\"\n";
        $NVME_LINE = <NVME_IN>;
        $VOL_ID = (split(/\s+/,$NVME_LINE))[2];
        printf("%-60s %s\n",$LSBLK_LINE, $VOL_ID);
        close( NVME_IN );

     } else {
        #
        # output found, but not nvme device - emit as found
        #
        printf("%s\n", $LSBLK_LINE);
     }

  }                                                 # while

close( LSBLK_IN );
printf("\n");

Output:

./lsblk.pl

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT VOLUME-ID
nvme2n1 259:0 0 500G 0 disk vol0691209dc9ce0c7e4
├─hanavg-log 254:1 0 200G 0 lvm /hana/log
├─hanavg-data 254:2 0 600G 0 lvm /hana/data
└─hanavg-shared 254:3 0 400G 0 lvm /hana/shared
nvme3n1 259:1 0 70G 0 disk vol0c61deb48f3cbf5ed
└─appvg-usr_sap 254:0 0 20G 0 lvm /usr/sap
nvme1n1 259:2 0 128G 0 disk [SWAP] vol05c802343fe11aad1
nvme6n1 259:3 0 500G 0 disk vol091f5e89a91d1503c
├─hanavg-log 254:1 0 200G 0 lvm /hana/log
├─hanavg-data 254:2 0 600G 0 lvm /hana/data
└─hanavg-shared 254:3 0 400G 0 lvm /hana/shared
nvme4n1 259:4 0 500G 0 disk vol051b5c9fc9192166e
├─hanavg-log 254:1 0 200G 0 lvm /hana/log
├─hanavg-data 254:2 0 600G 0 lvm /hana/data
└─hanavg-shared 254:3 0 400G 0 lvm /hana/shared
nvme0n1 259:6 0 50G 0 disk vol05f9c054f35e845fe
├─nvme0n1p1 259:7 0 300M 0 part /boot
└─nvme0n1p2 259:8 0 49.7G 0 part /
nvme7n1 259:9 0 500G 0 disk vol0395e63b8e99a71aa
└─backupvg-backups 254:4 0 500G 0 lvm /hana/backups

russellballestrini · 2019-01-19T21:38:28Z

I came up with a solid solution to how c5 instance types present devices out of order. I've had this in production for the last couple months just today had a chance to document it on my blog, have a look and see if this helps you: https://russell.ballestrini.net/aws-nvme-to-block-mapping/

lucab added area/distribution kind/friction team/os platform/aws labels Apr 9, 2018

mumoshu mentioned this issue Apr 13, 2018

Potential Issue with Newer Instance Types and etcd kubernetes-retired/kube-aws#1230

Closed

mumoshu mentioned this issue May 1, 2018

Mounting of additional NVMe volumes kubernetes-retired/kube-aws#1048

Closed

cknowles mentioned this issue May 21, 2018

ETCd not working on m5 instances kubernetes-retired/kube-aws#1313

Closed

mumoshu mentioned this issue May 25, 2018

Raid0 support kubernetes-retired/kube-aws#1322

Merged

r7vme mentioned this issue Jun 7, 2018

Fix volume attachments while upgrade (v12 v13) giantswarm/aws-operator#999

Merged

3 tasks

This was referenced Jun 25, 2018

udev: add support for AWS EBS NVMe friendly names coreos/init#268

Merged

coreos-base/coreos-init: bump for AWS NVMe friendly names coreos/coreos-overlay#3309

Merged

lucab mentioned this issue Jun 28, 2018

kola/tests: check for AWS friendly device name coreos/mantle#876

Merged

lucab closed this as completed in coreos/coreos-overlay#3309 Jun 28, 2018

enieuw mentioned this issue Oct 19, 2018

CoreOS 1855.4.0 AWS EBS Mount Lockup #2511

Open

trinitronx mentioned this issue Nov 8, 2018

Retry option for mount units systemd/systemd#4468

Closed

1 task

velothump mentioned this issue Aug 12, 2020

AWS Fedora CoreOS missing /dev/xvd* symlinks coreos/fedora-coreos-tracker#601

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AWS EBS] NVMe udev rename rules #2399

[AWS EBS] NVMe udev rename rules #2399

jalaziz commented Apr 9, 2018 •

edited

Loading

lucab commented Apr 9, 2018

jalaziz commented Apr 9, 2018

venezia commented Apr 10, 2018

nielssorensen commented Jun 4, 2018

jalaziz commented Jun 11, 2018

lucab commented Jun 28, 2018

lucab commented Jul 9, 2018

zyclonite commented Sep 11, 2018

zyclonite commented Sep 11, 2018

lucab commented Sep 11, 2018

zyclonite commented Sep 11, 2018

jalaziz commented Sep 11, 2018

lucab commented Sep 18, 2018

frittentheke commented Sep 27, 2018

dwagoner commented Jan 16, 2019

russellballestrini commented Jan 19, 2019

[AWS EBS] NVMe udev rename rules #2399

[AWS EBS] NVMe udev rename rules #2399

Comments

jalaziz commented Apr 9, 2018 • edited Loading

Issue Report

Feature Request

Environment

Desired Feature

Other Information

lucab commented Apr 9, 2018

jalaziz commented Apr 9, 2018

venezia commented Apr 10, 2018

nielssorensen commented Jun 4, 2018

jalaziz commented Jun 11, 2018

lucab commented Jun 28, 2018

lucab commented Jul 9, 2018

zyclonite commented Sep 11, 2018

zyclonite commented Sep 11, 2018

lucab commented Sep 11, 2018

zyclonite commented Sep 11, 2018

jalaziz commented Sep 11, 2018

lucab commented Sep 18, 2018

frittentheke commented Sep 27, 2018

dwagoner commented Jan 16, 2019

nvme_map.bsh - Map nvme names to volume IDs

./nvme_map.bsh

lsblk.pl - show lsblk output combined with nvme mapping

./lsblk.pl

russellballestrini commented Jan 19, 2019

jalaziz commented Apr 9, 2018 •

edited

Loading