Skip to content
This repository has been archived by the owner on Oct 16, 2020. It is now read-only.

[AWS EBS] NVMe udev rename rules #2399

Closed
jalaziz opened this issue Apr 9, 2018 · 16 comments
Closed

[AWS EBS] NVMe udev rename rules #2399

jalaziz opened this issue Apr 9, 2018 · 16 comments

Comments

@jalaziz
Copy link

jalaziz commented Apr 9, 2018

Issue Report

Feature Request

Environment

AWS CoreOS 1688.5.3 HVM on m5.* or c5.* instances

Desired Feature

Add udev symlink rules to map NVMe devices to traditional xvd[a-z] device names.

Other Information

With the newer m5 and c5 instances, EBS volumes show up as NVMe devices. AWS Linux provides built-in udev rules that symlink the NVMe devices to their equivalent /dev/sd[a-z] naming. This keeps things consistent with the older naming rules and matches what is configured in EBS block device mappings provided when launching the instance.

It would be great if CoreOS could provide similar rules. This would allow systemd mounts to work across all EC2 instance types without special hacks or relying on fixed device names. AWS Linux handles this with the help of a python script named ebsnvme-id that read EBS information from the NVMe device. I realize python is not installed on CoreOS, but the script could be rewritten to provide the basic functionality needed for udev renaming.

More information can be found here:
kubernetes-retired/kube-aws#1048
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/nvme-ebs-volumes.html

An example udev rule without using the python script can be found here: https://github.com/oogali/ebs-automatic-nvme-mapping

I can't seem to find the python script publicly, but it's available on the Amazon Linux AMI and is licensed under the Apache 2.0 license. I've copied the current udev rules and scripts from the Amazon Linux AMI here: https://gist.github.com/jalaziz/c22c8464cb602bc2b8d0a339b013a9c4

One thing I've noticed (and has been mentioned elsewhere) is that the device name in the vendor-specific name is different depending on when and how the volume is mounted. For example, it seems that volumes mounted pre-boot do not have the /dev/ prefix in the vendor info, while volumes mounted after have the /dev/ prefix. Also, it appears that the device is not renamed to the xvd naming convention automatically.

@lucab
Copy link

lucab commented Apr 9, 2018

Thanks for the report. We are already collecting a few cloud-storage specific udev rules, so I think we should also add the AWS NVMe ones there too.

@jalaziz
Copy link
Author

jalaziz commented Apr 9, 2018

Until support is added to CoreOS, I've created a set of systemd services and a udev rule that works around the issue based on the resources I listed above: https://gist.github.com/jalaziz/bcfe2f71e3f7e8fe42a9c294c1e9279f

@venezia
Copy link

venezia commented Apr 10, 2018

I'm also having this issue with coreos alpha and stable. @jalaziz scripts do identify the intended partition. Would be great to have this fixed within coreos. It seems reasonable to assume the OS would respect the wishes of the user to have /dev/sdk be the partition (for scripting purposes) rather than some haphazardly assigned nvme#n# partition.

core@whatever ~ $ cat /proc/partitions 
major minor  #blocks  name

 259        0    8388608 nvme1n1
 259        1    8388608 nvme2n1
 259        2    8388608 nvme0n1
core@whatever ~ $ sudo ./nvme.sh /dev/nvme1n1 
sdg xvdg
core@whatever ~ $ sudo ./nvme.sh /dev/nvme2n1 
sdk xvdk

@nielssorensen
Copy link

You realize the repo you listed, https://github.com/oogali/ebs-automatic-nvme-mapping, has an MIT license, correct? I believe if anyone wishes to use it, the license must follow the code; like if you are merging it into your project. Or copying it as the case may be.

@jalaziz
Copy link
Author

jalaziz commented Jun 11, 2018

Thanks for the note @nielssorensen. I overlooked that. I've updated my gist to include the appropriate copyright notices and licenses.

@lucab
Copy link

lucab commented Jun 28, 2018

Udev rules and helper landed in coreos/init#268.
I added this to the GH board for the next alpha, bugfix is at coreos/coreos-overlay#3309.

@lucab
Copy link

lucab commented Jul 9, 2018

This has been released as part of CL 1828.0.0 (current alpha).

@zyclonite
Copy link

r5.* have the same issue

@zyclonite
Copy link

@lucab tested against 1883.0.0 with r5.* instances and it looks solved, although when i compare t2.* instances where the volumes get mapped to /dev/xvd* against r5.* where i get only /dev/sd*... which is again not consistent or am i missing something?

@lucab
Copy link

lucab commented Sep 11, 2018

@zyclonite you can look directly into nvme id-ctrl, but I think that difference is coming from AWS itself.

@zyclonite
Copy link

@lucab true, but on one instance type the ebs device is exposed as nvme and on the other as xvd or sd... wouldn't it make more sense to map ebs devices so that the same names can be used with the same single container linux config independent of the instance type?

@jalaziz
Copy link
Author

jalaziz commented Sep 11, 2018

The original workaround I proposed created symlinks for both variations /dev/xvd* and /dev/sd* because of the inconsistency from AWS. It's a bit unfortunate, but it allows us to configure everything based of a single naming convention without worrying what AWS is going to choose.

@lucab
Copy link

lucab commented Sep 18, 2018

I understand it is unfortunate and that the other script was blindly creating multiple device names, but that is not a correct thing to do in distribution vendor rules. Those device names don't really exist and in future may clash with other device naming choices by AWS.

AWS documentation explicitly mentions that device names can differ wildly according to multiple parameters: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/device_naming.html

@frittentheke
Copy link

If you feel reusing device names that could exist regularly (though thats unlikely the case on the same instance), why not create new, artificial names like

/dev/clouddisk1

@dwagoner
Copy link

lucab's reference was useful. Until a better naming convention is implemented, here is what one might do to get around this. We use SLES, so:

zypper install nvme-cli

Simple script:

#!/bin/bash
#######################################################

nvme_map.bsh - Map nvme names to volume IDs

#######################################################
DEVS="$(lsblk | egrep "^nvme" | awk '{print $1}')"
for I in $DEVS; do
echo -n "$I "
nvme id-ctrl -v /dev/${I} | grep vol | awk '{print $3}'
done
Output:

./nvme_map.bsh

nvme2n1 vol0691209dc9ce0c7e4
nvme3n1 vol0c61deb48f3cbf5ed
nvme1n1 vol05c802343fe11aad1
nvme6n1 vol091f5e89a91d1503c
nvme4n1 vol051b5c9fc9192166e
nvme0n1 vol05f9c054f35e845fe
nvme7n1 vol0395e63b8e99a71aa

Futher, a wrapper for lsblk can be created to provide the mapping directly:

#!/usr/bin/perl
################################################################

lsblk.pl - show lsblk output combined with nvme mapping

################################################################
$LSBLK = "/usr/bin/lsblk";
$NVME = "/usr/sbin/nvme";

open(LSBLK_IN, "$LSBLK |") || die "Cannot run "$LSBLK"\n";
while ( $LSBLK_LINE = <LSBLK_IN> ) {
chop( $LSBLK_LINE );
if ( $LSBLK_LINE =~ /NAME/ ) {
#
# emit header
#
printf("%-65s VOLUME-ID\n",$LSBLK_LINE );

     } elsif ( $LSBLK_LINE =~ /(nvme[0-9n]+)\s/ ) {
        #
        # nvme device found - map the name and emit output
        #
        $DEV = $1;
        open(NVME_IN, "$NVME id-ctrl -v /dev/$1|grep vol|") || die "Cannot run \"$NVME id-ctrl -v /dev/$1\"\n";
        $NVME_LINE = <NVME_IN>;
        $VOL_ID = (split(/\s+/,$NVME_LINE))[2];
        printf("%-60s %s\n",$LSBLK_LINE, $VOL_ID);
        close( NVME_IN );

     } else {
        #
        # output found, but not nvme device - emit as found
        #
        printf("%s\n", $LSBLK_LINE);
     }

  }                                                 # while

close( LSBLK_IN );
printf("\n");

Output:

./lsblk.pl

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT VOLUME-ID
nvme2n1 259:0 0 500G 0 disk vol0691209dc9ce0c7e4
├─hanavg-log 254:1 0 200G 0 lvm /hana/log
├─hanavg-data 254:2 0 600G 0 lvm /hana/data
└─hanavg-shared 254:3 0 400G 0 lvm /hana/shared
nvme3n1 259:1 0 70G 0 disk vol0c61deb48f3cbf5ed
└─appvg-usr_sap 254:0 0 20G 0 lvm /usr/sap
nvme1n1 259:2 0 128G 0 disk [SWAP] vol05c802343fe11aad1
nvme6n1 259:3 0 500G 0 disk vol091f5e89a91d1503c
├─hanavg-log 254:1 0 200G 0 lvm /hana/log
├─hanavg-data 254:2 0 600G 0 lvm /hana/data
└─hanavg-shared 254:3 0 400G 0 lvm /hana/shared
nvme4n1 259:4 0 500G 0 disk vol051b5c9fc9192166e
├─hanavg-log 254:1 0 200G 0 lvm /hana/log
├─hanavg-data 254:2 0 600G 0 lvm /hana/data
└─hanavg-shared 254:3 0 400G 0 lvm /hana/shared
nvme0n1 259:6 0 50G 0 disk vol05f9c054f35e845fe
├─nvme0n1p1 259:7 0 300M 0 part /boot
└─nvme0n1p2 259:8 0 49.7G 0 part /
nvme7n1 259:9 0 500G 0 disk vol0395e63b8e99a71aa
└─backupvg-backups 254:4 0 500G 0 lvm /hana/backups

@russellballestrini
Copy link

I came up with a solid solution to how c5 instance types present devices out of order. I've had this in production for the last couple months just today had a chance to document it on my blog, have a look and see if this helps you: https://russell.ballestrini.net/aws-nvme-to-block-mapping/

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants