Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TASK [common_baseline_install : Common | Format ECS partition(s)] Fails #456

Closed
SivaKaviyappa opened this issue Jan 10, 2019 · 6 comments
Closed

Comments

@SivaKaviyappa
Copy link

Expected Behavior

The task should complete during standard installation.

Actual Behavior

(Please put additional output and logs in the section for that below)

The task fails stating that it is not able to find the device /dev/nvme2n11.

The raw device name is /dev/nvme2n1 am not sure from where the extra 1 is coming at the end.

Steps to Reproduce Behavior

  1. Bootstrap the instance. ( install firewalld and disable rpcbind ..)
  2. Choose standard install step1

Relevant Output and Logs

# Output and Logs go here

Jan 10 05:19:08 ip-172-31-40-131 ansible-stat: Invoked with checksum_algorithm=sha1 get_checksum=False path=/host/files/seeds checksum_algo=sha1 follow=False get_md5=False get_mime=True get_attributes=True
Jan 10 05:19:08 ip-172-31-40-131 journal: #33[1;34mok: [172.31.40.131]#33[0m#015
Jan 10 05:19:08 ip-172-31-40-131 journal: #15
Jan 10 05:19:08 ip-172-31-40-131 journal: TASK [common_baseline_install : Common | Create GPT partition table(s) on ECS block device(s)] **********************************************************************************************#15
Jan 10 05:19:08 ip-172-31-40-131 ansible-command: Invoked with warn=True executable=None _uses_shell=False _raw_params=/sbin/parted -s /dev/nvme2n1 mklabel gpt removes=None creates=None chdir=None stdin=None
Jan 10 05:19:08 ip-172-31-40-131 kernel: nvme2n1:
Jan 10 05:19:08 ip-172-31-40-131 journal: #33[1;36mchanged: [172.31.40.131] => (item=/dev/nvme2n1)#33[0m#015
Jan 10 05:19:08 ip-172-31-40-131 journal: #15
Jan 10 05:19:08 ip-172-31-40-131 journal: TASK [common_baseline_install : Common | Partition ECS block device(s)] *********************************************************************************************************************#15
Jan 10 05:19:09 ip-172-31-40-131 ansible-command: Invoked with warn=True executable=None _uses_shell=False _raw_params=/sbin/parted -s /dev/nvme2n1 mkpart xfs 0% 100% removes=None creates=None chdir=None stdin=None
Jan 10 05:19:09 ip-172-31-40-131 journal: #33[1;36mchanged: [172.31.40.131] => (item=/dev/nvme2n1)#33[0m#015
Jan 10 05:19:09 ip-172-31-40-131 journal: #15
Jan 10 05:19:09 ip-172-31-40-131 journal: TASK [common_baseline_install : Common | Check alignment of ECS partitions(s)] **************************************************************************************************************#15
Jan 10 05:19:09 ip-172-31-40-131 ansible-command: Invoked with warn=True executable=None _uses_shell=False _raw_params=/sbin/parted -s /dev/nvme2n1 align-check opt 1 removes=None creates=None chdir=None stdin=None
Jan 10 05:19:09 ip-172-31-40-131 kernel: nvme2n1: p1
Jan 10 05:19:09 ip-172-31-40-131 journal: #33[1;36mchanged: [172.31.40.131] => (item=/dev/nvme2n1)#33[0m#015
Jan 10 05:19:09 ip-172-31-40-131 journal: #15
Jan 10 05:19:09 ip-172-31-40-131 journal: TASK [common_baseline_install : Common | Format ECS partition(s)] ***************************************************************************************************************************#15
Jan 10 05:19:09 ip-172-31-40-131 ansible-filesystem: Invoked with resizefs=False force=True opts=None dev=/dev/nvme2n11 fstype=xfs
Jan 10 05:19:09 ip-172-31-40-131 journal: #33[1;31mfailed: [172.31.40.131] (item=/dev/nvme2n1) => {"changed": false, "failed": true, "item": "/dev/nvme2n1", "msg": "Device /dev/nvme2n11 not found."}#33[0m#015

lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:1 0 30G 0 disk
└─nvme0n1p1 259:2 0 30G 0 part /
nvme1n1 259:0 0 120G 0 disk
└─nvme1n1p1 259:3 0 120G 0 part
nvme2n1 259:4 0 108G 0 disk
└─nvme2n1p1 259:5 0 108G 0 part


Notifies: @padthaitofuhot @captntuttle @adrianmo

@SivaKaviyappa
Copy link
Author

Ok I was able to resolve this by using T2 instance type. AWS has new nitro ec2 instance types and the ansible script is not able to format the EBS block devices correctly.

@ksteinfeldt
Copy link
Contributor

Noted. I will look into this.

Thank you for posting your resolution. It is much appreciated!

@MookThompson
Copy link

I'm having the same issue trying to deploy to a local physical machine with an NVMe drive as the storage. The issue seems to be in the 'ansible/roles/common_baseline_install/tasks/main.yml' file, which assumes that the first partition on a device can be derived by simply appending a '1' to the device name (eg for device 'sda', first partition is 'sda1'). However, CentOS (and AWS it seems) uses a convention for NVMe devices that inserts a 'p' before the number (ie for device 'nvme0n1' the first partition is actually 'nvme0n1p1', not 'nvme0n11' as the script assumes it would be), so the 'Format ECS partition(s)' and other steps that make this assumption fail as the partition can't be found (as described by the OP).
Unfortunately, I don't have the option to change to non-NVMe storage device, as the change to a T2 instance type seems to have done. I'm currently trying to work around this by hacking the emccorp/ecs-install Docker image so I can replace the instances of '{{ item }}1' with '{{ item }}p1' in the 'ansible/roles/common_baseline_install/tasks/main.yml' file, but am finding this tricky as the file is extracted from a tgz file each time the image is spun up. Any help in getting past this point would be appreciated!

@MookThompson
Copy link

I've been able to get the ECS installation process to complete successfully with an NVMe storage device after following this process:

  • Change any {{item}}1 references to {{item}}p1 in the following files:
    • ui/ansible/roles/common_baseline_check/tasks/main.yml
    • ui/ansible/roles/common_baseline_install/tasks/main.yml
  • Build the installation image using bootstrap.sh, also using the workaround in 'step1' fails after ecs-install built from sources #504 to fix a dependency failure.

Obviously, this is only a hack for my system - the installation image I built will only work on an NVMe storage device. The general installation process needs to be able to derive the correct first partition name for both traditional ({{item}}1) and NVMe ({{item}}p1) storage devices.

@MookThompson
Copy link

OK, so I spoke too soon. My hacking of the yml files described above does allow 'step1' to run through successfully, but 'step2' never completes - it just gets stuck initialising, reporting things like WAIT: ECS API internal error.

As I need a working system, I put in a standard hard drive (which CentOS assigned the normal name of 'sdb', with the first partition 'sdb1'), reverted my changes to yml files described above, and the full installation process completed successfully and the system is up and running.

Therefore, it looks like there's some other bit of ECS (possibly not even in the installation process) that doesn't support the CentOS naming convention of NVMe devices.

@padthaitofuhot
Copy link
Contributor

Correct, CommunityEdition does not support NVMe at this time. For NVMe demos, please contact sales.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants