Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stemcell 3262.14 breaks persistent disk formatting on Openstack #98

Closed
JamesClonk opened this issue Sep 18, 2016 · 7 comments
Closed

Stemcell 3262.14 breaks persistent disk formatting on Openstack #98

JamesClonk opened this issue Sep 18, 2016 · 7 comments
Labels

Comments

@JamesClonk
Copy link

It seems that with stemcell 3262.14 a newer version of bosh-agent has been introduced that fails mounting/formatting persistent disks on Openstack.
I was trying to deploy the latest bosh release and stemcell with bosh-init, but it fails at formatting the partition.
I tried various combinations, bosh 257.9 and 257.14 both work with stemcell versions 3262.12 or earlier, starting with 3262.14 it fails.

output of bosh-init 0.0.96:

Started deploying
  Creating VM for instance 'bosh/0' from stemcell 'c3c0b4d4-a0d9-40ee-8324-fc034e035a64'... Finished (00:01:41)
  Waiting for the agent on VM '8895e425-78d9-4cb7-8b70-7190a4d69ca8' to be ready... Finished (00:00:18)
  Creating disk... Finished (00:00:07)
  Attaching disk '7c6dadc7-3231-4f60-a327-1245c895f2ab' to VM '8895e425-78d9-4cb7-8b70-7190a4d69ca8'... Failed (00:00:19)
Failed deploying (00:02:26)

Stopping registry... Finished (00:00:00)
Cleaning up rendered CPI jobs... Finished (00:00:00)

Command 'deploy' failed:
  Deploying:
    Creating instance 'bosh/0':
      Updating instance disks:
        Updating disks:
          Deploying disk:
            Mounting disk:
              Sending 'get_task' to the agent:
                Agent responded with error: Action Failed get_task: Task d5818063-217f-41e3-5110-01a5539d86d2 result: Mounting persistent disk: Formatting partition with : Checking filesystem format of partition: Running command: 'blkid -p /dev/disk/by-id/virtio-7c6dadc7-3231-4f60-a1', stdout: '', stderr: 'error: /dev/disk/by-id/virtio-7c6dadc7-3231-4f60-a1: No such file or directory

how it looks on the VM:

root@0c5397b0-7010-4b03-45f5-87f33a0a5eae:/dev/disk/by-id# ll
total 0
drwxr-xr-x 2 root root  80 Sep 18 08:33 ./
drwxr-xr-x 5 root root 100 Sep 18 08:33 ../
lrwxrwxrwx 1 root root   9 Sep 18 08:33 virtio-7c6dadc7-3231-4f60-a -> ../../vdc
lrwxrwxrwx 1 root root  10 Sep 18 08:33 virtio-7c6dadc7-3231-4f60-a-part1 -> ../../vdc1

relevant part of bosh-agent log:

2016-09-18_08:33:14.04760 [File System] 2016/09/18 08:33:14 DEBUG - Glob '/dev/disk/by-id/*7c6dadc7-3231-4f60-a'
2016-09-18_08:33:14.04771 [File System] 2016/09/18 08:33:14 DEBUG - Checking if file exists /dev/disk/by-id/virtio-7c6dadc7-3231-4f60-a
2016-09-18_08:33:14.04775 [virtioDevicePathResolver] 2016/09/18 08:33:14 DEBUG - Resolved disk {ID:7c6dadc7-3231-4f60-a327-1245c895f2ab DeviceID: VolumeID:/dev/sdc Lun: HostDeviceID: Path:/dev/sdc FileSystemType:} by ID as '/dev/disk/by-id/virtio-7c6dadc7-3231-4f60-a'
2016-09-18_08:33:14.04777 [File System] 2016/09/18 08:33:14 DEBUG - Reading file /proc/mounts
2016-09-18_08:33:14.04792 [File System] 2016/09/18 08:33:14 DEBUG - Read content
2016-09-18_08:33:14.04793 ********************
2016-09-18_08:33:14.04793 sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
2016-09-18_08:33:14.04793 proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
2016-09-18_08:33:14.04793 udev /dev devtmpfs rw,relatime,size=4077380k,nr_inodes=1019345,mode=755 0 0
2016-09-18_08:33:14.04794 devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
2016-09-18_08:33:14.04794 tmpfs /run tmpfs rw,nosuid,noexec,relatime,size=817588k,mode=755 0 0
2016-09-18_08:33:14.04794 /dev/vda1 / ext4 rw,relatime,data=ordered 0 0
2016-09-18_08:33:14.04794 none /var/lib/ureadahead/debugfs debugfs rw,relatime 0 0
2016-09-18_08:33:14.04795 none /sys/fs/cgroup tmpfs rw,relatime,size=4k,mode=755 0 0
2016-09-18_08:33:14.04795 none /sys/fs/fuse/connections fusectl rw,relatime 0 0
2016-09-18_08:33:14.04795 none /sys/kernel/debug debugfs rw,relatime 0 0
2016-09-18_08:33:14.04795 none /sys/kernel/security securityfs rw,relatime 0 0
2016-09-18_08:33:14.04795 none /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
2016-09-18_08:33:14.04796 none /run/shm tmpfs rw,nosuid,nodev,relatime 0 0
2016-09-18_08:33:14.04796 none /run/user tmpfs rw,nosuid,nodev,noexec,relatime,size=102400k,mode=755 0 0
2016-09-18_08:33:14.04796 none /sys/fs/pstore pstore rw,relatime 0 0
2016-09-18_08:33:14.04796 rpc_pipefs /run/rpc_pipefs rpc_pipefs rw,relatime 0 0
2016-09-18_08:33:14.04797 /dev/vda3 /var/vcap/data ext4 rw,relatime,data=ordered 0 0
2016-09-18_08:33:14.04797 /dev/vda3 /var/log ext4 rw,relatime,data=ordered 0 0
2016-09-18_08:33:14.04797 tmpfs /var/vcap/data/sys/run tmpfs rw,relatime,size=1024k 0 0
2016-09-18_08:33:14.04798 /dev/vda3 /tmp ext4 rw,relatime,data=ordered 0 0
2016-09-18_08:33:14.04798 /dev/vda3 /var/tmp ext4 rw,relatime,data=ordered 0 0
2016-09-18_08:33:14.04798
2016-09-18_08:33:14.04798 ********************
2016-09-18_08:33:14.04801 [linuxPlatform] 2016/09/18 08:33:14 INFO - realPath = /dev/disk/by-id/virtio-7c6dadc7-3231-4f60-a, devicePath = , isMountPoint = %!s(bool=false)
2016-09-18_08:33:14.04804 [File System] 2016/09/18 08:33:14 DEBUG - Making dir /var/vcap/store with perm 0700
2016-09-18_08:33:14.04820 [Cmd Runner] 2016/09/18 08:33:14 DEBUG - Running command: lsblk --nodeps -nb -o SIZE /dev/disk/by-id/virtio-7c6dadc7-3231-4f60-a
2016-09-18_08:33:14.05305 [Cmd Runner] 2016/09/18 08:33:14 DEBUG - Stdout: 68719476736
2016-09-18_08:33:14.05306 [Cmd Runner] 2016/09/18 08:33:14 DEBUG - Stderr:
2016-09-18_08:33:14.05307 [Cmd Runner] 2016/09/18 08:33:14 DEBUG - Successful: true (0)
2016-09-18_08:33:14.05307 [linuxPlatform] 2016/09/18 08:33:14 DEBUG - Persistent disk size to be partitioned is: 68719476736, and error is: <nil>
2016-09-18_08:33:14.05307 [linuxPlatform] 2016/09/18 08:33:14 DEBUG - fdisk partitioner was chosen
2016-09-18_08:33:14.05310 [Cmd Runner] 2016/09/18 08:33:14 DEBUG - Running command: sfdisk -d /dev/disk/by-id/virtio-7c6dadc7-3231-4f60-a
2016-09-18_08:33:14.06026 [Cmd Runner] 2016/09/18 08:33:14 DEBUG - Stdout:
2016-09-18_08:33:14.06027 [Cmd Runner] 2016/09/18 08:33:14 DEBUG - Stderr:
2016-09-18_08:33:14.06027 sfdisk: ERROR: sector 0 does not have an msdos signature
2016-09-18_08:33:14.06027  /dev/disk/by-id/virtio-7c6dadc7-3231-4f60-a: unrecognized partition table type
2016-09-18_08:33:14.06028 No partitions found
2016-09-18_08:33:14.06028 [Cmd Runner] 2016/09/18 08:33:14 DEBUG - Successful: true (0)
2016-09-18_08:33:14.06032 [attemptRetryStrategy] 2016/09/18 08:33:14 DEBUG - Making attempt #0
2016-09-18_08:33:14.06035 [Cmd Runner] 2016/09/18 08:33:14 DEBUG - Running command: sfdisk -uM /dev/disk/by-id/virtio-7c6dadc7-3231-4f60-a
2016-09-18_08:33:14.07894 [Cmd Runner] 2016/09/18 08:33:14 DEBUG - Stdout:
2016-09-18_08:33:14.07896 Disk /dev/disk/by-id/virtio-7c6dadc7-3231-4f60-a: 133152 cylinders, 16 heads, 63 sectors/track
2016-09-18_08:33:14.07897 Old situation:
2016-09-18_08:33:14.07897 New situation:
2016-09-18_08:33:14.07897 Units = mebibytes of 1048576 bytes, blocks of 1024 bytes, counting from 0
2016-09-18_08:33:14.07897
2016-09-18_08:33:14.07898    Device Boot Start   End    MiB    #blocks   Id  System
2016-09-18_08:33:14.07898 /dev/disk/by-id/virtio-7c6dadc7-3231-4f60-a-part1         0+ 65535- 65536-  67108607+  83  Linux
2016-09-18_08:33:14.07898 /dev/disk/by-id/virtio-7c6dadc7-3231-4f60-a-part2         0      -      0          0    0  Empty
2016-09-18_08:33:14.07898 /dev/disk/by-id/virtio-7c6dadc7-3231-4f60-a-part3         0      -      0          0    0  Empty
2016-09-18_08:33:14.07899 /dev/disk/by-id/virtio-7c6dadc7-3231-4f60-a-part4         0      -      0          0    0  Empty
2016-09-18_08:33:14.07899 Successfully wrote the new partition table
2016-09-18_08:33:14.07899
2016-09-18_08:33:14.07899 Re-reading the partition table ...
2016-09-18_08:33:14.07899
2016-09-18_08:33:14.07900 [Cmd Runner] 2016/09/18 08:33:14 DEBUG - Stderr: Checking that no-one is using this disk right now ...
2016-09-18_08:33:14.07900 OK
2016-09-18_08:33:14.07900
2016-09-18_08:33:14.07900 sfdisk: ERROR: sector 0 does not have an msdos signature
2016-09-18_08:33:14.07900  /dev/disk/by-id/virtio-7c6dadc7-3231-4f60-a: unrecognized partition table type
2016-09-18_08:33:14.07901 No partitions found
2016-09-18_08:33:14.07901 Warning: no primary partition is marked bootable (active)
2016-09-18_08:33:14.07902 This does not matter for LILO, but the DOS MBR will not boot this disk.
2016-09-18_08:33:14.07902 If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
2016-09-18_08:33:14.07902 to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1
2016-09-18_08:33:14.07902 (See fdisk(8).)
2016-09-18_08:33:14.07902 [Cmd Runner] 2016/09/18 08:33:14 DEBUG - Successful: true (0)
2016-09-18_08:33:14.07903 [SfdiskPartitioner] 2016/09/18 08:33:14 INFO - Succeeded in partitioning /dev/disk/by-id/virtio-7c6dadc7-3231-4f60-a with ,,L
2016-09-18_08:33:14.07903 [Cmd Runner] 2016/09/18 08:33:14 DEBUG - Running command: blkid -p /dev/disk/by-id/virtio-7c6dadc7-3231-4f60-a1
2016-09-18_08:33:14.08014 [Cmd Runner] 2016/09/18 08:33:14 DEBUG - Stdout:
2016-09-18_08:33:14.08015 [Cmd Runner] 2016/09/18 08:33:14 DEBUG - Stderr: error: /dev/disk/by-id/virtio-7c6dadc7-3231-4f60-a1: No such file or directory
2016-09-18_08:33:14.08016 [Cmd Runner] 2016/09/18 08:33:14 DEBUG - Successful: false (2)
2016-09-18_08:33:14.08019 [Task Service] 2016/09/18 08:33:14 ERROR - Failed processing task #d5818063-217f-41e3-5110-01a5539d86d2 got: Mounting persistent disk: Formatting partition with : Checking filesystem format of partition: Running command: 'blkid -p /dev/disk/by-id/virtio-7c6dadc7-3231-4f60-a1', stdout: '', stderr: 'error: /dev/disk/by-id/virtio-7c6dadc7-3231-4f60-a1: No such file or directory
2016-09-18_08:33:14.08019 ': exit status 2
2016-09-18_08:33:14.72117 [HTTPS Dispatcher] 2016/09/18 08:33:14 INFO - POST /agent
2016-09-18_08:33:14.72121 [MBus Handler] 2016/09/18 08:33:14 INFO - Received request with action get_task
2016-09-18_08:33:14.72122 [MBus Handler] 2016/09/18 08:33:14 DEBUG - Payload
2016-09-18_08:33:14.72122 ********************
2016-09-18_08:33:14.72122 {"method":"get_task","arguments":["d5818063-217f-41e3-5110-01a5539d86d2"],"reply_to":"dd316ff5-3564-4277-7201-f2f26400c236"}
2016-09-18_08:33:14.72122 ********************
2016-09-18_08:33:14.72123 [Action Dispatcher] 2016/09/18 08:33:14 INFO - Running sync action get_task
2016-09-18_08:33:14.72139 [Action Dispatcher] 2016/09/18 08:33:14 ERROR - Action Failed get_task: Task d5818063-217f-41e3-5110-01a5539d86d2 result: Mounting persistent disk: Formatting partition with : Checking filesystem format of partition: Running command: 'blkid -p /dev/disk/by-id/virtio-7c6dadc7-3231-4f60-a1', stdout: '', stderr: 'error: /dev/disk/by-id/virtio-7c6dadc7-3231-4f60-a1: No such file or directory
2016-09-18_08:33:14.72140 ': exit status 2
2016-09-18_08:33:14.72150 [MBus Handler] 2016/09/18 08:33:14 INFO - Responding
2016-09-18_08:33:14.72152 [MBus Handler] 2016/09/18 08:33:14 DEBUG - Payload
2016-09-18_08:33:14.72152 ********************
2016-09-18_08:33:14.72153 {"exception":{"message":"Action Failed get_task: Task d5818063-217f-41e3-5110-01a5539d86d2 result: Mounting persistent disk: Formatting partition with : Checking filesystem format of partition: Running command: 'blkid -p /dev/disk/by-id/virtio-7c6dadc7-3231-4f60-a1', stdout: '', stderr: 'error: /dev/disk/by-id/virtio-7c6dadc7-3231-4f60-a1: No such file or directory\n': exit status 2"}}

Is this the problem?
https://github.com/cloudfoundry/bosh-agent/blob/master/platform/linux_platform.go#L936-L939

Nothing has changed on part of our Openstack infrastructure. To confirm that I checked various older bosh deployments, the persistent disk partitions there are always under /dev/disk/by-id/virtio-*-part1, not as the bosh-agent expects with /dev/disk/by-id/virtio-*1.
There's also never anything under /dev/mapper to be found.

I'm curious as to why it works on stemcell 3262.12 and not anymore on 3262.14. I did not figure out which commit causes this problem. From looking the at code mentioned above it should have never worked?

@voelzmo
Copy link
Contributor

voelzmo commented Sep 19, 2016

Update: As there have been only 2 commits to the agent from 3262.12 to 3262.14, I suppose this one breaks the handling of disks: 287b000

@aduffeck
Copy link
Contributor

I can also confirm this issue. I think the problem is that GetRealDevicePath used to follow the symlinks in /dev/disk/by-id/ (

realPath, err = idpr.fs.ReadLink(deviceIDPath)
) and returned e.g. /dev/vdb where adding 1 to get to the partition was correct.

Now it returns the symlink directly but MountPersistentDisk and others still add the 1 instead of -part1 because the /dev/mapper condition doesn't apply.

@cppforlife
Copy link
Contributor

cc @dsboulder

@dpb587-pivotal
Copy link
Contributor

@voelzmo
Copy link
Contributor

voelzmo commented Oct 1, 2016

@dpb587-pivotal In which stemcell series is this fixed exactly? I suppose its not an issue in 3262.15, what about the other series? Have you updated the agent in those as well?

@voelzmo voelzmo reopened this Oct 1, 2016
@dpb587-pivotal
Copy link
Contributor

dpb587-pivotal commented Oct 5, 2016

The fix was backported to 3262.x (starting from 3262.15) and 3263.x (starting from 3263.2). It was not backported to earlier versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants