Skip to content
This repository has been archived by the owner on Sep 22, 2020. It is now read-only.

torusblk flexprepvol: mkfs failed (race?) #447

Open
frozenice opened this issue Jan 24, 2017 · 1 comment
Open

torusblk flexprepvol: mkfs failed (race?) #447

frozenice opened this issue Jan 24, 2017 · 1 comment

Comments

@frozenice
Copy link

Env: Running a pod with a volume via flex volume plugin in Kubernetes 1.5.1.

For some reason flexprepvol is unable to format the device, thus failing the whole mount process and the pod. Tried with v0.1.2 and b783b16 (latest master at this time) on Ubuntu Server 16.04 and 16.10.

This was in the logs:

torus[12325]: mke2fs 1.43.3 (04-Sep-2016)
torus[12325]: mkfs.ext4: Device size reported to be zero.  Invalid partition specified, or
torus[12325]:         partition table wasn't reread after running fdisk, due to
torus[12325]:         a modified partition being busy and in use.  You may need to reboot
torus[12325]:         to re-read your partition table.

Which is the exact message I get, when trying to format an unattached /dev/nbd*.

Manually mounting the device via torusblk and running mkfs myself worked, so I added a timeout of 5 seconds (time.Sleep) just before sysd := connectSystemd() in flex.go#mountAction and used the newly compiled binary as my flex plugin. This worked!

So I'm guessing that there is a race condition between attach and mount / flexprepvol where the device needs a little time to be fully initialized. Kubernetes tried too quickly to mount the volume after attach.

@frozenice
Copy link
Author

Tried it with 2 seconds instead and it failed, don't know if it was related or another error. Back to 5 seconds.

Also I was getting these for my previously working volume (now unusable):

W | distributor: remote asking for non-existent block: br c : 18 : 1

Deleteing and creating a new one worked. Will file another issue if it happens again.

I'm also getting some of these every few minutes. Don't know if related, always been there IIRC:

W | torus: couldn't register heartbeat: rpc error: code = 4 desc = context deadline exceeded
W | torus: couldn't update peerlist: rpc error: code = 4 desc = context deadline exceeded

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant