New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ceph-volume crashes and osd fails to initialize when creating a disk osd on a NixOS node #14120
Comments
rook-ceph-osd-prepare job log
|
Tried to do it with partitions on the metadata device instead of LVs, and now I'm getting an error saying
It seems like Rook is using LVM tools from the host mount rootfs to provision the drives, and this is incompatible with NixOS. I did follow the prerequisites for NixOS in the docs, but it seems like there are some additional steps needed to make Rook's OSD provisioning work on NixOS. Any advice here?
If there's a specific directory where the LVM binaries need to be, I can add symlinks for them with host mounts or Nix configs. |
I seem to have solved my issue now, by adding an env override for the PATH envvar so that ceph picks up the NixOS bins.
This now successfully provisions all of my OSDs, with the original LVM configuration I mentioned in my initial post. Are there any improvements we can make to handle this interaction with NixOS better, improve the error messages, or document the mitigation in the NixOS prerequisites section in the Rook docs? I'll go ahead and update the title, feel free to change this issue into a feature request . I'm happy to open a PR for documentation changes. Let me know if this configmap workaround is production-ready or if there's a better workaround available. |
Great to hear it is working now. Sounds good to update the docs if you want to open a PR. I would imagine the Prerequisites page would be good to add this info in the existing NixOS section. |
Is this a bug report or feature request?
Deviation from expected behavior:
ceph-volume crashes with an IndexError during OSD initialization job when trying to create a disk osd with a lvm db-device.
My data devices are raw hdd disks with no gpt/mbr, and I'm using a single metadata NVMe device, partitioned with GPT and a single LVM PV. The metadata LVM has 1 LV for each OSD I'm trying to add, for a total of 4 LVs.
I am partitioning the metadata device like this so that I can add/remove OSDs associated with the metadata device in the future without destroying the whole pool.
I referenced the devices manually in CephCluster like so:
I tried both referencing as
/dev/h5b_metadata0/h5b_metadata0_3
and as/dev/mapper/h5b_metadata0-h5b_metadata0_3
, but neither worked.I have included my full CephCluster and provision logs below. From the error message, it seems like ceph-volume can't find the LVs associated with a device?
How to reproduce it (minimal and precise):
Right now I'm just testing things before converting to a prod environment, so this is a new cluster with a single storage node. Everything creates but the storage node fails to provision the OSDs.
File(s) to submit:
CephCluster CR
Logs to submit:
See comment.
Environment:
uname -a
): 6.1.87lsblk:
lvs:
pvs:
The text was updated successfully, but these errors were encountered: