New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The content is gone after the reboot #55
Comments
@bulgaru thanks for filing a bug. Could you provide the following output so I can get an idea of what's going on?
From there we can move on to more troubleshooting if required, and would have to look at the state of the GPT and LVM itself. Best, |
|
@bulgaru nothing strange there, there's a top part of the dmesg missing, which shows the version and hash etc, could you parse that part too?. if you look back in time, at your log messages before the reboot, did the module get unloaded, and did it give any output when unloading? Generally it's something like this:
I'm not familiar with the LVM-thin setup you have, but my guess there is something with the meta-data on the LVM layer. Looking at Oddly enough when I try to remove the module with LVM provisioned VG and LVs it complains it can't due to the volume group being active. Can you check if your volume groups is actually active, or you can see it at all? Best, |
These are the only records i see:
In the syslog i noticed some lines that may or may not be related to the issue:
Let me know if you need access to the server - it's available from the internet so you can easily log into Proxmox and run all the tests you need, in case it's quicker and easier. |
Additionally went to check the /var/log/kern.log to check the previous session. As for lvmdiskscan:
Output of the lvs:
|
Update #2: After I've manually removed the driver via modprobe -r, i got this output:
It is not generated on reboot, so i guess the driver is not detached then. I noticed no errors related to volume groups. |
Update #3:
When i try to check what's using the driver, i get this:
So my best guess is that on reboot the storage is not removed properly, which leads to data corruption. |
@bulgaru my guess is that the metadata is full or got trashed somehow. Historically this is not uncommon on ThinLVM / LVM from what I've read. In all honesty though I've never had it myself knocks on wood. Poking around on the web I did find some pointers For access where do you want me to send my public key? |
@snuf please check your email - the access credentials are there. There has been a breakthrough thanks to your help. I've discovered that for some reason the volume group is inactive after the boot. It may be related to the mechanics of Fusion-Io drivers being loaded into the kernel or some Proxmox changes or bugs:
After activating the volume group all data is back up and seems to work perfectly. I've submitted a ticket on Proxmox forum and we'll see if it can be fixed via some configs. Never encountered an issue with volume group activation before, even though these cards are used on 4 Proxmox nodes at the moment. |
@bulgaru a that is good news! I sent you an email back. Can we keep this open until we have a conclusive answer to what happened exactly so we can record it for others? Best, |
Agreed! Le't keep it open. I will be glad to post updates related to the issue if something comes out of Proxmox forum. Thank you again for looking into it! |
Welcome! Found a couple of things,
Poking in dmesg I found an error for
Also another thing I was looking at is that all the proxmox example configuration show Best, |
Hey! I've played with the LVM for the past 2 days and here was my starting point - it works great with Sandisk drivers on Proxmox 5 and i should be able to spot the difference in Proxmox 6. Here are the 4 scenarios i've been looking into:
One by one:
This is where i am atm. I think the issue lies somewhere in between poor LVM configs and the way the OS works. I can easily activate the vg's manually and it would be trivial to add the initialisation at the boot. Question remains why it does not work out of the box. Btw, another crossed out scenario is that the timeline is messed - that the physical device initialisation occurs after the attempt to initialise the vg's. I'm crossing it out cause it seems that the device initialisation occurs roughly at the same time in both Proxmox 5 and 6, with almost identical log messages. Yet, unlike in Proxmox 5, where the vg's are being initialised by systemd as soon as the device is attached, in Proxmox 6 nothing happens as soon as the device is attached. |
Answer finally found! The reason why it works on Proxmox 5 and not Proxmox 6 is a different approach for LVM activation. The activation in Proxmox 6 is based on This is a fantastic news, since it basically means that the Fusion-Io drives from the 2nd and possibly 3rd generation can be used with Proxmox 6. Thanks for help and support, @snuf! |
I ran into this today. While changing
Huh. So ID_FS_TYPE isn't being set. That should be set by blkid as noted at the beginning of 69-lvm-metad.rules:
Where does blkid normally get triggered? 60-persistent-storage.rules:
So why isn't that happening? Well, much earlier in 60-persistent-storage.rules:
Oh. I see. Turns out someone already asked to have fio* added to that list in systemd/systemd#3718. Upstream doesn't want to include it since the driver is out-of-tree. I can't really argue with that. So we need to provide our own udev rules for FusionIO devices. Cribbing from 60-persistent-storage.rules, the follow works as minimal udev rules for FusionIO:
Put that in /etc/udev/rules.d/60-persistent-storage-fio.rules and ID_FS_TYPE gets populated correctly which then triggers the creation of the
Further improvements could be made by shipping a script that parses fio-status output such that udev could import things like the UUIDs for each /dev/fioX. That would allow for creating I'm unclear if this repo is the right place to maintain and install udev rules for these devices. The minimal rules I wrote above do not require any of the programs from fio-util so maybe that's ok. |
And I just noticed that tools/udev/rules.d/60-persistent-fio.rules already exists. It's missing the import of blkid to discover the LVM PVs and create the matching symlinks. I'll send a PR shortly. |
LVM auto activation is triggered by udev discovering the LVM PVs and populating ID_FS_TYPE. 60-persistent-fio.rules already runs blkid on partitions on the fioX device to create /dev/disk/by-uuid symlinks. Add an import of filesystem metadata on the fioX device using udev's blkid builtin. Fixes RemixVSL#55
My OSDs kept going offline after a reboot. Was pointed to this issue, and this solved it. Thanks! Should really add this udev rule to the Wiki page. Needed for Proxmox version 8. Should really create a Proxmox dedicated page on the Wiki. Thanks! |
We are happy to have you create the documentation and we can approve/add it. |
Want to create a repo for the wiki, so that pull requests can be submitted? |
For now, just submit a PR that is a file called proxmox.md in the root of the repo and then we can figure out where that lives. We do appreciate the support. |
Bug description
Using Fusion-IO ioScale2 1.65GB card. Compiled the drivers for the pve-kernel-5.3.13-1-pve: 5.3.13-1 (Proxmox 6). The drivers compiled normally and the card is visible. It has a GPT partition table and has been added as LVM-thin storage. Weirdly enough, all the content seems to be gone after the reboot. Here's the storage summary:
Here's the storage content (real size is around 0.6-1.2GB):
Environment information
The text was updated successfully, but these errors were encountered: