-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lmt 3.2.10 displays INACTIVE for most targets #53
Comments
That's the same LMT and Lustre version we are running on my test system. For the working OST, and one of the other ones, please post the output of the following: thanks |
Here's the requested output:
|
Hi,
|
Sure:
|
It looks to me like that means 0 clients have connected to fs23-OST0001. Can you check one of your lustre client nodes with "lfs check osts" and compare those same two OSTs appear? I suspect fs23-OST0002 will report "active" and fs23-OST0001 will either be missing or report "inactive". If they both say "active", please post the following between those two OSTs:
Thanks |
All OSTs appear just fine on the clients. Here's the output of your command on the OSS:
|
Are the servers and clients both Lustre 2.10.8? |
I think there was a single 2.12.3 client, all others 2.10.8. |
Have these targets (MDTs and OSTs, on the server nodes) ever, in their lifetime, been un-mounted and then re-mounted? I just created a new lustre 2.12.4 file system from scratch, and observe the same behavior you describe, after they have been mounted for the first time - the recovery_status file just says "status:INACTIVE". After umount and mounting again, the recovery_status files have the expected content. |
I can't say for sure, but I guess they have been mounted several times already. |
There is a related (and possibly the same) issue at https://jira.whamcloud.com/browse/LU-14930 |
Another user, @alvaromartin990, ran into this recently. Some background in case it helps him or others who end up here: When a Lustre target (e.g. MDT0000) starts, it connects with clients and goes through a process called "recovery" which handles the case where a lustre target failed and was restarted. An example might be that MDT0000 is hosted on server "lustre1", which lost power due and then was powered on and MDT0000 started again. There may have been in-process i/o's at the time, and the clients and servers must ensure that any such i/o's either landed on-disk or are replayed. During this "recovery" process, no new i/o requests are accepted by the server, and no new clients are allowed to mount the file system. After clients and servers have synchronized state, the lustre target exits recovery and resumes normal operation. Lustre reports this to sysadmins and tools like LMT via the $ cat /proc/fs/lustre/mdt/lflood-MDT0000/recovery_status LMT tries to let sysadmins know a lustre target is in recovery, so sysadmins know why normal operation isn't occurring. For LMT users running into this issue, however, the recovery_status file for one or more targets contains "status: INACTIVE" when the target clearly is up and isn't in recovery (because it's allowing new clients to mount the FS, and handling new i/o requests). This seems to me like a Lustre bug, and https://jira.whamcloud.com/browse/LU-14930 did result in a patches for Lustre 2.12 (never merged), 2.14 (never merged), and 2.15 (landed). That said, it would be good if LMT could work around this behavior for sites running Lustre versions before 2.15. |
Hi @ofaaland, Thank you so much for your explanation, it makes sense now. It sounds like the best way to proceed with this is to update to Lustre 2.15. I will definitely consider doing so since I'd like to use all LMT features. Thanks! |
I've upgraded from 3.2.7 to 3.2.10 and now all but one OST display the message INACTIVE 0s remaining instead of the current statistics. Only one OST went through recovery and has
status: COMPLETE in recovery_status, all others have INACTIVE, but they are mounted and working fine.
I'm running Lustre 2.10.8.
The text was updated successfully, but these errors were encountered: