-
-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lidarr.exe process hung on FreeBSD 11.2 (FreeNAS 11.2) #602
Comments
Note: this is FreeNAS-specific, and is not FreeBSD. I do active testing with 11.2 and 12.0 currently and have not observed nor am I able to reproduce this behavior. FreeNAS has it's own separate base system which has a number of patches not present in FreeBSD. I would need the output of |
I don't know how to reproduce this behavior. It has happened 2 or 3 times in the last ~ 2 months. Also does not happen when I am actively doing stuff via the Web UI. It appears to hang during background processing. FWIW, I ran that command against it currently, and it looks like it is producing a bunch of errors: https://pastebin.com/xnnbrt4S Is this a red herring? Or relevant? |
Yes, that is extremely helpful. The pointer here is a flood of There are complaints about numerous other applications that access files frequently running into the same problems, making it clearly Yet Another FreeNAS Specific Issue. |
I am not even sure what to put in a ticket to FreeNAS about this. |
Honestly, I wouldn't even bother. There's a pile of bugs for this exact behavior on various applications, and they've just been closing them without actually fixing them. Just a lot of falsely pointing the finger at the applications. You could attempt to tune vfs and IO Cage settings to see if that alleviates the problem, but I don't think it would help here. This is ultimately a kernel-level issue. |
I went ahead and posted on the FreeNAS forums: https://forums.freenas.org/index.php?threads/lidarr-exe-process-hangs-occassionally-process-trying-to-get-a-lock-in-kernel-space.73264/ I will likely open a ticket in their Redmine system for this issue, I just don't know enough technically to describe the real underlying problem. Is this something you would be willing to assist with? |
FWIW, this happened again. Here is
Notice that the Here are the last 25 lines of
|
This is definitely pointing to a kernel level failure under high file loads, a common failure on FreeNAS. About how many files do you have for artists starting with a number (e.g. 0-9), A, and B? Also, what other applications are you running? This will help inform whether it may be able to be mitigated with tuning. |
Here is a count of FLAC files for artists whose name starts with a number, A, or B:
I don't think I have files other than FLAC in there. In this IOCage jail I am not running anything else. This is what
On my FreeNAS server, I have separate IOCage jails setup for the following:
I have a few other jails, but they are not running. |
Okay, so making some fairly conservative assumptions, Lidarr should probably only be holding somewhere on the order of ~225 files open at any one point worst case. If it was the number of files, I'd have no problem reproducing with a much larger file set. So something on FreeNAS is causing the locks to spin too slow or get hung up in sleep. The extremely high CPU utilization is actually pretty normal, because that's just how mono rolls in general. The fix here is probably going to require the |
This sounds like FreeNAS would need to add this to the next incremental release for 11.2. I would like to open a ticket on their ticketing site, but am not entirely sure how to word the request. Let me take a stab at it though and will post here when I am done. |
Should just need to reference those two commits as Looking at a truss on my working system, I do see a very high number of |
Thank you @rootwyrm, you have gone above and beyond here helping me with a problem that is not your/Lidarr's problem. |
Opened Issue #72545. |
Received a response on #72545:
|
Commented on the FreeNAS issue; I don't agree with the assessment that this can wait for 11.3, because 11.3 doesn't have a date yet. However, we're at their mercy there, so all we can do is hope they listen. As discussed with @ta264 on Discord, Lidarr also appears to hold files open excessively. This definitely would exacerbate the issue, especially on systems that have slow spinners like we have here. So there are two root causes. Issue 1: On FreeNAS, the activity overall is leading to catastrophic pool depletion. Once pool depletion occurs, the system is basically unrecoverable unless using a bhyve instance with fully isolated kernel. My hypothesis is that commits Issue 2: Lidarr holds files open much longer than necessary - but doesn't leak FDs. There's no evidence to suggest Lidarr is actually exhausting FDs, but the lazy freeing and reliance on the Mono GC is causing it to excessively lean on the kernel like we're seeing here. Because Lidarr will tend to work on much larger file sets as compared to Sonarr/Radarr, not holding files open is much more critical. This behavior dates back to NzbDrone as I mentioned above. @ta264 's rewrite should address that behavior but we'll need to verify. Unfortunately I'm fighting through a lang/mono problem on the 12.0 I have allocated for nightly so I won't be able to test as quickly as I'd like. |
Most of this is over my head but just FYI keeping the file handle open too long is a Lidarr specific issue (reading MP3/flac tags) so if you have the same problems in Sonarr or Radarr my change won't fix it. |
I may have misunderstood what you said you had changed then. The biggest concern is fopen() without fclose() and just relying on mono's GC to clean it up afterwards. Mono's GC can only do so much, and putting more load on it that way is going to cause serious problems down the road for larger libraries. Unfortunately, I don't know anything at all about .NET so I don't even know where to look for the fopen() operations in it. If it's actively closing file access after scan/update with your change, that will indeed fix it though. |
ta264s file disposal fix is here, There are also three memory fixes recently merged into nightly related incorrect sqlite db cache, datamapper lazyloaded keeping parent mapper alive, and workaround for a mono gzip issue in http connection code that caused a leak. Will be interesting to see how this looks after all of these are integrated. |
Thanks all, I look forward to trying these fixes. |
@rootwyrm I posted a couple of responses in the ixSystems/FreeNAS issue. Not sure if you are following it. |
Mono is dead, long live .NET |
Describe the bug
The Lidarr.exe process becomes hung on FreeNAS 11.2 (FreeBSD 11.2) periodically.
The process looks like this (from
ps auxwwww
):/usr/local/bin/mono /usr/local/share/Lidarr/Lidarr.exe --nobrowser --data=/usr/local/lidarr (mono-sgen)
It is running in a FreeBSD IO Cage Jail.
I have tried to
kill -9
the process and it does not end. I tell the host system/OS to reboot and it is not able to because it cannot end this process.I will provide a URL for the log, but I think it froze at
19-1-24 05:04:06.2
in the log.To Reproduce
Steps to reproduce the behavior: Unknown
Expected behavior
The process should not hang, I should be able to restart the service, I should be able to restart the system.
Screenshots
N/A
Logs
https://pastebin.com/t7j5qb0K
System info (please complete the following information):
Additional context
N/A
AB#321
The text was updated successfully, but these errors were encountered: