New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using ZFS on desktop with stock kernel is bad experience #169457
Comments
Hmm, so according to the flag documentation this is recommended for desktop usage: "Select this if you are building a kernel for a desktop system." https://www.linuxtopia.org/online_books/linux_kernel/kernel_configuration/re152.html |
Yes. I'm not sure how much my nvidia card plays also a role in this mess of a pc, but switching to a different preemtive setting just feels so much smoother. |
Using the rt kernel is not always an option unfortunately. The nvidia driver doesn't like it and the open driver is just not good enough on hidpi multimonitor setups. When I need super low latencies, I use cpuset cgroups to isolate one core and disable hyperthreading on that. Then move the audio thread there. This is good enough ;) |
i am also super interested in having a responsive system with zfs. But at some point i switched back as i currently dont understand how to keep zfs modules and zen kernel in sync, and the zfs module has a nice variable for pointing to compatible kernel versions, which is useful for newbies like me. :) |
Ubuntu's kernel, which officially supports ZFS is compiled with the same options. $ uname -a
Linux ubuntu 5.13.0-39-generic #44~20.04.1-Ubuntu SMP Thu Mar 24 16:43:35 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ awk '/^#/ { next } /PREEMPT/ { print }' /boot/config-5.13.0-39-generic
CONFIG_PREEMPT_VOLUNTARY=y
CONFIG_HAVE_PREEMPT_DYNAMIC=y
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_DRM_I915_PREEMPT_TIMEOUT=640 Are you sure it's not something else on your system? Of course this is anecdotal but I have not had latency problems running ZFS on a desktop. (with sufficient RAM!) But then again, I'm not doing realtime audio either. |
for me personally i would not bet on that's exact that setting. But for me the combination of having memory pressure AND IO load is needed for that to happen, which differs a otc to OPs description in detail or better, adds an additional layer which may be the root cause. Sadly i hadn't yet the time for a kernel compilation with the suggested settings to check if that also "fixes" my issue, or if its something different, but a system which freaks out on IO when having "just" 10G RAM left free sounds like a bad experience for me... :) |
You can try Liquorix, Xanmod or Zen kernel patch sets to test, they are all available on nixpkgs, note that support from ZFS or NVIDIA may lag a bit. I run ZFS in some of my machines mainly with Xanmod or Zen and I do not see any lag, only when I am with heavy IO operation on some old HDDs. |
As mentioned in my very first post, i did try ZEN at some point, and that gave much better experience. but i switched back to stock, as i find How do you ensure that zfs modules are in a valid version, or is that something i dont have to care to much? About the user experience on some IO, when scrubing kicks in (which is of course high IO and some lag is expected) but the system is not even usable any longer as its mostly unresponsive. Not sure if its really the same "issue" poelzi bothers or described, as this is not really near RT scenarios, and i dont want to overtake the issue here, with my "problems", but a short feedback how you properly handle the kernel matching zfs would be handsome. edit: i now understand how the kernelPackages are tied to the kernel in nixos, so question is now cleared |
Scheduling under load is a difficult problem to solve. Can you isolate where the block latency is coming from? |
I somehow forgot about the issue and was today talking about it. The iostats look like the following:
I am absolute no expert in reading this but i see a high |
NixOS/nixpkgs#169457 Signed-off-by: Krzysztof Nazarewski <gpg@kdn.im>
163 milliseconds to do the write, when ~1 millisecond the actual drive latency. That's clearly CPU bound. On non-preemptive kernel all of this are running in high priority zfs io threads doing a lot of compression/encryption/checksumming. I wonder, what is the recordsize, compression algorithm, encryption settings for the system? It might be not enough cond_resched() calls in one or more of the corresponding code paths or zfs kernel thread priority is too high for the audio thread to be able to preempt it. Before switching to preemptive kernel I was playing with zfs module parameters like the following with unconsistent level of success:
This should be set on the module load I think. Binding the threads may be especially bad for audio as usually only one cpu core handles audio interrupts and if zfs occupies that core and doesn't preempt in time it will cause stuttering. |
zfs get recordsize,compression,encryption rpool
NAME PROPERTY VALUE SOURCE
rpool recordsize 128K default
rpool compression zstd local
rpool encryption off default It is kept on default beside using zfs as compression.
ill try out if that results in a better experience. edit: Most of my bad UX has been resolved by setting udev.extraRules = ''
ACTION=="add|change", KERNEL=="sd[a-z]*[0-9]*|mmcblk[0-9]*p[0-9]*|nvme[0-9]*n[0-9]*p[0-9]*", ENV{ID_FS_TYPE}=="zfs_member", ATTR{../queue/scheduler}="none"
''; Which i found here: EDIT 2 |
I (according to the comment) wrote this somewhere and since then it has spread to many configs. https://github.com/search?q=artturin+ENV%7BID_FS_TYPE%7D%3D%3D%22zfs_member%22%2C+ATTR%7B..%2Fqueue%2Fscheduler%7D%3D%22none%22&type=code NixOS#169457 (comment)
I (according to the comment) wrote this somewhere and since then it has spread to many configs. https://github.com/search?q=artturin+ENV%7BID_FS_TYPE%7D%3D%3D%22zfs_member%22%2C+ATTR%7B..%2Fqueue%2Fscheduler%7D%3D%22none%22&type=code #169457 (comment)
I (according to the comment) wrote this somewhere and since then it has spread to many configs. https://github.com/search?q=artturin+ENV%7BID_FS_TYPE%7D%3D%3D%22zfs_member%22%2C+ATTR%7B..%2Fqueue%2Fscheduler%7D%3D%22none%22&type=code NixOS#169457 (comment)
I (according to the comment) wrote this somewhere and since then it has spread to many configs. https://github.com/search?q=artturin+ENV%7BID_FS_TYPE%7D%3D%3D%22zfs_member%22%2C+ATTR%7B..%2Fqueue%2Fscheduler%7D%3D%22none%22&type=code NixOS#169457 (comment)
@poelzi Have you tried the proposed solution? Can we close the issue? |
I've tried it before it was merged and even after and it didn't make a difference for me. My principal problem is using atuin with a zfs root where the shell would hang while atuin does an insert in SQLite. It's been already referenced above. The real-time patch at the top made the most difference but still I see it from time to time. |
This is actually ZFS bug causing ftruncate hangups (affecting everything using sqlite as a database, not just there are 2 "fixes" so far:
|
I'm experiencing the same issue -- though (as far as I can tell) not only with Atuin, also Firefox, Konsole, KDE Plasma, etc... for reference, I have compression, deduplication, and encryption all disabled. Neither setting |
Some other things I have tried (unsuccessfully) -- I'm interested in hearing if any of these worked for others, and what other options I should consider:
At this point I am beginning to doubt my problem is with ZFS itself (or the desktop environment for that matter), though I'm not sure where else I should be looking. |
fyi in case someone else comes around the udev changes, they are applied by default since #250308 (#169457 (comment)) that should be in nixos stable since 23.11. So for that part there should not be the need to do any manual changes. |
Thanks, I wasn't aware of that. Rather embarrassingly, the main issue for me turned out to be a power-saving setting my laptop had automatically enabled without me noticing. Opening files/applications still lags sometimes, but it usually resolves itself after the first time (so I assume this is caching-related). |
That could be https://discourse.nixos.org/t/plasma-emojier-too-slow-episode-iv/40130, so it might be fixed in Plasma 6. |
I've also ran into this. Turned out I had battery saver on. |
Describe the bug
ZFS on a desktop system with default kernel which is compiled with PREEMTIVE_VOLUNTARY causes a system with terrible lagg, short hangs and very bad realtime behaviour. This is easily so see with jackd and mixxx for example.
If the kernel is compiled with these changes, the system behaves much better:
Steps To Reproduce
Steps to reproduce the behavior:
Expected behavior
More behaviour similar to other filesystems
Additional context
Upstream ticket: openzfs/zfs#13128
Notify maintainers
@wizeman @hmenke @jcumming @jonringer @fpletz @globin
Metadata
The text was updated successfully, but these errors were encountered: