Using ZFS on desktop with stock kernel is bad experience #169457

poelzi · 2022-04-20T16:22:36Z

Describe the bug

ZFS on a desktop system with default kernel which is compiled with PREEMTIVE_VOLUNTARY causes a system with terrible lagg, short hangs and very bad realtime behaviour. This is easily so see with jackd and mixxx for example.

If the kernel is compiled with these changes, the system behaves much better:

boot.kernelPatches = [ {
        name = "enable RT_FULL";
        patch = null;
        extraConfig = ''
            PREEMPT y
            PREEMPT_BUILD y
            PREEMPT_VOLUNTARY n
            PREEMPT_COUNT y
            PREEMPTION y
            '';
     } ];

Steps To Reproduce

Steps to reproduce the behavior:

Do any ZFS file io
Run mixxx + jackd for example
Observe the stuttering and underuns

Expected behavior

More behaviour similar to other filesystems

Additional context

Upstream ticket: openzfs/zfs#13128

Notify maintainers

@wizeman @hmenke @jcumming @jonringer @fpletz @globin

Metadata

[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 5.16.20, NixOS, 21.11 (Porcupine)`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.4`
 - channels(poelzi): `"home-manager-21.11, nixos-21.05.4726.530a53dcbc9"`
 - channels(root): `"nixos-21.11.335665.0f316e4d72d"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`

The text was updated successfully, but these errors were encountered:

Mindavi · 2022-04-20T16:43:46Z

Hmm, so according to the flag documentation this is recommended for desktop usage:

"Select this if you are building a kernel for a desktop system."

https://www.linuxtopia.org/online_books/linux_kernel/kernel_configuration/re152.html

poelzi · 2022-04-20T17:14:21Z

Yes. I'm not sure how much my nvidia card plays also a role in this mess of a pc, but switching to a different preemtive setting just feels so much smoother.
We should at least provide a linux kernel derivative that is with desktop settings and warn the user if zfs is enabled and the default kernel is used or document to switch kernels.

poelzi · 2022-04-20T17:16:33Z

Using the rt kernel is not always an option unfortunately. The nvidia driver doesn't like it and the open driver is just not good enough on hidpi multimonitor setups. When I need super low latencies, I use cpuset cgroups to isolate one core and disable hyperthreading on that. Then move the audio thread there. This is good enough ;)

poelzi · 2022-04-20T17:19:04Z

@Mindavi https://www.linuxtopia.org/online_books/linux_kernel/kernel_configuration/re153.html is the good option

Shawn8901 · 2022-04-21T18:19:19Z

i am also super interested in having a responsive system with zfs.
when my system gets over some level of load the amount of short lags and sound stuttering (esp. on recording) heavily increases.
Before nix i was on an arch install and at least i havent noticed similar things before.
Got a 2700X on the machine here, which should be able to handle my workloads very easy. Sadly i did not (yet) take a deep dive on the issue.
At least what i could say when i had tried it, the zen kernel did serve less stuttering, or at least i did notice less of it (but like said havent done benchmarks or something to measure).

But at some point i switched back as i currently dont understand how to keep zfs modules and zen kernel in sync, and the zfs module has a nice variable for pointing to compatible kernel versions, which is useful for newbies like me. :)

hmenke · 2022-04-24T10:25:32Z

Ubuntu's kernel, which officially supports ZFS is compiled with the same options.

$ uname -a
Linux ubuntu 5.13.0-39-generic #44~20.04.1-Ubuntu SMP Thu Mar 24 16:43:35 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ awk '/^#/ { next } /PREEMPT/ { print }' /boot/config-5.13.0-39-generic 
CONFIG_PREEMPT_VOLUNTARY=y
CONFIG_HAVE_PREEMPT_DYNAMIC=y
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_DRM_I915_PREEMPT_TIMEOUT=640

Are you sure it's not something else on your system?

Of course this is anecdotal but I have not had latency problems running ZFS on a desktop. (with sufficient RAM!) But then again, I'm not doing realtime audio either.

Shawn8901 · 2022-04-26T16:05:30Z

for me personally i would not bet on that's exact that setting.
i noticed sound shuttering in situations of higher IO load, when having higher memory pressure (eG having ~ 30% free memory oh the system, which has 32G), which i did not have on my old installation. So thats just observation on my side but sounds similar to what OP has described on the upstream ticket.

But for me the combination of having memory pressure AND IO load is needed for that to happen, which differs a otc to OPs description in detail or better, adds an additional layer which may be the root cause.

Sadly i hadn't yet the time for a kernel compilation with the suggested settings to check if that also "fixes" my issue, or if its something different, but a system which freaks out on IO when having "just" 10G RAM left free sounds like a bad experience for me... :)

bryanasdev000 · 2022-05-09T14:07:36Z

for me personally i would not bet on that's exact that setting. i noticed sound shuttering in situations of higher IO load, when having higher memory pressure (eG having ~ 30% free memory oh the system, which has 32G), which i did not have on my old installation. So thats just observation on my side but sounds similar to what OP has described on the upstream ticket.

But for me the combination of having memory pressure AND IO load is needed for that to happen, which differs a otc to OPs description in detail or better, adds an additional layer which may be the root cause.

Sadly i hadn't yet the time for a kernel compilation with the suggested settings to check if that also "fixes" my issue, or if its something different, but a system which freaks out on IO when having "just" 10G RAM left free sounds like a bad experience for me... :)

You can try Liquorix, Xanmod or Zen kernel patch sets to test, they are all available on nixpkgs, note that support from ZFS or NVIDIA may lag a bit.

I run ZFS in some of my machines mainly with Xanmod or Zen and I do not see any lag, only when I am with heavy IO operation on some old HDDs.

Shawn8901 · 2022-05-09T17:28:28Z

You can try Liquorix, Xanmod or Zen kernel patch sets to test, they are all available on nixpkgs, note that support from ZFS or NVIDIA may lag a bit.

As mentioned in my very first post, i did try ZEN at some point, and that gave much better experience. but i switched back to stock, as i find config.boot.zfs.package.latestCompatibleLinuxPackages to be very handy to ensure to have a compatible kernel installed. And it this is sadly no ZEN.

How do you ensure that zfs modules are in a valid version, or is that something i dont have to care to much?
As i am using ZFS on root Its crucial to have it working at the end.

About the user experience on some IO, when scrubing kicks in (which is of course high IO and some lag is expected) but the system is not even usable any longer as its mostly unresponsive.
The zfs pool is hosted on a Samsung SSD 860 EVO which is not really some old HDD.

Not sure if its really the same "issue" poelzi bothers or described, as this is not really near RT scenarios, and i dont want to overtake the issue here, with my "problems", but a short feedback how you properly handle the kernel matching zfs would be handsome.

edit: i now understand how the kernelPackages are tied to the kernel in nixos, so question is now cleared

jcumming · 2022-08-03T01:08:55Z

Scheduling under load is a difficult problem to solve.

Can you isolate where the block latency is coming from? iostat -x and zpool iostat -vl are good debugging tools to identify if the latency is coming from the kernel or the device.

Shawn8901 · 2023-02-14T17:34:54Z

I somehow forgot about the issue and was today talking about it.
I nailed it quite clear down to write IO by sending around zfs datasets in my network.
When the affected PC is the sender everything works fine, as soon as it its the receiver sound sometimes begins to shutter.

The iostats look like the following:

Every 2.0s: zpool iostat -vl                                                                                                                                               pointalpha: Tue Feb 14 18:09:27 2023

                                                       capacity     operations     bandwidth    total_wait     disk_wait    syncq_wait    asyncq_wait  scrub   trim
pool                                                 alloc   free   read  write   read  write   read  write   read  write   read  write   read  write   wait   wait
---------------------------------------------------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
rpool                                                 461G   467G     28     82   535K  4.89M    1ms  163ms  384us  754us  197us     1s    4ms   12ms    2ms      -
  ata-Samsung_SSD_860_EVO_1TB_S3Z9NB0K403903D-part2   461G   467G     28     82   535K  4.89M    1ms  163ms  384us  754us  197us     1s    4ms   12ms    2ms      -
---------------------------------------------------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----

Every 2.0s: iostat -x                                                                                                                                                      pointalpha: Tue Feb 14 18:11:02 2023

Linux 6.1.7-xanmod1 (pointalpha)        02/14/23        _x86_64_        (16 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          19.00    0.01    3.80    0.19    0.00   77.01

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda             30.66    540.40     0.06   0.20    0.45    17.63   84.26   5212.12     0.79   0.93    0.57    61.86    0.00      0.00     0.00   0.00    0.00     0.00    1.21    2.89    0.07   3.56
sr0              0.00      0.00     0.00   0.00    6.00     2.22    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
zd0              0.01      0.25     0.00   0.00    0.13    27.62    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
zd16             0.01      0.18     0.00   0.00    0.08    24.87    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00

I am absolute no expert in reading this but i see a high f_await on iostat and high syncq_wait on zpool iostat.
When another machine is the receiver (where i sadly can not test the behavior as its a server and i dont know how to verity it) those two numbers are a lot lower.
But i am not sure how to interpret the numbers to be honest.
From %util the device should be chilling.

NixOS/nixpkgs#169457 Signed-off-by: Krzysztof Nazarewski <gpg@kdn.im>

IvanVolosyuk · 2023-04-26T08:51:00Z

> Every 2.0s: zpool iostat -vl                                                                                                                                               pointalpha: Tue Feb 14 18:09:27 2023
> 
>                                                        capacity     operations     bandwidth    total_wait     disk_wait    syncq_wait    asyncq_wait  scrub   trim
> pool                                                 alloc   free   read  write   read  write   read  write   read  write   read  write   read  write   wait   wait
> ---------------------------------------------------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
> rpool                                                 461G   467G     28     82   535K  4.89M    1ms  163ms  384us  754us  197us     1s    4ms   12ms    2ms      -
>   ata-Samsung_SSD_860_EVO_1TB_S3Z9NB0K403903D-part2   461G   467G     28     82   535K  4.89M    1ms  163ms  384us  754us  197us     1s    4ms   12ms    2ms      -
> ---------------------------------------------------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
>

163 milliseconds to do the write, when ~1 millisecond the actual drive latency. That's clearly CPU bound. On non-preemptive kernel all of this are running in high priority zfs io threads doing a lot of compression/encryption/checksumming. I wonder, what is the recordsize, compression algorithm, encryption settings for the system? It might be not enough cond_resched() calls in one or more of the corresponding code paths or zfs kernel thread priority is too high for the audio thread to be able to preempt it.

Before switching to preemptive kernel I was playing with zfs module parameters like the following with unconsistent level of success:

spl.spl_taskq_thread_bind=0
spl.spl_taskq_thread_priority=0

This should be set on the module load I think. Binding the threads may be especially bad for audio as usually only one cpu core handles audio interrupts and if zfs occupies that core and doesn't preempt in time it will cause stuttering.

Shawn8901 · 2023-04-26T18:33:36Z

163 milliseconds to do the write, when ~1 millisecond the actual drive latency. That's clearly CPU bound. On non-preemptive kernel all of this are running in high priority zfs io threads doing a lot of compression/encryption/checksumming. I wonder, what is the recordsize, compression algorithm, encryption settings for the system?

 zfs get recordsize,compression,encryption rpool
NAME   PROPERTY     VALUE           SOURCE
rpool  recordsize   128K            default
rpool  compression  zstd            local
rpool  encryption   off             default

It is kept on default beside using zfs as compression.

Before switching to preemptive kernel I was playing with zfs module parameters like the following with unconsistent level of success:
spl.spl_taskq_thread_bind=0
spl.spl_taskq_thread_priority=0
This should be set on the module load I think. Binding the threads may be especially bad for audio as usually only one cpu core handles audio interrupts and if zfs occupies that core and doesn't preempt in time it will cause stuttering.

ill try out if that results in a better experience.

edit:

Most of my bad UX has been resolved by setting

udev.extraRules =  ''
      ACTION=="add|change", KERNEL=="sd[a-z]*[0-9]*|mmcblk[0-9]*p[0-9]*|nvme[0-9]*n[0-9]*p[0-9]*", ENV{ID_FS_TYPE}=="zfs_member", ATTR{../queue/scheduler}="none"
'';

Which i found here:

EDIT 2
The udev change is in nixos since 23.11

I (according to the comment) wrote this somewhere and since then it has spread to many configs. https://github.com/search?q=artturin+ENV%7BID_FS_TYPE%7D%3D%3D%22zfs_member%22%2C+ATTR%7B..%2Fqueue%2Fscheduler%7D%3D%22none%22&type=code NixOS#169457 (comment)

Artturin · 2023-09-04T16:00:41Z

#250308

I (according to the comment) wrote this somewhere and since then it has spread to many configs. https://github.com/search?q=artturin+ENV%7BID_FS_TYPE%7D%3D%3D%22zfs_member%22%2C+ATTR%7B..%2Fqueue%2Fscheduler%7D%3D%22none%22&type=code #169457 (comment)

I (according to the comment) wrote this somewhere and since then it has spread to many configs. https://github.com/search?q=artturin+ENV%7BID_FS_TYPE%7D%3D%3D%22zfs_member%22%2C+ATTR%7B..%2Fqueue%2Fscheduler%7D%3D%22none%22&type=code NixOS#169457 (comment)

magnetophon · 2023-11-03T14:01:39Z

@poelzi Have you tried the proposed solution? Can we close the issue?

numkem · 2023-11-03T14:13:43Z

@poelzi Have you tried the proposed solution? Can we close the issue?

I've tried it before it was merged and even after and it didn't make a difference for me.

My principal problem is using atuin with a zfs root where the shell would hang while atuin does an insert in SQLite. It's been already referenced above.

The real-time patch at the top made the most difference but still I see it from time to time.

nazarewk · 2023-11-03T16:11:12Z

This is actually ZFS bug causing ftruncate hangups (affecting everything using sqlite as a database, not just atuin) as noted in atuinsh/atuin#952 (comment) which links to openzfs/zfs#14290

there are 2 "fixes" so far:

put the sqlite database files on tmpfs and synchronize them (with litestream?) to persistent storage as described in Frequent delays of up to 5s on cli commands atuinsh/atuin#952 (comment)
disable sync on the dataset holding sqlite database Frequent delays of up to 5s on cli commands atuinsh/atuin#952 (comment)

generic-github-user · 2024-01-19T13:18:31Z

I'm experiencing the same issue -- though (as far as I can tell) not only with Atuin, also Firefox, Konsole, KDE Plasma, etc... for reference, I have compression, deduplication, and encryption all disabled. Neither setting autotrim=on nor adding boot.kernelPackages = config.boot.zfs.package.latestCompatibleLinuxPackages; to my configuration seems to have made any difference. I would be happy to create a new dataset with sync=disabled for specific applications if I could isolate them, but it seems at this point that the issue is system-wide.

generic-github-user · 2024-01-20T20:55:14Z

Some other things I have tried (unsuccessfully) -- I'm interested in hearing if any of these worked for others, and what other options I should consider:

adding boot.kernelParams = [ "elevator=none" ];
setting sync=disabled for all datasets and rebooting
the udev modifications mentioned by Shawn8901
setting boot.kernelPackages = pkgs.linuxPackages_zen;
switching from KDE plasma to GNOME
updating spl_taskq_thread_bind and spl_taskq_thread_priority

At this point I am beginning to doubt my problem is with ZFS itself (or the desktop environment for that matter), though I'm not sure where else I should be looking.

Shawn8901 · 2024-01-21T11:48:29Z

* the `udev` modifications mentioned by Shawn8901

fyi in case someone else comes around the udev changes, they are applied by default since #250308 (#169457 (comment)) that should be in nixos stable since 23.11. So for that part there should not be the need to do any manual changes.

generic-github-user · 2024-01-22T01:53:03Z

Thanks, I wasn't aware of that. Rather embarrassingly, the main issue for me turned out to be a power-saving setting my laptop had automatically enabled without me noticing. Opening files/applications still lags sometimes, but it usually resolves itself after the first time (so I assume this is caching-related).

illode · 2024-02-25T18:37:21Z

Opening files/applications still lags sometimes, but it usually resolves itself after the first time (so I assume this is caching-related).

That could be https://discourse.nixos.org/t/plasma-emojier-too-slow-episode-iv/40130, so it might be fixed in Plasma 6.

SuperSandro2000 · 2024-02-27T20:44:37Z

I've also ran into this. Turned out I had battery saver on.

poelzi added the 0.kind: bug label Apr 20, 2022

veprbl added the 6.topic: kernel label Apr 23, 2022

nazarewk added a commit to nazarewk-iac/nix-configs that referenced this issue Apr 12, 2023

nix flake update & patch kernel for possible ZFS responsiveness

04f7247

NixOS/nixpkgs#169457 Signed-off-by: Krzysztof Nazarewski <gpg@kdn.im>

ellie mentioned this issue May 7, 2023

Frequent delays of up to 5s on cli commands atuinsh/atuin#952

Closed

Artturin mentioned this issue Aug 20, 2023

nixos/zfs: disable redundant scheduler #250308

Merged

12 tasks

nazarewk mentioned this issue Mar 25, 2024

sqlite backend extremely slow on zfs zhaofengli/attic#113

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using ZFS on desktop with stock kernel is bad experience #169457

Using ZFS on desktop with stock kernel is bad experience #169457

poelzi commented Apr 20, 2022 •

edited by Artturin

Mindavi commented Apr 20, 2022

poelzi commented Apr 20, 2022

poelzi commented Apr 20, 2022

poelzi commented Apr 20, 2022

Shawn8901 commented Apr 21, 2022

hmenke commented Apr 24, 2022 •

edited

Shawn8901 commented Apr 26, 2022

bryanasdev000 commented May 9, 2022

Shawn8901 commented May 9, 2022 •

edited

jcumming commented Aug 3, 2022

Shawn8901 commented Feb 14, 2023 •

edited

IvanVolosyuk commented Apr 26, 2023

Shawn8901 commented Apr 26, 2023 •

edited by Artturin

Artturin commented Sep 4, 2023

magnetophon commented Nov 3, 2023

numkem commented Nov 3, 2023 •

edited

nazarewk commented Nov 3, 2023 •

edited

generic-github-user commented Jan 19, 2024

generic-github-user commented Jan 20, 2024 •

edited

Shawn8901 commented Jan 21, 2024

generic-github-user commented Jan 22, 2024

illode commented Feb 25, 2024

SuperSandro2000 commented Feb 27, 2024

Using ZFS on desktop with stock kernel is bad experience #169457

Using ZFS on desktop with stock kernel is bad experience #169457

Comments

poelzi commented Apr 20, 2022 • edited by Artturin

Describe the bug

Steps To Reproduce

Expected behavior

Additional context

Notify maintainers

Metadata

Mindavi commented Apr 20, 2022

poelzi commented Apr 20, 2022

poelzi commented Apr 20, 2022

poelzi commented Apr 20, 2022

Shawn8901 commented Apr 21, 2022

hmenke commented Apr 24, 2022 • edited

Shawn8901 commented Apr 26, 2022

bryanasdev000 commented May 9, 2022

Shawn8901 commented May 9, 2022 • edited

jcumming commented Aug 3, 2022

Shawn8901 commented Feb 14, 2023 • edited

IvanVolosyuk commented Apr 26, 2023

Shawn8901 commented Apr 26, 2023 • edited by Artturin

Artturin commented Sep 4, 2023

magnetophon commented Nov 3, 2023

numkem commented Nov 3, 2023 • edited

nazarewk commented Nov 3, 2023 • edited

generic-github-user commented Jan 19, 2024

generic-github-user commented Jan 20, 2024 • edited

Shawn8901 commented Jan 21, 2024

generic-github-user commented Jan 22, 2024

illode commented Feb 25, 2024

SuperSandro2000 commented Feb 27, 2024

poelzi commented Apr 20, 2022 •

edited by Artturin

hmenke commented Apr 24, 2022 •

edited

Shawn8901 commented May 9, 2022 •

edited

Shawn8901 commented Feb 14, 2023 •

edited

Shawn8901 commented Apr 26, 2023 •

edited by Artturin

numkem commented Nov 3, 2023 •

edited

nazarewk commented Nov 3, 2023 •

edited

generic-github-user commented Jan 20, 2024 •

edited