Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

starting from kernel 5.10.x latency test fails systematically on multiple platforms #285

Open
bondagit opened this issue Nov 23, 2022 · 13 comments
Labels
wontfix This will not be worked on

Comments

@bondagit
Copy link

bondagit commented Nov 23, 2022

If I try the latency test application (test/latency.c) using a Linux Kernel 5.9.x with alsa-lib v1.2.4 on the following audio card:
PCH [HDA Intel PCH], device 1: ALC269VC Analog [ALC269VC Analog] (for example)
I get the following result:

sudo ./latency -C hw:1 -P hw:1 -f S16_LE -r 48000 -c 2  -s 10 -m 128 -M 128 
Scheduler set to Round Robin with priority 99...
Playback device is hw:1
Capture device is hw:1
Parameters are 48000Hz, S16_LE, 2 channels, non-blocking mode
Poll mode: no
Loop limit is 480000 frames, minimum latency = 128, maximum latency = 128
Hardware PCM card 1 'HDA Intel PCH' device 0 subdevice 0
Its setup is:
  stream       : PLAYBACK
  access       : RW_INTERLEAVED
  format       : S16_LE
  subformat    : STD
  channels     : 2
  rate         : 48000
  exact rate   : 48000 (48000/1)
  msbits       : 16
  buffer_size  : 128
  period_size  : 64
  period_time  : 1333
  tstamp_mode  : NONE
  tstamp_type  : MONOTONIC
  period_step  : 1
  avail_min    : 64
  period_event : 0
  start_threshold  : 2147483647
  stop_threshold   : 128
  silence_threshold: 0
  silence_size : 0
  boundary     : 4611686018427387904
  appl_ptr     : 0
  hw_ptr       : 0
Hardware PCM card 1 'HDA Intel PCH' device 0 subdevice 0
Its setup is:
  stream       : CAPTURE
  access       : RW_INTERLEAVED
  format       : S16_LE
  subformat    : STD
  channels     : 2
  rate         : 48000
  exact rate   : 48000 (48000/1)
  msbits       : 16
  buffer_size  : 128
  period_size  : 64
  period_time  : 1333
  tstamp_mode  : NONE
  tstamp_type  : MONOTONIC
  period_step  : 1
  avail_min    : 64
  period_event : 0
  start_threshold  : 2147483647
  stop_threshold   : 128
  silence_threshold: 0
  silence_size : 0
  boundary     : 4611686018427387904
  appl_ptr     : 0
  hw_ptr       : 0
Trying latency 128 frames, 2666.667us, 2.666667ms (375.0000Hz)
Success
Playback:
*** frames = 480128 ***
  state       : RUNNING
  trigger_time: 1907393.672215
  tstamp      : 0.000000
  delay       : 80
  avail       : 48
  avail_max   : 64
Capture:
*** frames = 480000 ***
  state       : RUNNING
  trigger_time: 1907393.672215
  tstamp      : 0.000000
  delay       : 0
  avail       : 0
  avail_max   : 16
Maximum read: 16 frames
Maximum read latency: 333.333us, 0.333333ms (3000.0000Hz)
Hardware sync
Playback time = 1907393.672215, Record time = 1907393.672215, diff = 0

The tests succeeded and final state is RUNNING

If I try the same tests on the same audio device and distribution but with a Linux Kernel >= 5.10.x it fails systematically and I get:

sudo ./lantency -C hw:1 -P hw:1 -f S16_LE -r 48000 -c 2  -s 10 -m 128 -M 128 
Scheduler set to Round Robin with priority 99...
Playback device is hw:0
Capture device is hw:0
Parameters are 48000Hz, S16_LE, 2 channels, non-blocking mode
Poll mode: no
Loop limit is 480000 frames, minimum latency = 128, maximum latency = 128
Hardware PCM card 0 'HDA Intel PCH' device 0 subdevice 0
Its setup is:
  stream       : PLAYBACK
  access       : RW_INTERLEAVED
  format       : S16_LE
  subformat    : STD
  channels     : 2
  rate         : 48000
  exact rate   : 48000 (48000/1)
  msbits       : 16
  buffer_size  : 128
  period_size  : 64
  period_time  : 1333
  tstamp_mode  : NONE
  tstamp_type  : MONOTONIC
  period_step  : 1
  avail_min    : 64
  period_event : 0
  start_threshold  : 2147483647
  stop_threshold   : 128
  silence_threshold: 0
  silence_size : 0
  boundary     : 4611686018427387904
  appl_ptr     : 0
  hw_ptr       : 0
Hardware PCM card 0 'HDA Intel PCH' device 0 subdevice 0
Its setup is:
  stream       : CAPTURE
  access       : RW_INTERLEAVED
  format       : S16_LE
  subformat    : STD
  channels     : 2
  rate         : 48000
  exact rate   : 48000 (48000/1)
  msbits       : 16
  buffer_size  : 128
  period_size  : 64
  period_time  : 1333
  tstamp_mode  : NONE
  tstamp_type  : MONOTONIC
  period_step  : 1
  avail_min    : 64
  period_event : 0
  start_threshold  : 2147483647
  stop_threshold   : 128
  silence_threshold: 0
  silence_size : 0
  boundary     : 4611686018427387904
  appl_ptr     : 0
  hw_ptr       : 0
Trying latency 128 frames, 2666.667us, 2.666667ms (375.0000Hz)
Failure
Playback:
*** frames = 90720 ***
  state       : XRUN
  trigger_time: 967.658489
  tstamp      : 0.000000
  delay       : 0
  avail       : 216
  avail_max   : 216
Capture:
*** frames = 90592 ***
  state       : XRUN
  trigger_time: 967.658489
  tstamp      : 0.000000
  delay       : 0
  avail       : 88
  avail_max   : 88
Maximum read: 16 frames
Maximum read latency: 333.333us, 0.333333ms (3000.0000Hz)
Hardware sync

I get an XRUN during the test execution.
I can reproduce the same error on multiple boards for multiple audio cards and on different Linux distribution just changing the Linux kernel version.
The problem was originally reported at:
AES67 Linux daemon: Latency test fails with Linux kernel starting from 5.10.0x

I am interested in testing with low latency (-m 128 -M 128).
My guess is that something changed in the Linux audio core and the latency application has to be adapted.
Any idea ?

@perexg
Copy link
Member

perexg commented Nov 23, 2022

It may not be an audio issue at all. Some other task / kernel driver may eat CPU time preventing the low-latency operation. From the result:

   avail       : 216
   avail_max   : 216

It means that the playback is missing 88 samples (216-128) at the under-run check.

You can trace the syscalls using strace -r -o trace.txt like strace -r -s 256 -o trace.txt ./latency -C hw:2 -P hw:2 -f S16_LE -r 48000 -c 2 -m 128 -M 128 -X:

     0.000027 write(1, "[146516us] POS: p=72 (min=40, rt=817us) c=8 (max=40, rt=-183us)\n", 64) = 64
     0.000030 ioctl(5, SNDRV_PCM_IOCTL_READI_FRAMES, 0x7ffcc4c7c000) = 0
     0.000029 ioctl(4, SNDRV_PCM_IOCTL_WRITEI_FRAMES, 0x7ffcc4c7bfd0) = 0
     0.000026 ioctl(5, SNDRV_PCM_IOCTL_HWSYNC, 0) = 0
     0.000029 ioctl(4, SNDRV_PCM_IOCTL_HWSYNC, 0x1) = 0
     0.000028 write(1, "[146660us] POS: p=72 (min=40, rt=840us) c=8 (max=40, rt=-160us)\n", 64) = 64
     0.000030 ioctl(5, SNDRV_PCM_IOCTL_READI_FRAMES, 0x7ffcc4c7c000) = 0
     0.000097 ioctl(4, SNDRV_PCM_IOCTL_WRITEI_FRAMES, 0x7ffcc4c7bfd0) = 0
     0.000027 ioctl(5, SNDRV_PCM_IOCTL_HWSYNC, 0) = 0
     0.000136 ioctl(4, SNDRV_PCM_IOCTL_HWSYNC, 0x1) = 0
     0.000270 write(1, "[147220us] POS: p=64 (min=40, rt=613us) c=8 (max=40, rt=-553us)\n", 64) = 64
     0.000059 ioctl(5, SNDRV_PCM_IOCTL_READI_FRAMES, 0x7ffcc4c7c000) = 0
     0.000042 ioctl(4, SNDRV_PCM_IOCTL_WRITEI_FRAMES, 0x7ffcc4c7bfd0) = 0
     0.000054 ioctl(5, SNDRV_PCM_IOCTL_HWSYNC, 0) = 0
*    0.000871 ioctl(4, SNDRV_PCM_IOCTL_HWSYNC, 0x1) = 0         
     0.000061 write(1, "[148305us] POS: p=16 (min=16, rt=695us) c=32 (max=40, rt=-972us)\n", 65) = 65
     0.000074 ioctl(5, SNDRV_PCM_IOCTL_READI_FRAMES, 0x7ffcc4c7c000) = 0
     0.000101 ioctl(4, SNDRV_PCM_IOCTL_WRITEI_FRAMES, 0x7ffcc4c7bfd0) = -1 EPIPE (Broken pipe)
without -X:
     0.000020 ioctl(5, SNDRV_PCM_IOCTL_READI_FRAMES, 0x7ffda8f7c9b0) = 0
     0.000025 ioctl(4, SNDRV_PCM_IOCTL_WRITEI_FRAMES, 0x7ffda8f7c980) = 0
     0.000019 ioctl(5, SNDRV_PCM_IOCTL_READI_FRAMES, 0x7ffda8f7c9b0) = -1 EAGAIN (Resource temporarily unavailable)
     0.000022 ioctl(5, SNDRV_PCM_IOCTL_READI_FRAMES, 0x7ffda8f7c9b0) = -1 EAGAIN (Resource temporarily unavailable)
     0.000023 ioctl(5, SNDRV_PCM_IOCTL_READI_FRAMES, 0x7ffda8f7c9b0) = -1 EAGAIN (Resource temporarily unavailable)
     0.000022 ioctl(5, SNDRV_PCM_IOCTL_READI_FRAMES, 0x7ffda8f7c9b0) = -1 EAGAIN (Resource temporarily unavailable)
     0.000022 ioctl(5, SNDRV_PCM_IOCTL_READI_FRAMES, 0x7ffda8f7c9b0) = -1 EAGAIN (Resource temporarily unavailable)
*    0.001841 ioctl(5, SNDRV_PCM_IOCTL_READI_FRAMES, 0x7ffda8f7c9b0) = 0
     0.000081 ioctl(4, SNDRV_PCM_IOCTL_WRITEI_FRAMES, 0x7ffda8f7c980) = -1 EPIPE (Broken pipe)

Legend: I marked with * the scheduling time gaps

Perhaps, it may be worth to do tests the the vanilla kernel and do a bisect to the commit which affects this behavior. Another way is to do the kernel profiling.

@bondagit
Copy link
Author

bondagit commented Nov 23, 2022

Thanks for the info.
I can reproduce this by simply changing the kernel from any version <= 5.9.x to any version >= 5.10.x:
nothing else changes, just the kernel.
I'm using the precompiled kernel for the Ubuntu distro (with very similar configuration) and the problem is systematically reproducing on both X86_64 and ARM64 platforms.
Other people using various Linux distributions reported the same, so my guess is that the problem is related to some kernel change introduced starting from version 5.10.x impacting the low latency behaviour.

@perexg
Copy link
Member

perexg commented Nov 23, 2022

I did some debugging using ftrace (trace-cmd) and it really seems that the system scheduler blocks the latency task by a kworker task:

    latency-7395  [002] 22719.778930: hwptr:                pcmC0D0c/sub0: POS: pos=16, old=26768, base=26752, period=64, buf=128
    latency-7395  [002] 22719.778932: hwptr:                pcmC0D0c/sub0: POS: pos=16, old=26768, base=26752, period=64, buf=128
    latency-7395  [002] 22719.778934: hwptr:                pcmC0D0c/sub0: POS: pos=16, old=26768, base=26752, period=64, buf=128
    latency-7395  [002] 22719.778937: hwptr:                pcmC0D0c/sub0: POS: pos=48, old=26768, base=26752, period=64, buf=128
    latency-7395  [002] 22719.778938: applptr:              pcmC0D0c/sub0: prev=26768, curr=26800, avail=0, period=64, buf=128
    latency-7395  [002] 22719.778941: hwptr:                pcmC0D0p/sub0: POS: pos=96, old=26816, base=26752, period=64, buf=128
    latency-7395  [002] 22719.778942: applptr:              pcmC0D0p/sub0: prev=26896, curr=26928, avail=48, period=64, buf=128
    latency-7395  [002] 22719.778949: sched_switch:         latency:7395 [0] S ==> kworker/2:1:5386 [120]
     <idle>-0     [007] 22719.778971: irq_handler_entry:    irq=182 name=snd_hda_intel:card0
     <idle>-0     [007] 22719.779013: hwptr:                pcmC0D0c/sub0: IRQ: pos=48, old=26800, base=26752, period=64, buf=128
     <idle>-0     [007] 22719.779070: irq_handler_exit:     irq=182 ret=handled
     <idle>-0     [007] 22719.779073: irq_handler_entry:    irq=182 name=snd_hda_intel:card0
     <idle>-0     [007] 22719.779084: irq_handler_exit:     irq=182 ret=unhandled
kworker/2:1-5386  [002] 22719.779162: sched_wakeup:         latency:7395 [0] CPU:002
     <idle>-0     [007] 22719.780474: irq_handler_entry:    irq=182 name=snd_hda_intel:card0
     <idle>-0     [007] 22719.780524: sched_wakeup:         kworker/7:1:5655 [120] CPU:007
     <idle>-0     [007] 22719.780567: hwptr:                pcmC0D0p/sub0: IRQ: pos=48, old=26848, base=26752, period=64, buf=128
     <idle>-0     [007] 22719.780567: xrun:                 pcmC0D0p/sub0: XRUN: old=26928, base=26880, period=64, buf=128
     <idle>-0     [007] 22719.780699: irq_handler_exit:     irq=182 ret=handled
     <idle>-0     [007] 22719.780701: irq_handler_entry:    irq=182 name=snd_hda_intel:card0
     <idle>-0     [007] 22719.780729: irq_handler_exit:     irq=182 ret=unhandled
     <idle>-0     [007] 22719.780736: sched_switch:         swapper/7:0 [120] R ==> kworker/7:1:5655 [120]
kworker/7:1-5655  [007] 22719.780744: sched_switch:         kworker/7:1:5655 [120] W ==> swapper/7:0 [120]
kworker/2:1-5386  [002] 22719.790242: sched_switch:         kworker/2:1:5386 [120] R ==> latency:7395 [0]
    latency-7395  [002] 22719.790455: hw_mask_param:        pcmC0D0p:0 000/025 ACCESS 0000000000000000ffffffffffffffff 00000000000000000000000000000009
    latency-7395  [002] 22719.790455: hw_mask_param:        pcmC0D0p:0 000/025 FORMAT 0000000000000000ffffffffffffffff 00000000000000000000000000000404
    latency-7395  [002] 22719.790456: hw_mask_param:        pcmC0D0p:0 000/025 SUBFORMAT 0000000000000000ffffffffffffffff 00000000000000000000000000000001

As you can see, the latency task was rescheduled at 22719.778949 and system gave the CPU back at 22719.790242 . So the time difference is 0.011292 seconds (~11.3ms). @tiwai: FYI - could you check my interpretation, please?

It really appears like a problem in the linux scheduler or the kworker task.

@perexg
Copy link
Member

perexg commented Nov 23, 2022

I think that I found the upstream change between 5.9 and 5.10 kernels in the scheduler which causes this behavior:

https://lore.kernel.org/lkml/c596a06773658d976fb839e02843a459ed4c2edf.1479204252.git.bristot@redhat.com/

See the patch description. You can set the old behaviour using this command:

sysctl -w kernel.sched_rt_runtime_us=1000000

Note that this command may be dangerous - see the patch description. The task can eat whole CPU preventing to run other tasks.

@perexg perexg added the wontfix This will not be worked on label Nov 23, 2022
@perexg
Copy link
Member

perexg commented Nov 23, 2022

For the archiving purposes - trace-cmd commands:

LATENCY=128 ; USLEEP=200
trace-cmd start -b 10000 -e 'sched_wakeup*' -e sched_switch -e gpio_value -e irq_handler_entry -e irq_handler_exit -e snd_pcm \
/home/redhat/git/alsa-lib/test/.libs/latency -C hw:0 -P hw:0 -f S16_LE -r 48000 -c 2 -s 10 -m "${LATENCY}" -M "${LATENCY}" -y "${USLEEP}"
trace-cmd stop
trace-cmd extract

eventually (to check workqueues):

trace-cmd start -b 10000 -e 'sched_wakeup*' -e sched_switch -e workqueue ...

Use kernelshark or trace-cmd report to analyze the trace.dat file.

@bondagit
Copy link
Author

Great catch ! this seems to work for me.
I will test it on some more platforms and report here.
Thanks a lot for the prompt support !

@tiwai
Copy link
Contributor

tiwai commented Nov 24, 2022

Would a different RT setup in latency program change the behavior? It sounds like a kernel regression if it shows such a latency even with the highest RT priority.

@perexg
Copy link
Member

perexg commented Nov 24, 2022

Would a different RT setup in latency program change the behavior? It sounds like a kernel regression if it shows such a latency even with the highest RT priority.

The latency.c sets the scheduler to round-robin (SCHED_RR) with the maximal priority 99. The mentioned kernel patch allows to interrupt tasks even with this highest latency with some non-realtime tasks and it seems that it does in a large time window (from the realtime perspective) - tenths of milliseconds. From my analysis, the workqueue tasks are fine (up to 1ms), but the scheduler wake the latency (busy-loop task) too late.

I added new option to latency test code to allow using SCHED_FIFO, but the results are not good, too. It seems that the scheduler penalty for the busy-loop programs is high in the current kernels and the scheduler does not use "free" CPUs rather than interrupt this busy task.

For reference (pipewire discussion): https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/1271

@bondagit
Copy link
Author

bondagit commented Nov 24, 2022

Would a different RT setup in latency program change the behavior? It sounds like a kernel regression if it shows such a latency even with the highest RT priority.

On the platforms I am currently testing if I change the latency application and skip the setscheduler() call at all, the latency app works fine on kernel >= 5.10.x. (I mean even without setting kernel.sched_rt_runtime_us param).
Final behaviour with the new kernel looks anyway unexpected to me.

With a kernel <= 5.9.x I have the opposite behaviour and if I skip the setscheduler() call the latency test fails.

@perexg
Copy link
Member

perexg commented Nov 24, 2022

Wow, I did not expect that. It seems that SCHED_OTHER is better than FIFO/RR with the current kernels. I added the SCHED_OTHER policy support to latency.c - -O other (75f8e2e). So we can select the policy without the code modification now.

@bondagit
Copy link
Author

bondagit commented Nov 24, 2022

it sounds ok to me.
Just to summarise.
According to my tests SCHED_OTHER is required to make latency test work with a kernel >= 5.10.x and SCHED_RR is required with a kernel <= 5.9.x

@bondagit
Copy link
Author

bondagit commented Nov 24, 2022

In addition to allow the user to specify the scheduler policy, can the latency application apply a default scheduler according to the above findings ?

@perexg
Copy link
Member

perexg commented Nov 24, 2022

I think that it would be bad to change the application defaults according the kernel version. Also, it's clearly the kernel scheduler problem which should be reported to LKML (linux kernel mailing list) and scheduler maintainers - see https://github.com/torvalds/linux/blob/c3eb11fbb826879be773c137f281569efce67aa8/MAINTAINERS#L18348 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

3 participants