Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latency test fails with Linux kernel starting from 5.10.0x #79

Closed
MichalBury opened this issue Apr 25, 2022 · 29 comments
Closed

Latency test fails with Linux kernel starting from 5.10.0x #79

MichalBury opened this issue Apr 25, 2022 · 29 comments

Comments

@MichalBury
Copy link

Hello,

I made some test on Raspberry4 and CM4. I have still XRUN when running latency_test.
I have tried various Raspberry modules, Raspbian, Ubuntu etc. Tryed stock kernels, also preempt kernels.
Can some share good working setup for Low latency usage on Raspberry?

Some info from latest test :

  • ubuntu 21.10
  • stock kernel Linux 5.13.0-1025-raspi aarch64

./run_test.sh S16_LE 48000 2 1

Compiling tools ...
make: Nothing to be done for 'all'.
Creating test file ...
test file created
Creating configuration files ..
net.ipv4.igmp_max_memberships = 66
Starting PTP master ...
Starting AES67 daemon ...
Waiting for PTP slave to sync ...
Starting to record 60 sec from sink ...
Recording raw data '/tmp/sink_test.raw' : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo
Starting to playback test on source ...
Recording to file "test/sink_test.raw" successfull
Test result:
ok
Terminating processes ...
daemon exiting with code: 0

./run_latency_test.sh S16_LE 48000 2 60 128

Compiling tools ...
make: Nothing to be done for 'all'.
Using buffer size of 128 frames
Creating configuration files ..
net.ipv4.igmp_max_memberships = 66
Starting PTP master ...
Starting AES67 daemon ...
Waiting for PTP slave to sync ...
Running 10 secs latency test
Scheduler set to Round Robin with priority 99...
Playback device is hw:RAVENNA
Capture device is hw:RAVENNA
Parameters are 48000Hz, S16_LE, 2 channels, non-blocking mode
Poll mode: no
Loop limit is 2880000 frames, minimum latency = 64, maximum latency = 64
Hardware PCM card 1 'Merging RAVENNA' device 0 subdevice 0
Its setup is:
stream : PLAYBACK
access : RW_INTERLEAVED
format : S16_LE
subformat : STD
channels : 2
rate : 48000
exact rate : 48000 (48000/1)
msbits : 16
buffer_size : 128
period_size : 64
period_time : 1333
tstamp_mode : NONE
tstamp_type : MONOTONIC
period_step : 1
avail_min : 64
period_event : 0
start_threshold : 2147483647
stop_threshold : 128
silence_threshold: 0
silence_size : 0
boundary : 4611686018427387904
appl_ptr : 0
hw_ptr : 0
Hardware PCM card 1 'Merging RAVENNA' device 0 subdevice 0
Its setup is:
stream : CAPTURE
access : RW_INTERLEAVED
format : S16_LE
subformat : STD
channels : 2
rate : 48000
exact rate : 48000 (48000/1)
msbits : 16
buffer_size : 128
period_size : 64
period_time : 1333
tstamp_mode : NONE
tstamp_type : MONOTONIC
period_step : 1
avail_min : 64
period_event : 0
start_threshold : 2147483647
stop_threshold : 128
silence_threshold: 0
silence_size : 0
boundary : 4611686018427387904
appl_ptr : 0
hw_ptr : 0
Trying latency 128 frames, 2666.667us, 2.666667ms (375.0000Hz)
Failure
Playback:
*** frames = 69888 ***
state : XRUN
trigger_time: 5053.170167
tstamp : 0.000000
delay : 0
avail : 64
avail_max : 64
Capture:
*** frames = 69760 ***
state : XRUN
trigger_time: 5053.170169
tstamp : 0.000000
delay : 0
avail : 128
avail_max : 128
Maximum read: 64 frames
Maximum read latency: 1333.333us, 1.333333ms (750.0000Hz)
End to end latency: 6.665 msecs
Terminating processes ...
daemon exiting with code: 0

@bondagit
Copy link
Owner

bondagit commented Apr 26, 2022

something is weird with your execution. From the traces you enclosed you have:
Loop limit is 2880000 frames, minimum latency = 64, maximum latency = 64
whereas is should be:
Loop limit is 2880000 frames, minimum latency = 128, maximum latency = 128
Are you using the latest daemon sources from master branch ?

@MichalBury
Copy link
Author

Yes I have double checked that the latest branch is used, I have simple done
git clone https://github.com/bondagit/aes67-linux-daemon

and runned build.sh

@MichalBury
Copy link
Author

Hello,

i have done one more test with Raspbian 64bit, installed from new SD card. The result is the same "XRUN".
The "Loop limit" is now OK.

Kernel: 5.15.32-v8+ #1538 SMP PREEMPT

./run_test.sh S16_LE 48000 2 1

Creating test file ...
test file created
Creating configuration files ..
net.ipv4.igmp_max_memberships = 66
Starting PTP master ...
Starting AES67 daemon ...
Waiting for PTP slave to sync ...
Starting to record 60 sec from sink ...
Recording raw data '/tmp/sink_test.raw' : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo
Starting to playback test on source ...
Recording to file "test/sink_test.raw" successfull
Test result:
ok
Terminating processes ...
daemon exiting with code: 0

./run_latency_test.sh S16_LE 48000 2 60 128

Compiling tools ...
make: Nothing to be done for 'all'.
Using buffer size of 128 frames
Creating configuration files ..
net.ipv4.igmp_max_memberships = 66
Starting PTP master ...
Starting AES67 daemon ...
Waiting for PTP slave to sync ...
Running 10 secs latency test
Scheduler set to Round Robin with priority 99...
Playback device is hw:RAVENNA
Capture device is hw:RAVENNA
Parameters are 48000Hz, S16_LE, 2 channels, non-blocking mode
Poll mode: no
Loop limit is 2880000 frames, minimum latency = 128, maximum latency = 128
Hardware PCM card 2 'Merging RAVENNA' device 0 subdevice 0
Its setup is:
stream : PLAYBACK
access : RW_INTERLEAVED
format : S16_LE
subformat : STD
channels : 2
rate : 48000
exact rate : 48000 (48000/1)
msbits : 16
buffer_size : 128
period_size : 64
period_time : 1333
tstamp_mode : NONE
tstamp_type : MONOTONIC
period_step : 1
avail_min : 64
period_event : 0
start_threshold : 2147483647
stop_threshold : 128
silence_threshold: 0
silence_size : 0
boundary : 4611686018427387904
appl_ptr : 0
hw_ptr : 0
Hardware PCM card 2 'Merging RAVENNA' device 0 subdevice 0
Its setup is:
stream : CAPTURE
access : RW_INTERLEAVED
format : S16_LE
subformat : STD
channels : 2
rate : 48000
exact rate : 48000 (48000/1)
msbits : 16
buffer_size : 128
period_size : 64
period_time : 1333
tstamp_mode : NONE
tstamp_type : MONOTONIC
period_step : 1
avail_min : 64
period_event : 0
start_threshold : 2147483647
stop_threshold : 128
silence_threshold: 0
silence_size : 0
boundary : 4611686018427387904
appl_ptr : 0
hw_ptr : 0
Trying latency 128 frames, 2666.667us, 2.666667ms (375.0000Hz)
Failure
Playback:
*** frames = 45568 ***
state : XRUN
trigger_time: 2129.030265
tstamp : 0.000000
delay : 0
avail : 64
avail_max : 64
Capture:
*** frames = 45440 ***
state : XRUN
trigger_time: 2129.030266
tstamp : 0.000000
delay : 0
avail : 128
avail_max : 128
Maximum read: 64 frames
Maximum read latency: 1333.333us, 1.333333ms (750.0000Hz)
End to end latency: 7.999 msecs
Terminating processes ...
daemon exiting with code: 0

@bondagit
Copy link
Owner

bondagit commented Apr 27, 2022

you can try to increase the latency, for example to 256 frames:
./run_latency_test.sh S16_LE 48000 2 60 256
If this is not working, you can try to double the latency at each step.

@MichalBury
Copy link
Author

tried up to 2048. still same result.
but I need to stay at 128 to have "end to en latency" max 10ms.

So I'm wondering. Seen post that it's runnig on RPi4 with low latency without problems.
Checked the Kernel setup. It's Preemp, Config_hz=250.

@bondagit
Copy link
Owner

I don't have this board so I cannot test directly.
Can you report the output of:
cat /proc/cpuinfo
and
uname -a

@MichalBury
Copy link
Author

required info:

cat /proc/cpuinfo

processor : 0
BogoMIPS : 108.00
Features : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3

processor : 1
BogoMIPS : 108.00
Features : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3

processor : 2
BogoMIPS : 108.00
Features : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3

processor : 3
BogoMIPS : 108.00
Features : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 3

Hardware : BCM2835
Revision : c03140
Serial : 1000000033b333ce
Model : Raspberry Pi Compute Module 4 Rev 1.0

uname -a
Linux aes67test 5.15.32-v8+ #1538 SMP PREEMPT Thu Mar 31 19:40:39 BST 2022 aarch64 GNU/Linux

@bondagit
Copy link
Owner

I don't see anything evident unfortunately. Everything seems ok.

@MichalBury
Copy link
Author

You have written that you runned the test on NanoPi NEO2. Can you share your kernel config and version which was used for test.
thank you.

@bondagit
Copy link
Owner

I used a NanoPI NEO2.

uname -a

Linux nanopineo2 5.8.8-sunxi64 #trunk SMP Sun Sep 13 01:50:54 PDT 2020 aarch64 aarch64 aarch64 GNU/Linux

In enclosed the kernel config I used.
config.gz

@neoscopio
Copy link

I also have this problem with regular desktop computers, I'm guessing something changed in newer kernels that affect latency. Thanks for the config, I will also try on intel and ryzen.

I can't test below 128 frame. Any ideia why? If I specify e.g. 64, the test is still done for 128. Is it an Alsa thing?

Thanks

@bondagit
Copy link
Owner

bondagit commented Apr 29, 2022

Any ideia why? If I specify e.g. 64, the test is still done for 128. Is it an Alsa thing?

The driver works with a tick frame size set by the parameter:
"tic_frame_size_at_1fs": 64
in general on Linux it doesn't make any sense to go below 48: all my attempts to go below 1ms @ 48Khz failed.
Having clarified this we need at least two buffers of tic_frame_size_at_1fs frames each to avoid under-run on the player/capture sides.
So the driver won't allow any buffer size programming below (2 * tic_frame_size_at_1fs), 128 frames in this case.

@bondagit
Copy link
Owner

bondagit commented Apr 29, 2022

If you change "tic_frame_size_at_1fs" to 48 in test/daemon.conf you can run a test with 96 frames and this is the minimum latency I could successfully achieve on a NanoPI NEO2 board.

./run_latency_test.sh S16_LE 48000 2 60 96 
Compiling tools ...
make: Nothing to be done for 'all'.
Using buffer size of 96 frames
Creating configuration files ..
net.ipv4.igmp_max_memberships = 66
Starting PTP master ...
Starting AES67 daemon ...
Waiting for PTP slave to sync ...
Running 10 secs latency test
Scheduler set to Round Robin with priority 99...
Playback device is hw:RAVENNA
Capture device is hw:RAVENNA
Parameters are 48000Hz, S16_LE, 2 channels, non-blocking mode
Poll mode: no
Loop limit is 2880000 frames, minimum latency = 96, maximum latency = 96
Hardware PCM card 1 'Merging RAVENNA' device 0 subdevice 0
Its setup is:
  stream       : PLAYBACK
  access       : RW_INTERLEAVED
  format       : S16_LE
  subformat    : STD
  channels     : 2
  rate         : 48000
  exact rate   : 48000 (48000/1)
  msbits       : 16
  buffer_size  : 96
  period_size  : 48
  period_time  : 1000
  tstamp_mode  : NONE
  tstamp_type  : MONOTONIC
  period_step  : 1
  avail_min    : 48
  period_event : 0
  start_threshold  : 2147483647
  stop_threshold   : 96
  silence_threshold: 0
  silence_size : 0
  boundary     : 6917529027641081856
  appl_ptr     : 0
  hw_ptr       : 0
Hardware PCM card 1 'Merging RAVENNA' device 0 subdevice 0
Its setup is:
  stream       : CAPTURE
  access       : RW_INTERLEAVED
  format       : S16_LE
  subformat    : STD
  channels     : 2
  rate         : 48000
  exact rate   : 48000 (48000/1)
  msbits       : 16
  buffer_size  : 96
  period_size  : 48
  period_time  : 1000
  tstamp_mode  : NONE
  tstamp_type  : MONOTONIC
  period_step  : 1
  avail_min    : 48
  period_event : 0
  start_threshold  : 2147483647
  stop_threshold   : 96
  silence_threshold: 0
  silence_size : 0
  boundary     : 6917529027641081856
  appl_ptr     : 0
  hw_ptr       : 0
Trying latency 96 frames, 2000.000us, 2.000000ms (500.0000Hz)
Success
Playback:
*** frames = 2880096 ***
  state       : RUNNING
  trigger_time: 5438.701330
  tstamp      : 0.000000
  delay       : 96
  avail       : 0
  avail_max   : 48
Capture:
*** frames = 2880000 ***
  state       : RUNNING
  trigger_time: 5438.701334
  tstamp      : 0.000000
  delay       : 0
  avail       : 0
  avail_max   : 48
Maximum read: 48 frames
Maximum read latency: 1000.000us, 1.000000ms (1000.0000Hz)
Playback time = 5438.701330, Record time = 5438.701334, diff = -4
End to end latency: 5.998 msecs
Terminating processes ...
daemon exiting with code: 0

@bondagit
Copy link
Owner

bondagit commented Apr 29, 2022

The reason I decided to set the "tic_frame_size_at_1fs" to 64 in the default configuration was to keep the driver ALSA interface usable via JACK (number of frames between JACK calls must be a power of 2)

@MichalBury
Copy link
Author

I've made a test on my Server with 2 x Xeon L5640 (24 Cores).
It's running Ubuntu 20.04 LTS with Kernel 5.4.0-109-generic.
Compiled and runned Latency test. Everything was OK.

The I have installed latest released kernel for this system 5.13.0-41-generic.
Recompiled the project.
Now I have "XRUN" on latency test.

So it seems that something major has changed in kernel.
Will try to compile 5.8.8 Kernel version for Raspberry and make a test.

@bondagit
Copy link
Owner

bondagit commented May 1, 2022

Thanks for sharing. This is interesting information.

@bondagit
Copy link
Owner

bondagit commented May 2, 2022

I've made a test on my Server with 2 x Xeon L5640 (24 Cores). It's running Ubuntu 20.04 LTS with Kernel 5.4.0-109-generic. Compiled and runned Latency test. Everything was OK.

The I have installed latest released kernel for this system 5.13.0-41-generic. Recompiled the project. Now I have "XRUN" on latency test.

So it seems that something major has changed in kernel. Will try to compile 5.8.8 Kernel version for Raspberry and make a test.

This is confirmed also on my N40 mini PC with Intel® Celeron® Processor N4020 , 2 Cores/2 Threads:
the latency tests fail systematically with both kernel 5.11.0-49-generic and 5.11.0-49-lowlatency and they work instead with kernel 5.8.9-050809-generic.

@MichalBury
Copy link
Author

I have made the same test on Raspberry PI4 with three different stock kernels.
5.15.32 - latency test XRUN
5.10.17 - latency test XRUN
5.4.79 - latency test OK

So now I need to find precisely from which Kernel version it's not running properly.

@bondagit
Copy link
Owner

bondagit commented May 3, 2022

it worked for me with last kernel of 5.9.x.
So my guess is that the problem started at some point of 5.10.x. When we find where the problem started it should be easier to find the root cause of the issue.

@neoscopio
Copy link

I'm relative new to the Pi when it comes to compile or change the kernel. I'm a bit embarrassed, but I don't know how to change the kernel version. Regular apt install throws an error about creating links, same kernel gets loaded. Can someone point me to a tutorial? Do I really need to change the kernel by mounting the SD card in another computer?

Thanks

@neoscopio
Copy link

neoscopio commented May 4, 2022

Btw: I've tryed mitigations=off kernel flag on desktop, but stil got xruns for frames =< 2048.

@bondagit
Copy link
Owner

bondagit commented May 4, 2022

Can someone point me to a tutorial?

in general it depends on the distro you are using if they offer some easy way to change kernel.
For example for Ubuntu you can use: https://www.how2shout.com/linux/how-to-change-default-kernel-in-ubuntu-22-04-20-04-lts/

@neoscopio
Copy link

For example for Ubuntu you can use: https://www.how2shout.com/linux/how-to-change-default-kernel-in-ubuntu-22-04-20-04-lts/

Thanks, my problem is that is not working for the Raspberry Pi. No grub, for one. Nm, I'll get around it.

@bondagit
Copy link
Owner

bondagit commented May 7, 2022

I did some more tests and the XRUN issue starts from kernel 5.10.0 on. The issue happens for both X86 and ARM platforms. For this reason, for the time being, I suggest using a kernel version <= 5.9.x

@bondagit
Copy link
Owner

bondagit commented May 7, 2022

Thanks, my problem is that is not working for the Raspberry Pi. No grub, for one. Nm, I'll get around it.

yes, usually on ARM boards in addition to the kernel and the headers you have to install the kernel dtb too.
After these you cloud also have to update your bootloader configuration (usually U-boot).
For armbian you can find instruction here: https://www.armbian.com/nanopi-neo-2/

@bondagit bondagit changed the title XRUN on Raspberry4 / CM4 Latency test fails with Linux kernel starting from 5.10.0x May 18, 2022
@ArrEssJay
Copy link

ArrEssJay commented Oct 10, 2022

Another data point: If I build the upstream driver (master) with this patch only:

https://bitbucket.org/MergingTechnologies/ravenna-alsa-lkm/pull-requests/5

Running it in combination with their daemon, I don't hear the glitching due to this latency issue on Kernel 5.18. There seems to be incompatibility in the API which prevents that build of the driver being tested with the daemon from this repo:

http_server:: GET /cometd/ response 404

I do hear the issue running the driver/daemon built from this repo. I can see some additional driver code changes upstream that I suspect would be worth investigating the relevance of.

@bondagit
Copy link
Owner

The problem reproduces on multiple audio cards by using the original latency application in alsa-lib.
So it doesn't seem to be related to the RAVENNA driver.

@bondagit
Copy link
Owner

bondagit commented Nov 23, 2022

perexg commented 11 minutes ago

I think that I found the upstream change between 5.9 and 5.10 kernels in the scheduler which causes this behavior:
https://lore.kernel.org/lkml/c596a06773658d976fb839e02843a459ed4c2edf.1479204252.git.bristot@redhat.com/
See the patch description. You can set the old behaviour using this command:
sysctl -w kernel.sched_rt_runtime_us=1000000
Note that this command may be dangerous - see the patch description. The task can eat whole CPU preventing to run other tasks.

@bondagit
Copy link
Owner

bondagit commented Nov 23, 2022

I will run some more tests and add this setting to the documentation and scripts.
On kernel >= 5.10.x use the following setting to restore old kernel scheduler behaviour:

sysctl -w kernel.sched_rt_runtime_us=1000000

bondagit added a commit that referenced this issue Nov 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants