New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
occasional drops detected in output RTP stream #17
Comments
Hi @jtc-dolby, |
Also this a frequency of 250Hz the problem doesn't show up. |
You said that the problem doesn't show up with 250Hz but this is the default frequency in kernel config file. If this is true then why would expect a difference in behaviour with 100Hz? I will try 100Hz anyway and see if it changes anything. |
I tested with 100Hz. Not good news. I got drops using aplay both with and without the -M option (Memory mapped mode). (Before I got drops only with the -M option). The drops were much more frequent when using the -M option: 43 vs 5 drops over a 45min period. |
Try: |
Also you might want to try passing |
…ode on the network loopback interface. This can be used t odebug issue #17.
I have finally found some time to proceed on the debugging of this issue and I have just pushed on the branch test_issue_17 the files required to run a test on the loopback network interface and to verify the integrity of the wav file recorded.
To verify that the recording is correct:
The expected output is : "ok", otherwise the program prints out the file position where the corruption was detected. |
Thanks for the new information. It’s been a while since I looked at it so I can see if it is still an issue on my end. The way I was testing was to use ‘aplay’ to transmit a known sequence and then recording the output. I was seeing errors with ‘-M’ switch i.e. memory mapped mode but not using regular mode. I can repeat this test for you if that will help.
From: Andrea Bondavalli <notifications@github.com>
Reply-To: bondagit/aes67-linux-daemon <reply@reply.github.com>
Date: Tuesday, January 19, 2021 at 11:44 AM
To: bondagit/aes67-linux-daemon <aes67-linux-daemon@noreply.github.com>
Cc: James Cowdery <JTC@dolby.com>, Mention <mention@noreply.github.com>
Subject: Re: [bondagit/aes67-linux-daemon] occasional drops or stream realignment detected in output RTP stream (#17)
I have finally found some time to proceed on the debugging of this issue and I have just pushed on the branch test_issue_17 the files required to run a test on the loopback network interface and to verify the integrity of the wav file recorded.
In the test I configure a single 48Khz L24 stereo source and the corresponding sink to playback and record on the loopback network interface a sample wav file created artificially to enable simple detection of corruptions on the recording.
To run the test and create the recording:
./run_test.sh
To verify that the recording is correct:
cd test
c++ check.cc -o check
./check
The expected output is : "ok", otherwise the program prints out the file position where the corruption was detected.
I have run the test successfully many times on both ARM and x86_64 platforms and I never had a single invalid recording.
In my tests I have tried to use both mermory mapped and read/write interleaved modes.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_bondagit_aes67-2Dlinux-2Ddaemon_issues_17-23issuecomment-2D763084458&d=DwMCaQ&c=lI8Zb6TzM3d1tX4iEu7bpg&r=AHk4BVn6sW_wKrFU1lW6bg&m=tGCTaPPW8se_0Y_E_IWspaq-FAhkrgvXTC4i8uynEko&s=hh1xX3NnpYEYcMK-xDaZcdMJf1s0rL1rgFBYqpU6wq8&e=>, or unsubscribe<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AFPBWXCWVY46IZ7JZ4CJ3YLS2XOJZANCNFSM4PSSEYQQ&d=DwMCaQ&c=lI8Zb6TzM3d1tX4iEu7bpg&r=AHk4BVn6sW_wKrFU1lW6bg&m=tGCTaPPW8se_0Y_E_IWspaq-FAhkrgvXTC4i8uynEko&s=tt8yWVREud5SsIW7lPHvw_8T9EjLVCyGsQstPTm0_1c&e=>.
|
hi @jtc-dolby , great to hear you again ;-) |
Thanks for the kind words. Yes I’m still here☺ I’ll try running the test as you suggest. For your reference, I use a Lawo receiver with 10ms latency.
From: Andrea Bondavalli <notifications@github.com>
Reply-To: bondagit/aes67-linux-daemon <reply@reply.github.com>
Date: Tuesday, January 19, 2021 at 12:03 PM
To: bondagit/aes67-linux-daemon <aes67-linux-daemon@noreply.github.com>
Cc: James Cowdery <JTC@dolby.com>, Mention <mention@noreply.github.com>
Subject: Re: [bondagit/aes67-linux-daemon] occasional drops or stream realignment detected in output RTP stream (#17)
hi @jtc-dolby<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jtc-2Ddolby&d=DwMCaQ&c=lI8Zb6TzM3d1tX4iEu7bpg&r=AHk4BVn6sW_wKrFU1lW6bg&m=Rvv-7mj2Kbqz2zNnEnmEKwUHOanv6WaM5ZDqccIP_vM&s=c1VozeJYaRRotBGXclFJOcOfmXQLfV8XRere4-5E-dQ&e=> , great to hear you again ;-)
Can you just try to run the simple test on the test_issue_17 branch ? I'd like to hear the results. Thanks.
Anyway this is not conclusive analysis and I am going to perform more device to device (not loopback) tests.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_bondagit_aes67-2Dlinux-2Ddaemon_issues_17-23issuecomment-2D763095075&d=DwMCaQ&c=lI8Zb6TzM3d1tX4iEu7bpg&r=AHk4BVn6sW_wKrFU1lW6bg&m=Rvv-7mj2Kbqz2zNnEnmEKwUHOanv6WaM5ZDqccIP_vM&s=tEsntt_Pq0ty_JtNBAlCX9m-Ec2KEmoe0pLs2kJWnb4&e=>, or unsubscribe<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AFPBWXHFZZXDCAUWJUCBU2DS2XQSRANCNFSM4PSSEYQQ&d=DwMCaQ&c=lI8Zb6TzM3d1tX4iEu7bpg&r=AHk4BVn6sW_wKrFU1lW6bg&m=Rvv-7mj2Kbqz2zNnEnmEKwUHOanv6WaM5ZDqccIP_vM&s=mqnuOw_aOoYevp8zAVYrYIs1FWf1tImCOP_Fdu4vNFY&e=>.
|
After additional tests on the X86 platform I can reproduce a corruption of the output file by changing the driver base timer period to 48 ("tic_frame_size_at_1fs": 48) in the daemon configuration file used by the test (test/daemon.conf). This was set to 192 in my commit. |
I have executed the 'run_test.sh' on the issue 17 branch several times without making any changes and the checker returns OK. Should I try again with tic set to 48 instead of 192? |
I ran the tests again with TIC=48 three times and it was OK everytime. |
Hi @bondagit ,is it possible to change the delay setting from 2ms to 5ms? |
I don't know how to do this, check with Audinate.
What platform are you using for the tests ? What Linux Kernel version ? |
Hi @jtc-dolby
So your platform looks ok, or at least it's ok with a 4 mins recording. Probably we should test it for a longer period of time. |
I did see drops fairly frequently (every minute or so). I don't have a Dante AVIO device but the Lawo devices I do have gather statistics regarding late packets. I can use the same build on the same machine as I just ran the tests and see what happens. Is it possible there are differences between stream type or codec? My focus was the AM824 because this is carrying data and that is covered by a CRC so corruption is very apparent. I can put data inside L16/L24 for testing purposes. |
Testing issue 17 with "tic_frame_size_at_1fs": 48 on various Raspberry Pis Device A (10.1.14.120) Device B (10.1.14.77) Device C (10.1.14.55) Device D (10.1.14.92) |
Great job, thank you. The results were as expected as I was able to get it to work correctly even on a board with an ARM SOC older than the ones you tested. |
Thanks, I think this should help to narrow down the issue: If you have late packets it can be that your receiver device is dropping them (replacing the late samples with silence).
Yes, a different codec can make a difference. For this reason I was also thinking to enhance this test suite to allow for different configurations. In the meantime you could try modifying my test to play an AM824 (L32) file and record an L32 wav file. Afterwards you should be able to use the CRC to detect registration errors instead of relying on the samples sequence as I did. |
To run the test suite: ./run_test.sh sample_format sample_rate channels duration Where: sample_format can be one of S16_LE, S24_3LE, S32_LE sample_rate can be one of 44100, 48000, 96000 channels can be one of 1, 2, 4 duration is in the range 1 to 10 minutes The test suite creates a raw file with the specified parameters, runs a loopback test where the file gets played and recorded using the loopback network interface and checks that the recorded file, after the initial silence, contains the expected samples sequence. This test was developed to further investigate the issue #17.
Hi @jtc-dolby , On X86_64 with tic_frame_size_at_1fs set to 48 in test/daemon.conf, I have the following results: So, according to my tests the sample format doesn't affect the test result. One of the possible reasons is that the samples size of driver internal playback and capture buffers is always set to 32 bit. Can you repeat these 3 tests using your platform ? Thanks. |
…al playback buffer corruption in case the read/write interleaved mode is used. The problem arises from pre-buffering performed after ALSA prepare and before ALSA start trigger- The patch enables the early startup of the audio playback interrupt and disables the cleanup (mute) of the playback buffer. This issue can be reproduced using the test suite developed to investigate #17 and by removing the (-M) option from aplay in run_test.sh script.
I have just found another issue in the driver. This time it's related to the playback in read-write interleaved mode and it affects the initial samples of the output stream. |
Hi @jtc-dolby, did you have the chance to run the new tests? Thanks |
I ran the test. Results below. I didn't reinstall the driver between tests but just edited the conf file and reran the script. Is that correct? Here are my results from the tests: |
Yes, that's correct because the last test script uninstall and reinstall the driver at every run. |
OK. I think that is a reasonable theory. My machine is pretty old and 1ms is quite tight. I'd like to prove it by increasing the latency and making the problems go away. 192 (4ms) certainly looks better. Can I go higher? How about 480 (10ms)? |
yes, I have succesfully tried to raise it up 480 (10ms) included. If think even more should be possible.
This was not included at the time I ran the tests my side. I will revise it as soon as I have time. |
Hi @jtc-dolby, |
Hi Andrea, tests made: tic_frame_size_at_1fs set to 48:
tic_frame_size_at_1fs set to 96:
tic_frame_size_at_1fs set to 192:
I made these tests on: Machine 1:
Machine 2
Machine 3
All results have been OK in all machines, except if I make "stressing work" during the test (compiling heavy code or transcoding videos...). Repeating the tests in a idle state resulted always OK. If could be of any interest, I can share my .config ( or part of it ) of my kernels If you need more info or tests, please let me know. Cheers |
Hi Guido , |
- removed demo - updated documentation with info about the platform compatiblity test
As reported by @jtc-dolby we have an issue with the ALSA RAVENNA driver causing occasional drops or stream realignment in output RTP stream.
It's possible to detect the problem by playing on the RAVENNA device a low frequency e.g. 100Hz sine wave and checking the resulting recording on another device.
The issue can be reproduced with both memory mapped and read/write interleaved modes.
See enclosed pictures:
According to my tests the problem doesn't show up on the recording side and it seems to affect the playback only.
After a number of tries I could associate the issue to the following debug prints logged by the driver (see lines in bold):
[432559.316518] entering mr_alsa_audio_pcm_trigger (substream name=subdevice #0 #0) ...
[432559.316521] mr_alsa_audio_pcm_trigger(Start), rate=48000 format=32 channels=2 period_size=48
[432559.316523] starting playback I/O
[432559.323410] LastTICCounter = 2636220 ui64TICCounter = 2636227 (Timer period = 69 [100us])
[432559.323427] Timer period out of range: 0 [ms]. Target period = 1
[432559.323428] next_wakeup: 432561021200000 now: 432561021145163
[432560.597372] LastTICCounter = 2637506 ui64TICCounter = 2637501 (Timer period = 9 [100us])
[432660.226216] LastTICCounter = 2737128 ui64TICCounter = 2737130 (Timer period = 18 [100us])
[432660.227235] LastTICCounter = 2737131 ui64TICCounter = 2737131 (Timer period = 0 [100us])
[432719.026270] Timer period out of range: 0 [ms]. Target period = 1
[432719.026276] next_wakeup: 432720725500000 now: 432720725422743
The text was updated successfully, but these errors were encountered: