Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigating clicks in DASH playback #65

Open
kristianhentschelbbc opened this issue May 30, 2023 · 0 comments
Open

Investigating clicks in DASH playback #65

kristianhentschelbbc opened this issue May 30, 2023 · 0 comments

Comments

@kristianhentschelbbc
Copy link
Collaborator

As reported in a thread on the MakerBox forum, there can be playback artefacts (clicks at mostly regular intervals) especially with long continuous drone sounds.

Testing on macOS with the latest build of AO (0.22) I can reliably reproduce the problem with a 48KHz test signal of a 100-300Hz sine sweep over 30 seconds in Safari and Chrome, where I get clicks approximately every 4 seconds and sometimes in between. In Firefox, I get far fewer problems but it does tend to happen at least once around the 8 second mark.

I can think of two potential causes

  1. Clicks being introduced by ffmpeg in splitting and encoding the audio
  2. Clicks being introduced in the browser where the split segments are recombined

The first issue, ffmpeg splitting/encoding:

We package all audio twice (once for Safari/iOS devices, and once for every other browser). As far as I remember, we've usually seen better quality with the generic version. For the generic version we use ffmpeg's built-in DASH packaging which should correctly encode then split. For the Safari version we use ffmpeg's "segment" output format where I'm not sure in which order the operations are performed.

Below is the encoding command we use (the example being a continuous sine tone for 30 seconds). The only thing on top of ffmpeg is that Audio Orchestrator first splits each track around long silent gaps and throws the gaps away (by running this with different -ss and -t parameters for each non-silent item). It will also replace the manifest generated by ffmpeg with a simpler one that our playback library understands.

ffmpeg -ss 0 -t 30 -i 01_Tone.wav -t 4.096 -f lavfi -i anullsrc=channel_layout=mono:sample_rate=48000 -filter_complex [0:a][1:a]concat=n=2:v=0:a=1,asplit=2[outa][outb] -map [outa] -c:a aac -b:a 128k -use_template 1 -use_timeline 0 -seg_duration 4.096 -f dash encoded-items-PpKVJs/01_Tone_000000/manifest.mpd -map [outb] -c:a aac -b:a 128k -frame_size 1024 -f segment -segment_time 4.096 encoded-items-PpKVJs/01_Tone_000000/safari_%05d.m4a

I can concatenate the resulting init segment with the m4s chunks and get back a seamless track, while I can't easily do the same with the safari .m4a segments - I suspect because the m4a headers for timing information are not correct. This might need further investigation, but the fact that the clicks also appear in non-Safari/iOS browsers makes me think there's something else at play.

The second issue, playback in the browser:

Audio Orchestrator is using the WebAudio API to emulate DASH playback, because it is based on an old internal audio toolkit library (bbcat-js) that was written before we had widespread support for Media Source Extensions as a more reliable way of playing back DASH streams. The source code for the DASH source nodes is here: https://github.com/bbc/audio-orchestration/tree/main/packages/bbcat-js/src/dash/dash-source-node

I'm unfortunately not very familiar with this code, but I think it generally works by combining the binary data for adjacent segments and decoding them, then scheduling playback of those buffers on the WebAudio timeline where maybe browser-dependent inaccuracies could be introduced. A few years ago there were some security changes in browsers to limit access to very accurate timers that could be exploited for finger-printing and side-channel timing attacks, and I wonder if this might have had an impact here as well.

As an alternative to splitting into short segments for DASH-like playback, we could download the entire audio for each item upfront. Audio Orchestrator will do this for short items (under 10 seconds of audio separated from other items by at least 1 second of silence). Tuning these parameters to take that approach for everything might bypass the splitting and re-assembly issues. However, it might cause other problems, such as a longer download / decode delay before the item can play, which would lead to the beginning of an item being missed if not scheduled long enough in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant