Be explicit on how browsers should handle "priming samples" #1091

Open
jakearchibald opened this Issue Nov 29, 2016 · 8 comments

Comments

@jakearchibald

https://jakearchibald.github.io/aac-decode-bug/

Chrome stable, Firefox and Edge display a large gap at the start of the decoded AAC. This is added by the encoder as a series of priming samples.

Question is, should these priming samples be present in the decoded audio? The Apple document above says:

a playback system must trim the silent priming samples to preserve correct synchronization

So I guess it depends on who the playback system is. Is it the browser when it decodes, or is it the web audio API user that calls start().

My gut feeling is that Safari is doing the right thing, it certainly feels the most developer-friendly. If it's decided that the gap should be present, we must have an easy way to access this metadata so we know it's there and can skip it.

@padenot

This comment has been minimized.

Show comment
Hide comment
@padenot

padenot Nov 29, 2016

Member

This will probably be handled as part of the grand Media Decoder spec we've planned for v.next.

Member

padenot commented Nov 29, 2016

This will probably be handled as part of the grand Media Decoder spec we've planned for v.next.

@rtoy

This comment has been minimized.

Show comment
Hide comment
@rtoy

rtoy Nov 29, 2016

Contributor

I think this is pretty complicated, at least after raising the issue with the ffmpeg developers a while back. (Chrome uses ffmpeg for decoding audio.) They said at the time that it was fairly difficult because the meta data was optional so there was no way to know for sure if the encoded stream actually included it. If the meta data included the information, then ffmpeg would remove the priming frames.

There are also remainder frames at the end, and the developers said that couldn't be reliably removed either.

Oh, this was for aac and mp3. For ogg, ffmpeg produces the expected output. I think Opus and FLAC decoder produces the expected output.

Contributor

rtoy commented Nov 29, 2016

I think this is pretty complicated, at least after raising the issue with the ffmpeg developers a while back. (Chrome uses ffmpeg for decoding audio.) They said at the time that it was fairly difficult because the meta data was optional so there was no way to know for sure if the encoded stream actually included it. If the meta data included the information, then ffmpeg would remove the priming frames.

There are also remainder frames at the end, and the developers said that couldn't be reliably removed either.

Oh, this was for aac and mp3. For ogg, ffmpeg produces the expected output. I think Opus and FLAC decoder produces the expected output.

@jakearchibald

This comment has been minimized.

Show comment
Hide comment
@jakearchibald

jakearchibald Dec 1, 2016

@padenot in the meantime, do you know whether browsers should remove these gaps as part of web audio decoding, or is that still up for debate? A decision here would be useful to point the various browsers at and try and get some consistency.

@padenot in the meantime, do you know whether browsers should remove these gaps as part of web audio decoding, or is that still up for debate? A decision here would be useful to point the various browsers at and try and get some consistency.

@padenot

This comment has been minimized.

Show comment
Hide comment
@padenot

padenot Dec 1, 2016

Member

@jakearchibald, we (some people at Mozilla) are currently investigating a number of formats, codec specifications, bad encoders present in the wild, that kind of thing.

The idea here is that we should be able to roundtrip to wav (wav -> compressed format -> wav) and have the same file (more or less, of course taking the fact that compression can be lossy).

It may be that it is not feasible, and that for some formats, UAs will have to use heuristics.

Member

padenot commented Dec 1, 2016

@jakearchibald, we (some people at Mozilla) are currently investigating a number of formats, codec specifications, bad encoders present in the wild, that kind of thing.

The idea here is that we should be able to roundtrip to wav (wav -> compressed format -> wav) and have the same file (more or less, of course taking the fact that compression can be lossy).

It may be that it is not feasible, and that for some formats, UAs will have to use heuristics.

@JohanRonstrom

This comment has been minimized.

Show comment
Hide comment
@JohanRonstrom

JohanRonstrom Dec 9, 2016

We are also having this issue, and are currently trying to find workarounds for our products, for mp3 (LAME tag) and m4a (ITUNSMB tag).

In files where the tags do show up as expected (as seen with ffprobe), getting the start value from webaudio would be a huge first improvement.

You would of course expect decodeAudioData to be able to load the audio as intended and specified by the encoder (trimmed).

JohanRonstrom commented Dec 9, 2016

We are also having this issue, and are currently trying to find workarounds for our products, for mp3 (LAME tag) and m4a (ITUNSMB tag).

In files where the tags do show up as expected (as seen with ffprobe), getting the start value from webaudio would be a huge first improvement.

You would of course expect decodeAudioData to be able to load the audio as intended and specified by the encoder (trimmed).

@notthetup

This comment has been minimized.

Show comment
Hide comment
@notthetup

notthetup Dec 12, 2016

Contributor

Semi-relevant blog post related to this about MAD (libmad). Not sure if any UA uses libmad, but it seems that it also had some issues with priming samples.

https://thebreakfastpost.com/2016/11/26/mp3-decoding-with-the-mad-library-weve-all-been-doing-it-wrong/

Contributor

notthetup commented Dec 12, 2016

Semi-relevant blog post related to this about MAD (libmad). Not sure if any UA uses libmad, but it seems that it also had some issues with priming samples.

https://thebreakfastpost.com/2016/11/26/mp3-decoding-with-the-mad-library-weve-all-been-doing-it-wrong/

@rtoy

This comment has been minimized.

Show comment
Hide comment
@rtoy

rtoy Dec 12, 2016

Contributor

Chrome's WebAudio implementation currently uses ffmpeg to handle all of the decoding. Adding heuristics in WebAudio is not the preferred approach since the implementation currently doesn't need to know anything about the contents of the file except for the estimated duration. The removal of these things are best handled by ffmpeg.

Contributor

rtoy commented Dec 12, 2016

Chrome's WebAudio implementation currently uses ffmpeg to handle all of the decoding. Adding heuristics in WebAudio is not the preferred approach since the implementation currently doesn't need to know anything about the contents of the file except for the estimated duration. The removal of these things are best handled by ffmpeg.

@hoch

This comment has been minimized.

Show comment
Hide comment
@hoch

hoch Dec 19, 2016

Member

I agree with @rtoy here. I had to use 'heuristics hack' to address the M4A container priming frames, but it backfired. (A simple OS update will ruin the fix.)

Even if we handle this issue at the level of unified decoder (i.e. FFMPEG) in Chrome, developers still have to fight the cross-browser inconsistency. It is a sad situation, but developers have to sniff the platform/browser or try to decode a short silent file to see the decoded result.

:(

Member

hoch commented Dec 19, 2016

I agree with @rtoy here. I had to use 'heuristics hack' to address the M4A container priming frames, but it backfired. (A simple OS update will ruin the fix.)

Even if we handle this issue at the level of unified decoder (i.e. FFMPEG) in Chrome, developers still have to fight the cross-browser inconsistency. It is a sad situation, but developers have to sniff the platform/browser or try to decode a short silent file to see the decoded result.

:(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment