Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing AudioBuffers in native sample bit depth (was: Add support for 16-bit sample type?) #2396

Open
juj opened this issue Jan 15, 2014 · 31 comments
Projects

Comments

@juj
Copy link

juj commented Jan 15, 2014

Using an integer 16-bit sample type instead of float32 would allow saving half of the memory on audio data when it's resident in RAM.

Consider adding support for users to utilize audio data in such formats.

Discussion thread about this is at http://lists.w3.org/Archives/Public/public-audio/2013OctDec/0294.html

@jussi-kalliokoski
Copy link
Member

My opinion in this matter is pretty much in sync with KG...

The thing here is that we have a lot of different scenarios that have different needs. While for a game it might be OK to let the UA decide the quality / performance tradeoff, but for a DAW degrading the quality is a dealbreaker. I don't want us to encourage the UA to do this sort of stuff. The difference betweeen letting the developer decide and letting the UA decide is that especially in the case of mobile the developer can update the choice-making logic much faster, but the UA might be maintained for a longer period. The developer knows what the application does, and will do, whereas the UA will have to resort to heuristics that can easily go wrong (as I pointed out, something that's an optimization for one application is a bug for another). Hiding performance as implementation details is a terrible idea in my experience.

Another thing to consider is that both use cases I mentioned here (games and DAWs) traditionally don't actually do any heuristics, but let the users choose the performance options. This makes sense because ultimately the user is actually in the best position to make the decision because they can see the impact of it. If we let the UA decide this sort of thing, those applications lose the ability to let the user decide.

I think all in all, letting the UA decide is more or less a useless feature that would cause a lot of implementation complexity and people trying to work around it anyway.

As for the subject of compressed assets, with using <audio> there's the unresolved problem of time-syncing it with the rest of the API. One option would be to allow to create an AudioBuffer out of an <audio> element. This would of course throw if the asset hasn't finished loading yet.

@opera-mage
Copy link
Member

I still think it is a good idea to let the UA decide what representation to use internally, but it should not degrade the quality of samples (unless there is some hint in the API that lets a developer tell the UA what quality level it can accept).

Here's a fairly non-intrusive way to support integer formats that I think would solve most problems (comments welcome):

  • decodeAudioData is extended to accept raw integer formats directly (possibly only int16 for now, with the option to add more formats in the future).
  • Over all, the API remains float32 (e.g. AudioBuffer.getChannelData continues to return float32, regardless of the internal format of the AudioBuffer).
  • The UA MAY choose to store data internally in AudioBuffers in any format that reproduces the original quality of the sample (if given int16 in decodeAudioData, there's no need to use float32 internally, and it would be prohibited to use int8, for instance).
  • TBD: Is it OK for a UA to use int16 internally for decoded OGG/MP3/whatever lossy formats, or do we need to add a hint to decodeAudioData in order to enable such memory optimizations?

Furthermore, I suggest that:

  • The resampling step in decodeAudioData is made optional (TBD: hint in API or UA decision).

In general, I don't see how we could both have integer formats internally AND resample the data. To me it seems that we need to go to float32 when resampling, right?

@jussi-kalliokoski
Copy link
Member

I still think it is a good idea to let the UA decide what representation to use internally

What's the value in that? Because the cost is really high if you expect the UAs to actually deliver heuristics that help in most cases and don't at least hurt in others.

At most, the UA deciding to me would be better as a NTH feature that is either opt-in or opt-out, because in the end it's going to be the user who has the access to the most relevant information, so I'm against doing anything that prevents the application from providing the user with the choice in this. Don't get me wrong, I'm all for reasonable defaults, but in matters where the performance impact is this high, the defaults should be possible to overwrite.

@jussi-kalliokoski
Copy link
Member

I think that actually the hinting should work the other way around; the UA could give hints to the application on what's the best thing to do, then the application can use that as a default unless the user overrides it, or if there is a known limitation with a given device, etc.

@joeberkovitz
Copy link
Contributor

At the moment implementations can choose (without normative spec changes) to store decoded buffers in some compressed format that is expanded to floats when read or played back. That might or might not perform well relative to other implementations that don't do this. The playback quality relative to other decoding treatments might vary. It's not for the spec to say.

That said, we plan to revisit in-memory compression more thoroughly in the next version of the spec.

@juj
Copy link
Author

juj commented Oct 27, 2015

I wonder if there has been any thoughts or communication about this recently? I'm currently getting bitten by this again when porting a game from Android with Emscripten to run in web browser in a mobile phone, and facing considerable memory pressure trying to get it to run. Being able to store audio as original 16-bit instead of expanding to 32-bit would save around 50MB of RAM at runtime for the application, which would be a huge saving when trying to run on phones with 256MB/512MB of RAM.

@padenot
Copy link
Member

padenot commented Oct 28, 2015

We have not been talking about this recently, but we understand it is still an issue.

Joe, did you mean to push that back to v.next on June 3rd ?

@joeberkovitz
Copy link
Contributor

I did mean to push it back, because I thought the sense of the group was that implementations could store audio any way they want internally inside an AudioBuffer even if the externally visible data is represented as floats.

That doesn't mean I'm dismissing it as an issue, I understand it's a big deal, but we hadn't agreed on a straightforward solution in the spec and there seems to be some room for implementations to make this better without changing the spec.

@joeberkovitz
Copy link
Contributor

By the way, I wonder if the AudioContextOptions being floated in #348 would be useful to request a lower AudioBuffer bit depth... if this were to be an opt-in.

@juj
Copy link
Author

juj commented Nov 9, 2015

If an option lived in AudioContextOptions, that would force all AudioBuffers to have the same bitness? It sounds odd to require an application to have all its buffers in the same bitness, shouldn't it be a property of the buffer rather than the context?

@juj
Copy link
Author

juj commented Mar 6, 2016

Has there been any recent advances on this, or thoughts on if/when support for 16bit audio buffers might be realistically introduced? We've been working with a major game company partner on a HTML5 title to be deployed on Facebook, and out of memory crashes contribute more than 30% of the initial conducted QA tests. Profiling shows that having support for 16-bit audio would allow optimizing the game to use 10-20% less memory, which would definitely help with the OOM crashes. Games often utilize a lot of different sound effects, and they are preloaded up front since they need to be played back in real time as a response to a game logic event, so they typically have large banks of audio stored in memory. The native version of the game utilizes 16-bit audio buffers, so needing to expand them to 32-bit on the web causes a big discrepancy in native app vs HTML5 app memory footprints.

@miskoltrans
Copy link

Hi, I have worked on porting a mobile game to WebGL, which you can check out at www.topeleven.com. In our case using 16-bit audio would decrease memory usage about 10%. So this is an optimization worth considering.

@juj
Copy link
Author

juj commented Mar 24, 2017

This is still showing up in most Unreal Engine 4 and Unity3D ported titles on the web, and being able to use 16-bit integer formats for audio effects would be a big size saving for these demos. I wonder what the latest thinking is on this? This bug was added a "V1" label earlier, but that was then removed by @mdjp . What does that mean, and what does its removal mean? Has there been any thought for adding this feature in the future? Thanks!

@rtoy
Copy link
Member

rtoy commented Mar 24, 2017

Based on skimming over the issue and the labels set here, this will not be in v1, but in the next version.

I think there are a couple issues that need to be worked out. First, what does decodeAudioData do? And how do we specify what it should do? Second, how does AudioBuffer indicate the format and how does the user specify it?

I think specifying a new AudioBuffer feature would be relatively simple with the new AudioBufferOptions. Some work needed to specify how it behaves. Presumably, it gets converted to float internally when used.

decodeAudioData is a bit of a mess and it's really hard to add something here while keeping everything backward-compatible.

@juj
Copy link
Author

juj commented Sep 1, 2017

I have created a test suite of different audio files and effects that currently are problematic. You can visit https://github.com/juj/audio_test_suite to find it, or http://clb.demon.fi/audio_test_suite/ to check it out live.

First, what does decodeAudioData do?

While creating the above set of tests, I notice decodeAudioData currently is hardcoded to decode to the sampling rate of the AudioContext, at least on Firefox. I wish that was not the case, but decodeAudioData would not perform resampling conversion on the input. Similarly, it would be best if decodeAudioData did not do any sample type conversion on the input either. Though overall, I'd vote for removing decodeAudioData altogether, #1305, and replacing it with better APIs that make users memory aware, because the usage of that function in the wild is heartbreakingly lax at the moment. This probably jives well with you mentioning decodeAudioData being a mess.

Overall, I'd like to see the manipulation of compressed and uncompressed audio be much more symmetric in the API, so that all features are available on both formats. At that point, different uncompressed formats would probably also become easier to express. Though I only know of a small subsection of Web Audio API overall, so not sure how easy or hard that would be to achieve.

@magcius
Copy link

magcius commented Sep 14, 2017

I would imagine that playback is the 90% case, with an extremely shallow effect graph. For something like ConvolutionEffectNode, mandating float inputs/outputs is fine. For AudioSourceBufferNode (and perhaps ScriptProcessorNode), I'd really like to see 16-bit-depth support here.

@cwilso
Copy link
Contributor

cwilso commented Sep 26, 2018

It would also be nice to have better control over the samplerate - the fact that decodeaudiodata always downsamples to the output rate is unfortunate, as it's lossy.

@karlt
Copy link
Contributor

karlt commented Sep 26, 2018

Although perhaps not the API you'd choose, control over the sample rate is available through OfflineAudioContext.

@cwilso
Copy link
Contributor

cwilso commented Sep 26, 2018

Hmm, interesting, since AudioBuffers can (IIRC) be shared across contexts...

@juj
Copy link
Author

juj commented Jul 17, 2019

@hoch had opened a discussion about AudioDeviceClient API, which led to a conversation about efficient compressed audio sample playback. That prompted an illustration/sketch of an API to play back compressed audio clips, something like follows:

var audioFeatures = AudioDevice.enumerateAudioSupport(); // Returns a list of e.g. {sampleRate: 44100, channels: 'stereo' }, {sampleRate: 48000, channels: '5.1' }
var device = new AudioDevice({sampleRate: 44100, channels: 'stereo' });

// Compressed audio playback:
var mediaSource = new MediaSource(myTypedArray, /*offset*/43242, /*length*/5325, 'audio/ogg'); // weak reference to typed array bits, no deep copy of byte data
// or mediaSource = new MediaSource(fetch('foo.ogg'));
mediaSource.downloadHint = 'download on first play'/'download up front'/'decode up front';
mediaSource.onloaded / .readystate etc. to provide information 

var mediaInstance = new MediaInstance(mediaSource);
mediaInstance.start = 0;
mediaInstance.loopStart = 2342;
mediaInstance.loopEnd = 53114;
mediaInstance.loopTimes = 3; // default=infinity
mediaInstance.end = 350000;
mediaInstance.pitch/.volume/.worldPosition = ...;
mediaInstance.onloopended/.onended = function() {};

var playbackGroup = device.createAudioPlaybackGroup();
var playbackInstance1 = playbackGroup.play(mediaInstance, timeFromNow=0);
var playbackInstance2 = playbackGroup.play(mediaInstance, timeFromNow=2);
var playbackInstance3 = playbackGroup.play(mediaInstance, timeFromNow=4);

playbackInstance2.pitch = ...; // animate the playback pitch

playbackGroup.volume/.pitch = ...; // animate clips in a group
playbackGroup.stop(); // stops all audio files playing in this group

// soft real time push mode synthesis:
var playbackGroup = device.createAudioPlaybackGroup();

var mediaInstance1 = new MediaInstance(myTypedArray, /*offset=*/2000, /*length=*/400000);
myTypedArray[2000 through 402000] = /*synthesized audio frames*/;
playbackGroup.appendQueue(mediaInstance1);

var mediaInstance2 = new MediaInstance(myTypedArray, /*offset=*/402000, /*length=*/400000);
myTypedArray[402000 through 802000] = /*more synthesized audio frames*/;
playbackGroup.appendQueue(mediaInstance2); // queue up to be played back after above buffer

@hoch asked to drop it to the issue tracker for reference. Not sure how to tie in to Web Audio, but hopefully it gives ideas of the use cases.

@hoch
Copy link
Member

hoch commented Jul 17, 2019

Thanks @juj!

In the last F2F, WG/CG briefly chatted about WebCodec and the Stream/WebAudio integration. This proposal might be relevant to both directions.

@padenot
Copy link
Member

padenot commented Jun 10, 2020

Virtual F2F:

  • It's possible to change the sample-rate but not the bit depth, it seems something useful. Having another property when constructing an AudioBuffer would work, but interaction with the rest of the API needs to be defined
  • WebCodecs goes a long way to help, but does not help if the decoded audio assets need to be present in memory at all time, for example in a sampler or any other advanced music app that need to be able to seek arbitrarily in real-time. As a data point, just halving the memory by inflating to float32 on mobile on Firefox make a nice difference, since there are so many assets.

@rtoy
Copy link
Member

rtoy commented Oct 1, 2020

Teleconf: This is useful. @padenot mentioned that Firefox already does this internally and transparently. The question is if it should be exposed to the developer and what the API should look like. Proposals welcome.

@padenot
Copy link
Member

padenot commented Oct 20, 2020

TPAC 2020:

  • There was nobody from the game industry in the call, so we couldn't make much concrete progress
  • https://www.bitsnbites.eu/hiqh-quality-dpcm-attempts/ / https://github.com/mbitsnbites/libsac was discussed, as a way for authors to opt-in to in-memory compressed audio assets, to be able to have a rather high quantity of audio assets in memory while reducing the footprint by about 8x (f32 -> 4bits). This compression scheme is designed (amongst other things) for random access and real-time safety. Additionally, having a flag to allow a file to be stored by the engine in 16-bits could have its use, to have reduced footprint but lossless in-memory audio samples
  • Web Codecs is happening. The audio decoding part is available in Chrome pre-release, behind a flag. This solves the problem of playing long audio streams without having the whole file resident in memory, but still having rather precise (=custom) playback

The two last point are complementary and don't serve the same use-case, I believe both would have their use.

cc @juj, @pmlt

@juj
Copy link
Author

juj commented Oct 30, 2020

Hey, thanks for the ping! I was not aware of TPAC, and missed out on that - but would love to join in a call if that would help the progress.

My take on raw 8-bit/16-bit vs 4-bit DPCM is that neither can obviate a need for the other. Both types of formats are used in native game projects, so I would vote to see support for both in Web Audio. (preference towards raw if only one had to be chosen)

@gnarhard
Copy link

TPAC 2020:

  • There was nobody from the game industry in the call, so we couldn't make much concrete progress
  • https://www.bitsnbites.eu/hiqh-quality-dpcm-attempts/ / https://github.com/mbitsnbites/libsac was discussed, as a way for authors to opt-in to in-memory compressed audio assets, to be able to have a rather high quantity of audio assets in memory while reducing the footprint by about 8x (f32 -> 4bits). This compression scheme is designed (amongst other things) for random access and real-time safety. Additionally, having a flag to allow a file to be stored by the engine in 16-bits could have its use, to have reduced footprint but lossless in-memory audio samples
  • Web Codecs is happening. The audio decoding part is available in Chrome pre-release, behind a flag. This solves the problem of playing long audio streams without having the whole file resident in memory, but still having rather precise (=custom) playback

The two last point are complementary and don't serve the same use-case, I believe both would have their use.

cc @juj, @pmlt

To reference your second bullet point: choosing the overall bit depth of the audio context would significantly help my application's memory footprint. Being forced to use 32 bit floating point is maxing out my memory when I have 16+ long-form audio files loaded in.

@rtoy
Copy link
Member

rtoy commented May 20, 2021

Virtual F2F 2021: Increase this to priority-1. We will support additional depths for linear PCM (i.e. not 4-bit DPCM). Lots of details need to be worked out, but probably decodeAudioData will return an AudioBuffer with int16 for mp3/aac files. We want a way to be able to say new AudioBuffer(<options>) to allow specifying the bit depth of the buffer and be able to get the bits out.

@bradisbell
Copy link

...but probably decodeAudioData will return an AudioBuffer with int16 for mp3/aac files...

@rtoy It's rare, but folks may want to decode these to 24-bit.

We want a way to be able to say new AudioBuffer() to allow specifying the bit depth of the buffer and be able to get the bits out.

Are you saying that the web app can specify the desired target bit depth, in cases where the original isn't known? If so, that sounds great. (For example, an MP3 encoder may take 24-bit PCM samples, and the decoder may be able to output 24-bit PCM samples, but as far as I understand it there is no inherent bit depth while in MP3-land. The web application could request 24-bit PCM if it wanted.)

@rtoy
Copy link
Member

rtoy commented May 21, 2021

Sorry. I really meant that for an encoded file, decodeAudioData can return a buffer of whatever the appropriate bit depth is if there is one. So a 24-bit wav file gets a 24-bit buffer. Well, I guess there isn't really a 24-bit array type, so it would probably be a 32-bit array type.

Are you saying that the web app can specify the desired target bit depth, in cases where the original isn't known? If so, that sounds great. (For example, an MP3 encoder may take 24-bit PCM samples, and the decoder may be able to output 24-bit PCM samples, but as far as I understand it there is no inherent bit depth while in MP3-land. The web application could request 24-bit PCM if it wanted.)

Ah, I'm not sure about that. I think we want to minimize the changes to decodeAudioData since WebCodecs can probably do everything better. So, I'm not sure about what you can specify for decodeAudioData. But certainly as a user, I want to be able to create an AudioBuffer manually with a specified bit depth. If nothing else, this is useful for testing that AudioBuffers behave correctly.

@padenot
Copy link
Member

padenot commented Jul 15, 2021

AudioWG call:

  • Web Codecs can copyTo audio samples to memory, soon with conversion
  • If we add a ctor to AudioBuffer that takes a format and a buffer, this can be quite flexible. The AudioBuffer is then used as usual, and inflated to float32 as needed

@mdjp
Copy link
Member

mdjp commented Jul 22, 2021

Next step - straw man and draft spec text required.

@mdjp mdjp transferred this issue from WebAudio/web-audio-api-v2 Sep 23, 2021
@mdjp mdjp added the P1 WG charter deliverables; "need to have" label Sep 23, 2021
@mdjp mdjp added this to To do in v.next via automation Sep 23, 2021
@mdjp mdjp moved this from Untriaged to In discussion in v.next Sep 29, 2021
@hoch hoch removed the P1 WG charter deliverables; "need to have" label Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
v.next
In discussion
Development

No branches or pull requests