Skip to content
This repository has been archived by the owner on Oct 7, 2021. It is now read-only.

Request to Expose AudioBuffer to DedicatedWorker #111

Closed
chcunningham opened this issue Jan 5, 2021 · 29 comments
Closed

Request to Expose AudioBuffer to DedicatedWorker #111

chcunningham opened this issue Jan 5, 2021 · 29 comments
Projects

Comments

@chcunningham
Copy link

WebCodecs uses AudioBuffer as a container of unencoded audio.
https://wicg.github.io/web-codecs/#dom-audioframe-buffer

The AudioFrame, which wraps AudioBuffer, is provided as an input to AudioEncoder and as an output from AudioDecoder. We've exposed the encoder and decoder interfaces to DedicatedWorker to facilitate offloaded codec IO.

I think the change is as simple as changing the Exposed= parameter for just AudioBuffer. I'll send a PR to demonstrate.

@padenot
Copy link
Member

padenot commented Jan 6, 2021

We need to reason about the (somewhat) hidden cost of the interleaving/deinterleaving operations that we'll face, and the cost of the possible (probable) memory inflation that this might cause.

Generally, decoding, encoding and IO APIs are using interleaved audio buffer, so that temporal locality is better: samples that are close in time are close in memory (they are played out, recorded, encoded and decoded together). Processing APIs often like deinterleaved (also sometimes called planar by analogy with the graphics world) buffers, because processing a particular audio channel with a particular algorithm is more efficient/practical this way.

The Web Audio API always use planar audio, because its main goal is processing. It happens to have a decoding API for historical reasons, that is made to inter-operate with the rest of the API, and this contradicts the above.

Additionally, the Web Audio API always uses 32-bits floating point numbers to represent audio samples, when exposed to authors. We often see codecs output (and input) in 16-bits integers (or less, or more, but 16-bits is the most common). This is a 2-fold increased in memory footprint for no real good reasons, that led Firefox for Android to have a lazy inflation scheme, where audio samples stay in 16-bits integer (what the decoder provides) until they need to be exposed to script, this works well but is implicit. https://github.com/WebAudio/web-audio-api-v2/issues/11 aims at making this explicit and allow further optimizations.

It might well be that we're okay with this, and then implementation do those inflation/reordering operations in a lazy fashion. We could also do a spin-off of https://github.com/WebAudio/web-audio-api-v2/issues/11 and allow interleaved audio, alongside different sample-representation. This seems like a better way to do things, and doesn't seem particularly hard (no harder than what is implemented today in Gecko).

@hoch
Copy link
Member

hoch commented Jan 6, 2021

I think this is about exposing AudioBuffer to DedicatedWorker, not to AudioWorkletGlobalScope. Paul, is your concern mostly about the inflated memory footprint?

@padenot
Copy link
Member

padenot commented Jan 6, 2021

And CPU, memory traffic, etc., yes.

Basically, unnecessary inefficiencies at the API boundary because the type that exists today is not what a codec API needs.

Just exposing it is fine, it's not hard. Indeed, this has nothing to do with AudioWorkletGlobalScope.

@hoch
Copy link
Member

hoch commented Jan 6, 2021

I agree with that we need to look into the hidden implication, but exposing AudioBuffer itself seems harmless and it would be useful for WebCodec's use cases.

@padenot padenot transferred this issue from WebAudio/web-audio-api Jan 7, 2021
@padenot
Copy link
Member

padenot commented Jan 7, 2021

(Moving this to V2 because we're not changing V1 text at this stage).

@hoch
Copy link
Member

hoch commented Jan 7, 2021

For reference: crbug.com/1160580

@rtoy
Copy link
Member

rtoy commented Jan 7, 2021

I would like to know what the actual use case is and whether AudioBuffer is the right api, as @padenot mentions. Perhaps there's a better solution.

@chcunningham
Copy link
Author

chcunningham commented Jan 8, 2021

AudioBuffer is used by WebCodecs to describe raw audio. It is essentially the output of AudioDecoder and the input to AudioEncoder. In both cases, we wrap the buffer in an AudioFrame to add some metadata and the ability to immediately release the buffer without waiting for GC (more important for VideoFrame, but consistency is good).

We like AudioBuffer for several reasons

  • it describes a buffer of raw audio in much the same way as other codec apis (e.g. chrome's internal api, ffmpeg, etc...), including sampleRate, length, duration, channels
  • it offers mechanisms to operate on the raw float data
  • easy integration with WebAudio, including AudioWorklet which will be used for rendering by WebCodecs applications

Edit: Sorry, I saw @rtoy's comment in my email before seeing @padenot's earlier comments. I happily defer to @padenot and others expertise on the performance issues he outlines.

it describes a buffer of raw audio in much the same way as other codec apis (e.g. chrome's internal api, ffmpeg, etc...),

But I acknowledge Paul's point about other codec APIs not forcing planar float32.

@guest271314

This comment has been minimized.

@guest271314

This comment has been minimized.

@guest271314

This comment has been minimized.

@guest271314

This comment has been minimized.

@rtoy
Copy link
Member

rtoy commented Jan 8, 2021

@chcunningham If the current AudioBuffer works for you and the costs that @padenot mentions are acceptable, I have no objections to using the plain AudioBuffer.

But this also allows us to update AudioBuffer to some form as mentioned in #11 more amenable to the kind of processing you might want to do.

@padenot
Copy link
Member

padenot commented Jan 13, 2021

it describes a buffer of raw audio in much the same way as other codec apis (e.g. chrome's internal api, ffmpeg, etc...), including sampleRate, length, duration, channels

I just checked and Chromium's and ffmpeg's objects specifically supports what I'm talking about here (possibility of having interleaved samples, various bit depths and sample representation in their decoded audio data type). A comment on top of Chrome's declaration also mention that a conversion to an AudioBus (the analog object to Web Audio API'sAudioBuffer in Chromium's codebase) requires copying and a possible conversion (if the source was not f32 planar in the first place).

If we extend AudioBuffer to be able to hold interleaved data of different types (as it was requested by authors, but we didn't design a solution yet), then I have of course no objections. I can also take on some of that work (as someone working on both Web Audio and Web Codecs) if we end up finding that it's the right design.

@guest271314
Copy link

Why delete comments critical of decisions based on conjecture without evidence?

Is this

FWIW, from perspective here what we need are tests to substantiate with evidence any adverse impact (performance or otherwise unintended consequences) defining AudioBuffer in Worker, and other methods and API's in AudioWorkletGlobalScope as currently there is no evidence that restricting defining Web Audio API methods in Worker or methods available in Worker in AudioWorkletGlobalScope will have the adverse (performance)

not true and correct?

If so, where are the tests?

@rtoy
Copy link
Member

rtoy commented Jan 14, 2021

Comments were not deleted. I can still see them if I click them.

@hoch
Copy link
Member

hoch commented Jan 15, 2021

@padenot Thanks for the context! As a someone who was not involved in WebCodec's design, I am curious. Then why does AudioFrame include AudioBuffer? I believe AudioBuffer is specifically designed for Web Audio, so non-interleaved format only makes sense.

@chcunningham
Copy link
Author

chcunningham commented Jan 15, 2021

Then why does AudioFrame include AudioBuffer?

The full reasoning is in my comment above. I made the call pretty much independently without consideration of the tradeoffs we're now discussing. Still, I think those points in favor of using AudioBuffer are compelling and it sounds like we have a solid path forward to resolve the issues raised here. @hoch do you agree?

I believe AudioBuffer is specifically designed for Web Audio, so non-interleaved format only makes sense.

If WebCodecs were to create its own AudioBuffer, it would be pretty much identical to WebAudio's except, perhaps, offering more formats (and clearly that is not a v1 requirement, even for WebCodecs). If we can instead add those formats to AudioBuffer I think it simplifies how raw audio is described on the web platform and makes for clean integration with WebAudio.

@rtoy rtoy added this to In discussion in V2 Jan 21, 2021
@hoch
Copy link
Member

hoch commented Feb 4, 2021

I discussed the GC pressure issue with @chcunningham and we believe Chromium's GC can handle the pressure from the rapid AudioBuffer object creation from the WebCodecs API. @padenot might be able to speak for FireFox's view on this.

Here I see two paths:

  1. Expose BAC to Workers: By exposing BAC, we have to open other related interfaces to Workers anyway. This is a longer-term project.
  2. Expose AudioBuffer to Workers first, and then BAC later: We can support WebCodecs' use case by opening AudioBuffer in Workers. The change seems straightforward and it will be a great pilot project for the BAC in the future.

The option 2 seems non-controversial. I am curious what other people think.

offering more formats (and clearly that is not a v1 requirement, even for WebCodecs)

This deserves a separate issue in this tracker. Definitely not a V1 or short-term project.

@chcunningham
Copy link
Author

I agree option 2 seems not controversial. Particularly the firs step of exposing AudioBuffer to workers. I'm happy to see the other parts of option 2 pursued as well, but for now I'd like to just unblock on changing the exposure.

@padenot @rtoy @hoch are we in agreement?

@hoch
Copy link
Member

hoch commented Feb 19, 2021

Yes. I agree.

@hoch
Copy link
Member

hoch commented Feb 25, 2021

@padenot I understand we generally agree on this. Right?

@guest271314
Copy link

Given user-defined Float32Array set to AudioBuffer how to generate a timestamp (https://bugs.chromium.org/p/chromium/issues/detail?id=1185755) for AudioFrame?

@guest271314
Copy link

I have not found WebCodecs 'opus' AudioEncoder and AudioDecoder to be reliable, neither real-time, nor storing the value and streaming only when the entire input has been decoded.

Nonetheless, given AudioFrame is used by Media Capture Insertable Streams (Breakout Box; Media Transform API), we still face the issue of generating timestamp programmatically.

One way to generate timestamps is to create a MediaStreamAudioDestination connected to an OscillatorNode (Chrome implementation of MediaStreamAudioDestinationNode is not in compliance with the specification; is not "potentially playing" although readyState is "live"; requires an AudioNode as source to commence playback at HTMLMediaElement, else we would not need to create an OscillatorNode) with frquency set to 0 and use the timestamp from the AudioFrame from value at ReadableStreamDefaultReader,read() to supply user-defined buffer and timestamp at new AudioFrame passed to WritableStreamDefaultWriter.write(), something like

       const { stream } = msd;
       const [track] = stream.getAudioTracks();
       const osc = new OscillatorNode(ac, { frequency: 0 });
       const processor = new MediaStreamTrackProcessor(track);
       const generator = new MediaStreamTrackGenerator('audio');
       const { writable } = generator;
       const { readable: audioReadable } = processor;
       const audioWriter = writable.getWriter();
       const mediaStream = new MediaStream([generator]);
       const audioReader = audioReadable.getReader();
       audio.srcObject = mediaStream;
       osc.connect(msd);
       osc.start();
       let duration = 0;
        audioReader.read().then(async function process({ value, done }) {
          // avoid glitches, gaps, clipping start of MediaStreamTrackGenerator output (~0.5 second)
          if (audio.currentTime < value.buffer.duration * 50) {
            return audioWriter
              .write(value)
              .then(() => audioReader.read().then(process));
          }
          if (writeOffset && writeOffset >= readOffset) {
            // avoid clipping end of MediaStreamTrackGenerator output (~1.0 seconds; await audio playback completion)
            if (audio.currentTime < duration + value.buffer.duration * 100) {     
              // noop           
              return audioReader.read().then(process);
            } else {
              msd.disconnect();
              osc.disconnect();
              track.stop();
              audioReader.releaseLock();
              audioReadable.cancel();
              audioWriter.releaseLock();
              writable.abort();
              await writable.closed;
              await ac.close();
              console.log(`readOffset: ${readOffset}, writeOffset: ${writeOffset}, duration: ${duration}, audio.currentTime: ${audio.currentTime}, ac.currentTime: ${ac.currentTime}`);                
              return await Promise.all([
                new Promise((resolve) => (stream.oninactive = resolve)),
                new Promise((resolve) => (ac.onstatechange = resolve)),
              ]);
            }
          }
          // get timestamp from AudioFrame
          const { timestamp } = value;
          // ...
          const floats = new Float32Array(220);
          // set user-defined values at floats
          const buffer = new AudioBuffer({
            numberOfChannels: 1,
            length: floats.length,
            sampleRate: 22050,
          });
          buffer.copyToChannel(floats, 0, 0);
          duration += buffer.duration;
          const frame = new AudioFrame({ timestamp, buffer });
          return audioWriter.write(frame).then(() => {
            frame.close();
            return audioReader.read().then(process);
          });
        })

@guest271314
Copy link

Exposing AudoBuffer is not an issue. We can effectively create a user-defined audio buffer using Float32Array.

In both cases, we wrap the buffer in an AudioFrame to add some metadata and the ability to immediately release the buffer without waiting for GC

The issue is the wrapping in an AudioFrame which has the timestamp property that has serious problems in production (I am not sure how AudioEncoder and AudioDecoder have been deployed, because they do not work, are broken https://plnkr.co/plunk/yY4CZbPNnZMT7Kwd; perhaps save for a test with a single frame).

An exposed AudioBuffer in a Worker will still be less than useful when the timestamp attribute of the wrapping AudioFrame can be backwards in time with respect to the preceding AudioFrame(s)

AudioFrame {timestamp: 54858, buffer: AudioBuffer}
AudioFrame {timestamp: 54293, buffer: AudioBuffer}

AudioFrame {timestamp: 126129, buffer: AudioBuffer}
AudioFrame {timestamp: 125602, buffer: AudioBuffer}

1194760315999
1194760315470

1194760347008
1194760346476

Fixing that should be the priority.

@guest271314
Copy link

WICG banned me. Otherwise I would strongly suggest filing a PR in WebCodecs specification that includes the language that timestamp MUST be greater (which includes non-duplication) than the previous timestamp https://bugs.chromium.org/p/chromium/issues/detail?id=1181547.

@guest271314
Copy link

WebCodecs uses AudioBuffer as a container of unencoded audio.
https://wicg.github.io/web-codecs/#dom-audioframe-buffer

I am curious why AudioFrame is necessary at all?

What is the reasoning for using a timestamp attribute?

AudioWorklet works without a timestamp.

AudioFrame (and the "microsecond" timestamp that is used internally for Chrome implementation, though not exposed as constructable to the Web) appears to be an unnecessary abstraction.

chcunningham added a commit to chcunningham/web-audio-api-v2 that referenced this issue Mar 26, 2021
@chcunningham
Copy link
Author

Looks like the build is happy now!

I've updated the opening comment w/ links to chromium bug and WPT tests. Please let me know if anything else is needed.

@padenot
Copy link
Member

padenot commented May 21, 2021

@chcunningham this will happen automatically when we expose the Web Audio API in Web Workers, and Web Codecs doesn't need it anymore, so I'm closing this, thanks!

@padenot padenot closed this as completed May 21, 2021
V2 automation moved this from In discussion to Done May 21, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
V2
  
Done
Development

Successfully merging a pull request may close this issue.

5 participants