Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TAG Issue: Layering considerations #257

Closed
chrislo opened this issue Oct 17, 2013 · 10 comments
Closed

TAG Issue: Layering considerations #257

chrislo opened this issue Oct 17, 2013 · 10 comments
Labels
w3c-tag-tracker Group bringing to attention of the TAG, or tracked by the TAG but not needing response.

Comments

@chrislo
Copy link
Member

chrislo commented Oct 17, 2013

The following point was raised by the W3C TAG as part of their review of the Web Audio API. In it a number of issues are raised, which we can split into separate issues if required. For now let's capture our response in this issue.

Layering Considerations

Web Audio is very low-level and this is a virtue. By describing a graph that operates in terms of samples of bytes, it enables developers to tightly control the behavior of processing and ensure low-latency delivery of results.

Today's Web Audio spec is an island: connected to its surroundings via loose ties, not integrated into the fabric of the platform as the natural basis and explanation of all audio processing -- despite being incredibly fit for that purpose.

Perhaps the most striking example of this comes from the presence in the platform of both Web Audio and the <audio> element. Given that the <audio> element is incredibly high-level, providing automation for loading, decoding, playback and UI to control these processes, it would appear that Web Audio lives at an all-together lower place in the conceptual stack. A natural consequence of this might be to re-interpret the <audio> element's playback functions in terms of Web Audio. Similar descriptions can happen of the UI in terms of Shadow DOM and the loading of audio data via XHR or the upcoming fetch() API. It's not necessary to re-interpret everything all at once, however.

Web Audio acknowledges that the <audio> element performs valuable audio loading work today by allowing the creation of SourceNode instances from them:

/***********************************
  * 4.11 The MediaElementAudioSourceNode Interface
  **/
var mediaElement = document.getElementById('mediaElementID');
var sourceNode = context.createMediaElementSource(mediaElement);
sourceNode.connect(filterNode);

Lots of questions arise, particularly if we think of media element audio playback as though it's low-level aspects were described in terms of Web Audio:

  • Can a media element be connected to multiple AudioContexts at the same time?
  • Does ctx.createMediaElementSource(n) disconnect the output from the default context?
  • If a second context calls ctx2.createMediaElementSource(n) on the same media element, is it disconnected from the first?
  • Assuming it's possible to connect a media element to two contexts, effectively "wiring up" the output from one bit of processing to the other, is it possible to wire up the output of one context to another?
  • Why are there both MedaiaStreamAudioSourceNode and MediaElementAudioSourceNode in the spec? What makes them different, particularly given that neither appear to have properties or methods and do nothing but inherit from AudioNode?

All of this seems to indicate some confusion in, at a minimum, the types used in the design. For instance, we could answer a few of the questions if we:

  • Eliminate MediaElementAudioSourceNode and instead re-cast media elements as possessing MediaStream audioStream attributes which can be connected to AudioContexts
  • Remove createMediaElementSource() in favor of createMediaStreamSource()
  • Add constructors for all of these generated types; this would force explanation of how things are connected.

That leaves a few open issues for which we don't currently have suggestions but believe the WG should address:

  • What AudioContext do media elements use by default?
  • Is that context available to script? Is there such a thing as a "default context"?
  • What does it mean to have multiple AudioContext instances for the same hardware device? Chris Wilson advises that they are simply sum'd, but how is that described?
  • By what mechanism is an AudioContext attached to hardware? If I have multiple contexts corresponding to independent bits of hardware...how does that even happen? AudioContext doesn't seem to support any parameters and there aren't any statics defined for "default" audio contexts corresponding to attached hardware (or methods for getting them).
@domenic
Copy link
Contributor

domenic commented Mar 30, 2014

@jernoble would you be able to use your https://github.com/jernoble/Sound repo to answer @slightlyoff's original set of questions above? Especially the last set?

@cwilso
Copy link
Contributor

cwilso commented Mar 31, 2014

Sort of. Jer's sound.js repo just calls decodeAudioData() for all files - so it's not going to stream, it will need a complete file before it will start. We'd need to implement codecs (or expand the decoding API in Web Audio) to make it complete.

@cwilso cwilso self-assigned this Apr 17, 2014
@cwilso
Copy link
Contributor

cwilso commented Apr 17, 2014

What we need to respond to this:

  • gap analysis with media elements
  • answers to specific questions above
  • model for "default audio context"

@joeberkovitz
Copy link
Contributor

Next step: write a response to the TAG issues, including this one.

@joeberkovitz
Copy link
Contributor

[...this is stuck in the same place as #250...] @cwilso has there been any discussion with TAG? We need to address this in order to get the spec into V1 shape. Please let the group know if you are still on this, or if the chairs should find another way to resolve.

@cwilso cwilso changed the title Layering considerations TAG Issue: Layering considerations Oct 26, 2015
@joeberkovitz
Copy link
Contributor

TPAC: this is a big chunk that we should make top of the list for post-V1 action, and take a fresh look with the TAG at a cross-WG discussion.

@kirbysayshi
Copy link

kirbysayshi commented Feb 10, 2017

One issue I've run into recently: it's impossible to accurately schedule playback (start/pause) of an HTMLMediaElement connected to an audioContext. This means there is no ability (as far as I know) to schedule accurate playback of either a large audio file (requiring streaming) or an EME-protected file.

Is accurate scheduled playback also addressed by this issue? Or a separate concern?

@padenot
Copy link
Member

padenot commented Feb 13, 2017

It's separate. This is about speccing the HTMLMediaElement in terms of the Web Audio API and other specs, so the Web Platform is, in a way, "layered", offering lower-level primitives that allow to reimplement higher-level APIs. Authors, depending on their needs, would target different sets of API (and ideally combine them).

For accurate scheduling of long media element, I don't think there is a perfect, built-in way to do this at the moment. Authors have had good experience of using stiched AudioBufferSourceNode with either decoding the media files in javascript, or using a codec that lets them easily split and stich using decodeAudioData (e.g., vorbis, opus), with minimal js code. It all depends on the use-case. For example, does "accurate" means "sample accurate", or is 10ms of scheduling jitter ok ? In general, phrasing a problem in terms of real-world use-case allows for a better framing of the issue. Maybe you could open a separate github issue where we could discuss what does not currently work?

Being able to pipe EME-protected content in a Web Audio API using a MediaElementAudioSourceNode, or use decodeAudioData on an EME-encrypted blob, is, for now, restricted for obvious reasons, but contacting browser vendors would be the right way forward here. For Mozilla, padenot@mozilla.com would work.

@kirbysayshi
Copy link

Ok, sorry for causing noise on this issue, and thank you for your thoughtful response! To answer your question: accurate means sample accurate (10ms of schedule jitter is perceptible in applications like beat-synced triggering of <audio> elements). Since HTMLMediaElement#play is the only interface for EME playback, the stitched-AudioBufferSourceNode solution unfortunately doesn't allow sample-accurate playback of EME audio (as far as I know, there is no way to get decoded buffers out of EME precisely).

I'll raise a separate issue with more details.

@svgeesus svgeesus added the TAG label Nov 6, 2017
@padenot
Copy link
Member

padenot commented Sep 17, 2019

It's been decided to do the decoding part of things in a different spec, because the Web Audio API really is about processing.

The Web Audio API can do everything that is needed on the playback side of the HTMLMediaElement.

@padenot padenot closed this as completed Sep 17, 2019
V2 Preparation (DO NOT USE) automation moved this from To do to Done Sep 17, 2019
@plehegar plehegar added the w3c-tag-tracker Group bringing to attention of the TAG, or tracked by the TAG but not needing response. label Apr 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
w3c-tag-tracker Group bringing to attention of the TAG, or tracked by the TAG but not needing response.
Projects
No open projects
Development

No branches or pull requests

9 participants