Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access to a different output device: AudioContext.setSinkId() #2400

Closed
cwilso opened this issue Nov 21, 2014 · 130 comments · Fixed by #2498
Closed

Access to a different output device: AudioContext.setSinkId() #2400

cwilso opened this issue Nov 21, 2014 · 130 comments · Fixed by #2498
Assignees
Labels
P1 WG charter deliverables; "need to have" status: ready for editing
Projects

Comments

@cwilso
Copy link
Contributor

cwilso commented Nov 21, 2014

Should be able to specify different audio devices, using media device selectors.

@cwilso cwilso self-assigned this Dec 3, 2014
@hoch
Copy link
Member

hoch commented Feb 6, 2015

http://w3c.github.io/mediacapture-output/#h-webaudio-extensions

Just to keep the reference here. The spec enumerates several options for implementation with pros/cons.

@bill-hofmann
Copy link
Contributor

Missing is any information about characteristics of output devices (channels, etc., etc.) - this is a getUserMedia issue, but essential to solve

@joeberkovitz
Copy link
Contributor

This is currently in Media Capture Task Force's court; we are monitoring their progress.

@cwilso
Copy link
Contributor Author

cwilso commented May 12, 2015

I don't think MCTF is just gonna fix our problem, though - and we need to provide a constructor that takes a different device.

@joeberkovitz
Copy link
Contributor

@cwilso If MCTF provides a way to obtain a MediaStream for some desired device (which is what we asked for, and their updated API seems well on the way to giving it to us) then doesn't AudioContext::createMediaStreamDestination() provide a way to use that device as a destination?

@cwilso
Copy link
Contributor Author

cwilso commented May 12, 2015

Sure. But that would absolutely be a HORRIBLE way to connect - because you want the AudioContext to run at the rate and clock of the device, not at some arbitrary other clock and have to be coerced to that device. MediaStream device in/out implies a potential for clock conversion.

@joeberkovitz
Copy link
Contributor

@cwilso Thanks for explaining - I had not been aware of that very significant point (I wonder if others were?).

Perhaps something as simple as a class method on AudioContext would fill the bill, e.g. ctx = AudioContext.createMediaStreamContext(stream). This would not lock us into an optional constructor arg that we'd have to live with forever. I'm not sure whether this sort of static pseudo-constructor is cool in Web APIs.

Do you have a specific proposal in mind for us to discuss on Thursday?

@cwilso
Copy link
Contributor Author

cwilso commented May 12, 2015

Yes, it's what I gave Justin to put in the MCTF spec:

3.1.1 Constructor argument
Option 1: AudioContext constructor argument
The sink ID is passed as an argument to the AudioContext constructor, e.g.

new AudioContext({ sinkId: requestedSinkId });

By requiring the sink ID to be set at construction time, this simplifies the implementation, since the output sample rate is fixed.

@joeberkovitz
Copy link
Contributor

I see now, it's in the document that @hoch referenced above. Sorry for the thrash here. Yup, that all looks great.

@joeberkovitz
Copy link
Contributor

So... apart from MCTF needing to approve, is there a reason this is not just "Ready for Editing"?

@joeberkovitz
Copy link
Contributor

@cwilso So that I can make sure MCTF is focusing on the right stuff: exactly what elements of the Audio Output API other than the Web Audio extensions must exist for V1, from your point of view? Since a sinkId is just a deviceID from enumerateDevices() (which itself is not part of the Audio Output API), do we really need to make the whole Audio Output API proposal a dependency for V1 Web Audio?

@cwilso
Copy link
Contributor Author

cwilso commented May 12, 2015

I think we (Web Audio) are responsible for making sure AO API and WA API work together to define bedrock (e.g. audio device access, audio bits access a la issue #359) vs. layers on top of that bedrock (e.g. biquadfilter). If we don't work through and make sense of these architectural layers now, it will never make sense.

@joeberkovitz
Copy link
Contributor

Just to confirm: you are saying we shouldn't implement the constructor until the whole AO API is accepted by MCTF?

@cwilso
Copy link
Contributor Author

cwilso commented May 12, 2015

No, that's not quite what I'm saying. I'm saying we shouldn't ship until the model for how access to devices - bedrock - works. We should be able to work through and prove how is built on top of Web Audio, which is built on top of direct device access alongside getUserMedia (which provides device enumeration). The Audio Output API is actually a semantic layer on top of an implementation - we just need to prove that we could implement that (through device enumeration from gUM and redirecting of AudioContexts).

@joeberkovitz
Copy link
Contributor

OK -- can you walk us through this on tomorrow's call? Let's discuss what "prove" means, in particular.

@cwilso
Copy link
Contributor Author

cwilso commented May 13, 2015

Sure

On Wed, May 13, 2015 at 9:21 AM, Joe Berkovitz notifications@github.com
wrote:

OK -- can you walk us through this on tomorrow's call? Let's discuss what
"prove" means, in particular.


Reply to this email directly or view it on GitHub
https://github.com/WebAudio/web-audio-api/issues/445#issuecomment-101733971
.

@joeberkovitz
Copy link
Contributor

This issue is Ready for Editing w/r/t the AudioContext constructor as described Audio Output API proposal. However we need to still wait for MCTF response re the ability to enumerate devices with an awareness of sample rate, latency, number of channels, etc.

@joeberkovitz joeberkovitz assigned joeberkovitz and unassigned cwilso Jun 1, 2015
@joeberkovitz
Copy link
Contributor

We should also ask MCTF about Permissions API with respect to acquiring permission to access or enumerate devices.

@jasonmcaffee
Copy link

Is there any work around to get this functionality with AudioContext? With appropriate flags enabled, I'm able to get the list of audio devices via navigator.mediaDevices.enumerateDevices(), but I'm at a loss on how to set the output to a given device id. It would be a super useful feature.

@jasonmcaffee
Copy link

I found a workaround, but I'm not sure how well it works yet. There seems to be pops and glitches with a single sine oscillator.
https://jsfiddle.net/2k7gkdqw/1/

Basically the Audio element has a setSinkId for setting the appropriate output device.
The sinkId can be obtained, after gaining audio permissions, via navigator.mediaDevices.enumerateDevices.

With the audio element setup to send to the appropriate output, you can stream from the audiocontext to the audio element by creating a mediaStreamDestination, and passing it's stream property to the audio element.

e.g.

var c = new AudioContext();
var o = c.createOscillator();
var m = c.createMediaStreamDestination();
o.connect(m);
var audioEl = new Audio();
audioEl.src = URL.createObjectURL(m.stream);
audioEl.play();
audioEl.setSinkId('idFromEnumerateDevicesItem');
o.start();

Not sure how well this will work when dealing with several oscillators, effects, etc yet, but appears to be somewhat functional when appropriate flags are set.

UPDATE: I plugged this behavior into my synthesizer.
There is quite a bit of pops and clicks, especially for the first 30-60 seconds of playing. There are periods where pops and clicks don't occur, but these seem to reoccur if the sound is complicated (lots of notes, and/or lots of oscillators)
Another interesting behavior is that there are periods where the sound is detuned.

@petkaantonov
Copy link

after gaining audio permissions

Btw, the permission dialog is: "site wants to use your microphone" which is not going to get accepted if the user is went to e.g. "output device settings" section of an app. Something should probably be done about that so that app that simply wants to enable the user to choose between audio output devices can do so without having to deal with creepy microphone permissions.

@jan-ivar
Copy link

You can get the deviceId of the output device without permission. Permission is only needed for the label.

@petkaantonov
Copy link

I cannot imagine a use case where the label wouldnt be needed, other than malicous ones.
On Jan 31, 2016 20:11, jan-ivar notifications@github.com wrote:You can get the deviceId of the output device without permission. Permission is only needed for the label.

—Reply to this email directly or view it on GitHub.

@jan-ivar
Copy link

output 1, output 2.

@petkaantonov
Copy link

Its unacceptable to present that to a normal user who will just think the app is cheap/unfinished.
On Jan 31, 2016 21:16, jan-ivar notifications@github.com wrote:output 1, output 2.

—Reply to this email directly or view it on GitHub.

@jan-ivar
Copy link

We are talking about audio people, right (didn't they invent output 1 and output 2)?

Seriously, though. What are you suggesting? That output device labels be in the clear?

@petkaantonov
Copy link

I implied there should be separate permission for seeing audio output labels, not "wants to use your microphone" which is creepy as hell when the app has no reason for it.

Applications that play sound are not for "audio people only". Even if they were, that doesnt change the feeling of low quality and cheapness when an app cannot even get your devices right while all other apps can.

On Jan 31, 2016 21:25, jan-ivar notifications@github.com wrote:We are talking about audio people, right (didn't they invent output 1 and output 2)?

Seriously, though. What are you suggesting? That output device labels be in the clear?

—Reply to this email directly or view it on GitHub.

@hoch
Copy link
Member

hoch commented Aug 26, 2022

I did some research on IDL files in Chromium, but couldn't find any precedences that are a union type of DOMString and string ENUMs. So at this point this is technically impossible to do with the current WebIDL.

The other option is to create a dummy interface interface InaudibleSink{} and use the union type of (DOMString or InaudibleSink) but it doesn't seem clean or practical.

Also in the teleconf on 8/25, we agreed that using an internal queue for resolving multiple Promises.

@hoch
Copy link
Member

hoch commented Aug 29, 2022

More details on a solution that doesn't seem clean or practical:

setSinkId((DOMString or AudioContextOptions) sinkId);

Where AudioContextOptions has:

dictionary AudioContextOptions {
  (AudioContextLatencyCategory or double) latencyHint = "interactive";
  float sampleRate;
  (DOMString or AudioContextSinkOptions) sinkId;
};

dictionary AudioContextSinkOptions {
  bool useSilentSink;
}

With this way, we can change the latency hint and the sample rate when we change the sink.

@chrisguttandin
Copy link
Contributor

Could the following be expressed in WebIDL?

new AudioContext() // uses the default device since sinkId is not defined

new AudioContext({ sinkId: null }) // uses no output device since sinkId is set to null
// or
const audioContext = new AudioContext();
audioContext.setSinkId(null);

new AudioContext({ sinkId: 'abcd' }) // uses the device with the sinkId called 'abcd'
// or
const audioContext = new AudioContext();
audioContext.setSinkId('abcd');

If I recall correctly any member of a dictionary is nullable by default. In that case it would just be a dictionary in WebIDL.

dictionary AudioContextOptions {
    DOMString sinkId;
}

Another option could be to use false instead of 'none' to select no output device.

new AudioContext({ sinkId: false })
// or
const audioContext = new AudioContext();
audioContext.setSinkId(false);

I'm not sure though if it is possible to define a union of a DOMString with a boolean in WebIDL.

@hoch
Copy link
Member

hoch commented Aug 29, 2022

(DOMString or boolean) should be possible, but it's not descriptive enough.

Using null is an interesting idea, and then I believe it becomes a union of (DOMString and object). It should be doable but leaving the second field wide open to object doesn't feel great either.

I don't have a strong opinion, but the FooBarOptions approach is generally considered future-proof.

@bicknellr
Copy link

Have you all considered putting this functionality on AudioDestinationNode instead? That way you'd be able to route different streams from the same graph to different outputs. This would be useful if you want to build something that supports separate master and monitor outputs when, for example, preparing upcoming tracks while DJing.

@hoch
Copy link
Member

hoch commented Aug 30, 2022

There's 1:1 association between AudioContext and AudioDestinationNode. Having multiple AudioDestinationNodes is an idea, but I am not sure we want to pursue. Multiple devices mean that the system needs to handle sample rate and callback buffer differences across them.

Also - the multi-routing is already possible with multiple instances of MediaStreamAudioDestinationNode -(MediaStream)-> AudioElement. You'll lose sample-accurate synchronization between devices, but that's expected without device aggregation and an intermedia layer.

@hoch
Copy link
Member

hoch commented Aug 31, 2022

This idea also was proposed from Chrome engineers:

audioContext.setSinkId("default");
audioContext.setSinkId("device-unique-id");
audioContext.setSinkId("silent");

No complicated types, just plain DOMStrings. This is easy and sensible, but IIUC there's no precedences in Web Audio API. We've been using enums for this purpose.

@hoch
Copy link
Member

hoch commented Sep 7, 2022

To recap, here are two proposals for configurability:

A. Using AudioSinkOptions pattern:

dictionary AudioContextOptions {
  ...
  (DOMString or AudioSinkOptions) sinkId;
};

dictionary AudioSinkOptions {
  bool useSilentSink;
}

// example
audioContext.setSinkId("");
audioContext.setSinkId("5b79a953d8fb279...");
audioContext.setSinkId({useSilentSink: true});

B. Using plain strings:

dictionary AudioContextOptions {
  ...
  DOMString sinkId;
};

// example
audioContext.setSinkId("");
audioContext.setSinkId("5b79a953d8fb279...");
audioContext.setSinkId("silent");

@Sheraff
Copy link

Sheraff commented Sep 8, 2022

Is it a guarantee that no device (sink) will ever have an ID of "silent" or "default"?

@hoch
Copy link
Member

hoch commented Sep 8, 2022

See examples if typical IDs over here:
https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices/enumerateDevices#examples

This is what I get from my MacbookPro + Chrome:

audioinput: Default - MacBook Pro Microphone (Built-in) id = default
audioinput: MacBook Pro Microphone (Built-in) id = 5b79a953d8fb279e717b108562f28b4d934473541367b4ae41b809fedb319a8d
audiooutput: Default - MacBook Pro Speakers (Built-in) id = default
audiooutput: MacBook Pro Speakers (Built-in) id = 94508a698f07a537453f6c37a9449817ff3321b8f0869588a94ee4661350f0a8

Is it a guarantee that no device (sink) will ever have an ID of "silent" or "default"?

So I would say no and yes:

  1. "No": you'll get default as an ID. This rather works nicely with this API. You can just throw default and it'll work.
  2. "Yes": you won't get silent as an ID unless there's no future spec change on MediaDevices.enumerateDevices(). Also the first proposal (using AudioSinkOptions) will be effective for this corner case.

@guest271314

This comment was marked as off-topic.

@padenot
Copy link
Member

padenot commented Sep 9, 2022

The identifier is generated by the browser, it cannot be controlled by authors. It's different than the device name.

@guest271314

This comment was marked as off-topic.

@guest271314

This comment was marked as off-topic.

@hoch
Copy link
Member

hoch commented Sep 14, 2022

As @padenot mentioned above, the device identifier and the label are different.

@hoch hoch self-assigned this Sep 14, 2022
@padenot
Copy link
Member

padenot commented Sep 14, 2022

Update from discussing with @hoch at TPAC, this is what we're currently thinking:

enum AudioSinkType {
  "default",
  "none",
};

dict AudioSinkOptions {
  AudioSinkType type;
};

partial interface AudioContext {
  Promise<undefined> setSinkId(AudioSinkOptions or DOMString);
}

AudioSinkType could grow other values. Something we thought about was the notion of "default device for communication use-case" (vs., say, listening to music). This is something that Android and Windows expose, at least.

@guest271314

This comment was marked as off-topic.

@hoch
Copy link
Member

hoch commented Sep 15, 2022

A slight update and one more discussion topic:

enum AudioSinkType {
  "none"
};

dict AudioSinkOptions {
  AudioSinkType type;
};

partial interface AudioContext {
  Promise<undefined> setSinkId(AudioSinkOptions or DOMString);
}

audioContext.setSinkId("") is already available for the default device, and the concept of "default device" doesn't really fall into a "type". A device being a default one is more about its identity, less of characteristics.

Question: what would be the value of audioContext.sinkId when the current sink type is "none"?

@hoch
Copy link
Member

hoch commented Sep 16, 2022

We agreed upon the sinkId getter design. The up-to-date API shape is:

enum AudioSinkType {
  "none"
};

dict AudioSinkOptions {
  AudioSinkType type;
};

partial interface AudioContext {
  readonly attribute (DOMString or AudioSinkOptions) sinkId;
  Promise<undefined> setSinkId(DOMString or AudioSinkOptions);
}

@guest271314

This comment was marked as off-topic.

@mjwilson-google
Copy link
Contributor

hoch, would this generally mean that after a successful setSinkId we should get exactly the same argument back when we call sinkId?

guest271314, I think you are describing using PulseAudio commands to make a microphone / other input device appear as an output device, then setting that output device as the system default output device. I don't think this is the usual configuration, so if someone has set up their system that way it may be that they have a reason to and the browser should respect that. Are you concerned with the meaning of "default audio output device" in the spec? Would it make more sense to say something like the "system-reported" default audio output device?

It seems to me that if the user configures their system to have a particular default output audio device, then that is the "real" default audio output device (even if it happens to be a microphone, /dev/null, etc.). There isn't anything special that would make one audio device the natural default for a particular system.

Or am I misunderstanding your concern?

@hoch
Copy link
Member

hoch commented Sep 20, 2022

Based on the algorithm, the internal slot changes only when the transition is successful. So:

await context.setSinkId('some-id'); // if this was successful
console.log(context.sinkId); // then this should be 'some-id'

@guest271314

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 WG charter deliverables; "need to have" status: ready for editing
Projects
No open projects
v.next
Ready For Editing
Development

Successfully merging a pull request may close this issue.