Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify the channels order for HTMLMediaElement #1089

Closed
VincentJousse opened this issue Nov 23, 2016 · 41 comments
Closed

Specify the channels order for HTMLMediaElement #1089

VincentJousse opened this issue Nov 23, 2016 · 41 comments
Assignees
Milestone

Comments

@VincentJousse
Copy link

VincentJousse commented Nov 23, 2016

As discussed here, we need a normalized channels order for channels manipulation in the Web Audio API.
It has already been defined for upmix/downmix algorithms. The only places where I see a lack of specification are ChannelSplitterNode and ChannelMergerNode.
I see two ways to specify this :

  • Enhance these two nodes specifications.
  • Define the channels orders as a global WAA specification.

I'm not an expert user of the WAA, so I let others feed this thread.

Vincent

@padenot
Copy link
Member

padenot commented Nov 23, 2016

One of the things that need to be done is to spec a channel ordering for the Web. It is possible to inspect the internals of an HTMLMediaElement or MediaStream with the Web Audio API.

This channel ordering would probably only be used when using the Web Audio API, and needs to be able to work with all existing codecs. If you only use an HTMLMediaElement without the Web Audio API, another mapping can be used internally.

For now, it looks like some browsers are using the channel mapping of the media stack they are using without remapping. Depending on the OS, OS version, and browsers, results differ.

We need to involve a number of people, including HTMLMediaElement people, Web Audio API people, VR people (because they have unusual requirements, like non-mapped 8 channel files), and probably authors as well, so that we find a solution that is proper, as multi-channel content is becoming more and more popular.

@padenot padenot changed the title Specify the channels order for the ChannelSplitterNode and the ChannelMergerNode. Specify the channels order for HTMLMediaElement Nov 23, 2016
@MatthieuParmentier
Copy link

To match several needs and various kind of contents: multichannel - Ambisonics&HOA - object-based audio... maybe useful to borrow the codes from the fresh ITU-R BS.2076

This document explains the Audio Definition Model, a free format to describe any kind of audio contents, instead of reinventing the wheel. This model has been developed to drive audio rendering engine. The Web Audio API exactly does this job too.

To offer a shortcut and help for this discussion, I would recommend the adoption of audioPackFormat and audioChannelFormat codes to easily describe any multichannel content. These codes are fixed and reflects the most common pack formats (mono, dual mono, stereo, surround, 5.1...) and channel formats (left, right, center, left surround...). It is also possible to take the codes to describe Ambisonics contents and a few others like Ambix used for Google and facebook VR.

In the broadcast domain, to exchange media between editors and broadcasters, we are used to refer to EBU R123 codes that specify a few combinations of packs and channels (type and order). But the main problem with EBU R123 is the necessity to update the recommendation to create new codes, there is no rule to do it. This is why the Audio Definition Model was born: a flexible schema for audio content description. Hope this helps.

@padenot
Copy link
Member

padenot commented Nov 23, 2016

Interesting, thanks.

Quoting the document:

Therefore, the EBU plans to generate a set of standard format descriptions for many of the commonly used formats

(section 4, page 7)

This is what we need here. Something like SMPTE 2036-2-2008 [0] would work. As the document you linked notes, having a full-blown meta-model ready for introspection is very heavy, and is not a goal for us.

Most content has a defined channel mapping (whether it's Dolby, SMPTE or somthing like WAVEX, Vorbis, etc.). For us, it's just a matter of presenting something coherent to JavaScript.

For example, say you have a 5.1 file. Regardless of the input format, and if there is a channel mapping defined (unlike in the situation described in the next paragraph), authors should expect to have something like: Left, Right, Center, Low frequency effect, Surround left, Surround right: the User-Agent should re-map the channels accordingly so that it is not necessary to detect which UA the code is running on, and special case every type of file, and let everybody re-map the channels in their JavaScript code (which is doable, of course, but is not something that authors should have to do).

For custom things, it would be better to have something like Opus' mapping family 255 [1], where you don't really have a mapping defined, but your application code can do whatever it needs, and you guaranteed to have the channels in the same order as the order they are stored in in the file.

[0]: The document itself is not freely available, but this is close: https://gstreamer.freedesktop.org/data/doc/gstreamer/head/gst-plugins-base-libs/html/gst-plugins-base-libs-gstaudiochannels.html#GstAudioChannelPosition
[1]: http://www.opus-codec.org/docs/opusfile_api-0.4/structOpusHead.html, search for "OpusHead::mapping_family"

@kickermeister
Copy link

This is what we need here. Something like SMPTE 2036-2-2008 [0] would work.

SMPTE 2036-2-2008 defines only the 22.2 setup. Unfortunately, there is currently no document available in SMPTE and ITU-R which could be used as reference for setups with more than 6 channels, such as 9.1 (aka 4+5+0). There could be something from the AES, but I was at least not able to find something. Especially not for all the different setups defined in ITU-R BS.2051. One could use TABLE 1 from the ITU-R BS.2051, but I'm not sure how reliable it is.

For setups up to 6 channels, you should stick with EBU R123, EBU R91 and ITU-R BS.775.
However, any reliable order of channels would be very much appreciated.

For custom things, it would be better to have something like Opus' mapping family 255 [1], where you don't really have a mapping defined, but your application code can do whatever it needs, and you guaranteed to have the channels in the same order as the order they are stored in in the file.

I think this would be really great for Ambisonics, HOA and object-based content!

@padenot
Copy link
Member

padenot commented Dec 10, 2016

This can be a gradual process. There is a lose de-facto agreement on SMPTE ordering. I propose that we spec that. This covers up to 7.1. Anything else we can spec later.

@hoch
Copy link
Member

hoch commented Dec 19, 2016

Although I agree that the AudioWG's review is needed here, but this issue should be upstreamed to the HTMLMediaElement level. If the channel order is nicely defined by the core decoding component s (i.e. video and audio tags), WebAudio can simply follow it.

@hoch
Copy link
Member

hoch commented Dec 19, 2016

I believe @jdsmith3000 has been working on this line of work? Any opinion?

@padenot
Copy link
Member

padenot commented Dec 19, 2016

There was an effort at some point to layer HTMLMediaElement on top of the Web Audio API. If this is still something we consider important, then it should be specced the other way around. Of course, conceptually, piping an HTMLMediaElement into a Web Audio graph then does kind of a weird loop across various specs, but maybe it's a situation that is tenable until the Web Audio API can be used to properly implement an HTMLMediaElement ?

@jdsmith3000
Copy link
Contributor

jdsmith3000 commented Jan 4, 2017 via email

@padenot
Copy link
Member

padenot commented Jan 9, 2017

@jdsmith3000, we have had reports from ISVs that there are inconsistencies between browsers when it comes to channel mapping for a given audio file.

In our reports, authors want to know at which index the channels for, say, Left, Right, Center, or Low Frequency are, so they can be processed appropriately for their use case. The Web Audio API has no concept of channel mapping, and instead assumes what looks like SMPTE ordering for all channels, and up/down mixing. This is becoming important because with the Web Audio API, authors can inspect the output of an HTMLMediaElement via a MediaElementSourceNode, or a MediaStream{Track,}AudioSourceNode if the HTMLMediaElement's output has been captured via captureStream.

In practice:

  • Some browsers simply use the channel mapping of the file. There is no Web API to detect the channel ordering of a particular media file (as of writing this), so users cannot really do anything with the files, since they don't know if the second channel is, for example, Right or Center.
  • Some browsers remap all the channel to something standard (say, SMPTE ordering), so that authors can assume a particular ordering. Internally, UAs know which channel mapping a particular file uses, and can move the channels around.

There need to be some sort of agreement on what to do so that multi-channel on the web is viable for the non-basic use-case (simple playback).

@mdjp mdjp added this to the Web Audio V1 milestone Jan 19, 2017
@joeberkovitz
Copy link
Contributor

Today's WG call: Proposal is to adopt SMPTE ordering. Seeking feedback from other developers and community at large prior to making this change.

@joeberkovitz
Copy link
Contributor

Status today: still awaiting feedback from @mdjp colleagues

@joeberkovitz
Copy link
Contributor

@jdsmith3000 says that SMPTE ordering appears to be same as Windows ordering.

Resolution is to reproduce SMPTE ordering in the spec.

@joeberkovitz joeberkovitz assigned jdsmith3000 and unassigned mdjp Mar 16, 2017
@mdjp
Copy link
Member

mdjp commented Mar 20, 2017

Some more feedback to consider: SMPTE sounds like a sensible approach. The only other document worth considering is BS.2094 (common definitions for ADM). This specifies pack formats for different layouts (from page 6) and the channel ordering for 22.2 thankfully fits with the SMPTE version. However it also highlights that there’s more than one kind of 7.0/7.1 and for example, do you need a silent fourth channel if there’s no LFE for 5.0.
Without having some signalling this generally becomes very difficult.
Also what about ambisonics busses.

@kickermeister
Copy link

I'd like to second the suggestion from @mdjp. SMPTE 2036-2-2008 only specifies the channel ordering for a 22.2 setup, which is nice but certainly not a good basis for other, more realistic channel formats for the Web. ITU-R BS.2094 (http://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.2094-0-201604-I!!PDF-E.pdf) appears to be the best option for now.

@joeberkovitz
Copy link
Contributor

To add a little more color from the last WG call:

We said that we would reproduce the known SMPTE channel ordering in the specification. However, we did not intend to specifically tie the spec to SMPTE (possibly forcing future web developers to buy copies of the spec). What we were actually committing to was to document the known SMPTE orderings in the Web Audio spec as the canonical ordering (without citing SMPTE per se), and leave room for other channel layouts to be included in the future.

So maybe this goes well with the above two suggestions.

@jdsmith3000
Copy link
Contributor

This is a little different from my understanding. The SMTPE ordering accommodates up to 22.2 multichannel, but also can accommodate sub-variations. We use a similar ordering scheme for Windows under WaveFormatExtensible. The ordering in it matches SMPTE, but the labels are more explicit. I believe we agreed to both state the ordering and labeling for SMPTE in the spec, and credit the document as the origin.

@padenot
Copy link
Member

padenot commented Mar 23, 2017

Like @jdsmith3000, I thought we would reference other documents. I think it makes sense to do so.

@joeberkovitz
Copy link
Contributor

Maybe my misunderstanding about referencing SMPTE then -- I defer to spec editors and experts on this. I am not sure if there is actually a problem here.

@rtoy
Copy link
Member

rtoy commented Mar 23, 2017

This is my recollection too (what @padenot and @jdsmith3000 said) from the teleconf. There was some concern expressed about referencing a document that costs a significant amount of money.

@jdsmith3000
Copy link
Contributor

I have the SMPTE ST2036-2-2008 UHDTV Audio Char and Channel Mapping specification and am looking at where to specifically add the content and SMPTE reference. To me, the logical change is to extend the mapping information in section 6.2 - Channel Ordering to include "22.2". Subsets should work with that. The SMPTE ordering, for example, lines up with the 5.1 channel ordering currently in the spec. And adding the information here aligns with the original concern about needing ordering for ChannelSplitterNode and ChannelMergerNode.

If this sounds okay to others, I will proceed with a pull request.

@kickermeister
Copy link

Sorry for bothering you again but I still wonder how you will define subsets with less than 24 channels then. For instance, what should be the decoder channel order for an "11.1" (aka "7.1.4" aka "7.1+4") channel setup file? Will there be a definition for this example subset in the spec? What if an unsupported subset is used? Moreover, the increasing usage of Ambisoncs (FOA / HOA) does not even have a direct loudspeaker mapping in the traditional understanding. How can that be covered by the spec?

@hoch
Copy link
Member

hoch commented Apr 11, 2017

I also sensed many people are interested in the 'non-diegetic' audio in FOA/HOA scenario.

By the way, even if our WG decides to adopt whatever the mapping scheme is I am not sure what that means. Web Audio API does not govern the decoding/streaming of media file unless it's given to decodeAudioData(). The splitter and the merger don't do anything smart to reorder the channel mapping and we want them to stay that way. So are we talking specifically about that method? or are we talking about proposing a channel scheme to MediaElement folks?

I understand there was an attempt to build MediaElement on top of other APIs, but I don't see that is happening now or in the near future. IMO, we should focus on confirming the decision from MediaElement, and making our decodeAudioData() work consistently.

@padenot
Copy link
Member

padenot commented Apr 12, 2017

This is about exposing a stable and consistent channel ordering when either:

  • Decoding a media file using decodeAudioData
  • Using an HTMLMediaElement, via MediaElementAudioSourceNode

This is not the case right now, implementations either remap to a consistent ordering (Gecko, Edge, although I probably haven't checked all cases for Edge), or have the ordering be the ordering of the underlying decoding mechanism (for example, and I could be wrong, but trying a few things, the native API OSX on Safari, whatever ffmpeg does on Chrome).

Since there is no way (short of writing a custom parser) for authors to discover the actual mapping of a file, the proposal is to remap the channels of all files to a well known mapping. SMPTE/WaveFormatExtensible.

Advanced use-cases can (and do) use something like Opus' mapping family 255, and get from 1 to 255 channels without explicit mapping, but supposedly there is associated code that works with those files, and the knowledge of what to do with what channel is the responsibility of the code (and not the media file).

Again, this only matters because the actual content of the channels is observable when using the Web Audio API (and only using the Web Audio API). When simply playing an HTMLMediaElement, other behaviour can (and are) implemented and valid, such as, shipping the compressed audio directly a special DSP chip (often with mp3 on mobile for increased battery life) or dedicated hardware (say, encoded surround audio that plays on a Hi-Fi home cinema system, that has a specific channel ordering and processing).

@hoch
Copy link
Member

hoch commented Apr 12, 2017

for example, and I could be wrong, but trying a few things, the native API OSX on Safari, whatever ffmpeg does on Chrome

I literally have an example for this.

Yes, I personally experienced this and I understand what the problem is. My point was that the meaning of a decision we make here. Without involving the working group for MediaElement, I don't see no point of the discussion. Perhaps I am asking this because I don't have a big picture on how various working groups are operating.

@padenot
Copy link
Member

padenot commented Apr 13, 2017

Yes, see my last paragraph here.

This is not an issue that happens when you're not using the Web Audio API. This remapping would happen in MediaElementAudioSourceNode or at the end of the decoding when using decodeAudioData.

@hoch
Copy link
Member

hoch commented Apr 13, 2017

Yes, thanks for the clarification.

So the channel mapping between MediaElement and the audio system layer (a DSP chip or a dedicated hardware) is already decided. It sounds like this invisible channel mapping might be different across the platform but it's handled automatically. That means the mapping scheme exposed to Web Audio API also needs to be changed somehow?

@padenot
Copy link
Member

padenot commented Apr 13, 2017

That means the mapping scheme exposed to Web Audio API also needs to be changed somehow?

I think the idea so far is, regardless of the actual channel mapping in the actual media, the Web Audio API would re-shuffle the channels to present a stable order.

For example, consider two files, one AAC file and one Wav file.

In AAC, the ordering is: C, L, R, SL, SR, LFE
In Wav, the ordering is: L, R, C, LFE, SL, SR

It's no uncommon that the compressed AAC is sent directly to another device (for example using an optical cable), that would be responsible for the matrixing/whatever needs to happen. The other device has the knowledge of the speaker setup AND the channel mapping present in the file, so, it's able to make an informed decision about what to do. The other file also has a defined mapping, so the UA can the uncompressed audio

For the Web Audio API, the information about the channel mapping is lost when using MediaElementAudioSourceNode or decodeAudioData, you only have a channel count, so you can't make an informed decision, and re-map the channel yourself. Of course, you can always do that if you provide the info off-band, but it's not great to have to do that.

@jdsmith3000
Copy link
Contributor

If we lose channel mapping information, is ordering alone sufficient? The 22.2 mapping will define the order of channel beyond our current mono, stereo and 5.1; but will only hold up if all channels in the order are represented in the audio. Correct?

@padenot
Copy link
Member

padenot commented Apr 14, 2017

If we lose channel mapping information, is ordering alone sufficient?

If we spec a consistent channel regardless of the input media, then the Web Audio API implementation remaps to this consistent channel ordering, so authors have the guarantee, for example, that on an AudioBuffer that has six channels, getChannelData(2) will be the center channel, so there is no issue about the loss of information.

It appears that ordering and channel count will be sufficient to be able to work with any file, if we have this guarantee that the Web Audio API only presents data in (say) WaveFormatExtensible or SMPTE.

The 22.2 mapping will define the order of channel beyond our current mono, stereo and 5.1; but will only hold up if all channels in the order are represented in the audio. Correct?

I'm sorry, I don't think I understand what you mean here.

@hoch
Copy link
Member

hoch commented Apr 14, 2017

for example, that on an AudioBuffer that has six channels, getChannelData(2) will be the center channel, so there is no issue about the loss of information.

I really like that idea. Perhaps we can go further by allowing getChannelData("FC")?

As we discussed, Media*SourceNode does not know how many active channels it contains. Can we consider adding one more property there?

console.log(mediaElementSourceNode.streamInfo);
>> "[FL, FR, FC, LFE1, BL, BR]"

Once we can agree upon the channel mapping scheme, adding this should not be a problem.

@padenot
Copy link
Member

padenot commented Apr 18, 2017

As we discussed, Media*SourceNode does not know how many active channels it contains.

I don't remember talking about that, do you have a link or something ?

@hoch
Copy link
Member

hoch commented Apr 18, 2017

I think I heard it in our teleconference, but couldn't find it on the minute. FWIW, currently the information is not exposed anywhere. (i.e. we can't query the node to find out the current active channel config.)

This kind of introspection is always better for developers, I believe.

@jdsmith3000
Copy link
Contributor

The SMPTE channel ordering is currently not supported on Windows. Is there a strong argument favoring SMPTE over ordering established by longer usage in Windows? If not, Iwould prefer listing the ordering in WAVEFORMEXTENSIBLE. It is similar to, but not the same as SMPTE.

Separately, @hoch suggests extending use of standard labeling on the orders. Is the intent to avoid having to fill missed positions with blank channels? I'm not sure implementations support the labeling currently, so this might not be readily supportable today.

@joeberkovitz
Copy link
Contributor

An approach that emerged on today's WG call is as follows (note that only @hoch, @rtoy and @svgeesus were present besides the chair):

In #1089 (comment) @hoch suggested that we endow Media*SourceNodes with some descriptive info that describes the mapping from channel indices. This info could be optionally present, where known. I also note that AudioBuffer may benefit from the same idea, since its channel indices presumably reflect the channel ordering in whatever media were decoded, and that different media formats use different native channel orderings.

The idea of optional descriptive data (perhaps coupled with some way to easily look up channels by their descriptive meaning, rather than their index) seems in many ways more promising than continuing down the road of forcing a uniform channel ordering on all of these interfaces, which is proving problematic (and might entail incompatible changes to existing WebRTC behavior).

Since this can be added later, and since it will require some more careful design, this suggests we should push off this capability to v.next and allow the channel index assignments to remain as they are for now, still indeterminate in some cases as to what a given channel index really is.

@padenot
Copy link
Member

padenot commented Jun 8, 2017

The idea of optional descriptive data (perhaps coupled with some way to easily look up channels by their descriptive meaning, rather than their index) seems in many ways more promising than continuing down the road of forcing a uniform channel ordering on all of these interfaces, which is proving problematic (and might entail incompatible changes to existing WebRTC behavior).

What problems have been encountered ?

@joeberkovitz
Copy link
Contributor

@jdsmith3000: From F2F, we've resolved to adopt the ordering from WAVEFORMEXTENSIBLE and to duplicate this information in the spec. An informative note can observe that the spec ordering is based on WAV.

@jdsmith3000
Copy link
Contributor

Like this?

Extended:
0: SPEAKER_FRONT_LEFT
1: SPEAKER_FRONT_RIGHT
2: SPEAKER_FRONT_CENTER
3: SPEAKER_LOW_FREQUENCY
4: SPEAKER_BACK_LEFT
5: SPEAKER_BACK_RIGHT
6: SPEAKER_FRONT_LEFT_OF_CENTER
7: SPEAKER_FRONT_RIGHT_OF_CENTER
8: SPEAKER_BACK_CENTER
9: SPEAKER_SIDE_LEFT
10: SPEAKER_SIDE_RIGHT
11: SPEAKER_TOP_CENTER
12: SPEAKER_TOP_FRONT_LEFT
13: SPEAKER_TOP_FRONT_CENTER
14: SPEAKER_TOP_FRONT_RIGHT
15: SPEAKER_TOP_BACK_LEFT
16: SPEAKER_TOP_BACK_CENTER
17: SPEAKER_TOP_BACK_RIGHT

@padenot
Copy link
Member

padenot commented Jun 21, 2017

Yes. We'd make a table in the spec that has a informations.

@joeberkovitz
Copy link
Contributor

We can use this table as a sort ordering for channels, so that the description copes gracefully with missing channels.

@jdsmith3000
Copy link
Contributor

PR #1271 should resolve this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants