-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KHR_audio_emitter #2137
base: main
Are you sure you want to change the base?
KHR_audio_emitter #2137
Conversation
Love that this space is getting more attention. I am a little concerned about the elimination of a number of features that MSFT_audio_emitter has. Some limitations are understandable such as the integration with animation. I expect the thinking is that the model and its audio is driven externally by the loaded system. Though this approach may give an artist less control, or at minimum a more complicated workflow. Other limitations I'm not sure I understand such as limiting to one emitter per node. This would just lead to having a number of child nodes for scenarios that require it. I'd like to remind folks of a demo video we produced for MSFT_audio_emitter that was completely data driven. |
I wanted to chime in here on the randomized audio clips. I'm generally opposed to having specific fixed-function features in glTF where not necessary, because those features will have to be implemented by everyone and maintained, even in the future when extensions such as KHR_animation2 have been developed more. That said, the demo you link is really cool. I can't remember the original document but I did see that randomizing environment audio adds a lot to immersion, compared to having a looping set of tracks that play one after the other. So regarding randomized clips, I'm a bit torn here. I might also suggest maybe some middle ground: rather than require a weighting/randomization system, allow multiple clips per audio emitter, but leave it up to the application or future extensions to implement the randomization / support selecting audio clips. (and otherwise allow just playing the first clip). As for multiple emitters per node, I would suggest this is not necessary: it would be very easy to add a child node (with no translation, rotation or scale) with another emitter on it. This is similar to how each node only has one mesh, but multiple meshes can be easily added as child nodes. |
Updated the example C++ source code for KHR_audio here: https://github.com/ux3d/OMI/tree/KHR_audio |
Yeah we talked about this extensively in the OMI glTF meeting yesterday. I'm personally on the side of making this extension as simple as possible. I also realize now that it is a bit odd that we allow multiple emitters on the scene, but not nodes. Given this feedback my recommendations are:
Here's a proposal with multiple inputs per emitter. There is a gain value on the emitter as well as each of the sources as you would usually see in a mixer. In this proposal the only inputs are audio sources, but you could imagine other audio processing nodes in there as well, similar to the WebAudio API. I think we want to make this spec as simple as possible without limiting future extensions to add more features and adding mixing to the core spec isn't a huge ask. {
"emitters": [
{
"name": "Positional Emitter",
"type": "positional",
"gain": 0.8,
"inputs": [0, 1],
"positional": {
"coneInnerAngle": 6.283185307179586,
"coneOuterAngle": 6.283185307179586,
"coneOuterGain": 0.0,
"distanceModel": "inverse",
"maxDistance": 10.0,
"refDistance": 1.0,
"rolloffFactor": 0.8
}
}
],
"sources": [
{
"name": "Clip 1",
"gain": 0.6,
"playing": true,
"loop": true,
"audio": 0
},
{
"name": "Clip 2",
"gain": 0.6,
"playing": true,
"loop": true,
"audio": 1
}
],
"audio": [
{
"uri": "audio1.mp3"
},
{
"bufferView": 0,
"mimeType": "audio/mpeg"
}
]
}
|
Just want to voice a +1 to Robert's proposed changes above particularly the bits around having multiple sources in the array. I've implemented the OMI audio spec in my WordPress plugin ( https://3ov.xyz ) and this change is early enough and makes enough sense that it is not impacting to my current users. Also a +1 on having one global emitter and having inputs that feed into that global. I don't feel strongly about loopStart and loopEnd but do see the benefits. |
We got feedback on |
|
||
#### `playing` | ||
|
||
Whether or not the specified audio clip is playing. Setting this property `true` will set the audio clip to play on load (autoplay). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason for naming this like a state ("playing") vs. naming it as what it does ("autoplay")? Seems that hints at possible implementation details ("changing this should change play state") but that isn't mentioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bigger discussion about how animation, audio, etc. should be treated on load. Animations are currently just keyframe data and it's up to the implementation to figure out how to play the animations. https://www.khronos.org/registry/glTF/specs/2.0/glTF-2.0.html#animations
So this begs the question if playing
or autoPlay
or even loop
should be included in the spec.
@najadojo @bghgary this also goes against the direction of the MSFT_audio_emitter where there are ways to specify runtime behavior.
Maybe runtime behavior should be left out of this spec and implemented in another?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point regarding playback behaviour!
If the glTF is purely seen as "data setup", then it might still be desirable to have a way to connect audio clips to animations - e.g. saying "this audio emitter belongs to that animation" (potentially: "at that point in time") would be pure data. This would be similar to how a mesh + material are connected to a node.
What do you think in which direction this connection should be made? Attaching an emitter to a clip or vice versa? I think on the clip would make somewhat more sense:
"animations": [
{
"channels": [...],
"samplers": [...],
"name": "SimpleCubeAnim",
"extensions": {
"KHR_audio": {
"emitters": [
{
"id": 0,
"offset": 0.0
},
{
"id": 1,
"offset": 1.3
}
]
}
}
}
],
In this example, two emitters belong to this animation clip, viewers would interpret that as they "loop together", and one emitter has a slight offset. Note having the same emitter play multiple times during the loop would be possible by having multiple references to the same ID with separate offsets.
What do you think? I think that would preserve the "glTF is purely data" aspect by giving hints about how things are connected, not how things should be playing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During the OMI glTF meeting we discussed this a bit. I think synchronizing audio with an animation should also be made part of another spec. AFAIK synchronizing audio with animations isn't necessarily the best way to deal with this scenario. We should talk more about autoPlay
and loop
though. Should this be included in the spec? Should we go into more depth on playing/mixing audio? It'd be good to get some feedback from others on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, at a minimum, we need to define both how the glTF carries the audio payload (including metadata) and how to play the audio, whether that is one or more specs. If this spec only defines the former and that's all we define, it will be hard to know if the spec works and it will be hard to demo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you agree that the options are
- adding info about play behaviour to the audio data itself
- connecting audio and animations in some way (either from audio to animation or from animation to audio)
or do you have another one in mind?
I agree that without any means of knowing how the audio is relevant to the scene, viewers won't be able to do anything with it - e.g. at a minimum I think tools like model-viewer should have a way to infer what to do with a file that has multiple animation clips and multiple audio assets (could be 1:1, could be different). A counter-argument would be saying "well for this case, if there's one animation and one audio clip that will be played back, everything else is undefined" (still allowing for cases such as this) but I'm not a huge fan of undefined-by-design...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During the OMI glTF meeting we agreed that playing behavior (audioPlay
and loop
) should be in this spec to define the minimum behavior to play sounds. However, connecting audio and animations should be delegated to an extension like #2147
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to think about the corresponding spec(s) that define how the audio will play before completing this spec. KHR_animation_pointer
spec will be able to cover some things (like modifying parameters for looping sounds), but it's probably not enough (e.g. one-shot sounds triggered from an animation).
#### `coneInnerAngle` | ||
|
||
The angle, in radians, of a cone inside of which there will be no volume reduction. | ||
|
||
#### `coneOuterAngle` | ||
|
||
The angle, in radians, of a cone outside of which the volume will be reduced to a constant value of`coneOuterGain`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to add a note here that setting this to some value > 2 * PI (the max allowed value) will turn this into a spherical audio source. It's implicit from the defaults but could be explained explicitly here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it should act as a point audio source / non-directional source when set. It's worth noting the WebAudio API Specification doesn't specify this detail though. What's also missing is the behavior when the coneOuterAngle is less than the coneInnerAngle or when the coneInnerAngle is greater than the coneOuterAngle. We should check on this before adding these details.
One area that I'd love to see discussed here is behaviour under root scale for AR, as this is something currently broken in both SceneViewer and QuickLook (since it was unspecified, I guess). Happy to provide sample files. Consider the typical scenario of placing a model in AR with model-viewer, and the model has spatial audio.
Unfortunately such considerations were omitted from lighting in glTF, and thus lighting in AR is also "broken everywhere" right now :) To explain why that's not trivial, here's my take (and happy to take the lighting discussion elsewhere, here for context):
|
|
||
An object containing the positional audio emitter properties. This may only be defined if `type` is set to `positional`. | ||
|
||
### Positional Audio Emitter Properties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are doppler effects considered implementation details? Might want to explicitly call this out here (e.g. animating a node with an audio source, fast viewer movement, ... might cause doppler effects on positional sources).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to add a note, yeah. But perhaps audio-listener oriented properties / effects should be defined in another series of extensions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't think these should be specified here, and probably also not in another extension, as its very application-specific. "Doppler effect for audio" is kind of in the same realm as "bloom effect for rendering" in my opinion. The note would be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can add language such as "Implementors may choose to add effects to spatial audio such as simulating the doppler effect." 👍
If needed, perhaps this could be specified under the |
I generally agree, but
is exactly what I'm trying to say: I think an extension adding audio (or lights) should talk about how that audio works. AR is a strong usecase for glTF, and it would simply be "undefined" (every viewer would do something different) if not explained in the extension that adds audio, in my opinion. Better would have been if such topics ("behaviour under different types of scales") would have been part of the Core spec, of course - if they would have been, they'd have probably forwarded responsibility to extensions though... I can just say: SceneViewer and QuickLook (glTF and USDZ, respectively) allow for the usage of audio and lights, and both haven't defined/thought about how it behaves under AR scale, so right now we can't use it for many projects where we'd like to. If it stays unspecified for KHR_audio, it's immediately the same issue. I'll take a deeper look at KHR_xmp_json_ld! Could you let me know where I find more about |
Probably the place to start would be Recommendations for I think I'm worried that "different types of scale" as a concept has a lot to do with emerging ideas of an "AR viewer" vs. other application types, and that these definitions may change drastically on a timeline of 1-2 years or even months. Embedding these context-specific requirements into KHR_punctual_lights (for example) could cause the extension to become outdated far sooner than it might otherwise. With an XMP namespace there is a proper versioning system and more flexibility to evolve or adapt to specific contexts. I suspect the same applies to KHR_audio. |
|
Yup that makes sense. We could remove the stereo audio source requirement from the global emitter.
We spoke about this during the OMI glTF meeting and KHR_audio should defined within the reference of the glTF document. We agree that the document should define what the content "means" but that the behavior of scaling the content should be up to your application. However, if an animation in the document is controlling the scaling of the node we should define that behavior and maybe that should inform best practices in this AR viewer use-case. So in the case of
Should this respect node scale? Note that in Unity, Audio Emitter scale is separated from the GameObject scale. But in the |
I suggest, that the behaviour is like in the So, the final position of the audio is affected by scale, but not the properties of audio. |
This is unfortunately exactly what breaks all existing viewers in Augmented Reality mode, where users can "make the content smaller or bigger" without any agreement on what that means – is it the viewer getting bigger or the content getting smaller or a mix of both. See my comment above for some more descriptive cases. |
We should realy define this consistent inside glTF. A pure glTF viewer should behave like I described. BTW, Blender behaves this way e.g you can try it out with a point light radius. It stays the same, independent of the scale. |
Personally I think this says "we should only care about what happens inside the ivory tower"... I created a new issue to track this problem outside of just KHR_audio: |
See #2137 (comment) — I think it is fine to try to define what you're asking for, but I do not think that requirement belongs in KHR_audio or KHR_lights_punctual. A specification that can properly deal with the ambiguities of novel and rapidly changing contexts is required to do what you suggest, and that will burden these extensions too much. |
Hm, I don't think it belongs tucked away in some unspecified metadata either. Otherwise a case could be made that 95% of extensions would be better suited as metadata. If you read through the issue I opened, I explicitly mention there that
|
Hey everyone, we've been discussing this spec over the past couple months and we have a few changes that allow for some of the requested features above. First is splitting audio data and audio sources. Audio Data is similar to an We've also changed We'd love to get everyone's feedback on these changes! Also, I believe we (The Matrix Foundation) have submitted our app to join Khronos. So I will hopefully be around to participate during the working group meetings in the near future. Hopefully that helps move this spec (and others) forward a little faster. |
"description": "The audio's MIME type. Required if `bufferView` is defined. Unless specified by another extension, the only supported mimeType is `audio/mpeg`.", | ||
"anyOf": [ | ||
{ | ||
"enum": [ "audio/mpeg" ] | ||
}, | ||
{ | ||
"type": "string" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider supporting more formats than just MP3. For example, WAV files are preferred for short sound effects due to their rapid decode time. There's also OGG vorbis which is a superior format for the same use cases as MP3. There's a long list of other formats but I don't think we should try to support all of them, maybe just 2 or 3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: @antpb update this description to include wav, mp3, and other supported types that are commonly implemented in current engines. state that others are nonstandard. Set type to json instead of string: match https://github.com/KhronosGroup/glTF/blob/main/specification/2.0/schema/image.schema.json#L16C8-L27C15
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, dont forget OGG! Support in Safari for OGG was blocking adding it when we first drafted the OMI_audio_emitter spec. It has since been added later in 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have FLAC added, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FLAC is a great format, but I don't think it's the right fit for the base glTF audio spec. FLAC is supported on all browsers since 2017, and in Unreal, but not Unity or Godot. FLAC is great for lossless audio, but glTF is generally focused on being a last mile format that focuses on being easy to parse by many importers,12 in which cases lossless audio is usually unnecessary. For improved portability, I think it's best for the base spec to only support MP3/OGG/WAV to encourage assets to use those, and then FLAC can be non-standard.
Footnotes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When referring to Ogg audio, Vorbis codec is usually implied as Ogg is a just container format. Ogg format is not supported in Safari so it won't be added to this extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the main issue here is finding formats with widespread support. By listing a format here, it is mandatory to implement that codec in order to support this glTF extension.
The Safari columns here show the problem: https://caniuse.com/?search=ogg
I would like to say, however, that I don't see any requirement for a given browser rendering engine to support a particular format. As with other glTF extensions, when some feature is not natively used by a particular game engine or renderer, it is possible to transcode at load time.
In the case of Ogg Vorbis, it would be possible to transcode Ogg Vorbis to another supported audio format using widely available open code such as https://github.com/brion/ogv.js/ - would usage of a library like Ogv.js in WASM not address the possible concern of Safari support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As with other glTF extensions, when some feature is not natively used by a particular game engine or renderer, it is possible to transcode at load time.
The only current examples are Draco, Meshopt, and Basis compression codecs. They always have to implemented by engines (likely with the same open-source libraries) and do not depend on browser support.
That said, Safari does support Vorbis audio inside WebM files, so using the latter instead of Ogg could be explored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally when this PR spec was written we opted to only define mp3 in the base spec with any other types being "non-standard" but still capable of being used through a string definition of the MIME desired. Mainly because of the expectation of every implementation needing to support every type defined in this base spec. We established that the majority of potential implementors had a common thread of supporting mp3 so everything else would be non-standard and most compatible going forward.
My comment earlier in this was about adding wav because of the wide usage for things like sound effects and making this more in line with the MSFT spec that only supported wav. OGG has a similar situation with wide usage, but we get into a long list of MIMEs with no real criteria for what should make the list of base spec supported types.
Being that this list could be ever-dynamic we should maybe agree on the base list of types with precedent in specification (wav/mp3) agreeing that anything else is non-standard and allowed. This puts us in a situation where any file type can be used and common types implemented (maybe FLAC) will be supported and parsed by engines regardless of the base spec.
It feels unrealistic to expect every implementation to have to support every file type that we list here and the list feels fragile to be ever changing with many opinions of what should be supported.
TLDR, can we go [ "audio/wav", "audio/mpeg" ]
with the spec allowing any other string being used for further supports (like OGG and FLAC)?
Would it be possible to define audio in a very compartmentalized manner. There would be the overall structure of audio that allows an audio source (separate sub-extension for each media type) and several different emitters (also sub-extensions). There may even be the possibility for filters and mixers between the audio sources. This suggestion is somewhat along the lines of Web Audio (https://www.w3.org/TR/webaudio/). It would mean that any implementation would need to implement at least one audio source and one audio emitter. A web-based implementation may be able to integrate with the WebAudio API. I know this is much more complex than the original comments were discussing. Perhaps it would be good to have a special case audio extension (single source type, single non-spatialized emitter, no filtering) and a more general purpose one. |
My personal opinion on this subject: Basically, you need at least one lossless (for any serious professional audio editing) and one lossy codec, mainly designed to minimize the size of the audio data. I suggest flac (wav is a bit outdated) and mp3 (the other variants were more or less developed due to already expired patents). Other common formats can be optionally supported (e.g. as in @antpb comment). |
@capnm is FLAC supported in major engines? From my quick research Unity only supports .aif.wav.mp3 and .ogg We would be specifying that all engines would need to natively support FLAC. While I agree it is a superior file type in a lot of ways, the history of support in wav across engines and browser implementations feels safer. Implementors get it as a freebie vs having to get engines like Unity to support FLAC natively. |
Some engines are at least already on the way to support it (slowly realize the massive advantages of open standards like glTF ;) You can convert and cache it locally to any format you need with relative ease. I would specify that mp3 format support is mandatory, and for the lossless option you can fall back to wav until the recommended flac support is implemented... |
Unity supports FLAC since version 2020.1.0. It's in the release notes but is not in the documentation even though it does work. |
factors in KHR PR feedback and fixes readme
Clarify spec details in KHR_audio and add property summary tables
Just wanted to give an update on the previous commits made to this PR. In recent OMI Group glTF Extensions meetings we identified some points of clarification needed around some of the properties in KHR_audio as well as some errors in the diagrams in the current PR. More details on these changes can be found here:
|
Hello again! I would like to get some feedback about an open issue in the Also noting, it would be nice to use the name If anyone has strong opinions on the naming, please give this issue a read and please leave any feedback you might have there. We'd like to get general agreement from folks here in this PR before making this change and pushing it to the repo it originates from. More context: omigroup/gltf-extensions#205 |
I agree with the name change. Is it worth looking at |
@mikeskydev Yes, we have looked there. |
Rename from KHR_audio to KHR_audio_emitter and add example file
| **coneInnerAngle** | `number` | The anglular diameter of a cone inside of which there will be no angular volume reduction. | 6.2831853... (τ or 2π rad, 360 deg) | | ||
| **coneOuterAngle** | `number` | The anglular diameter of a cone outside of which the volume will be reduced to a constant value of `coneOuterGain`. | 6.2831853... (τ or 2π rad, 360 deg) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider naming these innerConeAngle and outerConeAngle for consistency with KHR_lights_punctual? See:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original spec was drafted to mirror Web Audio API given the wide acceptance of the property names and behaviors. If there is a KHR reason to align these property names I can get behind it, but my personal feelings are that this goes against what a lot of people have mostly agreed on and understand relative to audio properties.
Here's the PannerNode docs we referenced when drafting the original OMI_audio_emitter spec https://developer.mozilla.org/en-US/docs/Web/API/PannerNode
I'm also partial to being able to see quickly the word cone
which is much easier to identify at the beginning for human reading a file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these names are used verbatim outside of the web specification? I do feel that clear inconsistencies among glTF specifications should outweigh borrowing terms verbatim from a particular existing specification.
I have no strong preference about distanceMaximum
or range
in the comment thread below – these aren't strong matches to existing glTF terms, and there's already some inconsistency in accessor.max
vs. .iridescenceThicknessMaximum
. OK with me to keep those as-is if you prefer.
But asking implementors to use both innerConeAngle
and coneInnerAngle
when implementing lights and audio ... I do not love that. 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that parity with lights is actually going to be an issue for implementers. The code for these should be very separated. I can see it as a cosmetic issue for consistency between specs, but not much more than that.
I find coneInnerAngle
easier to read and it's consistent with Web Audio. However, if Khronos wants us to switch to innerConeAngle
for consistency with lights, it's easy to make that change. We just need the people in charge to make the decision one way or the other. We also might want to get input from Mozilla or Google.
| ------------------ | -------- | ------------------------------------------------------------------------------------------------------------------- | ----------------------------------- | | ||
| **coneInnerAngle** | `number` | The anglular diameter of a cone inside of which there will be no angular volume reduction. | 6.2831853... (τ or 2π rad, 360 deg) | | ||
| **coneOuterAngle** | `number` | The anglular diameter of a cone outside of which the volume will be reduced to a constant value of `coneOuterGain`. | 6.2831853... (τ or 2π rad, 360 deg) | | ||
| **coneOuterGain** | `number` | The linear volume gain of the audio emitter set when outside the cone defined by the `coneOuterAngle` property. | 0.0 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above, perhaps outerConeGain?
| **coneOuterAngle** | `number` | The anglular diameter of a cone outside of which the volume will be reduced to a constant value of `coneOuterGain`. | 6.2831853... (τ or 2π rad, 360 deg) | | ||
| **coneOuterGain** | `number` | The linear volume gain of the audio emitter set when outside the cone defined by the `coneOuterAngle` property. | 0.0 | | ||
| **distanceModel** | `string` | Specifies the distance model for the audio emitter. | `"inverse"` | | ||
| **maxDistance** | `number` | The maximum distance between the emitter and listener, beyond which the audio cannot be heard. | 0.0 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider naming this property range
, for further consistency with KHR_lights_punctual? Or perhaps distanceMaximum
for greater consistency with iridescence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maxDistance
is currently named this way to clarify the type of range, to be unambiguous with refDistance
. If maxDistance
is renamed, should refDistance
be renamed too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one was also done to be consistent with the Web Audio API that this spec was inspired from. Wide acceptance of this API is my only hesitation in renaming these properties.
maxDistance: https://developer.mozilla.org/en-US/docs/Web/API/PannerNode/maxDistance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above. I don't feel strongly on this one, there's also some precedence for '.max' in the existing accessor specification.
### Audio Emitter | ||
|
||
Audio emitters define how audio sources are played back. Emitter properties are defined at the document level and are references by nodes. Audio may be played globally or positionally. Positional audio has further properties that define how audio volume scales with distance and angle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should a node's scale affect volume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would make sense to me if the global scale of the node affected the refDistance and range / maxDistance, but not the volume. This way if you scale up a glTF scene with audio, the audio can be heard at the same volume in the same places relative to that scene, and that scene's audio can be heard from farther away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I struggle to know how to implement this in a way that content creators can anticipate. It's very common that I will attach some component to an item and scale it to fit it into a space. Very common also that I'll make something in blender and forget to apply those scales. Making audio volume change based on that scale seems high risk for people to have unintended loud or quiet audio.
I do like the idea outside of this spec and would implement something like a KHR_audio_behavior
or KHR_audio_affector
that can define these behaviors and maybe be a solid place to reference things like effect chains. Would be cool to use things like the node structure to do scale->volume->reverb(more if bigger, less if smaller)->distortion(when super big)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood! I don't have a preference, and I think explicitly leaving this behavior undefined could be OK. The question has come up with KHR_lights_punctual
, particularly in AR/VR/XR applications, so I just wanted to anticipate it if possible here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Randomly stumbling over this, from some GitHub notification, completely without context:
Iff a node scale
should affect an audio volume, then non-uniform scaling will raise questions that should be addressed clearly and explicitly early in the process.
For folks following this PR that have concerns about different file types being supported in this spec, @aaronfranke drafted an excellent example of file types that extend KHR_audio_emitter to support ogg and opus. We just recently merged these in as a draft OMI spec. Anyone with file type reservations around this spec please give these a look to see if this solves your concerns 😄 |
There are several proven patterns you should examine in X3D...
In version 3.x it was all about the Sound and AudioClip nodes for providing the functionality; now in version 4 we support the W3C's WebAudio API!
See: https://web3d.org/documents/specifications/19775-1/V4.0/index.html
and specifically:
https://web3d.org/documents/specifications/19775-1/V4.0/Part01/components/sound.html
________________________________
From: antpb ***@***.***>
Sent: Thursday, May 30, 2024 10:57 PM
To: KhronosGroup/glTF ***@***.***>
Cc: Subscribed ***@***.***>
Subject: Re: [KhronosGroup/glTF] KHR_audio_emitter (PR #2137)
@antpb commented on this pull request.
________________________________
In extensions/2.0/Khronos/KHR_audio_emitter/README.md<#2137 (comment)>:
+### Audio Emitter
+
+Audio emitters define how audio sources are played back. Emitter properties are defined at the document level and are references by nodes. Audio may be played globally or positionally. Positional audio has further properties that define how audio volume scales with distance and angle.
I struggle to know how to implement this in a way that content creators can anticipate. It's very common that I will attach some component to an item and scale it to fit it into a space. Very common also that I'll make something in blender and forget to apply those scales. Making audio volume change based on that scale seems high risk for people to have unintended loud or quiet audio.
I do like the idea outside of this spec and would implement something like a KHR_audio_behavior or KHR_audio_affector that can define these behaviors and maybe be a solid place to reference things like effect chains. Would be cool to use things like the node structure to do scale->volume->reverb(more if bigger, less if smaller)->distortion(when super big)
—
Reply to this email directly, view it on GitHub<#2137 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AB2TSM3CUPXMYD4MKDI5IXDZE7RKZAVCNFSM5SG7HYUKU5DIOJSWCZC7NNSXTPCQOVWGYUTFOF2WK43UKJSXM2LFO45TEMBYHE4DCOBUHE2A>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
@npolys What specifically would you like to point out from that spec? What comparisons are you making, and what changes are you proposing? I see that the specification you linked supports the MIDI and AAC audio formats. What are the intended use cases of these formats? For MIDI, those files just list information about musical notes, their interpretation depends on the system (like how emojis are displayed differently on different operating systems and browsers). Is it intentional for an X3D file using MIDI audio to not be reproducible the same way on all systems? For AAC, please write a summary of how it compares to other formats such as MP3, Ogg Vorbis, and Opus. I also see that the specification details a lot of things that are not audio emission, such as spatialization, effects, and listener information. For glTF, those can be left to separate extensions. For example, we might have KHR_audio_listener, KHR_audio_reverb, etc. If you want to influence the feature set of these or any other extensions, please give specific feedback, use cases, and other information so that we can plan accordingly. |
Sharing the KHR Audio Framework Proposal here for broader visibility. During the recent 3D Formats working group meeting, we reviewed the proposal to define the KHR audio glTF specification using an audio graph framework. The purpose of the current document is to delve deeper into that proposal, offering a comprehensive design of the KHR audio graph. This includes a detailed description of each node object within the graph along with functionality, the specific properties associated with it, and how it interacts with other nodes in the graph. The document is structured to facilitate clear understanding and to solicit feedback on the proposed design. Based on the feedback we will update and finalize the design before it is formally schematized into KHR core audio spec, extensions, animations, and interactivity. cc: @rudybear |
@cashah KHR_audio_graph appears to be a superset of KHR_audio_emitter. Many of its properties are the same, including attenuation properties (distance model, ref distance, max distance, rolloff factor), shape properties (cone inner angle, cone outer angle, cone outer gain), emitter/sink properties (global/positional, gain, source), and playback properties (autoplay, loop, gain). As for the differences, KHR_audio_emitter defines omnidirectional as a cone angle of Tau radians, while KHR_audio_graph defines omnidirectional explicitly as the shape What is the purpose of defining bits per sample, sample count, sample rate, and channels? Those seem redundant with the data already present in the audio file, so I would not expect them to be included in the glTF JSON data. Which audio file types are allowed? KHR_audio_emitter defines MP3 and WAV as allowed types, similar to how base glTF defines PNG and JPEG as allowed types. KHR_audio_graph doesn't define this. Is there a plan to rework KHR_audio_graph to be a superset of KHR_audio_emitter, such that graph depends on emitter? As it is now, KHR_audio_emitter will be much easier to implement in game engines compared to an audio graph system, leading to wider adoption. Most 3D assets that emit audio will only need to define "here's what it sounds like and where it emits to", and do not need to define audio graphs. For example, the boom box sample asset is this simple. Is KHR_audio_graph implemented anywhere? Are there any sample assets? Where are the JSON schemas? Or is it still in the prototype stage, pending rework to be a superset of KHR_audio_emitter?
Why impose this restriction? I can imagine situations where you'd want multiple listeners, such as a scene with security cameras that listen for audio at different locations, and then either records that audio or feeds them through to a screen in a security monitoring room. In KHR_audio_emitter we simply do not define the listener, leaving it up to the implementation or future extensions. We expect that most implementations by default will treat the camera as the point that audio is listened from, for example this is the case in Unity, Unreal, and Godot. |
KHR_audio_emitter
This extension adds the ability to store audio data and represent both global and positional audio emitters in a scene.
This extension is intended to unify the OMI_audio_emitter and MSFT_audio_emitter extensions into a common audio extension.
Members from the Open Metaverse Interoperability Group, Microsoft, and other Khronos members met on 3/23/2022 to discuss how we might unify OMI's work on audio emitters and Microsoft's previous efforts. OMI_audio_emitter has a subset of the MSFT_audio_emitter extension's features, eliminating features that might be out of scope for a more general extension. KHR_audio_emitter also has some spec changes from the original OMI_audio_emitter requested by Khronos members. There are still some outstanding questions on audio formats, MP3 vs WAV, and what features within these formats should be supported.
We're looking forward to working together on this extension and bringing audio to glTF!