Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EXT_animation_map extension proposal #1137

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

msfeldstein
Copy link

I'd like to propose an extension we use at facebook to mark up animations to have semantic meanings, so client applications can know what to do with all the animations packaged inside a model. The simplest use case is to target an animation to be the 'enter' animation, and clients will know what to play when the model first appears (think a flower opening, or person waking up). More complicated use cases could be specifying idle/walk/die/birth animations for game assets.

It would be wonderful if this could be shared across the ecosystem so models can be more dynamic without requiring scripting.

@msfeldstein msfeldstein changed the title EXT_animation_map extension proposal FB_animation_map extension proposal Oct 30, 2017
@pjcozzi
Copy link
Member

pjcozzi commented Oct 30, 2017

Awesome! We'll help spread the word to get you some feedback.

@RemiArnaud
Copy link
Contributor

Related collada extension already available (maya) https://www.khronos.org/collada/wiki/Animation_clip_OpenCOLLADA_extension

@space2
Copy link

space2 commented Oct 31, 2017

In your example you mapped two animations (0 and 3) to the "enter" semantic. In such case (multiple animations mapped to the same semantic) how the animations will be played:

  • in sequence (i.e. play anim 0 and then anim 3), or
  • randomly selecting one?

@lexaknyazev
Copy link
Member

  • Extension semantics refer to an "object". What does it mean? Arbitrary glTF asset could contain multiple meshes/nodes/scenes.

  • glTF doesn't define any "official" way of looping animations, i.e., it's not defined how to interpolate between the last frame and the first frame. Maybe, desired behavior should be stated in the extension spec.

  • Is it OK to have several bindings with the same semantic?


@pjcozzi We need to register FB extension prefix.

@msfeldstein
Copy link
Author

@space2 I added some clarification. I believe they should all be played at once, though we could add a type modifier to allow you to play them in sequence, or random if desired. I think playing at once default is good because you could imagine a glTF scene with multiple meshes and a separate animation targetting each.

@lexaknyazev by "object" i mean a literal json object in the array, not a glTF object. Is there a more clear term i could use here?

glTF doesn't define any "official" way of looping animations, i.e., it's not defined how to interpolate between the last frame and the first frame. Maybe, desired behavior should be stated in the extension spec.

What interpolation would there be? Since there is 0 time between the last frame and first frame i would assume no interpolation.

Is it OK to have several bindings with the same semantic?

I think so. Do you see any problems with that? I could see this being useful if we add the 'sequential' play modes suggested by @space2, so that you could have multiple sequential animation lists.

@msfeldstein
Copy link
Author

also how does naming prefix work? This is an extension proposed by us at FB but i don't think there's anything FB specific about it.

@msfeldstein
Copy link
Author

@lexaknyazev @space2 perhaps we should default to sequential playback of the animations in each binding object animation array, and just let people add multiple entries for one semantic to play them in parallel.

@lexaknyazev
Copy link
Member

My question is about this "object":

Triggered when an object is picked up, for example grabbed in vr, or dragged in WebGL.

Does it mean the whole scene? Or should implementation perform lookup from animation.channel.target to determine affected objects?


What interpolation would there be? Since there is 0 time between the last frame and first frame i would assume no interpolation.

That means that for looping animations, the first and the last frames must have the same values, right?

@msfeldstein
Copy link
Author

msfeldstein commented Oct 31, 2017

Ah i see, yes that is unclear. I was assuming the target of the animation channel but now that i think about it that could add undue complexity to the clients (for an animation that targets a bunch of bones, and we expect clients to union all those bounding boxes for those bones meshes).

Perhaps it would be better to supply a target Node for any animation with interactivity?

the first and the last frames must have the same values

Yes i think that's up to the animation author to create properly looping animations, i don't think it should be the job of the engine to try to fix that. I could also imagine a non-seamless animation, think a bubble that rises from the ground over and over again, and it looks like a new bubble each time.

@lexaknyazev
Copy link
Member

Perhaps it would be better to supply a target Node for any animation with interactivity?

Yes.

we expect clients to union all those bounding boxes for those bones meshes

There was an effort to add pre-computed bounding boxes to skinned meshes. Maybe we should revisit that.

@msfeldstein
Copy link
Author

I added an optional node list for interactive semantics:

The following semantics require a list of glTF Node objects to specify interaction targets. If the node is omitted, it can be assumed that the entire glTF Scene is the target.

Does this seem satisfactory?

Pre-computed bounding boxes seems very useful, we're already fighting some performance problems trying to compute things on the client (we will get around this by pre-computing it on the server but it would be nice to have it already available in most models).

@lexaknyazev
Copy link
Member

Please use plural for array (node -> nodes).

I assume that any node from that list should trigger the animation, since it could be difficult for user to, e.g., grab 3 objects at the same time. On the other hand, there could be use cases for simultaneous conditions (e.g., gaze at several objects at once).

@msfeldstein
Copy link
Author

msfeldstein commented Oct 31, 2017

updated. that assumption is correct, i specify that any node can trigger the event. The use case for simultaneous conditions can be added later with another property (simultaneous: true or something) but i think we can leave that out for simplicity now.

@pjcozzi
Copy link
Member

pjcozzi commented Oct 31, 2017

@pjcozzi We need to register FB extension prefix.

See #1139

also how does naming prefix work? This is an extension proposed by us at FB but i don't think there's anything FB specific about it.

@msfeldstein a vendor prefixed extension like FB_ is for an extension that is created by a vendor that is only implemented by the vendor. An EXT_ prefix is for extension with multiple implementations. The KHR_ prefix is for Khronos ratified extensions that go through Khronos' IP process.

For this extension, it is very likely that engines like three.js, Babylon, and Cesium would be interested in implementing it so we might end up wanting to use the EXT_ prefix to avoid namespace churn later - if there are multiple implementations before this is merged.

"animations": {
"type": "array",
"items": {
"type": "number",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

number -> integer

"nodes": {
"type": "array",
"items": {
"type": "number",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

number -> integer

@lexaknyazev
Copy link
Member

To follow existing glTF schema usages, please consider adding these properties to animations and nodes definitions:

  • "uniqueItems": true
  • "minItems": 1

@sbtron
Copy link
Contributor

sbtron commented Nov 27, 2017

I think we might be interested in implementing this too. Would it make sense to do a multivendor extension like EXT?

Are there any thoughts on allowing compound semantics like

  • semantic1 && semantic2
  • semantic1 || semantic2
  • semantic1 && !semantic2

/cc @thmignon @najadojo

@thmignon
Copy link

@sbtron will be reviewing with the rest of the working group when they begin meeting again next week

@msfeldstein
Copy link
Author

msfeldstein commented Mar 27, 2018 via email

@lexaknyazev
Copy link
Member

The extension spec says

Animations should be played all at once

Does it mean that in the case like the following one all four animations must be playing at the same time? If so, the description should be more sound.

{
    "extensionsUsed": {
        "EXT_animation_map"
    },
    "extensions" : {
        "EXT_animation_map" : {
              "bindings": [
                  {
                      "semantic": "ENTER",
                      "animations": [0, 3]
                  },
                  {
                      "semantic": "ENTER",
                      "animations": [1, 2]
                  }
              ]
        }
    }
}

@msfeldstein
Copy link
Author

yes, it looks like you're applying 4 animations to the ENTER semantic. I'm not sure how to change the description for clarity do you have a suggestion?

@punkoffice
Copy link

Just wondering, has this been added to Facebook? If so, is there any documentation on how to apply animations (armature / vertex) and incorporate it into a GLB for Facebook?

@donmccurdy
Copy link
Contributor

@punkoffice No, Facebook does not yet support animation from the core glTF spec. This extension (not required for animation in glTF) in particular should not be implemented unless and until it is out of draft status.

@najadojo
Copy link

I created a test asset semantics.zip to exercise the implementation of this extension in Windows Mixed Reality.

@msfeldstein
Copy link
Author

msfeldstein commented Jun 1, 2018 via email

@najadojo
Copy link

najadojo commented Jun 1, 2018

Yes we added it in our most recent update. Capabilities documented at https://docs.microsoft.com/en-us/windows/mixed-reality/creating-3d-models-for-use-in-the-windows-mixed-reality-home#animation-guidelines

@msfeldstein
Copy link
Author

msfeldstein commented Jun 1, 2018

Now that there is a shipped implementation, what can we do to finalize and solidify this extension? @sbtron

Copy link
Contributor

@donmccurdy donmccurdy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the scope here is narrow and multiple vendors are interested in implementing, I think my concerns above can be considered resolved.

This extension is unusual compared to many others, in the sense that most of its implementation details are left undefined. That's OK with me, as I assume it will be most useful for application-specific situations, and we're guaranteed that models using the extension will load fine in engines that don't support it. But still, we should try to track and codify standard semantics and their use, if that seems to be turning into a pseudo-standard of its own.

* **LEAVE**: Trigger a single shot animation as the object is leaving the scene or viewport
* **ALWAYS**: Constantly loop an animation

The following semantics require a list of glTF Node objects to specify interaction targets. The event is triggered by acting on any of the nodes listed. If the nodes are omitted, it can be assumed that the entire glTF Scene is the target.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this confusing — the section is labeled "example semantics", which leads me to believe it is not normative, but the language here claims these semantics "require" a list of nodes and presumably must be used in a particular way.

I'm guessing that none of this section's content it is meant to be normative, but either way I think this should be spelled out somewhat.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah i see the confusion here. The semantics above can operate without any target nodes since they act on the entire object, whereas the ones below need some more info (which nodes to act on). Perhaps better wording would be "Semantics can have a list of nodes indicating which nodes the semantic should act upon if needed". Lemme try to rewrite this

"type" : "object",
"definitions": {
"semanticBinding": {
"type": "object",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to read these schema definitions so I'll just ask... can a bindings object have extras?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe anything in a gltf schema can have extras right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, semanticBinding definition needs to contain

"allOf": [ { "$ref": "glTFProperty.schema.json" } ],

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks

@msfeldstein
Copy link
Author

@donmccurdy thoughts on these changes? open to other wordings. It might also make sense to formalize a subset, things like ENTER/LEAVE/ALWAYS are very simple, and useful in almost any context i can think of, while things like GRAB are only for vr-with-hands and probably not ready to be formalized.

@msfeldstein
Copy link
Author

actually after talking to @sbtron it sounds like it might be better to formalize the semantics listed, and let any other usages be considered non-normative, updated the spec to be stronger in that wording

@najadojo
Copy link

najadojo commented Jul 9, 2018

In Windows MR we now have the ability to specify an animation reset state on the binding by appending _RESET_ON_START, _RESET_ON_STOP, or _DONT_RESET to any of the semantic names provided above. We could transition this to an extras object easily but I think we should have an easy way to add a sematic without editing this spec.

@msfeldstein
Copy link
Author

ah that might be worth specifying in spec (your addition makes things like enter animations not-compliant). Can you elaborate on what those suffixes mean? I assume DONT_RESET means to hold the final frame of the animation when it finishes, which would be good to add a how does RESET_ON_START work?

CSS calls this animation-fill-mode:forward, THREE calls it clampWhenFinished:true

@najadojo
Copy link

The reset refers to the key frame time. When the suffix is RESET_ON_START the animation starts at time 0 and the initial pose is applied, the default behavior. When the suffix is DONT_RESET the animation starts at the last time it was running at and that pose is applied, if an animation was stopped before it completes it will pickup where it left off. If DONT_RESET is applied to a completed animation starting it again will do nothing as its already at its final keyframe. When the suffix is RESET_ON_STOP and the animation plays to completion the time will be set at time 0 and the initial pose is applied again.

@msfeldstein
Copy link
Author

Honestly i don't have enough info about exactly how this should work, i figured things like an ENTER animation will have its final frame match the rest position, so i didn't consider something like this. Perhaps we can move that to a separate property or extra and try to finalize it later?

@najadojo
Copy link

An extra is fine for this data but it highlights "semantic" is only meaningful in the context in which its applied and custom values should be allowed. That is unless we more completely define the semantics and how they apply to different client conditions. For example I assume ENTER, POINTER_ENTER, and GAZE_ENTER all mean the same thing to the FB feed viewing of a glTF. What does PROXIMITY mean in that context or any 3D in 2D context? I'd also like to have a way that we can extend semantics easily. say if we want to use a joystick or other gesture to start playback.

@msfeldstein
Copy link
Author

msfeldstein commented Jul 10, 2018 via email

@pjcozzi
Copy link
Member

pjcozzi commented Jul 18, 2018

@donmccurdy are we close to being able to merge this as a draft?

@donmccurdy
Copy link
Contributor

I'm not opposed to merging this as a draft, but not sure whether the thread above is resolved. Will leave that to @msfeldstein and @najadojo, fine to merge when they're ready. Some of the suffixes, e.g. _DONT_RESET, sound like they could be properties of a binding rather than suffixes of a semantic, IMO.

Also wondering if it might be worth following vertex attribute semantics in requiring application-specific semantics to be prefixed with a _, thoughts?

@msfeldstein
Copy link
Author

It sounds like @najadojo is cool with moving the suffix stuff to an extra property which would unblock that.

The underscore suffix for non-standard semantics seems fine.

@najadojo you ask "For example I assume ENTER, POINTER_ENTER, and GAZE_ENTER all mean the same thing to the FB feed viewing of a glTF. What does PROXIMITY mean in that context or any 3D in 2D context?"

ENTER is different from POINTER_ENTER and GAZE_ENTER, pointer enter and gaze enter don't mean anything in the feed context since there is no pointer or gaze. It should just be ignored. Same with PROXIMITY, if you're not in ar/vr or have some other way to understand proximity then that animatino would be ignored.

@pjcozzi
Copy link
Member

pjcozzi commented Jan 25, 2019

@msfeldstein @najadojo is this extension in use at Facebook? Is this PR ready to merge or does it need more discussion or changes?

@msfeldstein
Copy link
Author

msfeldstein commented Jan 26, 2019 via email

@najadojo
Copy link

That is correct Windows Mixed Reality is using this extension. Though we do have some current limitations and modifications that are not part of this extension description. Our documentation can be found at Animation guidelines.

@pjcozzi
Copy link
Member

pjcozzi commented Jan 29, 2019

Thanks for the quick response, @msfeldstein @najadojo!

Though we do have some current limitations and modifications that are not part of this extension description. Our documentation can be found at Animation guidelines.

@sbtron @najadojo do you think someone at Microsoft would be willing to update this so we can merge it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet