New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interactive Video #33
Comments
I'm the developer of a web extension called Plopdown which adds interactive elements to existing videos on the web. The most complex and annoying problem in this space is the calculations and updates I need to make to ensure the elements related to the video are positioned correctly and show up in fullscreen. If anything should come out of this proposal, it's the ability to add elements to the playing video that are fully embedded. |
@spaceribs , thank you. I wonder whether Plopdown uses rectangular regions for interactive elements or whether it utilizes arbitrary shapes? On the topic of ways of implementing arbitrary shapes, in theory, one could use video tracks. These video tracks could contain of multiple animated silhouettes, overlay regions, or “hotspots”. Consider an interactive video of an automobile engine, under a hood. Envision a secondary video track where there is a colored silhouette for each part of the engine. While the primary video track would be visible to a user, using the secondary video track(s), the user could, with their mouse, hover over and click on the parts of the engine. |
For plopdown, regions were somewhat restrictive and I specifically wanted to implement other kinds of overlays like picture-in-picture, games, etc. The best course of action was to use VTT metadata to store the rendering information and place an absolutely positioned The tricky part was the positioning of the "stage" element. Originally I placed it as a sibling of the video, but the controls of Youtube/Netflix would prevent click events from coming through to my menus and elements. I eventually landed on querying via elementsFromPoint as the best solution to make sure the overlay displayed in fullscreen. The majority of elements are just normal HTML/SVG elements, nothing more complex than that. |
There's some similarity here with BBC R&D's object based media: https://www.ibc.org/trends/object-based-media-exploring-the-user-experience/6889.article |
@chrisn , I approached these interactive and adaptive media topics from an AI and educational technology perspective. Thanks for that article which showcases some BBC R&D work and brings to mind other interesting scenarios, e.g., journalism. Here is a hyperlink to the referenced article: Object-Based Media: An Overview of the User Experience by Maxine Glancy, Lauren Ward, Nick Hanson, Andy Brown, and Michael Armstrong. Here is a hyperlink to the StoryFormer tool: https://www.bbc.co.uk/makerbox/tools/storyformer . |
@AdamSobieski @spaceribs the Media WG is currently being rechartered. Have a look at the proposed charter, I think some of the deliverables will address several of the points raised above. |
It would be interesting to see a gap analysis:
|
These proposals also seem related:
|
@pchampin , @chrisn , thank you. If the topics aren't already in scope, I would like to propose adding the discussion and exploration of interactive video, branching video, interactive film, and related topics to the new Media WG charter. Here is a preliminary gap analysis:
While JS libraries and scripts running in Web documents can enhance included videos with interactivity, for self-contained and portable interactive videos which contain their own JS scripts and WASM modules, it appears that polyfills, prototypes, and player libraries would require the capabilities to inspect and access videos' file attachments, in particular JS scripts and WASM modules. This functionality could be implemented natively by browsers and provided only to interactive videos per their standard runtime environment. Why should interactive videos be self-contained and portable?
With respect to what new browser capabilities are needed, these include:
With respect to what APIs that a standardized interactive video player would need to provide, there are two perspectives to consider. Firstly, there is the Web document perspective, viewing the interactive video player from the outside. In these regards, one can envision events, e.g., Secondly, there is the interactive video perspective, viewing the interactive video player from the inside. This is referred to, above, as the standard runtime environment, the APIs available to the JS scripts and WASM modules in interactive videos. Possible features for the standard runtime environment include: navigation (seeking/branching), prefetching and opening video attachments, prefetching and opening remote resources, presenting menus, accessing users’ settings and configurations, storing and accessing local and remote data, and parsing JSON, RDF, XML, and other data formats. |
@Malvoz, thank you. Combining video annotation with video semantics, end-users could annotate any objects, events, or activities occurring in videos. Presently, one can select video content by making use of rectangular regions (e.g., media fragments). I am broaching uses of ancillary, or secondary, video tracks with one or more layers of uniquely-identifiable silhouettes. As envisioned, this content would be generated in part or entirely by computer vision algorithms. Hypertext presentations of annotations (or arrows connected to these) could follow or track objects, events, or activities occurring in videos. Relevant video content could be highlighted when the mouse hovers over or clicks on an annotation. There are interesting educational applications and scenarios to consider with respect to combinations of video annotation and video semantics. |
The Media WG is focused on standardising the specs listed in its charter, so probably isn't the best forum for exploratory discussion - the Media & Entertainment Interest Group (MEIG) is better suited. That said, if any of your requirements would need changes to any of the Media WG specs, those could be raised as issues in the relevant GitHub repos. |
@AdamSobieski, @chrisn, was there any further action with the MEIG and/or filing issues as Chris suggested? |
@LJWatson, hello. I remain interested in these interactive video topics, e.g.:
I remain interested in brainstorming and discussing how these scenarios could be supported in Web browsers in standard ways, e.g., simply by using the |
There hasn't really been much further discussion on this issue, at least in a W3C context, and MEIG would still welcome input in its GitHub repo. BBC R&D continues to do work in this area, see for example some of our pilots and collaborations. |
Thanks @AdamSobieski and @chrisn. We'll need implementor interest to be able to move this forward, but before then I think the idea itself will need to be specified a bit more. |
Thank you, @LJWatson. That makes sense. I recently opened a discussion and brainstorming thread on these topics in the Civic Technology Community Group mailing list. I am excited about possibilities pertaining to new user experiences with respect to the Web and smart television. With respect to the concept of a "channel", I am brainstorming about channels 2.0, multi-channels, or multi-stream channels. These would be more than just video streams. These constructs might utilize static or dynamic XML or JSON resources to describe their features while referring to one or more real-time video streams. Some ideas:
Per idea 1, channel-specific homepages and/or main menus, or idea 2, browsing sub-channels via grid-based navigation widgets, or guides, customers would be able to check their local weather forecasts at any point in time while remaining on a news "channel". Per idea 2, for news content, sub-channels could be those kinds of contents indicated on websites' main menus and submenus (e.g., weather, national, world, local, business, technology, entertainment, sports, science, health). There could also be a default "combination" sub-channel which would be understood as selecting and blending together segments from those other specialized sub-channels. In addition to news-related scenarios, another use-case scenario that comes to mind is music video. Different kinds of music could be each be streamed in a parallel sub-channel of a music "channel" rather than there having to be dozens or more traditional television channels, e.g., one per kind of music, from a content provider. After brainstorming and discussing with the group, towards specifying interactive video ideas a bit more, I am looking forward to providing an update here or in the MEIG repository. |
Introduction
Interactive videos are videos which support user interaction. Interactive videos can navigate to, or branch to, video content depending upon: users’ interactions with menus, users’ settings and configurations, models of users, other variables or data, random numbers, and program logic.
Educational uses of interactive video include, but are not limited to: instructional materials, how-to videos, and interactive visualizations.
Some chatbots or dialogue systems can be stored as interactive videos.
Interactive films are interactive videos or cinematic videogames where one or more viewers can interact to influence the courses of unfolding stories. Interactive films can be described as video forms of choose-your-own-adventure stories, or gamebooks. Contemporary examples of interactive films include: “Black Mirror: Bandersnatch”, “Puss in Book: Trapped in an Epic Tale”, and “Minecraft Story Mode”.
One day, some AI systems could be trained using large collections of interactive films.
A Standard Runtime Environment
As envisioned, interactive videos contain JavaScript scripts and/or WebAssembly (WASM) modules. These scripts and modules should be provided with a runtime environment different from the one provided by Web browsers for Web documents. The runtime environment provided for interactive video scripts and modules should include functionality for:
a. there should be a way, e.g., using URL fragments, to locate and prefetch files in videos, or video attachments
a. there should be a way, e.g., using URL fragments, to locate and retrieve files in videos, or video attachments
a. perhaps also functionality for presenting image maps or “hotspots” atop video
a. e.g., learner, player, and user models
There is a need for one standard runtime environment for use by multiple interactive video formats. With a standard runtime environment, interactive video player software (e.g., Web browsers) could more readily play multiple formats of interactive video.
Security Considerations and User Permissions
Only the standard runtime environment intended for interactive videos’ JavaScript scripts and WASM modules should be available to them. Containing documents could, perhaps, permit otherwise (see also:
<iframe>
).Interactive videos could contain hashes for and/or digital signatures of files in videos, or video attachments. WASM modules could be digitally signed.
Interactive video players (e.g., Web browsers) could make use of user permissions systems to protect users’ data privacy while providing users with features.
Menus and Accessibility
As envisioned, the presentation and display of menus is handled by interactive video players through the runtime environment API. Menus could be presented to users by invoking a
present
function.Spoken Language Interaction and Dialogue Systems
Interactive videos could utilize speech recognition grammars (e.g., SRGS) to enable users to select menu options via spoken natural language. In Web browsers, this functionality could be provided via the Web Speech API.
Perhaps interactive videos could also utilize remote speech recognition services.
Internationalization
Different versions of files in videos, or of video attachments, could exist in interactive videos for specific languages (e.g., menus, speech recognition grammars).
Documents Containing Interactive Videos
A document element for interactive video (e.g.,
<video>
) could have an event upon its interface to be raised whenever a menu is to be presented to a user. In this way, the presentation of a menu could be intercepted by a containing document. The containing document could then process the arguments passed to thepresent
function, display a menu to a user in a stylized manner, and provide a response back to the interactive video’s scripts or WASM modules, for example resulting in navigation to, or seeking to, a video clip, segment, or chapter.Interactivity via Animated Colored Silhouettes in Secondary Video Tracks
One could use secondary video tracks to provide arbitrarily-shaped interactive regions. These secondary video tracks could each contain multiple animated colored silhouettes, overlay regions, or “hotspots”. The colors of these silhouettes would correspond with arbitrarily-shaped interactive elements. The color black, however, would be reserved for indicating the absence of a silhouette.
Consider an interactive video of an automobile engine, under a hood. Envision a secondary video track where there is a colored silhouette for each part of the engine. While the primary video track would be visible to a user, using the secondary video track(s), the user could, with their mouse, hover over and click on the parts of the engine.
Animated colored silhouettes could also be rectangular and mirror the motion of text or images in videos. This would facilitate traditional hyperlinks in videos.
Semantics and Metadata
With semantics and metadata, one could describe the contents of videos, the objects and events which occur in them, and place this information in semantic tracks. One could utilize semantic graphs as well as “semantic deltas”, or “semantic diffs”, which indicate instantaneous changes to semantic graphs.
As envisioned, the animated colored silhouettes in secondary video tracks have unique identifiers and are URI-addressable. In this way, semantics and metadata could more readily describe the silhouetted regions which map to visual contents in videos.
With semantic tracks, users could, for example, utilize queries, via user interfaces, upon the contents of videos and observe the query results, e.g., objects or events, visually selected, outlined, or highlighted in the videos. Also possible is that query results could be presented to users with storyboards or other visualizations using relevant images from videos.
JavaScript and WebVTT
A syntax example is indicated for embedding JavaScript in WebVTT text tracks. The example provides two lambda functions for a cue, one to be called when the cue is entered and the other to be called when the cue is exited.
Polyfills and Prototyping
It appears that to implement a polyfill to prototype interactive video functionality, the HTML5 media API would need to surface the capability to access files in videos, or video attachments.
As envisioned, a polyfill would load JavaScript scripts and WASM modules from videos, implement and provide the standard runtime environment for those scripts and modules, and then run the videos, e.g., perhaps by calling a function like
main
or raising an event.Another approach could make use of HTML5 custom elements. A custom element:
could utilize an
<iframe>
to load a generated HTML5 document:with that generated HTML5 document resembling:
This could also be achieved utilizing the
srcdoc
attribute of the<iframe>
element.Attachments
Attachments in videos are additional files, such as "related cover art, font files, transcripts, reports, error recovery files, picture or text-based annotations, copies of specifications, or other ancillary files".
One can refer to a comparison of video container formats to see which video container formats presently support attachments.
Attachments in videos are a means for adding JavaScript scripts and WASM modules to videos. By placing utilized scripts and modules in interactive videos, the videos can be self-contained and portable.
Interfaces for inspecting and accessing attachments could resemble:
In theory, one could provide arguments including MIME type and natural language when opening a video attachment, per
content negotiation.
or
for example
getAttachment("main", "application/wasm")
.Related specifications include the File API and
BufferSource
. TheBufferSource
interface is utilized for loading WASM modules.Alternatively, mechanisms for inspecting and accessing video attachments could be encapsulated by Web browsers and the standard runtime environment for interactive video.
Conclusion
A standard runtime environment for interactive videos and standard formats for interactive videos are needed. With new standards, interactive videos would be readily authored, self-contained, portable, secure, accessible, interoperable, readily analyzed, and readily indexed and searched.
There is a need for one standard runtime environment for use by multiple interactive video formats. With a standard runtime environment, interactive video player software (e.g., Web browsers) could more readily play multiple formats of interactive video.
With standard interactive video formats (and perhaps open file formats for project files), extensible content authoring tools could be more readily developed. Authoring interactive stories and producing interactive films is difficult and, with new software tools, we could expect much more content.
With standard interactive video formats, interactive videos would be self-contained and portable.
With a standard runtime environment and standard interactive video formats, interactive videos would be more secure and would access user data and other resources only in accordance with user permissions.
With a standard runtime environment and standard interactive video formats, interactive videos would be accessible.
With standard interactive video formats, interactive videos would be interoperable with other technologies. For example, interactive videos could be played in Web documents and EPUB digital textbooks.
With standard interactive video formats, interactive videos could be better analyzed.
With standard interactive video formats, large collections of interactive videos could be better indexed and searched.
Thank you. I look forward to discussing these ideas with you.
References
[HTML5 Media] https://html.spec.whatwg.org/multipage/media.html
[PLS] https://www.w3.org/TR/pronunciation-lexicon/
[SMIL] https://www.w3.org/TR/SMIL/
[SRGS] https://www.w3.org/TR/speech-grammar/
[V8] https://v8.dev/
[V8 Isolates, Contexts, Worlds and Frames] https://chromium.googlesource.com/chromium/src/+/refs/heads/main/third_party/blink/renderer/bindings/core/v8/V8BindingDesign.md#A-relationship-between-isolates_contexts_worlds-and-frames
[WASM] https://webassembly.org/
[Web Animations] https://www.w3.org/TR/web-animations-1/
[Web Speech API] https://wicg.github.io/speech-api/
[WebVTT] https://www.w3.org/TR/webvtt1/
The text was updated successfully, but these errors were encountered: