New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explicit Padding of Subtitles and Timed Text tracks to comply with CMAF track/CMAF presentation model #9
Comments
wouldn't the flag |
so overall, no I disagree because we think this is not supported well in players, while inserting ttml or VTTEmptyCueBox is always supported. For defragmenting, removing WVTT empty cue should already be supported by any defragmenter, while removing ttml may not be supported but should be straightforward to do if you want to do that, again keeping it in a defragmented file causes no harm either. |
The minimalist conformant TTML document is:
ATSC will probably recommend this for sparse tracks. |
@mikedo thanks, I would recommend similar in DVB-DASH (this is also in EBU Tech 3381), and I would hope that eventually such recommendation can be included in CMAF eventually, inlcuding how/when to do the paddings |
I cannot find in CMAF nor in part 30 anything against usage of duration-is-empty flag
Same thing, I cannot find anything in CMAF regarding this The general problem I have with this approach is that rather than using a tool that is well documented and has no impact on the source content (hence is transparent for packagers), we now insert empty samples which are all format specific, hence make the packager codec specific. We're doing it here for WebVTT and TTML, but in a few years we'll end up with thousands of sparse metadata formats (haptics, annotations, etc ...) that will follow this same approach, each new format requiring a patch of the packager (and likely defragmenter). And that worries me. |
we have two issues: a) with duration-is-empty you don't have media, i.e. gaps which is not allowed in CMAF and many players (cannot handle this) Both the approach for TTML empty and WebVTT are explicit in MPEG-4 part 30 either VTTEmptyCue or TTML without body, I think the only gap is to describe in CMAF some recommendations for padding to fullfill CMAF track/switchingset/ presentation requirements. |
I cannot find anything in CMAF stating this is not allowed, maybe I'm missing something.
I disagree, the tfdt can still be present in the empty fragment to indicate a "current decode time" although there is nothing to decode. You can insert as many of these empty segments with updated tfdt as required for your segment duration constraints, just like you insert fake empty samples currently. |
There are several aspects here:
One can interpret that as not missing I would suggest we create example content with duration-is-empty and have people test their implementation and report. We can then decide to restrict it (e.g. to a new structural brand such as |
If DVB and ATSC and MPEG-4 part-30 and EBU 3381 recommend using VTTEmptyCue and/or TTML without body as empty sample, it would be safer to recommend that in CMAF aswell instead of introducing another approach. This would only increase incompatibility which should not be the goal of CMAF. duration-is-empty is not referred in part-30 and not in CMAF, having no samples with decode time (regardless of tfdt) implies discontinuitiy or gap. CMAF writes down what is allowed (the rest is not allowed) and this is not part of it. For content creation it may be the CMAF packagers job to do the padding, or by the live encoder producing DASH/CMAF, not the subtitle generator, so i disagree with your statement on design @cconcolato . |
MPEG-4 Part 30 says:
Note the use of "may" not "should".
Of course, CMAF is about improving interoperability. If indeed other SDOs are frozen on a solution, we should let them use it, but does not mean we cannot evolve CMAF into a more efficient solution.
That's not correct. CMAF puts restrictions on ISOBMFF. When it does not put restriction on something, it does not mention it.
Then the packager is not codec-agnostic, right? It has to be TTML-aware or VTT-aware or at least have a mapping between codec and an 'empty' sample definition. I was just hinting that this design is not scalable. |
my point is part-30 does not mention duration-is-empty and CMAF neither, the may is used because in a non-fragmented format which is also supported in MPEG-4 part 30 you do not need this. So yes using VTTEmpty Cue or empty ttml is really optional, but it is currently the only method defined and used to implement the CMAF track model for subtitles. Sure CMAF could define or evolve to something better, but typically technological advance should be in the technology standards first e.g. MPEG-4 part 30 and only after that be considered in CMAF. Restricting ISOBMFF in my opinion implies writing what is allowed, it is a matter of wording, so i still believe i am correct, as Yes packagers are always codec agnostic, that is a fact, just as there are ISOBMFF bindings for AVC/HEVC/VVC/AV1/MPEG-H audio you name it. Each have their own binding to the file format. So I dont really understand your point about this not being scalable. |
Thanks for bringing me in here @cconcolato . Subtitle encoder requirementsIn terms of the design question I consider a subtitle encoder to be responsible for generating data that effectively encodes a continuous stream of subtitle presentation, in the same way that an audio encoder generates encoded data that, when decoded, produces a continuous stream of audio samples. Clearly the encoded data are time-division-multiplexed, according to the packaging requirements, as set by whoever is configuring the encoding and packaging chain. So from that perspective, it is reasonable to expect the subtitle encoder to generate encoded subtitle samples which, when decoded, mean "for the duration of this subtitle sample, present nothing". If I saw a subtitle encoder simply stop producing output for a while, I would think it is broken, not that there are no subtitles to present. Implementation experienceWhen we implemented the EBU-TT Live Interoperability Toolkit (LIT) we designed the Resequencer component, in its "output a new subtitle document every n seconds" mode, so that it would output documents containing no content for periods when it had received no subtitles. Feeding those documents to the EBU-TT-D encoder then generates empty documents. I mention this because it was my assumption that the subtitle encoder would generate empty documents, at that time. PackagingFrom a packaging perspective, if the input temporarily disappears, it may not be straightforward to update the manifest to remove the subtitle components and then add them back in again, and the impact on players may not be desirable either. So it probably would make sense for packager implementers and/or operators to make a call on whether they want to supply default "empty" subtitle documents or let the client device get a 404 when fetching the non-existent subtitles. And that in turn might depend on the player's behaviour on getting those 404s. Empty subtitle segment TTML formatIn terms of the precise format of an empty TTML / IMSC / EBU-TT-D document, this is something where the different profiles of TTML differ slightly in what is permitted, and the encoders I am aware of also differ. EBU-TT-D is the only profile that requires that the <tt xml:lang="" xmlns="http://www.w3.org/ns/ttml"/> Note that unlike @mikedo 's suggestion in #9 (comment) this excludes the XML header, which is not formally required, because it is optional in XML 1.0, which is the basis used for encoding all current versions of TTML, EBU-TT-D and IMSC. (thank you to @TairT for pointing this out to me some time ago!) Implementation experienceOne encoder supplier whose EBU-TT-D output I have had the opportunity to review in detail currently creates empty subtitle documents that are not actually conformant EBU-TT-D: they contain an empty When faced with the fact that this is not conformant, naturally, the supplier wanted to know the real world impact on players, and naturally I was unable to provide an all-encompassing answer; it may well be that many players would simply continue without any user impact at all. Always omit the
|
@RufaelDev and I had an offline discussion. Our summary is:
The suggestion would be to put these questions into a Defect Report/Tuc and welcome contributions. Maybe liaise with other SDOs to get feedback. |
Maybe a survey of current practices would best be done by an industry forum rather than MPEG? MPEG could proactively document how to best do a sparse timed text track for encoder and player vendors to strive to sooner than later? |
@mikedo i agree, my suggestion was to include industry fora CTA, DASH-IF, and SDO DVB, ATSC and maybe EBU I think indeed mpeg should be pro-active to at least gain undertanding how CMAF users would solve this today, and if possible document a best practice. one other point, it is not only sparse subtitle tracks, it could also be for audio/video tracks padding that we could ask feedback |
Yes, seems like the solution should be general to any kind of track. Although unusual, it could also be used for black video and muted audio padding, even if the coded data is nominally present. |
Apologies if I've missed this somewhere and it already exists, but it might be helpful to be able to publish/signal a 'null' segment in the same way as an init segment is signalled now.
etc. Then whatever encoded version of a null segment is appropriate for the media type could be created once and referenced whenever it is needed. For TTML it would be that empty document, for other types it would be some other kind of resource. Just thinking out loud. Forgive me if this is already covered. |
Can't one say in the MPD "nothing happens for this duration, please move along"? Downloading a resource which then explicitly says "fooled you! there's nothing here!" seems silly. |
Perhaps you can, if the meaning of "nothing happens" is completely clear for the media type concerned. Unfortunately it is not. A scheme that defines "nothing" explicitly so it can be referenced later would help tidy that up. In the case of subtitles, say, one presentation style I have seen shows a dark rectangular area where the text would be all the time when the subtitles are enabled, even if there is no text. That area is presumably defined in the subtitle documents. If no text is present for an entire segment, how would you signal to continue showing the dark area? (disclaimer: BBC doesn't typically use this style) |
agreed, it has to be defined or obvious for each media type. Sound, well, it's silence. Video, nothing paints, (not even "we regret the loss of picture" (as the BBC used to say when the studio failed). For captions it seems fairly obvious? |
Already I can think of at least 3 schemes that would mean "nothing paints" and I don't know which one is right!
Does it? I think otherwise, as per #9 (comment) |
I agree video is the hardest case. In the case of captions, I think it's "as if the captions were not there or not enabled", so no, you don't get the black rectangle. For video, it no longer obscures what's below. If there is nothing below (it's not an overlay but the bottom document in the rendering stack), we're staring into the void, it's an application-specific fill (like "we regret the loss of picture"). |
just a few points to consider for the live/low latency streaming cases:
These are some things to take into account for the case of live (low latency streaming). Note that CMAF tracks padding can be done already and we are ok in practice, but there is a risk that people pad differently, so the question was if some explicit recommendation is needed. Also my thinking was that it would help adoption of the spec if this was a bit more clear as tracks of unequal length give problems. As for timed text/subtitle the problem occurs most frequently, that was the main case. If not done in CMAF itself this issue might be better discussed and processed in an industry forum. My intention with this issue was not to be introducing new client/player behaviour, but only to recommend a best practice with CMAF as is. |
@RufaelDev are you sure it's not the intended behaviour? Just wondering if this is documented anywhere: it seems weird to have predefined levels of importance for different types of representation, rather than making it content or application specific. |
I don't understand.
So you get a duration. I am not sure honestly that this flag helps much; the two useful cases are that the MPD tells you not to bother to fetch (saves a fetch); or that the file fetched tells you exactly what to do (e.g. paint a caption region with no text, as Nigels suggests). Once you've fetched something, you may as well be clear. I understand that if you're using algorithmic segment-URL generation, you always need a segment, and so the MPD telling you that there is nothing there is not possible, as you're not fetching new MPDs. |
An example is end of sect. 6.6.8 of CMAF, by skipping I meant A/V/T (not only A/V note this is also in the CMAF spec text), sorry for the misunderstanding . A player may skip all A/V/T for a part that has a discontinuitity (e.g. in DASH a new period may be used). my point is that for sparse subtitles all such behavior intended for gaps/discontinuities seems rather undesirable. |
Yes indeed and for this functionality one needs the fragment duration, not the (default) sample duration or zero (given there are no samples one would not know how to calculate the fragment duration). In low latency the mpd is not always updated (e.g. numbering or time extension in DASH) so it cannot always tell what (not) to fetch, and yes in the ideal world the segment would tell me exactly what to do :-) , that is why i think why a segment with VTTEmptyCue in samples and ttml without body in samples may be more helpful than a segment with a duration-is-empty flag as I know what to do with that information in the first case, that is render no subtitles for the duration of the fragment, while for the second i am still not sure. Last regarding comment #9, there is not a well established way to say in the MPD "nothing happens for this duration" for a representation or adaptationset. |
m55342 http://wg11.sc29.org/doc_end_user/documents/132_OnLine/wg11/m55342-v3-m55342_v2.zip studies and highlights some of the text around gaps/continuity and handling that.
|
This issue is related to MPEG internal issue http://mpegx.int-evry.fr/software/MPEG/Systems/ApplicationFormat/CMAF/-/issues/30 |
The group discussed this issue as part of the discussion on contribution m55778 and decided to close this issue. |
CMAF presentations composed of audio, video and timed text should have tracks defining them, and tracks are composed of CMAF fragments or segments.
In some practical cases, subtitles or timed text are not available (e.g. at the end of a presentations),
to comply with the CMAF presentation model, it would be nice if CMAF could include a recommendation for using empty subtitle timed text samples that have a timespan but do not contain text. This is supported in MPEG-4 part 30 but it is not
explicit or required in CMAF. My recommendation is to define a default method for padding fragments when no timed text or subt. is available. This way it will be more explicit for media presentations with timed text or subtitles to comply to the CMAF presentation model.
My recommendation would be to recommend fragments with a sample carrying VTTEmptyCueBox or a sample containing valid TTML document.
It would be great if section 11 could make a suggestion of how CMAF tracks with partially no subtitle can be supported by padding and fragments, perhaps with an example.
Again, I think the padding can be done in different ways, but making this an explicit recommendation would be helpful. Too many times we see a subtitle track that is much shorter than the audio video or has a gap.
The text was updated successfully, but these errors were encountered: