Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IOP/spec compliant way of splicing a VOD content for ad insertion #166

Closed
kqyang opened this issue Jan 11, 2018 · 13 comments
Closed

IOP/spec compliant way of splicing a VOD content for ad insertion #166

kqyang opened this issue Jan 11, 2018 · 13 comments
Milestone

Comments

@kqyang
Copy link

kqyang commented Jan 11, 2018

Let's say I have a VOD content with one content period:

<MPD ... profiles="urn:mpeg:dash:profile:isoff-on-demand:2011"  type="static" mediaPresentationDuration="PT100S">
  <Period id="0">
    <AdaptationSet id="0" ...>
      <Representation id="0" ...>
        <BaseURL>video.mp4</BaseURL>
        <SegmentBase indexRange="823-890" timescale="1000">
          <Initialization range="0-822"/>
        </SegmentBase>
      </Representation>
    </AdaptationSet>
  </Period>
</MPD>

Now I want to insert an Ad at time 30 seconds - let's assume it aligns perfectly with the subsegment boundary. The existing content period will be split into two periods.

We have two options here:

Method 1: Reuse the same media file in the two periods, i.e.

<MPD ... profiles="urn:mpeg:dash:profile:isoff-on-demand:2011"  type="static" mediaPresentationDuration="PT110S">
  <Period id="0">
    <AdaptationSet id="0" ...>
      <Representation id="0" ...>
        <BaseURL>video.mp4</BaseURL> <!--contain all subsegments-->
        <SegmentBase indexRange="823-890" timescale="1000">
          <Initialization range="0-822"/>
        </SegmentBase>
      </Representation>
    </AdaptationSet>
  </Period>
  <Period id="1" start="PT30S">
   <!--Ad Period-->
    ... 
  </Period>
  <Period id="2"  start="PT40S">
    <AdaptationSet id="0" ...>
      <Representation id="0" ...>
        <BaseURL>video.mp4</BaseURL> <!--contain all subsegments-->
        <SegmentBase indexRange="823-890" timescale="1000" presentationTimeOffset="30000">
          <Initialization range="0-822"/>
        </SegmentBase>
      </Representation>
    </AdaptationSet>
  </Period>
</MPD>

Method 2: Split the media file into 2 at time PT30S, i.e.

<MPD ... profiles="urn:mpeg:dash:profile:isoff-on-demand:2011"  type="static" mediaPresentationDuration="PT110S">
  <Period id="0">
    <AdaptationSet id="0" ...>
      <Representation id="0" ...>
        <BaseURL>video-0.mp4</BaseURL> <!--contain only subsegments before PT30S-->
        <SegmentBase indexRange="823-850" timescale="1000">
          <Initialization range="0-822"/>
        </SegmentBase>
      </Representation>
    </AdaptationSet>
  </Period>
  <Period id="1" start="PT30S">
   <!--Ad Period-->
    ... 
  </Period>
  <Period id="2"  start="PT40S">
    <AdaptationSet id="0" ...>
      <Representation id="0" ...>
        <BaseURL>video-1.mp4</BaseURL> <!--contain only subsegments after PT30S-->
        <SegmentBase indexRange="823-860" timescale="1000" presentationTimeOffset="30000">
          <Initialization range="0-822"/>
        </SegmentBase>
      </Representation>
    </AdaptationSet>
  </Period>
</MPD>

Which method is more compliant to DASH spec [1] and DASH IF IOP [2]?

According to the spec[1] A.3.2 Period Start and End Times

  • the Period start time is provided as PeriodStart according to 5.3.2.1 for any Period in the MPD.
  • the Period end time referred as PeriodEnd is determined as follows: For any Period in the MPD
    except for the last one, the PeriodEnd is obtained as the value of the PeriodStart of the next Period.

So the Period 0 starts at PT0S and ends at PT30S and Period 2 starts at PT40S and ends at PT110S.

The duration of the Representation in Period 0 is actually 100 seconds, which overlaps with the following periods.

In [1] 7.2.1 Media Presentation timeline note section, it mentions

NOTE At the start of a new Period, the playout procedure of the media content components may need to be adjusted at the end of the preceding Period to match the PeriodStart time of the new Period as there may be small overlaps or gaps with a Representation at the end of the preceding Period. Overlaps (respectively gaps) may result from Media Segments with actual presentation duration of the media stream longer (respectively shorter) than indicated by the Period duration. Also in the beginning of a Period if the earliest presentation time TP of any access unit of a Representation is not equal to TO then the playout procedures need to be adjusted accordingly.

So it seems a gap/overlap is allowed, though in this case, the overlap is big.

The Representation in Period 2 actually starts at presentation time 0, which is smaller than @presentationTimeOffset.

In [1] 7.2.1 Media Presentation timeline

Media Segments should not contain any presentation time TP that is smaller than the value of the
@presentationTimeOffset, TO. However, if this is the case, then presentation of the Media Segment is expected to only take place for presentation times greater than or equal to TO.

Although it is discouraged to have any presentation time smaller than @presentationTimeOffset, but it is allowed per the above description.

So it seems that method 1 is compliant to DASH spec [1]. Method 2 is certainly compliant to DASH spec [1] too, but it requires cutting the media file into two at the Ad cue point and we'll need to re-package the media if we want to adjust Ad cue point or add another Ad, which does not sound right.

I cannot find a clear guidance in DASH IF IOP [2] - all the multi-period examples seems to be using live profile. Is there any recommendation or preference here?

Thanks!

[1] ISO/IEC 23009-1:2014 Information technology — Dynamic adaptive streaming over HTTP (DASH) — Part 1: Media presentation description and segment formats
[2] Guidelines for Implementation: DASH-IF Interoperability Points http://dashif.org/wp-content/uploads/2017/09/DASH-IF-IOP-v4.1-clean.pdf

@sandersaares
Copy link
Member

sandersaares commented Jan 15, 2018

It appears to me that the intent in method 1 above is to use presentationTimeOffset as a type of "seek" operation. This usage is invalid.

presentationTimeOffset indicates to the client the media timestamp (pts) that is equivalent to PeriodStart. It is designed for mapping between the two timelines and not for any kind of "internal seeking". In other words, setting PTO=30s means that the first segment in the mp4 file starts at media timestamp 30s.

In method 1 example, where PTO is modified but the content is not, playback will still start at the first segment, which will almost certainly not have any samples even close to the expected value of 30 seconds. Playback will fail at that point.

I cannot find the exact statement but I recall DASH specifying that segments are expected to start with the expected media timestamp with a max deviation of 50% of segment length. With this example, that would require 60 second segments to be compliant, which seems unrealistic (and unlikely to be well supported by players).

@kqyang
Copy link
Author

kqyang commented Jan 16, 2018

@sandersaares Thanks for your comment. Not sure if you are talking about this statement:

In [1] 7.2.1 Media Presentation timeline

Media Segments should not contain any presentation time TP that is smaller than the value of the
@presentationtimeoffset, TO. However, if this is the case, then presentation of the Media Segment is expected to only take place for presentation times greater than or equal to TO.

So the spec does not recommend having samples before PTO but it does not prohibit it either. In fact, according to the above statement, playback should start from the samples with presentation times greater than or equal to PTO.

presentationTimeOffset indicates to the client the media timestamp (pts) that is equivalent to PeriodStart. It is designed for mapping between the two timelines and not for any kind of "internal seeking". In other words, setting PTO=30s means that the first segment in the mp4 file starts at media timestamp 30s.

I don't see it being reflected anywhere in the spec.

@sandersaares
Copy link
Member

sandersaares commented Jan 17, 2018

I will try to clarify more what I mean.

First, let me offer a useful heuristic. The MPD is a description of media content (DASH 4.3). Therefore, a question to ask when considering MPD spec-conformance is "am I describing the files on disk". The method 1 example describes the same file on disk in two different ways (once with PTO=0s, once with PTO=30s). As the file is unlikely to change between the two periods, this is already a red flag and a good hint that something is amiss.

While you can sometimes form two periods by describing subsets of a shared set of files on disk, this is not the case here (with just one mp4 file).

My previous comment was based on the my overall aggregate knowledge of DASH timing and I did not have a specific statement in mind. Let's try to find some detail, though.

DASH timing is best documented by IOP. In 3.2.7.1 we find this brief comment about PTO:

Note that typically this time is either earliest presentation time of the first segment or a value slightly larger in order to ensure synchronization of different media components. If larger, this Representation is presented with short delay with respect to the Period start.

This is a good hint that having segments where all samples are before PTO (as would be the case in the above period 3 for all of the first 30 seconds) is not a typical use of DASH. 30 seconds is unlikely to qualify as a short delay, either, nor is it done here to ensure synchronization.

Reading on, the same chapter later says:

In addition, each segment has an internal sample-accurate presentation time. Therefore, each segment has a media internal earliest presentation time EPT and sample-accurate presentation duration DUR.

For each media segment in each Representation the MPD start time of the segment should approximately be EPT - PTO. Specifically, the MPD start time shall be in the range of EPT - PTO -
0.5*DUR and EPT - PTO + 0.5*DUR according to the requirement stated above.

Let's unpack this for period 3 segment 1. I will assume 10 second segment duration as it is not explicitly listed in the example.

  • MPD start time (period-relative) is 0s.
  • EPT is approximately 0s.
  • DUR is approximately 10s.
  • PTO is 30s

So does the MPD start time of the segment fall within the allowed range?

  • Range start is EPT - PTO - 0.5*DUR which is 0s - 30s - 0.5*10s = -35s.
  • Range end is EPT - PTO + 0.5*DUR which is 0s - 30s + 0.5*10s = -25s.

No! The MPD start time of the segment 0s does not fall in the range [-35, -25]. This is the 50% statement I had in mind.

Looking at it the other way around, if your sample and segment timing accurately matches, PTO may be at most 50% of a segment duration. It is meant for synchronizing, not for seeking.

@tinskip
Copy link

tinskip commented Jan 17, 2018

I think this thread is a bit off course. The fundamental question is not whether whether method 1 is IOP compliant, but rather how can ad insertion be done using on-demand profile, and without splitting the title stream into multiple containers. There ought to be an interoperable way to do this, but it appears there is not.

Splitting the file is sub-optimal because it is not very flexible, requiring knowledge of (fixed) ad insertion points before packaging the VOD content.

@ojw28
Copy link

ojw28 commented Mar 6, 2018

Suppose you take a single segment content stream, ignore the sidx box and instead list each subsegment explicitly as a segment in the MPD in a SegmentList, with @mediaRange used to define the range of each segment in the content file. At this point it's possible to insert ad periods in a spec compliant way, without modifying the content stream, by having each period only list the segments that belong to it. There's still a constraint that ads must be inserted at segment boundaries, but I think you'd want this anyway because it ensures you can start playing each content period efficiently and independently to previous content.

If large SegmentList elements are a concern, or if there's a desire to enable the use case whilst still using on-demand profile, then it's worth thinking about how to achieve something equivalent whilst still defining each content period as a single segment, which I think is what this question is about. It feels like this could be achieved pretty easily by allowing each segment to indicate a range of subsegments that should be considered as belonging to it, rather than having all subsegments belong to it as is currently the case. As an example, suppose subsegments are 5 seconds long, zero indexed, and that you want to insert an ad at t=30. You could imagine specifying the first content period using syntax like:

<SegmentBase
    indexRange="823-860"        <-- Tells you where the full sidx box is, as normal
    indexSubsegments="0-5"     <-- Specifies the subsegments that belong to this segment
    timescale="1000">
  <Initialization range="0-822"/>
</SegmentBase>

And for the second content period:

<SegmentBase
    indexRange="823-860"        <-- Tells you where the full sidx box is, as normal
    indexSubsegments="6-12"   <-- Specifies the subsegments that make up this segment
    timescale="1000"
    presentationTimeOffset="30000">    <-- Presentation time offset for the 6th subsegment
  <Initialization range="0-822"/>
</SegmentBase>

And so on. What do you think?

@ojw28
Copy link

ojw28 commented Mar 6, 2018

Actually, you can possibly already do what you want using on-demand profile, if you're willing to generate a sidx box for each content period. You can host each sidx box at a separate url, and use @Index to define where to find it, like:

<SegmentBase
    index="sidx-for-subsegments-0-to-5.sidx"
    timescale="1000">
  <Initialization range="0-822"/>
</SegmentBase>

@kqyang
Copy link
Author

kqyang commented Mar 6, 2018

@ojw28 Good thoughts! Unfortunately, all the three proposals described here violates the spec.

  1. Use SegmentList with @mediarange to define the range of each segment in the content file.

ISO/IEC 23009-1:2014
8.3 ISO Base media file format On Demand profile
8.3.2 Media Presentation Description constraints
...
neither the Period.SegmentList element nor the Period.SegmentTemplate element shall be present

  1. Use indexSubsegments to indicate the subsegment ranges

I don't see the attribute indexSubsegments mentioned in any specification. I assume it is a new attribute you want to propose?

I actually do not see a reason of having this attribute as the player is already able to derive that based on Period@duration and @ presentationTimeOffset. It is redundant and it is not a good idea to have unnecessary redundancy.

  1. Use @ index to carry index segment in a separate file

ISO/IEC 23009-1:2014
7.3 Media Presentation based on the ISO base media file format
7.3.1 General
...
Index Segments shall not be present.

@ojw28
Copy link

ojw28 commented Mar 6, 2018

  • The first suggestion was intended to be a spec compliant approach that doesn't adhere to on-demand profile, as an option, because it seems to me there's no way to support this case within on-demand profile currently.
  • The second suggestion was indeed a proposal to add a new indexSubsegments attribute, so as to create a spec compliant approach that does adhere to on-demand profile. I disagree that it's redundant because I don't agree the way you're proposing to use duration and presentationTimeOffset is spec compliant. Re-defining them to behave as you propose is IMO pretty ugly. That approach doesn't do anything to enforce that the adjusted period start/end points fall on subsegment boundaries, for example, which I think would be a good constraint to have. The indexSubsegments proposal does enforce this, and mirrors the type of functionality you can achieve using SegmentList in an MPD, which seems like a nice property to have.
  • Ack that the third suggestion doesn't adhere to 23009-1 7.3.

@sandersaares
Copy link
Member

This topic was discussed in a call a few weeks ago. I felt pretty sure this got reflected in GitHub somewhere already but looking in this thread it looks like not. Anyway, some thoughts popped up in my head related to this, so I continue the discussion here.

The solution that seemed most workable to participants in the call was to relax the requirement for @presentationTimeOffset to point within the first segment. For on-demand DASH, it can point to whereever, legalizing method 1 in the OP.

As this requires knowledge of media timestamps, which may not be available for live profile without SegmentTimeline, this relaxation should be only applicable to the on-demand profile in my view. For live profile, @presentationTimeOffset should retain its existing use as a timeline alignment specifier, not as a seek indicator.

It also came up that @presentationDuration may be relevant here to suppress some content at the cut-point but I propose outlawing that attribute as its net value seems negative to me due to potential for easy (even accidental) misuse (see #178).

@tinskip
Copy link

tinskip commented May 16, 2018

Restricting @presentationTimeOffset to on-demand profiles for "large" adjustments seems reasonable, as the clock should keep running while ads are playing, returning to close-to live after the ad period.

However, for alignment / audio-video sync, an edit list in the init segment can be used, as for VOD. So perhaps PTO ought to just be disallowed for live.

@ojw28
Copy link

ojw28 commented May 29, 2018

Isn't there a distinction between (a) the PTO adjustment itself, and (b) whether it results in any clipping?

As I understand it, the original (and primary) purpose of PTO is just to adjust the media presentation timestamps so that they're properly aligned with the timeline. Such adjustments may be arbitrarily large without resulting in any clipping, and may be required for live streams if the media is generated in a way that means the contained timestamps are not already aligned as required. If this is the true then I don't think you can disallow use of PTO for live.

I also don't think that restricting large PTO adjustments to on-demand is quite the right thing. The important thing is not the size of the PTO adjustment, but rather the amount of clipping that occurs as a result. It's large amounts of clipping that should be restricted to occur for on-demand profiles only, rather than large PTO adjustments, I think?

@sandersaares
Copy link
Member

I agree with this interpretation.

@sandersaares
Copy link
Member

This issue provided much useful feedback into the formulation of the interoperable timing model. Given the lack of further activity, I close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants