Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write down how extra data is saved for usage with AVC & HEVC #373

Closed
steffenmanden opened this issue Apr 26, 2020 · 61 comments
Closed

Write down how extra data is saved for usage with AVC & HEVC #373

steffenmanden opened this issue Apr 26, 2020 · 61 comments
Labels
codec mapping format extension elements that may not merge in the main DocType

Comments

@steffenmanden
Copy link

Hi,

The creator of MakeMKV have made a custom solution for saving Dolby Vision data within a mkv file.

https://makemkv.com/forum/viewtopic.php?f=12&t=21937

Is it possible to look into how we can make this a part of the standard, so we going forward can support Dolby Vision in matroska?

The creator says:

"Dolby Vision is not a separate codec, but rather a (HEVC) stream extension - DV data (EL+RPU) are encoded in separate NAL units, according to Dolby spec.
The presence of DV data can be checked by several different methods - most obvious are:
(1) DV descriptor in codecprivate data (there is an "standard" of storing additional elements in codec private for AVC and HEVC streams, that was used for ages since 3D MVC video) or
(2)presence of DV RPU unit in first frame."

Hope this can be looked into and approved so the solution can become a part of the standard instead of a custom solution

@mbunkus
Copy link
Contributor

mbunkus commented Apr 26, 2020

I'm all for including this in the standard exactly the way MakeMKV handles it. Not only for Dolby Vision, but also for 3D MVC stuff. It's a simple & practical solution.

The CodecPrivate extension you're talking about looks like this:

  1. CodecPrivate starts with the normal avcC/hvcC codec configuration boxes as found in the corresponding standards wrt. storing AVC/HEVC in MP4 files
  2. Each extension is modeled after MP4 atoms:
  3. A four-byte, Big Endian length field (length excluding these four bytes, meaning only the length of the following bytes)
  4. A four-byte ID field (e.g. mvcC for 3D MVC video data in 3D Blu-rays with AVC; the IDs are the same as in the xyz-in-MP4 the standards)
  5. The content of that extension

That way of storing the extensions was the consensus of a mailing list thread on the Matroska-devel list from years ago.

mkvmerge doesn't support MVC yet (yeah I know), though I've looked into it over the last couple of weeks. Support will probably come within the next couple of weeks (though not yet in the upcoming release).

@JeromeMartinez
Copy link
Contributor

The creator of MakeMKV have made a custom solution [...] can become a part of the standard

Trying to force a standard by implementing before discussing is not really the way it should be done.

That said, AFAIK Dolby Vision data in (2) is "just" private NAL so Matroska would not really see them in the steam itself, not sure that there is something to put in the standard here.
And if (1) uses the "generic" (and consensus before IETF standardization began) method for adding new "boxes", this is also transparent.

I don't have Blu-ray with Dolby Vision for testing and it seems that MakeMKV does not support standalone TS or MP4 files (I have only that with Dolby Vision), could you provide a sample file? Just 1 frame would be enough.

@steffenmanden
Copy link
Author

The creator of MakeMKV have made a custom solution [...] can become a part of the standard

Trying to force a standard by implementing before discussing is not really the way it should be done.

That said, AFAIK Dolby Vision data in (2) is "just" private NAL so Matroska would not really see them in the steam itself, not sure that there is something to put in the standard here.
And if (1) uses the "generic" (and consensus before IETF standardization began) method for adding new "boxes", this is also transparent.

I don't have Blu-ray with Dolby Vision for testing and it seems that MakeMKV does not support standalone TS or MP4 files (I have only that with Dolby Vision), could you provide a sample file? Just 1 frame would be enough.

I doubt there was any intention to go around the standard - i simply think he was unaware of the IETF efforts to continue the specifications. So it's more a desire to help his community to be able to maintain data for future use.

Will see if i can get someone to throw a sample as soon as possible

@steffenmanden
Copy link
Author

Here is an example

https://drive.google.com/file/d/1HeooI-tOBnTGqFceMILLKf1X3xz0S_ue/view?usp=sharing

@JeromeMartinez
Copy link
Contributor

@steffenmanden I don't succeed to convert your ISO with "MakeMKV", 1.15.1, it complains that the m2ts file has length of 60 seconds which is less than minimum title length of 120 seconds, therefore I can not get the corresponding MKV. Would you mind to share the corresponding MKV?

@Klaus1189
Copy link

MakeMKV options -> second tab "Video" -> write "1" seconds -> then MakeMKV doesn't complain.

@steffenmanden
Copy link
Author

MakeMKV options -> second tab "Video" -> write "1" seconds -> then MakeMKV doesn't complain.

What he said :-) default settings ignores the super short sequences, so you have to overwrite it

@mcr
Copy link
Contributor

mcr commented Apr 26, 2020

Please write an extension document, using the mechanisms in matroska and ebml.

@JeromeMartinez
Copy link
Contributor

I confirm that it is as described by @mbunkus (for Dolby Vision: dvcC and hvcE atoms) so the need is more generic than Dolby Vision, it is to describe the de facto manner to have extensions in AVC and HEVC CodecPrivate parts.
The criticism I would have against such implementation (AVC or HEVC configuration must be parsed before finding the start of the extensions) are not strong enough (especially because arrays don't need to be parsed, so little effort) for willing to have another implementation, so fine for me.

I suggest to change the issue title to a more generic title.

@mcr mcr added format extension elements that may not merge in the main DocType codec mapping labels Apr 28, 2020
@MikeChenMM
Copy link

Hello everyone. Firstly, I was truly unaware of this standardization effort - thanks for keeping me in loop. Also, there was nothing close to "I did it my way, now make it a standard" attitude on my part - the forum post linked at top specifically says that officially MakeMKV does not support DV in MKV yet - the announcement was omitted from release notes on purpose. I rather wanted to create an interim version for "those who know", just to get the wheels moving, and then start a discussion in matroska-devel. Good thing that we have this place now.

Here is my input:

Some history - originally, the need to store AVC stream extension in MKV arose years ago when MVC (3D) streams appeared. At that time there was an ad-hoc process to standardize extensions for CodecPrivate that was later discussed on matroska-devel and became the de-facto standard. Moritz described it above - a generic way to keep multiple MP4 stream description atoms in MKV CodecPrivate data. At that time it was defined only for AVC and only for a single additional atom (mvcC).

With DV is looks a natural way to extend this method for HEVC as well, and make it somewhat formal. Given this is a major change, I took the liberty to use this opportunity to change the storage scheme a bit - the full description is below, mostly a copy-paste from the Moritz post.

CodecPrivate starts with the normal avcC/hvcC codec configuration boxes as found in the corresponding standards wrt. storing AVC/HEVC in MP4 files. As noted earlier, while the size of the first configuration box is not stored explixitly, it is rather trivial to compute - both avcC and hvcC are a fixed-size data structures followed by a single variable-sized part, where the size of the variable part is stored at fixed offset.

The size of entire CodecPrivate data is stored as part of EBML encoding. If the first (avcC/hvcC) structure ends before the end of CodecPrivate, then the rest of the CodecPrivate buffer contains zero or more MP4 atoms, in exactly the same way they are stored in MP4:

A four-byte, Big Endian length field INCLUDING the size field itself and the following fourcc tag and data (this is deviation I took - please read the rationale below).

A four-byte ID field (e.g. mvcC for 3D MVC video data in 3D Blu-rays with AVC; the IDs are the same as in the xyz-in-MP4 the standards)

The content of that extension

One more formal rule - The parser must correctly handle invalid data at the end of CodecPrivate. Specifically:

  • If at any time, the size of the remaining buffer is less than 8 bytes (length + fourcc) the parser must stop and the rest of CodecPrivate data must be discarded.
  • If the value of length field is invalid, specifically less than 8 or bigger than the size of rest of CodecPrivate data buffer , the parser must stop.

Otherwise the parser should assume that the data is a valid mp4 atom.

Now, the rationale behind my proposal to change the meaning of the size field - to include the header in size as opposed to "old" definition where length field specifies the size of the following fourcc tag and data, excluding the field itself.

The current storage method is sort of inconsistent - half of the header size is included in length, and length of the other half is stored implicitly. More importantly, it looks very similar similar but differs from a corresponding mp4 implementation.
Doing it a new way means that we store the MP4 atom in exactly the same way, as it is stored in mp4 - length includes everything. Given the code to create configuration boxes is often shared between containers or even implemented generically at codec level, this would remove a great source of confusion and coding errors.

Why it is safe to do now - the only usage of this extension so far is mvcC box (MVC video from 3D blu-ray) - this is just an avcC box for stereoscopic extension. I've modified that interim MakeMKV release to write length field in a new way and include few zeroes at the end of mvcC atom. This way either interpretation of the length field will be correct - the old code would expect 4 more bytes at the end of mvcC atom, would read these inserted zeroes and ignore them, as mvcC ends with a variable-sized structure whose length is explicitly specified. The new code would just read the atom properly.

I think that keeping the structures layout identical to corresponding mp4 standard is a big deal, and this change is justified.

Thanks!

@mbunkus
Copy link
Contributor

mbunkus commented Apr 30, 2020

Thanks a lot for your input, Mike.

I'm undecided on the length field semantics change. I do see your point about compatibility with MP4 atoms. However, there are two arguments against it:

  1. Compatibility with existing files. If you have a file created with the old understanding, and it is read by a parser with the new understanding, the parser would stop parsing four bytes short of the end of the buffer — because that's what the size field seems to indicate is the end of the extension atom. Parsers would have to do some kind of fancy guesswork, either from the total length of CodecPrivate or from the muxing application header field — neither approach is fine with me.
  2. Compatibility with MP4 would only be partial anyway, as I guess you wouldn't want to support the two special sizes 1 (= directly after the FourCC field there's now a 64-bit size field containing the actual size) and 0 (= the atom extends until the end of the parent element = until the end of CodecPrivate).

So to be honest I'd rather keep the currently used semantics of the size field not including its own length of four bytes.

However, my most important wish is that the same method is used for both H.264 and H.265 content; maybe even as the default methods for future cases where CodecPrivate needs to be extended for other codecs.

@JeromeMartinez
Copy link
Contributor

this is deviation I took

I unfortunately stop to implement the parsing on my side during my tests and didn't catch this deviation.
There is an issue here: this is no more a generic manner to extend the CodecPrivate for H.264/H.265.

Given the code to create configuration boxes is often shared between containers or even implemented generically at codec level, this would remove a great source of confusion and coding errors.

But now we have an even more complex spec, which needs to describe a readout of the ID field, check which "format" (old or new), based on the ID field and by guessing with 0 padding (and if the mvcC box has 0x00000000 at the end, how to differentiate?), and apply the right "rule" (old or new). So in practice it may be worse in both the spec and in the implementation, because "old" behavior is already in the wild for a long time so we can not just connect a MP4 parser.

IMO advantages of the "MP4 style" method are not enough for trying to handle both "old" and "new" methods, and keeping the "old" method for a generic manner for both H.264 and H.264 is more important.

I took the liberty to use this opportunity to change the storage scheme a bit

Issue is that the release is in the wild so files will be with it as users are starting to use it, before someone from Matroska is aware and can say a word. It is good that the issue was raised quickly.

(Moritz answer appeared while I was writing, looks like we have a similar comment :) ).

@MikeChenMM
Copy link

MikeChenMM commented Apr 30, 2020

But now we have an even more complex spec, which needs to describe a readout of the ID field, check which "format" (old or new), based on the ID field and by guessing with 0 padding (and if the mvcC box has 0x00000000 at the end, how to differentiate?), and apply the right "rule" (old or new).

We should not do any of that, this is ugly. If we go the new way, the parser should assume that length fileld includes total size, period. The only thing that gets broken - old MVC files with new parsers. I'm under assumption that no 3D player or utility actually uses these data, so nobody would notice the missing mvcC atom, should it be stripped due to mismatched len.

Issue is that the release is in the wild so files will be with it as users are starting to use it, before someone from Matroska is aware and can say a word. It is good that the issue was raised quickly.

Ok, let me clarify - the change of "length" field semantics happened only in this latest version of MakeMKV (and with zero-padding compatibility hack, so old parsers still would consume files created by 1.5.1 without error). I have no issues at all switching it back, if we decide that this is a proper way to go.
My only concern is that in order to preserve compatibility with non-existing content (nobody parses CodecPrivate for MVC, really - all interested parties do parse NAL units) we create confusion and source for bugs. But, naturally, I'm fine with either way.

@steffenmanden
Copy link
Author

Just wanted to follow up here, as i have a deep desire to get a good solution going for this!

A title change was also desired, what should i change it to ?

@JeromeMartinez
Copy link
Contributor

@steffenmanden you're right to ask for a follow up.

I understand that from people involved in Matroska standardization there is a preference for keeping the "old" method (element size not including the size bytes) as it was a consensus in mailing list thread on the Matroska-devel list from years ago, we could create more confusion by changing from AVC MVC style to a new style for Dolby Vision and/or other fields. This way, a parser could be neutral (not checking the block identifier and/or format for knowing which "size format" to use).

So it would means:

  • We (Matroska standardization) write a doc for being clear on that.
  • @MikeChenMM MakeMKV switches back to "old" method for Dobly Vision elements (and any new other element).
  • Issue title changes to e.g. "Write down how extra data is saved in CodecPrivate of AVC & HEVC".

@steffenmanden steffenmanden changed the title Allow for Dolby Vision Data to be saved within the mkv format Write down how extra data is saved in CodecPrivate of AVC & HEVC May 10, 2020
@MikeChenMM
Copy link

* @MikeChenMM MakeMKV switches back to "old" method for Dobly Vision elements (and any new other element).

Confirming.

@mbunkus
Copy link
Contributor

mbunkus commented May 22, 2020

@rcombs had some valid objections in PR #377 and raised the question if we shouldn't just use a couple of new Matroska elements for extensions.

I'm a bit concerned about this Dolby Vision data being placed in CodecPrivate, and particularly having it placed directly after the MP4-style configuration data with no other boundary. This data will be useful to the high-level application (for detecting Dolby Vision streams, distinguishing BL from EL) in addition to being used by the decoder itself. This means the demuxer will want to expose the contents of the DOVIDecoderConfigurationRecord to the application as stream metadata. By placing it after a variable-length structure with no central length code, this adds a requirement that the demuxer internally parses the AVCDecoderConfigurationRecord/HEVCDecoderConfigurationRecord to determine its length, skip it, then read the extension blocks.

Yes indeed. Parsing the end of avcC or hevcC isn't that difficult as the number of PPS, SPS and VPS structures and their sizes can be easily read. A demuxer would only have to parse that much, not the whole avcC or hevcC structures (no need for full SPS/PPS/VPS parsing).

However, if we're designing a general method of extending CodecPrivate, we cannot rely on this always being easy to do. For other codecs it might be much harder to determine where the regular CodecPrivate content ends and where extensions start.

This also means the Matroska CodecPrivate's contents would differ from that used in ISOBMFF, which means existing tools converting Matroska to MP4 will confusingly produce output with Matroska-style data.

Existing tools wouldn't be able to mux Matroska-with-Dolby-Vision to MP4 no matter how we store the additional data in Matroska. They all have to be adjusted. So this isn't really a good argument in my view.

Be that as it may. It would definitely be more fitting for Matroska to expose each extension as Matroska elements instead of bit-packing it into the existing CodecPrivate. Wouldn't have to be complicated either:

Tracks
  Track
    …
    CodecPrivateExtensions (new master)
      CodecPrivateExtension (new master)
        CodecPrivateExType (new binary[1], mandatory)
        CodecPrivateExContent (new binary, mandatory)

with CodecPrivate not being modified/not containing any extensions.[2]

MP4's relevant extensions atoms would easily be mappable to such a structure (and back).

[1] Yes, the …ExType would have to be a binary, not a string, as MP4 atom types ("boxes" in ISO Base Media File Format jargon) are just 32-bit unsigned integers which often happen to look readable when interpreted as ASCII, but they don't have to be. And Matroska should be more general wrt. extensions than simply copying how MP4 works.
[2] We would have to determine how to handle existing Matroska files with mvcC extensions. Those files do exist. I wouldn't like declaring them invalid. Maybe we can just ignore them; Mike himself said that "nobody parses CodecPrivate for MVC, really - all interested parties do parse NAL units" in one of his comments above.

@steffenmanden
Copy link
Author

steffenmanden commented May 22, 2020

I guess that this would also leave it more open for Mike's "new" way of storing stuff, instead of the "old" way?

@rcombs
Copy link
Contributor

rcombs commented May 22, 2020

A few additional thoughts:

Parsing the end of avcC or hevcC isn't that difficult as the number of PPS, SPS and VPS structures and their sizes can be easily read.

Yeah, it's not too bad, just a little awkward. Nowhere near as bad as, say, AAC, where the structures are much more complex and rarely have easy length codes.

Existing tools wouldn't be able to mux Matroska-with-Dolby-Vision to MP4 no matter how we store the additional data in Matroska.

Right, just, I'd rather have Matroska-with-Dolby-Vision get remuxed to something fairly benign (an MP4 with one or more video streams that might contain Dolby Vision data, but that are otherwise completely standard), rather than something that isn't standard MP4 and might break other things… or worse, become relied upon if some player decides to handle this kind of private data at a higher level instead of within the demuxer, which could end up accidentally creating a requirement for MP4 demuxers as well.

Another couple concerns about the current design:

  • It's possible that some H264 or HEVC decoders might fail when given CodecPrivate data that has additional content past the end of the AVCDecoderConfigurationRecord/HEVCDecoderConfigurationRecord. This would mean that files muxed with this change wouldn't be backwards-compatible with players using those decoders that aren't aware of this mechanism. Not sure if this actually happens in practice, though.
  • For future private extensions for other codecs, it might not actually be possible to append opaque data to the end of the existing CodecPrivate in a reliably decodable way. One can easily imagine a format where the CodecPrivate contains a series of elements with no way to determine whether you're finished parsing other than checking if you're at the end of the buffer. In such a case, there would be no way to determine that you've finished parsing the main CodecPrivate and now need to start parsing the extensions.

@mbunkus
Copy link
Contributor

mbunkus commented May 22, 2020

I guess that this would also leave it more open for Mike's "new" way of storing stuff, instead of the "old" way?

Mike's "new" and "old ways" are about different ways of storing additional data in CodecPrivate. My proposal would not store any additional data in CodecPrivate; instead all additional data is stored in CodecPrivateExtension masters.

Existing MP4 atoms such as dvcC would be mapped 1:1 onto corresponding pairs of CodecPrivateExType (with content 0x64 76 63 43 which is what ASCII dvcC is as a 32-bit unsigned Big Endian number) and CodecPrivateExContent. There would be no need for MP4-atom-like structures inside Matroska with such a proposal, completely negating the need for additional parsing of stuff after the regular CodecPrivate — just regular Matroska element parsing.

@MikeChenMM any comments on my proposal?

@mbunkus
Copy link
Contributor

mbunkus commented May 22, 2020

Right, just, I'd rather have Matroska-with-Dolby-Vision get remuxed to something fairly benign…

Yeah OK, I see your point.

For future private extensions for other codecs, it might not actually be possible to append opaque data to the end of the existing CodecPrivate in a reliably decodable way.

That's true and even worse than what I thought about.

So… I'm all for storing extensions in Matroska structures instead of appending them to CodecPrivate now.

@JeromeMartinez
Copy link
Contributor

The current extension is the result of extending the method used for MVC. So issues like a fail due to CodecPrivate not same as AVCDecoderConfigurationRecord/HEVCDecoderConfigurationRecord already exist. But I guess it was not spread enough for having these issues found everywhere.

I wrote the patch proposal only as a documentation about what is/was used, but I am not specifically for this method. Myself I add to change a bit my demuxer because I was considering CodecPrivate as dedicated to the corresponding decoder and not with extra metadata.

I see an advantage to this past method: a MKV remuxer not aware of the extension will smoothling remux the MVC/Dolby stuff. But maybe this is not the most important.

. One can easily imagine a format where the CodecPrivate contains a series of elements with no way to determine whether you're finished parsing other than checking if you're at the end of the buffer.

This is relevant, we can more or less easily handle AVC and HEVC, but this may be not future proof in case of complex CodecPrivate. So maybe that instead of keeping this direction we could change our mind here before it becomes too much spread and be more generic as in @mbunkus proposal.

I am fine with proposal of @mbunkus new Matroska elements + documenting the potential existence of mvcC existence after AVCDecoderConfigurationRecord as a legacy but deprecated method for transmitting mvcC.

@rcombs
Copy link
Contributor

rcombs commented May 22, 2020

It's been pointed out to me that there's one more flaw with the current method: it assumes that the AVCDecoderConfigurationRecord and HEVCDecoderConfigurationRecord structures are frozen. This is not the case. AVCDecoderConfigurationRecord has had an optional third NAL array added to it (for SPS_EXT NALs), which is currently ignored by libavcodec's parser and decoder. This means that AVCDecoderConfigurationRecord is now the exact kind of variable-length buffer-size-delimited CodecPrivate structure I was concerned about (which probably means my concern for current decoders potentially failing was unfounded). There's no guarantee that these structs won't have additional fields added on the end again in the future. I'm not sure there's any sensible way to handle both the SPS_EXT case and the mvcC case.

@JeromeMartinez
Copy link
Contributor

I'm not sure there's any sensible way to handle both the SPS_EXT case and the mvcC case.

numOfSequenceParameterSetExt and so on are mandatory depending on profile_idc, so we can compute AVCDecoderConfigurationRecord size. But true that it makes the parsing more complex and less transparent than we initially thought.

@robUx4
Copy link
Contributor

robUx4 commented May 22, 2020

I also think not adding data to the current CodecPrivate is better. Dolby Vision, as I understand it, is additional metadata to better handle the HDR of a picture (or group of pictures). A reader that doesn't know about Dolby Vision (my TV) should still be able to handle the HEVC+HDR in the file (my TV).

I don't think we need new elements though. We already have the BlockAdditions system where codec can add extra data in the track entry and in the Block with the codec data.

Ideally the Dolby Vision NAL would be stripped from the rest of the NALs and put in a BlockAdditional with a type for Dolby Vision. This way DV could be easily stripped at the Matroska level by some tools.

But maybe splitting NALs isn't a good idea or makes it more complex to reconstruct it for playback so it could (also?) be inside the regular Block as it probably is in MP4. But it would still signal a possible BlockAdditionMapping in the TrackEntry with the DOVIDecoderConfigurationRecord stored in the BlockAddIDExtraData the EBML way.

That means we can signal the Dolby Vision info in the Track and optionally strip the Dolby Vision NALs is the BlockAdditional is really used. That should also keep backward compatibility. Although BlockAddIDExtraData is relatively new (just like Dolby Vision).

@mbunkus
Copy link
Contributor

mbunkus commented May 22, 2020

We're talking about data that must be available in the track headers, Steve. BlockAdditions are part of the frames, not the track headers. Requiring a demuxer to read further into the file in order to find BlockGroups with BlockAdditions sounds like a superfluous complication to me.

Why would we divert radically from how HEVC is stored in MP4 all of a sudden? I don't see a real advantage here.

@mbunkus
Copy link
Contributor

mbunkus commented May 22, 2020

OK, new proposal based on the existing elements I hadn't been aware of. It's incomplete, especially wrt. to the ISO references.

AVC/h.264

New way

  • CodecPrivate only contains the avcC structure, nothing more.
  • One or more BlockAdditionalMapping elements can be used to store additional data required by the codec for decoding.
  • For the mvcC structure one BlockAdditionalMapping element must be used with BlockAddIDValue = 1 and BlockAddIDType = 1 [1]. BlockAddIDExtraData contains the mvcC structure as defined in ISO TODO.

Existing, deprecated way

A deprecated way of storing the mvcC structure is what MakeMKV has done for several years now, storing it in CodecPrivate right after the avcC structure with a size field, type field (value: 0x6d 76 63 43) and the content of the mvcC structure.

Demuxers MAY support this way of storing data.

Multiplexers SHOULD NOT create such files anymore. Multiplexers SHOULD convert such structures into the "new way" when re-writing such files.

HEVC/h.265

New way

  • CodecPrivate only contains the hevcC structure, nothing more.
  • One or more BlockAdditionalMapping elements can be used to store additional data required by the codec for decoding.
  • For the hvcE structure one BlockAdditionalMapping element must be used with BlockAddIDValue = 1 and BlockAddIDType = 2 [1]. BlockAddIDExtraData contains the hvcE (?) structure as defined in ISO TODO.
  • For the dvcC structure one BlockAdditionalMapping element must be used with BlockAddIDValue = 2 and BlockAddIDType = 2 [1]. BlockAddIDExtraData contains the dvcC structure as defined in ISO TODO.

Fotonotes

[1] I don't really understand the difference of BlockAddIDValue and BlockAddIDType, especially given that they're both mandatory. I assume BlockAddIDType can express something like this value (e.g. 1) is supposed to be a header structure vs this value is supposed to be a NALU? I've chosen 1 to mean "this is AVC header data" and 2 "this is HEVC header data". I haven't found an existing list of already-reserved values.

@JeromeMartinez
Copy link
Contributor

I understand BlockAddIDValue as decided by the muxer, not in spec (it is the value used in the BlockAdditions) and in that case it wouldn't be present in the file as there is no BlockAdditions.

@mbunkus
Copy link
Contributor

mbunkus commented May 22, 2020

Ah, so I would have to say…

  • BlockAddIDType == 1 is mvcC, only valid for CodecPrivate == AVC
  • BlockAddIDType == 2 is hvcE, only valid for CodecPrivate == HEVC
  • BlockAddIDType == 3 is dvcC, only valid for CodecPrivate == HEVC

That being said, BlockAddIDValue must still be present given that it is a mandatory element. For the case of only having that data in the track headers but not the block groups, its content is really actually irrelevant… or at least different from all values used for BlockAdditions in the block groups.

Man, that's somewhat cryptic to describe (and to understand).

Or we could simply change the specs, make BlockAddIDValue not be mandatory. I think I'd prefer this way.

@mbunkus
Copy link
Contributor

mbunkus commented May 22, 2020

We'll have to change the specs anyway in order to make it clear that BlockAdditionMapping might also simply provide codec extradata that's only present in the track headers. At the moment all the verbiage is tailored for describing content in the block groups.

@robUx4
Copy link
Contributor

robUx4 commented May 22, 2020

Ah damn, I wasn't aware of BlockAdditionMapping in TrackEntry. It's only present in the XML file at the moment, not on the website (my fault). Still not a fan of BlockAddIDType being an unsigned integer; we'd have to maintain a mapping of those numbers to what they mean in other standards. But those elements would suffice, yeah.

Technically the interpretation is still tied to the codec (originally the complement data in Musepack to get the lossless version). Each codec should tell how to interpret one of these ID (you can't just say MP3 has a complement data and hope decoders would know how to use it). So IMO that's not an issue.

@MikeChenMM
Copy link

* `BlockAddIDType == 1` is `mvcC`, only valid for `CodecPrivate` == AVC
* `BlockAddIDType == 2` is `hvcE`, only valid for `CodecPrivate` == HEVC
* `BlockAddIDType == 3` is `dvcC`, only valid for `CodecPrivate` == HEVC

Why can't we fix BlockAddIDType == 1 as "MP4 atom with fixed size"? this way we don't have to invent yet another ID for mvcC hvcE etc and will get 1-to-1 mapping and forward/back compatibility for free?
Just say that any BlockAdd id with type==1 is an MP4 atom (related to codec header). That's it. Easy for parser too - just enumerate all entries and add all with type==1 to the list of mp4 atoms. Then parse as usual...

@MikeChenMM
Copy link

EDIT: Oh, I've read the topic more thoroughly. I think we should not create yet another ID for the content type and should reuse MP4 IDs and layout. I'm all in favor for an additional "codecprivateextension" element - be it a new EBML id or a reuse of BlockAdditionMapping in that sort of hacky way - doesn't matter. But we should certainly keep the MP4 atom fourcc ID as is. The type=1 id=anything with fourcc and size in content is a good method of achieving this goal.

@mbunkus
Copy link
Contributor

mbunkus commented May 22, 2020

I don't like to use the whole atom structure as the size field itself is not only completely redundant, it would also pose the potential for more buffer overrun errors. And what happens when the size fields don't agree (meaning if the size from the embedded MP4 atom doesn't match the size of the Matroska element)? We'd have to spec something for that, parsers would have to handle that.

I would be OK with BlockAddIDType == 1 meaning 32-bit ID Big Endian field as found in MP4 files directly followed by the corresponding content, though. Like I wrote above I'd prefer re-using existing IDs from other standards (MP4 in this case), too.

@MikeChenMM
Copy link

I would be OK with BlockAddIDType == 1 meaning 32-bit ID Big Endian field as found in MP4 files directly followed by the corresponding content

Yes, you are right about the size. This would be the best way IMHO.

@steffenmanden
Copy link
Author

So is this the way to go ?

If so, i hope some of you would have time to do a write-up, as i simply don't have the insight into the container specs to do it properly :)

mbunkus added a commit that referenced this issue May 25, 2020
This prepares the element specs for the storage of elements such as
the mvcC extension for AVC video or the dvcC and hvcE extensions for
HEVC video. It does so by

* changing the verbiage to make BlockAdditionMapping not only
  referring to BlockAdditions in tracks,
* making BlockAddIDValue non-mandatory which would indicate that the
  BlockAdditionMapping is actually an extension to CodecPrivate and
* includes a first registered value for BlockAddIDType which signals
  that BlockAddIDExtraData contains a type & content like similar
  header atoms in ISO Base Media File Format files.

See #373 for the discussion how to store things. Will affect PR #377
which will have to be adjusted to make use of these elements.
@mbunkus
Copy link
Contributor

mbunkus commented May 25, 2020

PR created for the element specs. No, I'm not really happy with the wording yet; reviews & helpful suggestions are more than welcome.

If its accepted, Jérôme can adjust his PR #377 to make use of the elements.

@JeromeMartinez
Copy link
Contributor

@mbunkus I was actually preparing a similar PR, so I have taken some of your wording and I sent #389 for review as an alternative.

@JeromeMartinez
Copy link
Contributor

OK, new proposal based on the existing elements I hadn't been aware of.

@mbunkus I have written a PR based on your proposal, but actually I don't agree with mixing the 4-byte identifier with the content, and IMO it leads to more confusion especially if we try to avoid potential misunderstandings with WebM.

Technically the interpretation is still tied to the codec (originally the complement data in Musepack to get the lossless version).

@robUx4 being tied to the codec is not always the case, we should consider to have additions not tied to the codec. As we are clarifying the usage of BlockAdditionMapping, I would like to have a way to say if it is tied or not to the Codec.

My counter-proposal is:

  • BlockAddIDType contains the 4-byte ISOBMFF-like identifier. Actually we don't have to have a dedicated BlockAddIDType for saying that the format ISOBMFF-like, we just say that 4-byte numbers are planned for matching ISOBMFF-like identifiers + we also list values 1 and 4 with mapping as done in WebM (we also say that BlockAddIDValue value is mandatory to be same as `BlockAddIDType in that case, for compatibility with WebM which will not use these elements)
  • BlockAddIDExtraData contains only the content (no 4-byte identifier field)
  • Optionally a new element BlockAddIDTranscode (default 0) saying what to do if the track is transcoded (0 unknown, 1 discard, 2 keep; "discard" would be used for e.g. Musepack lossless addition, mvcC, hvcE and "keep" would be used for RAWcooked data)

Proposal is on #390.

@mbunkus
Copy link
Contributor

mbunkus commented May 26, 2020

Thanks! I like both of your PRs and see advantages in either method.

Pros for #389:

  • clearer separation of concerns (the "IDType" describes the namespace the actual IDs come from whereas the actual IDs are stored somewhere else) and therefore less danger of overlapping of values from different domains
  • not limited to max. 64-bit IDs

Pros for #390:

  • easier parsing
  • no conflated data in single element

I'm fine with either, I guess, though I slightly prefer #390.

Optionally a new element BlockAddIDTranscode (default 0) saying what to do if the track is transcoded (0 unknown, 1 discard, 2 keep; "discard" would be used for e.g. Musepack lossless addition, mvcC, hvcE and "keep" would be used for RAWcooked data)

I don't like such an element. Why should a muxer dictate how a remuxer should work? How should a muxer know in advance if a user wants to keep the lossless extensions upon remuxing or not? I don't see a use case for/the advantage of such an option. Let's not make Matroska more complex than it already is.

I'll add more comments about specific wordings in the respective PRs.

@JeromeMartinez
Copy link
Contributor

I'm fine with either, I guess, though I slightly prefer #390.

Me too, so if others (@robUx4?) are fine we could focus on this one only.

Why should a muxer dictate how a remuxer should work?

This is an hint about what to do during a transcoding without having the (incomplete) list of formats used, some additional data is useless if there is a transcoding (for example Musepack lossless addition is useless if the format of the main data is not Musepack) and some additional data is still useful because it is not related to the format of the main data (for example captions, any other ancillary data from MXF, RAWcooked content), and a transcoder could decide to discard/keep additional data based on this hint.

Let's not make Matroska more complex than it already is.

It is an hint only, very optional, and IMO more useful than MaxBlockAdditionID (I personally don't see how useful it is as I use 64-bit integers for them in any case :-p).
But this is optional and we can discard this debate for this issue and corresponding PRs (especially if #390 is kept) and I would open a dedicated issue.

@mbunkus
Copy link
Contributor

mbunkus commented May 26, 2020

Ah, I see, thanks for the explanation. Maybe a separate PR solely for that topic would indeed be better.

@steffenmanden steffenmanden changed the title Write down how extra data is saved in CodecPrivate of AVC & HEVC Write down how extra data is saved for usage with AVC & HEVC May 26, 2020
@retokromer
Copy link
Contributor

I'm fine with either, I guess, though I slightly prefer #390.

Me too, so if others (@robUx4?) are fine we could focus on this one only.

I agree, it’s easier to parse.

@steffenmanden
Copy link
Author

seems people are getting to agree that #390 is the way to go!

hope we can look into finalizing it - a lot of people waiting for it if you agree on this path :-)

@doctorsirius
Copy link

There is a lot of expectation around the revision spec from several important projects like; Plex, Emby, Kodi, etc, they absolutely trust on the matroska standard as the most important part to start working on give support to the new files.

@mbunkus
Copy link
Contributor

mbunkus commented Jun 1, 2020

I'm not sure what you're trying to say there. It comes across as being impatient and making sure to let us know that IMPORTANT PEOPLE are waiting for us, the slowpokes, and that we should finally get going and what are we waiting for. Please abstain from such posts, they're really not helpful; instead they're making me a bit resentful (just talking about myself here).

We're pretty much all volunteers here and work on best effort basis. A couple days more or less until specs are finalized won't make much difference if any to all of the projects you've listed.

We're happy to create specs for this particular thing. We like to have current content be storable in Matroska. We do work in that direction. That work includes peer review, and that simply takes time. It pretty much also means that the end result will be better and more consistent than had we rushed the job.

Patience, please.

@rcombs
Copy link
Contributor

rcombs commented Jun 1, 2020

…I'm in charge of media-core engineering at Plex. You won't see any pressure to rush from us.

@steffenmanden
Copy link
Author

Just take your time and do it properly, then we will all be happy :) I only push a reminder from time to time to continue the chat ;) - however i know it takes time, so we will get there when the time is right!

@doctorsirius
Copy link

@mbunkus You are getting me absolutely wrong, and I don't know where I'm saying you are slowpokes. I will not let you know which important people is waiting simply because it was just an expression, and both we know is not like that, the expectation is not on the devs but in a small part of the people following the DV thing on those projects.

But if you are feeling a bit resentful I can understand it and, in fact, assume my unintentional responsability of making you feel that way. I know you are volunteers and I also perfectly understand the spirit behind that volunteering (I myself collaborated in some important open source project in the past).

Having said that I hope you accept my apologies, I guess I need to calm down a bit with the whole thing of DV, you are specially right on the fact no one of those comments actually helps with the issue, even this one is just noise, I will take care that it doesn't happen again.

PD: The lockdown didn't help with my "anxiety" of software updates either

@mbunkus
Copy link
Contributor

mbunkus commented Jun 1, 2020

@doctorsirius Thanks for saying all that. I'm fine. I tend to be thin-skinned in general, and I've had people feel entitled too often in the past, both of which makes me interpret comments more negatively than they were intended. So… sorry from my side, too.

@JeromeMartinez
Copy link
Contributor

I merged the corresponding PR, so BlockAdditionMapping is the way to go, keeping the extra mvcC data in CodecPrivate only for legacy reasons.

@MikeChenMM please keep us posted of the implementation in your software, I would like to test my own software on the files created by it in order to be sure we are in sync.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
codec mapping format extension elements that may not merge in the main DocType
Projects
None yet
Development

Successfully merging a pull request may close this issue.