Skip to content
Permalink
Browse files
Safari can't play video completely at bilibili.com
https://bugs.webkit.org/show_bug.cgi?id=236440
rdar://88761053

Reviewed by Jer Noble.

Source/WebCore:

Video frames were incorrectly evicted during a call to appendBuffer
as the Source Buffer incorrectly assumed a discontinuity was present.

When appending data to a source buffer, the MSE specs describe a method
to detect discontinuities in the Coded Frame Processing algorithm
(https://www.w3.org/TR/media-source/#sourcebuffer-coded-frame-processing)
step 6:
"
- If last decode timestamp for track buffer is set and decode timestamp
  is less than last decode timestamp:
OR
- If last decode timestamp for track buffer is set and the difference
  between decode timestamp and last decode timestamp is greater than
  2 times last frame duration.
"
The issue being what defines the last frame duration.
Is it the frame last seen in the coded frame processing loop or the frame
whose presentation timestamp is just before the one we are currently
processing.

H264 and HEVC have a concept of b-frames: that is a frame that depends
on a future frame to be decoded.
Those frames are found in the container and can be identified by their
presentation timestamp higher than the frame following in decode order.
Those present a challenge as the frame prior the current one in
presentation order, may actually only be found several frames back in
decode order.
Bug 181891 attempted to fix a similar issue, and used the longest
"decode duration" as a workaround to detect discontinuity in the content.
It mentioned adopting the same technique as in Mozilla's MSE
implementation, but Mozilla also skip discontinuity detection within a
media segment (https://www.w3.org/TR/media-source/#media-segment which for
fMP4 is a single moof box) an approach that can't be achieved with
CoreMedia's AVStreamDataParser.
As mentioned in bug 181891, CoreMedia ignore the decode timestamps' delta
and juggles with the sample's duration so that there's no discontinuity
in the demuxed samples' presentation time, causing false positive in the
gap detection algorithm.

Bilibili uses HEVC content, and uses an encoding that generate lots
of b-frames, with a very wide sliding window (seen up to 12 frames).
By using the longest frame duration found in either presentation or
decode duration as threshold to identify a discontinuity, we can
properly parse the content and not incorrectly evict appended frames.
(As a side note, the use of HEVC with B-Frames is peculiar as not all
hardware support it.)
It is difficult to identify here if the issue is within the bilibili's
content or CoreMedia's output, though the responsibility more than
likely lies with bilibili.

Test: media/media-source/media-mp4-hevc-bframes.html

* platform/graphics/SourceBufferPrivate.cpp:
(WebCore::SourceBufferPrivate::TrackBuffer::TrackBuffer):
(WebCore::SourceBufferPrivate::resetTrackBuffers):
(WebCore::SourceBufferPrivate::didReceiveSample):
* platform/graphics/SourceBufferPrivate.h:

LayoutTests:

* media/media-source/content/test-bframes-hevc-manifest.json: Added.
* media/media-source/content/test-bframes-hevc.mp4: Added.
* media/media-source/media-mp4-hevc-bframes-expected.txt: Added.
* media/media-source/media-mp4-hevc-bframes.html: Added.



Canonical link: https://commits.webkit.org/248840@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@291813 268f45cc-cd09-0410-ab3c-d52691b4dbfc
  • Loading branch information
jyavenard committed Mar 24, 2022
1 parent de4834c commit 52ef261a093d0516667f17055c9e7c552cf338a5
Showing 9 changed files with 163 additions and 15 deletions.
@@ -1,3 +1,16 @@
2022-03-24 Jean-Yves Avenard <jya@apple.com>

Safari can't play video completely at bilibili.com
https://bugs.webkit.org/show_bug.cgi?id=236440
rdar://88761053

Reviewed by Jer Noble.

* media/media-source/content/test-bframes-hevc-manifest.json: Added.
* media/media-source/content/test-bframes-hevc.mp4: Added.
* media/media-source/media-mp4-hevc-bframes-expected.txt: Added.
* media/media-source/media-mp4-hevc-bframes.html: Added.

2022-03-24 Matteo Flores <matteo_flores@apple.com>

Unreviewed, reverting r291789.
@@ -0,0 +1,9 @@
{
"url": "content/test-bframes-hevc.mp4",
"type": "video/mp4; codecs=\"hev1.1.6.L93.B0\"",
"init": { "offset": 0, "size": 1065 },
"duration": 236.0,
"media": [
{ "offset": 1065, "size": 514474, "timestamp": 230.0, "duration": 6.0 }
]
}
Binary file not shown.
@@ -0,0 +1,13 @@

RUN(video.src = URL.createObjectURL(source))
EVENT(sourceopen)
RUN(sourceBuffer = source.addSourceBuffer(loader.type()))
RUN(sourceBuffer.appendBuffer(loader.initSegment()))
EVENT(update)
Append media segment.
RUN(sourceBuffer.appendBuffer(loader.mediaSegment(0)))
EVENT(update)
EXPECTED (sourceBuffer.buffered.length == '1') OK
EXPECTED (sourceBuffer.buffered.end(0) == '236') OK
END OF TEST

@@ -0,0 +1,49 @@
<!DOCTYPE html>
<html>
<head>
<title>media-mp4-hevc-bframes</title>
<script src="../../media/media-source/media-source-loader.js"></script>
<script src="../../media/video-test.js"></script>
<script>
var loader;
var source;
var sourceBuffer;

function loaderPromise(loader) {
return new Promise((resolve, reject) => {
loader.onload = resolve;
loader.onerror = reject;
});
}

window.addEventListener('load', async event => {
try {
findMediaElement();
loader = new MediaSourceLoader('content/test-bframes-hevc-manifest.json');
await loaderPromise(loader);

source = new MediaSource();
run('video.src = URL.createObjectURL(source)');
await waitFor(source, 'sourceopen');
waitFor(video, 'error').then(failTest);

run('sourceBuffer = source.addSourceBuffer(loader.type())');
run('sourceBuffer.appendBuffer(loader.initSegment())');
await waitFor(sourceBuffer, 'update');

consoleWrite('Append media segment.')
run('sourceBuffer.appendBuffer(loader.mediaSegment(0))');
await waitFor(sourceBuffer, 'update');
testExpected('sourceBuffer.buffered.length', '1');
testExpected('sourceBuffer.buffered.end(0)', '236', '==');
endTest();
} catch (e) {
failTest(`Caught exception: "${e}"`);
}
});
</script>
</head>
<body>
<video controls></video>
</body>
</html>
@@ -815,6 +815,7 @@ fast/mediastream/track-ended-while-muted.html [ Failure ]
webkit.org/b/223508 imported/w3c/web-platform-tests/mediacapture-streams/MediaStream-MediaElement-srcObject.https.html [ Failure Pass ]

webkit.org/b/218317 media/media-source/media-source-trackid-change.html [ Failure ]
webkit.org/b/238201 media/media-source/media-mp4-hevc-bframes.html [ Failure ]

webkit.org/b/211995 fast/images/animated-image-mp4.html [ Failure Timeout ]
webkit.org/b/211995 fast/images/animated-image-mp4-crash.html [ Timeout Pass ]
@@ -1,3 +1,69 @@
2022-03-24 Jean-Yves Avenard <jya@apple.com>

Safari can't play video completely at bilibili.com
https://bugs.webkit.org/show_bug.cgi?id=236440
rdar://88761053

Reviewed by Jer Noble.

Video frames were incorrectly evicted during a call to appendBuffer
as the Source Buffer incorrectly assumed a discontinuity was present.

When appending data to a source buffer, the MSE specs describe a method
to detect discontinuities in the Coded Frame Processing algorithm
(https://www.w3.org/TR/media-source/#sourcebuffer-coded-frame-processing)
step 6:
"
- If last decode timestamp for track buffer is set and decode timestamp
is less than last decode timestamp:
OR
- If last decode timestamp for track buffer is set and the difference
between decode timestamp and last decode timestamp is greater than
2 times last frame duration.
"
The issue being what defines the last frame duration.
Is it the frame last seen in the coded frame processing loop or the frame
whose presentation timestamp is just before the one we are currently
processing.

H264 and HEVC have a concept of b-frames: that is a frame that depends
on a future frame to be decoded.
Those frames are found in the container and can be identified by their
presentation timestamp higher than the frame following in decode order.
Those present a challenge as the frame prior the current one in
presentation order, may actually only be found several frames back in
decode order.
Bug 181891 attempted to fix a similar issue, and used the longest
"decode duration" as a workaround to detect discontinuity in the content.
It mentioned adopting the same technique as in Mozilla's MSE
implementation, but Mozilla also skip discontinuity detection within a
media segment (https://www.w3.org/TR/media-source/#media-segment which for
fMP4 is a single moof box) an approach that can't be achieved with
CoreMedia's AVStreamDataParser.
As mentioned in bug 181891, CoreMedia ignore the decode timestamps' delta
and juggles with the sample's duration so that there's no discontinuity
in the demuxed samples' presentation time, causing false positive in the
gap detection algorithm.

Bilibili uses HEVC content, and uses an encoding that generate lots
of b-frames, with a very wide sliding window (seen up to 12 frames).
By using the longest frame duration found in either presentation or
decode duration as threshold to identify a discontinuity, we can
properly parse the content and not incorrectly evict appended frames.
(As a side note, the use of HEVC with B-Frames is peculiar as not all
hardware support it.)
It is difficult to identify here if the issue is within the bilibili's
content or CoreMedia's output, though the responsibility more than
likely lies with bilibili.

Test: media/media-source/media-mp4-hevc-bframes.html

* platform/graphics/SourceBufferPrivate.cpp:
(WebCore::SourceBufferPrivate::TrackBuffer::TrackBuffer):
(WebCore::SourceBufferPrivate::resetTrackBuffers):
(WebCore::SourceBufferPrivate::didReceiveSample):
* platform/graphics/SourceBufferPrivate.h:

2022-03-24 Alan Bujtas <zalan@apple.com>

[LFC][IFC] Remove slow codepath matching arithmetics in FontCascade::widthForSimpleText
@@ -67,7 +67,7 @@ static const MediaTime discontinuityTolerance = MediaTime(1, 1);

SourceBufferPrivate::TrackBuffer::TrackBuffer()
: lastDecodeTimestamp(MediaTime::invalidTime())
, greatestDecodeDuration(MediaTime::invalidTime())
, greatestFrameDuration(MediaTime::invalidTime())
, lastFrameDuration(MediaTime::invalidTime())
, highestPresentationTimestamp(MediaTime::invalidTime())
, highestEnqueuedPresentationTime(MediaTime::invalidTime())
@@ -101,7 +101,7 @@ void SourceBufferPrivate::resetTrackBuffers()
{
for (auto& trackBufferPair : m_trackBufferMap.values()) {
trackBufferPair.get().lastDecodeTimestamp = MediaTime::invalidTime();
trackBufferPair.get().greatestDecodeDuration = MediaTime::invalidTime();
trackBufferPair.get().greatestFrameDuration = MediaTime::invalidTime();
trackBufferPair.get().lastFrameDuration = MediaTime::invalidTime();
trackBufferPair.get().highestPresentationTimestamp = MediaTime::invalidTime();
trackBufferPair.get().needRandomAccessFlag = true;
@@ -959,14 +959,8 @@ void SourceBufferPrivate::didReceiveSample(Ref<MediaSample>&& originalSample)
// OR
// ↳ If last decode timestamp for track buffer is set and the difference between decode timestamp and
// last decode timestamp is greater than 2 times last frame duration:
MediaTime decodeDurationToCheck = trackBuffer.greatestDecodeDuration;

if (decodeDurationToCheck.isValid() && trackBuffer.lastFrameDuration.isValid()
&& (trackBuffer.lastFrameDuration > decodeDurationToCheck))
decodeDurationToCheck = trackBuffer.lastFrameDuration;

if (trackBuffer.lastDecodeTimestamp.isValid() && (decodeTimestamp < trackBuffer.lastDecodeTimestamp
|| (decodeDurationToCheck.isValid() && abs(decodeTimestamp - trackBuffer.lastDecodeTimestamp) > (decodeDurationToCheck * 2)))) {
|| (trackBuffer.greatestFrameDuration.isValid() && decodeTimestamp - trackBuffer.lastDecodeTimestamp > (trackBuffer.greatestFrameDuration * 2)))) {

// 1.6.1:
if (m_appendMode == SourceBufferAppendMode::Segments) {
@@ -983,7 +977,7 @@ void SourceBufferPrivate::didReceiveSample(Ref<MediaSample>&& originalSample)
// 1.6.2 Unset the last decode timestamp on all track buffers.
trackBuffer.get().lastDecodeTimestamp = MediaTime::invalidTime();
// 1.6.3 Unset the last frame duration on all track buffers.
trackBuffer.get().greatestDecodeDuration = MediaTime::invalidTime();
trackBuffer.get().greatestFrameDuration = MediaTime::invalidTime();
trackBuffer.get().lastFrameDuration = MediaTime::invalidTime();
// 1.6.4 Unset the highest presentation timestamp on all track buffers.
trackBuffer.get().highestPresentationTimestamp = MediaTime::invalidTime();
@@ -1261,12 +1255,15 @@ void SourceBufferPrivate::didReceiveSample(Ref<MediaSample>&& originalSample)
trackBuffer.needsMinimumUpcomingPresentationTimeUpdating = true;
}

// NOTE: the spec considers "Coded Frame Duration" to be the presentation duration, but this is not necessarily equal
// to the decoded duration. When comparing deltas between decode timestamps, the decode duration, not the presentation.
// NOTE: the spec considers the need to check the last frame duration but doesn't specify if that last frame
// is the one prior in presentation or decode order.
// So instead, as a workaround we use the largest frame duration seen in the current coded frame group (as defined in https://www.w3.org/TR/media-source/#coded-frame-group.
if (trackBuffer.lastDecodeTimestamp.isValid()) {
MediaTime lastDecodeDuration = decodeTimestamp - trackBuffer.lastDecodeTimestamp;
if (!trackBuffer.greatestDecodeDuration.isValid() || lastDecodeDuration > trackBuffer.greatestDecodeDuration)
trackBuffer.greatestDecodeDuration = lastDecodeDuration;
if (!trackBuffer.greatestFrameDuration.isValid())
trackBuffer.greatestFrameDuration = std::max(lastDecodeDuration, frameDuration);
else
trackBuffer.greatestFrameDuration = std::max({ trackBuffer.greatestFrameDuration, frameDuration, lastDecodeDuration });
}

// 1.17 Set last decode timestamp for track buffer to decode timestamp.
@@ -117,7 +117,7 @@ class SourceBufferPrivate
struct TrackBuffer {
WTF_MAKE_STRUCT_FAST_ALLOCATED;
MediaTime lastDecodeTimestamp;
MediaTime greatestDecodeDuration;
MediaTime greatestFrameDuration;
MediaTime lastFrameDuration;
MediaTime highestPresentationTimestamp;
MediaTime highestEnqueuedPresentationTime;

0 comments on commit 52ef261

Please sign in to comment.