Skip to content

Commit

Permalink
[Cocoa] "Pop" of bad audio heard at the start of certain YouTube videos
Browse files Browse the repository at this point in the history
https://bugs.webkit.org/show_bug.cgi?id=255212
rdar://106976225

Reviewed by Eric Carlson.

Tracking addition of a test via https://bugs.webkit.org/show_bug.cgi?id=255227.

Two interrelated problems cause discontinuties in the audio output at the
start of certain Opus-encoded WebM files.

1) A bug in the ffmpeg muxer causes the initial block in a cluster to be 1ms
too long, which causes an audible discontinuity to be generated from
AVSampleBufferAudioRenderer.

2) Some Opus-encoded WebM files include a CodecDelay value, which requires
players to decode, but not render, the initial audio frames in a stream.

For 2), map the CodecDelay value to a kCMSampleBufferAttachmentKey_TrimDurationAtStart
attachment in the resulting CMSampleBuffer. This causes the output duration of the
sample to be reduced by the trim duration, and the output presentation time to be
increased by the trim duration, so also shift the input presentation time by the same
amount. This aligns the first audible frame with the start time of the track.

For 1), if a discontinuity is encountered, and the discontinuity is less than 15ms
simply advance the presentation time of the subsequent sample by the discontinuity
duration. Track this discontinuity cumulatively, so that if multiple discontinuities
are encountered that total greater than 15ms, a real audible discontinuity is generated
and the track is brought back in sync with the master timeline.

* Source/WebCore/platform/MediaSample.h:
* Source/WebCore/platform/graphics/cocoa/CMUtilities.mm:
(WebCore::toCMSampleBuffer):
* Source/WebCore/platform/graphics/cocoa/SourceBufferParserWebM.cpp:
(WebCore::WebMParser::VideoTrackData::consumeFrameData):
(WebCore::WebMParser::AudioTrackData::AudioTrackData):
(WebCore::WebMParser::AudioTrackData::consumeFrameData):
* Source/WebCore/platform/graphics/cocoa/SourceBufferParserWebM.h:
(WebCore::WebMParser::AudioTrackData::AudioTrackData): Deleted.

Canonical link: https://commits.webkit.org/262837@main
  • Loading branch information
jernoble committed Apr 11, 2023
1 parent 89a5ffd commit 7f1bcb5
Show file tree
Hide file tree
Showing 9 changed files with 105 additions and 22 deletions.
Expand Up @@ -21,7 +21,7 @@ EXPECTED (sourceBuffer.buffered.end(0) == '7.401') OK
RUN(sourceBuffer.appendBuffer(loader.mediaSegment(0).slice(0, loader.mediaSegmentSize(0)/2)))
EVENT(update)
EXPECTED (sourceBuffer.buffered.length == '2') OK
EXPECTED (sourceBuffer.buffered.end(0) == '3.961') OK
EXPECTED (sourceBuffer.buffered.end(0) == '3.96') OK
EXPECTED (sourceBuffer.buffered.end(1) == '7.401') OK
Clean sourcebuffer of all content.
RUN(sourceBuffer.remove(0, 100))
Expand All @@ -41,7 +41,7 @@ EVENT(update)
RUN(sourceBuffer.appendBuffer(loader.mediaSegment(0).slice(0, loader.mediaSegmentSize(0)/2)))
EVENT(update)
EXPECTED (sourceBuffer.buffered.length == '2') OK
EXPECTED (sourceBuffer.buffered.end(0) == '3.961') OK
EXPECTED (sourceBuffer.buffered.end(0) == '3.96') OK
EXPECTED (sourceBuffer.buffered.end(1) == '7.401') OK
Clean sourcebuffer of all content.
RUN(sourceBuffer.abort())
Expand All @@ -56,6 +56,6 @@ EVENT(update)
RUN(sourceBuffer.appendBuffer(loader.mediaSegment(0).slice(0, loader.mediaSegmentSize(0)/2)))
EVENT(update)
EXPECTED (sourceBuffer.buffered.length == '1') OK
EXPECTED (sourceBuffer.buffered.end(0) == '3.961') OK
EXPECTED (sourceBuffer.buffered.end(0) == '3.96') OK
END OF TEST

13 changes: 10 additions & 3 deletions LayoutTests/media/media-source/media-webm-opus-partial-abort.html
Expand Up @@ -23,6 +23,13 @@

window.addEventListener('load', async event => {
try {
// NOTE: A bug in ffmpeg causes a 1ms mismatch between the Opus
// packet duration of the first WebM block, and the start time of
// the subsequent block. Until all ports agree on whether to use
// the packet duration or the block timestamps, this tolerance
// value allows this to continue passing normally.
const OpusBufferedTolerance = 0.001;

findMediaElement();
loader = new MediaSourceLoader('content/test-opus-manifest.json');
await loaderPromise(loader);
Expand Down Expand Up @@ -58,7 +65,7 @@
run('sourceBuffer.appendBuffer(loader.mediaSegment(0).slice(0, loader.mediaSegmentSize(0)/2))');
await waitFor(sourceBuffer, 'update');
testExpected('sourceBuffer.buffered.length', '2');
testExpected('sourceBuffer.buffered.end(0)', '3.961');
testExpectedEqualWithTolerance('sourceBuffer.buffered.end(0)', '3.96', OpusBufferedTolerance);
testExpected('sourceBuffer.buffered.end(1)', '7.401');

consoleWrite('Clean sourcebuffer of all content.');
Expand All @@ -80,7 +87,7 @@
run('sourceBuffer.appendBuffer(loader.mediaSegment(0).slice(0, loader.mediaSegmentSize(0)/2))');
await waitFor(sourceBuffer, 'update');
testExpected('sourceBuffer.buffered.length', '2');
testExpected('sourceBuffer.buffered.end(0)', '3.961');
testExpectedEqualWithTolerance('sourceBuffer.buffered.end(0)', '3.96', OpusBufferedTolerance);
testExpected('sourceBuffer.buffered.end(1)', '7.401');

consoleWrite('Clean sourcebuffer of all content.');
Expand All @@ -97,7 +104,7 @@
run('sourceBuffer.appendBuffer(loader.mediaSegment(0).slice(0, loader.mediaSegmentSize(0)/2))');
await waitFor(sourceBuffer, 'update');
testExpected('sourceBuffer.buffered.length', '1');
testExpected('sourceBuffer.buffered.end(0)', '3.961');
testExpectedEqualWithTolerance('sourceBuffer.buffered.end(0)', '3.96', OpusBufferedTolerance);

endTest();
} catch (e) {
Expand Down
Expand Up @@ -11,25 +11,25 @@ Append a partial media segment.
RUN(sourceBuffer.appendBuffer(partial1))
EVENT(update)
EXPECTED (sourceBuffer.buffered.length == '1') OK
EXPECTED (sourceBuffer.buffered.end(0) == '3.961') OK
EXPECTED (sourceBuffer.buffered.end(0) == '3.96') OK
Complete the partial media segment.
RUN(sourceBuffer.appendBuffer(partial2))
EVENT(update)
EXPECTED (sourceBuffer.buffered.length == '1') OK
EXPECTED (sourceBuffer.buffered.end(0) == '4.981') OK
EXPECTED (sourceBuffer.buffered.end(0) == '4.98') OK
Clean sourcebuffer of all content.
RUN(sourceBuffer.remove(0, 100))
EVENT(update)
Append the two media segment in reverse order.
RUN(sourceBuffer.appendBuffer(loader.mediaSegment(1)))
EVENT(update)
EXPECTED (sourceBuffer.buffered.length == '1') OK
EXPECTED (sourceBuffer.buffered.start(0) == '4.981') OK
EXPECTED (sourceBuffer.buffered.end(0) == '9.981') OK
EXPECTED (sourceBuffer.buffered.start(0) == '4.98') OK
EXPECTED (sourceBuffer.buffered.end(0) == '9.98') OK
RUN(sourceBuffer.appendBuffer(loader.mediaSegment(0)))
EVENT(update)
EXPECTED (sourceBuffer.buffered.length == '1') OK
EXPECTED (sourceBuffer.buffered.start(0) == '0') OK
EXPECTED (sourceBuffer.buffered.end(0) == '9.981') OK
EXPECTED (sourceBuffer.buffered.end(0) == '9.98') OK
END OF TEST

17 changes: 12 additions & 5 deletions LayoutTests/media/media-source/media-webm-opus-partial.html
Expand Up @@ -18,6 +18,13 @@

window.addEventListener('load', async event => {
try {
// NOTE: A bug in ffmpeg causes a 1ms mismatch between the Opus
// packet duration of the first WebM block, and the start time of
// the subsequent block. Until all ports agree on whether to use
// the packet duration or the block timestamps, this tolerance
// value allows this to continue passing normally.
const OpusBufferedTolerance = 0.001;

findMediaElement();
loader = new MediaSourceLoader('content/test-opus-manifest.json');
await loaderPromise(loader);
Expand All @@ -39,14 +46,14 @@
run('sourceBuffer.appendBuffer(partial1)');
await waitFor(sourceBuffer, 'update');
testExpected('sourceBuffer.buffered.length', '1');
testExpected('sourceBuffer.buffered.end(0)', '3.961', '==');
testExpectedEqualWithTolerance('sourceBuffer.buffered.end(0)', '3.96', OpusBufferedTolerance);

consoleWrite('Complete the partial media segment.')
run('sourceBuffer.appendBuffer(partial2)');
await waitFor(sourceBuffer, 'update');

testExpected('sourceBuffer.buffered.length', '1');
testExpected('sourceBuffer.buffered.end(0)', '4.981', '==');
testExpectedEqualWithTolerance('sourceBuffer.buffered.end(0)', '4.98', OpusBufferedTolerance);

consoleWrite('Clean sourcebuffer of all content.');
run('sourceBuffer.remove(0, 100)');
Expand All @@ -56,14 +63,14 @@
run('sourceBuffer.appendBuffer(loader.mediaSegment(1))');
await waitFor(sourceBuffer, 'update');
testExpected('sourceBuffer.buffered.length', '1');
testExpected('sourceBuffer.buffered.start(0)', '4.981', '==');
testExpected('sourceBuffer.buffered.end(0)', '9.981', '==');
testExpectedEqualWithTolerance('sourceBuffer.buffered.start(0)', '4.98', OpusBufferedTolerance);
testExpectedEqualWithTolerance('sourceBuffer.buffered.end(0)', '9.98', OpusBufferedTolerance);

run('sourceBuffer.appendBuffer(loader.mediaSegment(0))');
await waitFor(sourceBuffer, 'update');
testExpected('sourceBuffer.buffered.length', '1');
testExpected('sourceBuffer.buffered.start(0)', '0', '==');
testExpected('sourceBuffer.buffered.end(0)', '9.981', '==');
testExpectedEqualWithTolerance('sourceBuffer.buffered.end(0)', '9.98', OpusBufferedTolerance);

endTest();
} catch (e) {
Expand Down
11 changes: 11 additions & 0 deletions LayoutTests/media/video-test.js
Expand Up @@ -95,6 +95,17 @@ function testExpected(testFuncString, expected, comparison)
}
}

function testExpectedEqualWithTolerance(testFuncString, expected, tolerance)
{
try {
let observed = eval(testFuncString);
let success = Math.abs(observed - expected) <= tolerance;
reportExpected(success, testFuncString, '==', expected, observed)
} catch (ex) {
consoleWrite(ex);
}
}

function sleepFor(duration) {
return new Promise(resolve => {
setTimeout(resolve, duration);
Expand Down
1 change: 1 addition & 0 deletions Source/WebCore/platform/MediaSample.h
Expand Up @@ -217,6 +217,7 @@ class MediaSamplesBlock {
MediaTime presentationTime;
MediaTime decodeTime;
MediaTime duration;
MediaTime trimDuration;
MediaSampleDataType data;
MediaSample::SampleFlags flags;
};
Expand Down
7 changes: 7 additions & 0 deletions Source/WebCore/platform/graphics/cocoa/CMUtilities.mm
Expand Up @@ -230,6 +230,7 @@ static CFStringRef convertToCMYCbCRMatrix(PlatformVideoMatrixCoefficients coeffi
packetTimings.reserveInitialCapacity(samples.size());
Vector<size_t> packetSizes;
packetSizes.reserveInitialCapacity(samples.size());
auto cumulativeTrimDuration = MediaTime::zeroTime();
for (auto& sample : samples) {
if (!std::holds_alternative<Ref<const FragmentedSharedBuffer>>(sample.data))
return makeUnexpected("Invalid MediaSamplesBlock type");
Expand All @@ -247,6 +248,7 @@ static CFStringRef convertToCMYCbCRMatrix(PlatformVideoMatrixCoefficients coeffi
}
packetTimings.append({ PAL::toCMTime(sample.duration), PAL::toCMTime(sample.presentationTime), PAL::toCMTime(sample.decodeTime) });
packetSizes.append(data->size());
cumulativeTrimDuration += sample.trimDuration;
}

CMSampleBufferRef rawSampleBuffer = nullptr;
Expand All @@ -267,6 +269,11 @@ static CFStringRef convertToCMYCbCRMatrix(PlatformVideoMatrixCoefficients coeffi
} else if (samples.isAudio() && samples.discontinuity())
PAL::CMSetAttachment(rawSampleBuffer, PAL::kCMSampleBufferAttachmentKey_FillDiscontinuitiesWithSilence, *samples.discontinuity() ? kCFBooleanTrue : kCFBooleanFalse, kCMAttachmentMode_ShouldPropagate);

if (cumulativeTrimDuration > MediaTime::zeroTime()) {
auto trimDurationDict = adoptCF(PAL::softLink_CoreMedia_CMTimeCopyAsDictionary(PAL::toCMTime(cumulativeTrimDuration), kCFAllocatorDefault));
PAL::CMSetAttachment(rawSampleBuffer, PAL::get_CoreMedia_kCMSampleBufferAttachmentKey_TrimDurationAtStart(), trimDurationDict.get(), kCMAttachmentMode_ShouldPropagate);
}

return adoptCF(rawSampleBuffer);
}

Expand Down
54 changes: 52 additions & 2 deletions Source/WebCore/platform/graphics/cocoa/SourceBufferParserWebM.cpp
Expand Up @@ -1105,7 +1105,7 @@ webm::Status WebMParser::VideoTrackData::consumeFrameData(webm::Reader& reader,

processPendingMediaSamples(presentationTime);

m_pendingMediaSamples.append({ presentationTime, presentationTime, MediaTime::indefiniteTime(), WTFMove(m_completeFrameData), isKey ? MediaSample::SampleFlags::IsSync : MediaSample::SampleFlags::None });
m_pendingMediaSamples.append({ presentationTime, presentationTime, MediaTime::indefiniteTime(), MediaTime::zeroTime(), WTFMove(m_completeFrameData), isKey ? MediaSample::SampleFlags::IsSync : MediaSample::SampleFlags::None });

ASSERT(!*bytesRemaining);
return webm::Status(webm::Status::kOkCompleted);
Expand Down Expand Up @@ -1176,6 +1176,22 @@ void WebMParser::VideoTrackData::flushPendingSamples()
m_lastPresentationTime.reset();
}

WebMParser::AudioTrackData::AudioTrackData(CodecType codecType, const webm::TrackEntry& trackEntry, WebMParser& parser)
: TrackData { codecType, trackEntry, TrackInfo::TrackType::Audio, parser }
{
// https://wiki.xiph.org/MatroskaOpus#Element_Additions
// CodecDelay is a new unsigned integer element added to the TrackEntry element.
// The value is the number of nanoseconds that must be discarded, for that stream, from the
// start of that stream. The value is also the number of nanoseconds that all encoded
// timestamps for that stream must be shifted to get the presentation timestamp.
// (This will fix Vorbis encoding as well.)
if (trackEntry.codec_delay.is_present()) {
constexpr uint32_t k_us_in_seconds = 1000000000;
m_remainingTrimDuration = MediaTime(trackEntry.codec_delay.value(), k_us_in_seconds);
m_presentationTimeShift = -m_remainingTrimDuration;
}
}

void WebMParser::AudioTrackData::resetCompletedFramesState()
{
mNumFramesInCompleteBlock = 0;
Expand Down Expand Up @@ -1250,7 +1266,41 @@ webm::Status WebMParser::AudioTrackData::consumeFrameData(webm::Reader& reader,
else if (formatDescription() && *formatDescription() != *m_processedMediaSamples.info())
drainPendingSamples();

m_processedMediaSamples.append({ presentationTime, MediaTime::invalidTime(), m_packetDuration, WTFMove(m_completeFrameData), MediaSample::SampleFlags::IsSync });
auto trimDuration = MediaTime::zeroTime();
MediaTime localPresentationTime = presentationTime;
if (m_remainingTrimDuration.isFinite() && m_remainingTrimDuration > MediaTime::zeroTime()) {
if (m_remainingTrimDuration < m_packetDuration)
std::swap(trimDuration, m_remainingTrimDuration);
else {
m_remainingTrimDuration -= m_packetDuration;
trimDuration = m_packetDuration;
}
}

ASSERT(m_presentationTimeShift.isFinite());
if (m_presentationTimeShift != MediaTime::zeroTime())
localPresentationTime += m_presentationTimeShift;

if (m_lastPresentationEndTime.isValid()) {
MediaTime discontinuityGap = localPresentationTime - m_lastPresentationEndTime;
// ATSC IS-191: Relative Timing of Sound and Vision for Broadcast Operations
// "The sound program should never lead the video program by more than
// 15 milliseconds, and should never lag the video program by more than
// 45 milliseconds."
// By collapsing gaps between samples, we effectively advance the timing
// of the subsequent samples, which may result in the sound content leading
// the video content. With a reasonable assumption that the media file starts
// with A/V content in perfect sync, we can advance samples up to 15 ms.
// Any gaps larger than that amount must have a discontiuity in order to bring
// the audio and video presenatations into sync.
constexpr MediaTime maximumAllowableDiscontinuity = MediaTime(15, 1000);
if (discontinuityGap > MediaTime::zeroTime() && discontinuityGap < maximumAllowableDiscontinuity)
localPresentationTime -= discontinuityGap;
}

m_lastPresentationEndTime = localPresentationTime + m_packetDuration;

m_processedMediaSamples.append({ localPresentationTime, MediaTime::invalidTime(), m_packetDuration, trimDuration, WTFMove(m_completeFrameData), MediaSample::SampleFlags::IsSync });

drainPendingSamples();

Expand Down
Expand Up @@ -233,10 +233,7 @@ class WebMParser
return makeUniqueRef<AudioTrackData>(codecType, trackEntry, parser);
}

AudioTrackData(CodecType codecType, const webm::TrackEntry& trackEntry, WebMParser& parser)
: TrackData { codecType, trackEntry, TrackInfo::TrackType::Audio, parser }
{
}
AudioTrackData(CodecType, const webm::TrackEntry&, WebMParser&);

private:
webm::Status consumeFrameData(webm::Reader&, const webm::FrameMetadata&, uint64_t*, const MediaTime&) final;
Expand All @@ -247,6 +244,9 @@ class WebMParser
uint8_t m_framesPerPacket { 0 };
Seconds m_frameDuration { 0_s };
size_t mNumFramesInCompleteBlock { 0 };
MediaTime m_lastPresentationEndTime { MediaTime::invalidTime() };
MediaTime m_remainingTrimDuration;
MediaTime m_presentationTimeShift;
};

private:
Expand Down

0 comments on commit 7f1bcb5

Please sign in to comment.