-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[GStreamer] MediaPlayerPrivateGStreamer: Abort stale tasks on flushes #3802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GStreamer] MediaPlayerPrivateGStreamer: Abort stale tasks on flushes #3802
Conversation
EWS run on previous version of this PR (hash dbb1766) |
Is this retry label supposed to trigger something already? |
dbb1766
to
4348a5a
Compare
I hoped so, but didn't seem so. I just re-uploaded a version with indentation fixes. |
EWS run on previous version of this PR (hash 4348a5a) |
Interesting, compositing/video/video-border-radius-clipping.html failed both times. It may not be a fluke then... Will investigate. |
Okay, this has been a ride... First, I have learned that video output doesn't work in our test runner.
That would be its own issue to solve. Still, somehow, the tester managed to catch an actual issue, even if kind of by accident: With the new code, at the end of playback the frame disappears, instead of remaining green as it should, and as it did with the old code. That is a undoubtedly a regression. I pinpointed the problem to the video sink probe, confirming leaving the old code doesn't expose the bug. Then I analyzed both by hand. The only difference should be that the new one also calls AbortableTaskQueue methods. But removing such calls didn't return the old behavior, so I must have been missing something in my analysis. I added logging to both versions, processed logs, and eventually found this: With the new code:
With the old code:
Shockers! The old code doesn't do anything in reaction to flushes. Which is a bit strange since there is clearly code to call This is the
TIL When a flush event is send to a probe, Doing a bit of code archeology, I found that code to come from 05df690#diff-f6cc7b9fa85e4704ddb4a5f5e1f3300b29ee850f68259ddae19867015833836bR261-R263 Therefore the "flushCurrentBuffer() on FLUSH_START" code has been unreachable since https://commits.webkit.org/245491@main (22 Dec 2021). That change was committed because a previous commit (18 Oct 2021) to add processing of tag events had the unintended consequence of calling I hope we can agree that the stubborness of using early returns in the probe algorithm has turned it into spaghetti code, only worsening every time anyone wanted to add a feature. It took me quite a while to understand what the algorithm was supposed to do, both when making this PR, later when analyzing this bug, and in the end my analysis was still wrong! When I added logs, there was a total of 6 exit paths! So, where does this
The commit message stated:
And for this purpose it added a new function, TextureMapperPlatformLayerProxy::dropCurrentBufferWhilePreservingTexture(). I have no experience with TextureMapperPlatformLayerProxy or much of the graphics stack, and I don't understand how that code that just adds asynchronous operations would ensure that the GstMemory is released by the time a new buffer comes to the decoder, if it does at all. Also, it's clear as of now, and probably for a long time, it doesn't succeed in preventing flickering, as calling flushCurrentBuffer() has the visible side effect of consistently making the flushed frame invisible to the user. An older version of said code existed earlier before in the WPEWebKit fork, being originally committed in 10 Jun 2016. That version explicitly introduced flicker, its commit message reading:
This is the same flicker the later upstream commit would supposedly avoid with dropCurrentBufferWhilePreservingTexture(). Still, this is only flush events. Why are we doing anything on DRAIN queries? That comes from this WPEWebKit commit from 23 Jan 2015 Back then it was a different function:
Still, neither commit provides any explanation on why we need to call flushCurrentBuffer() on FLUSH_START events or DRAIN queries. There is currently code with a vague explanation, which I added as part of 4fd8d9a, but that comes from asking @eocanha.
So now I'm left wondering:
|
Some explanations about why the The Raspberry Pi OMX decoder (among others) decodes video onto buffers using a special (and very scarce) GPU memory. It must be GPU memory because that's the only way to achieve zero-copy on those platforms. That's because that GPU memory can (in some way that I don't exactly know) be converted/used as a GL texture for the GPU (hence zero-copy). Some context about a drain operation in GStreamer: https://gstreamer.freedesktop.org/documentation/additional/design/sparsestreams.html?gi-language=c#still-framemenu-support |
4348a5a
to
630eb70
Compare
EWS run on previous version of this PR (hash 630eb70)
|
630eb70
to
539d948
Compare
https://bugs.webkit.org/show_bug.cgi?id=244534 Reviewed by Philippe Normand. This patch is a combination of a follow-up of WebKit#3746, and a fix for a race condition I independently discovered before said PR was published. Minor cleanups for pr/3746 are done, additional logging is added, MediaPlayerPrivateGStreamer::m_abortableTaskQueue is renamed to MediaPlayerPrivateGStreamer::m_sinkTaskQueue, comments are added, and most importantly: the task queue is flushed when a GStreamer flush is received. This fixes a race condition on MSE flushes. To compensate the additional logic added to the WebKit video sink probe, this patch refactors the function, getting rid of the early return spaghetti it had slowly become over the years. In the process, I found the code for calling flushCurrentBuffer() on FLUSH_START in the previous code was actually unreachable and had been so for the most part of a year, as an accident of an early return in the old code (FLUSH_START events have the GST_PAD_PROBE_TYPE_EVENT_DOWNSTREAM flag. The old code did an early return when that flag was found for an event other than a tag event). Figuring out all bugs with flushCurrentBuffer() and what usages it should have is very much outside of the scope of this patch. Instead, this patch makes no change on behavior and adds a FIXME section explaining the situation and how it needs further work. * Source/WebCore/platform/graphics/gstreamer/GStreamerVideoSinkCommon.cpp: (webKitVideoSinkSetMediaPlayerPrivate): * Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.cpp: (WebCore::MediaPlayerPrivateGStreamer::~MediaPlayerPrivateGStreamer): (WebCore::MediaPlayerPrivateGStreamer::updateVideoOrientation): * Source/WebCore/platform/graphics/gstreamer/MediaPlayerPrivateGStreamer.h: (WebCore::MediaPlayerPrivateGStreamer::sinkTaskQueue): Canonical link: https://commits.webkit.org/254142@main
539d948
to
e55d190
Compare
Committed 254142@main (e55d190): https://commits.webkit.org/254142@main Reviewed commits have been landed. Closing PR #3802 and removing active labels. |
https://bugs.webkit.org/show_bug.cgi?id=257035 Reviewed by Xabier Rodriguez-Calvar. Before this patch, there were two classes having a m_hasAllTracks field: MediaSourcePrivateGStreamer and MediaPlayerPrivateGStreamerMSE. This redundancy lead to non-synchronization, and in consequence this was making flushes not occur on MSE, as SourceBufferPrivateGStreamer::flush() checked for the one in MediaPlayerPrivateGStreamerMSE which no code ever set to true. A visible consequence of this was video was being repeated on quality changes, due to the lack of flush of old frames. This patch fixes the issue by keeping the `hasAllTracks` state in one single place, in MediaSourcePrivateGStreamer. Note: The triggering of MSE flushes exposed several bugs on the handling of flushes in other parts of the code. It's important these are addressed to avoid regressions in behavior when incorporating this patch: (1) WebKit#3802 [GStreamer] MediaPlayerPrivateGStreamer: Abort stale tasks on flushes Without this patch, MSE flushes can deadlock. (2) https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/2413 appsink: Fix race condition on caps handling appsink used to have a race condition on the handling of caps after a flush that can cause crashes. A fix is present in GStreamer 1.20.3+. A workaround in the WebKit side is possible and desirable to avoid headaches in Linux distros, and has been uploaded as: WebKit#3965 [GStreamer] Introduce workaround for race condition in appsink (3) https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/3471 basesink: Support position queries after non-resetting flushes basesink doesn't answer position queries between a FLUSH_STOP with reset_time=FALSE and a SEGMENT event until GStreamer 1.24.0, which incorporates that patch. For backwards-compatibility -- and present-compatibility for that matter since GStreamer 1.24.0 has not been released yet, a workaround has been Proposed for WebKit: WebKit#14082 [MSE][GStreamer] Workaround basesink's lack of support for position queries between a non-resetting flush and a pre-roll * Source/WebCore/platform/graphics/gstreamer/mse/MediaPlayerPrivateGStreamerMSE.cpp: (WebCore::MediaPlayerPrivateGStreamerMSE::sourceSetup): * Source/WebCore/platform/graphics/gstreamer/mse/MediaPlayerPrivateGStreamerMSE.h: (WebCore::MediaPlayerPrivateGStreamerMSE::hasAllTracks const): Deleted. * Source/WebCore/platform/graphics/gstreamer/mse/MediaSourcePrivateGStreamer.cpp: (WebCore::MediaSourcePrivateGStreamer::addSourceBuffer): * Source/WebCore/platform/graphics/gstreamer/mse/MediaSourcePrivateGStreamer.h: * Source/WebCore/platform/graphics/gstreamer/mse/SourceBufferPrivateGStreamer.cpp: (WebCore::SourceBufferPrivateGStreamer::flush): Canonical link: https://commits.webkit.org/264510@main
https://bugs.webkit.org/show_bug.cgi?id=257035 Reviewed by Xabier Rodriguez-Calvar. Before this patch, there were two classes having a m_hasAllTracks field: MediaSourcePrivateGStreamer and MediaPlayerPrivateGStreamerMSE. This redundancy lead to non-synchronization, and in consequence this was making flushes not occur on MSE, as SourceBufferPrivateGStreamer::flush() checked for the one in MediaPlayerPrivateGStreamerMSE which no code ever set to true. A visible consequence of this was video was being repeated on quality changes, due to the lack of flush of old frames. This patch fixes the issue by keeping the `hasAllTracks` state in one single place, in MediaSourcePrivateGStreamer. Note: The triggering of MSE flushes exposed several bugs on the handling of flushes in other parts of the code. It's important these are addressed to avoid regressions in behavior when incorporating this patch: (1) WebKit#3802 [GStreamer] MediaPlayerPrivateGStreamer: Abort stale tasks on flushes Without this patch, MSE flushes can deadlock. (2) https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/2413 appsink: Fix race condition on caps handling appsink used to have a race condition on the handling of caps after a flush that can cause crashes. A fix is present in GStreamer 1.20.3+. A workaround in the WebKit side is possible and desirable to avoid headaches in Linux distros, and has been uploaded as: WebKit#3965 [GStreamer] Introduce workaround for race condition in appsink (3) https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/3471 basesink: Support position queries after non-resetting flushes basesink doesn't answer position queries between a FLUSH_STOP with reset_time=FALSE and a SEGMENT event until GStreamer 1.24.0, which incorporates that patch. For backwards-compatibility -- and present-compatibility for that matter since GStreamer 1.24.0 has not been released yet, a workaround has been Proposed for WebKit: WebKit#14082 [MSE][GStreamer] Workaround basesink's lack of support for position queries between a non-resetting flush and a pre-roll * Source/WebCore/platform/graphics/gstreamer/mse/MediaPlayerPrivateGStreamerMSE.cpp: (WebCore::MediaPlayerPrivateGStreamerMSE::sourceSetup): * Source/WebCore/platform/graphics/gstreamer/mse/MediaPlayerPrivateGStreamerMSE.h: (WebCore::MediaPlayerPrivateGStreamerMSE::hasAllTracks const): Deleted. * Source/WebCore/platform/graphics/gstreamer/mse/MediaSourcePrivateGStreamer.cpp: (WebCore::MediaSourcePrivateGStreamer::addSourceBuffer): * Source/WebCore/platform/graphics/gstreamer/mse/MediaSourcePrivateGStreamer.h: * Source/WebCore/platform/graphics/gstreamer/mse/SourceBufferPrivateGStreamer.cpp: (WebCore::SourceBufferPrivateGStreamer::flush): Canonical link: https://commits.webkit.org/264510@main
https://bugs.webkit.org/show_bug.cgi?id=257035 Reviewed by Xabier Rodriguez-Calvar. Before this patch, there were two classes having a m_hasAllTracks field: MediaSourcePrivateGStreamer and MediaPlayerPrivateGStreamerMSE. This redundancy lead to non-synchronization, and in consequence this was making flushes not occur on MSE, as SourceBufferPrivateGStreamer::flush() checked for the one in MediaPlayerPrivateGStreamerMSE which no code ever set to true. A visible consequence of this was video was being repeated on quality changes, due to the lack of flush of old frames. This patch fixes the issue by keeping the `hasAllTracks` state in one single place, in MediaSourcePrivateGStreamer. Note: The triggering of MSE flushes exposed several bugs on the handling of flushes in other parts of the code. It's important these are addressed to avoid regressions in behavior when incorporating this patch: (1) WebKit/WebKit#3802 [GStreamer] MediaPlayerPrivateGStreamer: Abort stale tasks on flushes Without this patch, MSE flushes can deadlock. (2) https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/2413 appsink: Fix race condition on caps handling appsink used to have a race condition on the handling of caps after a flush that can cause crashes. A fix is present in GStreamer 1.20.3+. A workaround in the WebKit side is possible and desirable to avoid headaches in Linux distros, and has been uploaded as: WebKit/WebKit#3965 [GStreamer] Introduce workaround for race condition in appsink (3) https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/3471 basesink: Support position queries after non-resetting flushes basesink doesn't answer position queries between a FLUSH_STOP with reset_time=FALSE and a SEGMENT event until GStreamer 1.24.0, which incorporates that patch. For backwards-compatibility -- and present-compatibility for that matter since GStreamer 1.24.0 has not been released yet, a workaround has been Proposed for WebKit: WebKit/WebKit#14082 [MSE][GStreamer] Workaround basesink's lack of support for position queries between a non-resetting flush and a pre-roll * Source/WebCore/platform/graphics/gstreamer/mse/MediaPlayerPrivateGStreamerMSE.cpp: (WebCore::MediaPlayerPrivateGStreamerMSE::sourceSetup): * Source/WebCore/platform/graphics/gstreamer/mse/MediaPlayerPrivateGStreamerMSE.h: (WebCore::MediaPlayerPrivateGStreamerMSE::hasAllTracks const): Deleted. * Source/WebCore/platform/graphics/gstreamer/mse/MediaSourcePrivateGStreamer.cpp: (WebCore::MediaSourcePrivateGStreamer::addSourceBuffer): * Source/WebCore/platform/graphics/gstreamer/mse/MediaSourcePrivateGStreamer.h: * Source/WebCore/platform/graphics/gstreamer/mse/SourceBufferPrivateGStreamer.cpp: (WebCore::SourceBufferPrivateGStreamer::flush): Canonical link: https://commits.webkit.org/264510@main
https://bugs.webkit.org/show_bug.cgi?id=257035 Reviewed by Xabier Rodriguez-Calvar. Before this patch, there were two classes having a m_hasAllTracks field: MediaSourcePrivateGStreamer and MediaPlayerPrivateGStreamerMSE. This redundancy lead to non-synchronization, and in consequence this was making flushes not occur on MSE, as SourceBufferPrivateGStreamer::flush() checked for the one in MediaPlayerPrivateGStreamerMSE which no code ever set to true. A visible consequence of this was video was being repeated on quality changes, due to the lack of flush of old frames. This patch fixes the issue by keeping the `hasAllTracks` state in one single place, in MediaSourcePrivateGStreamer. Note: The triggering of MSE flushes exposed several bugs on the handling of flushes in other parts of the code. It's important these are addressed to avoid regressions in behavior when incorporating this patch: (1) WebKit/WebKit#3802 [GStreamer] MediaPlayerPrivateGStreamer: Abort stale tasks on flushes Without this patch, MSE flushes can deadlock. (2) https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/2413 appsink: Fix race condition on caps handling appsink used to have a race condition on the handling of caps after a flush that can cause crashes. A fix is present in GStreamer 1.20.3+. A workaround in the WebKit side is possible and desirable to avoid headaches in Linux distros, and has been uploaded as: WebKit/WebKit#3965 [GStreamer] Introduce workaround for race condition in appsink (3) https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/3471 basesink: Support position queries after non-resetting flushes basesink doesn't answer position queries between a FLUSH_STOP with reset_time=FALSE and a SEGMENT event until GStreamer 1.24.0, which incorporates that patch. For backwards-compatibility -- and present-compatibility for that matter since GStreamer 1.24.0 has not been released yet, a workaround has been Proposed for WebKit: WebKit/WebKit#14082 [MSE][GStreamer] Workaround basesink's lack of support for position queries between a non-resetting flush and a pre-roll * Source/WebCore/platform/graphics/gstreamer/mse/MediaPlayerPrivateGStreamerMSE.cpp: (WebCore::MediaPlayerPrivateGStreamerMSE::sourceSetup): * Source/WebCore/platform/graphics/gstreamer/mse/MediaPlayerPrivateGStreamerMSE.h: (WebCore::MediaPlayerPrivateGStreamerMSE::hasAllTracks const): Deleted. * Source/WebCore/platform/graphics/gstreamer/mse/MediaSourcePrivateGStreamer.cpp: (WebCore::MediaSourcePrivateGStreamer::addSourceBuffer): * Source/WebCore/platform/graphics/gstreamer/mse/MediaSourcePrivateGStreamer.h: * Source/WebCore/platform/graphics/gstreamer/mse/SourceBufferPrivateGStreamer.cpp: (WebCore::SourceBufferPrivateGStreamer::flush): Canonical link: https://commits.webkit.org/264510@main
e55d190
539d948
🧪 api-gtk