Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the PTS offset logic error when first reading a file with FFmpegReader #565

Merged
merged 2 commits into from Sep 12, 2020

Conversation

jonoomph
Copy link
Member

Use the calculated 0 - PTS, unless it is too large (more than 1 second off from zero). This is related to a previous PR (with much discussion): #311. This PR fixes the regression related to limiting the offset, and I think makes more sense than previous suggestions.

The PTS offset was always designed as a quick way to "zero" out the video PTS and audio PTS values, so we could calculate frame numbers, based on the starting PTS values, etc... But in my testing, some videos are encoded with negative timestamps, especially common with audio tracks, where the audio starts before the 1st video frame. OpenShot is not really designed to deal with this, and it causes us to search for matching packets which never come. This could/does introduce some amount of audio drift, just depending on how negative the audio timestamps are. Also, in my testing, I've come across videos that have garbage timestamps (usually negative values), for example -9999999999999999, and then the rest of the packets have normal PTS values. So, this is the background on why I want to limit the offset to some reasonable amount (-1 to +1 seconds in timebase units).

I've done some testing on this with my crazy test videos, and this seems to work great.

…der. Use the calculated 0 - PTS, unless it is too large (more than 1 second off from zero)
@jonoomph
Copy link
Member Author

@ferdnyc Please take a look, and let me know that I'm not crazy, haha. I think this is much "saner" than before, and saner than not filtering out crazy values.

@codecov
Copy link

codecov bot commented Sep 11, 2020

Codecov Report

Merging #565 into develop will increase coverage by 1.22%.
The diff coverage is 75.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #565      +/-   ##
===========================================
+ Coverage    48.66%   49.88%   +1.22%     
===========================================
  Files          129      129              
  Lines        10043    10846     +803     
===========================================
+ Hits          4887     5411     +524     
- Misses        5156     5435     +279     
Impacted Files Coverage Δ
src/FFmpegReader.cpp 68.04% <75.00%> (-0.01%) ⬇️
include/ReaderBase.h 71.42% <0.00%> (-28.58%) ⬇️
include/Timeline.h 59.25% <0.00%> (-9.50%) ⬇️
include/Clip.h 83.33% <0.00%> (-5.56%) ⬇️
src/Clip.cpp 38.00% <0.00%> (-1.72%) ⬇️
tests/Timeline_Tests.cpp 100.00% <0.00%> (ø)
src/CacheDisk.cpp 72.13% <0.00%> (+0.43%) ⬆️
src/Timeline.cpp 45.20% <0.00%> (+6.49%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6cc00d6...7dc9eb6. Read the comment docs.

@ferdnyc
Copy link
Contributor

ferdnyc commented Sep 11, 2020

@ferdnyc Please take a look, and let me know that I'm not crazy, haha. I think this is much "saner" than before, and saner than not filtering out crazy values.

I'll take a look! I'm in my phone now, so I only managed a quick perusal with too little context. One question, though: Why is there a GetVideoPTS() method, but then the audio computation just uses packet->pts? Struck me as odd that they'd be different, but I don't yet know what the contents of GetVideoPTS() are.

@jonoomph
Copy link
Member Author

@ferdnyc That method does a little bit of checking before deciding which timestamp to return (PTS or DTS in some situations)

	int64_t current_pts = 0;
	if (packet->dts != AV_NOPTS_VALUE)
		current_pts = packet->dts;

@ferdnyc
Copy link
Contributor

ferdnyc commented Sep 11, 2020

@ferdnyc That method does a little bit of checking before deciding which timestamp to return (PTS or DTS in some situations)

	int64_t current_pts = 0;
	if (packet->dts != AV_NOPTS_VALUE)
		current_pts = packet->dts;

In every situation where it's set, in fact. ...That feels off to me.

I'm worried that I'm re-hashing #311 here, but... DTS is the decode timestamp, so by unilaterally overriding the presentation timestamp with it, isn't the code saying, "Always store these frames in the order they were decoded, even if we know they're out of order from how they should be shown."? That's how I read the code above — which sounds off to me.

It's also very different from what ffplay does. Here's how their decoder_decode_frame handles frame timestamps (ffplay.c@L602:626)

switch (d->avctx->codec_type) {
    case AVMEDIA_TYPE_VIDEO:
        ret = avcodec_receive_frame(d->avctx, frame);
        if (ret >= 0) {
            if (decoder_reorder_pts == -1) {
                frame->pts = frame->best_effort_timestamp;
            } else if (!decoder_reorder_pts) {
                frame->pts = frame->pkt_dts;
            }
        }
        break;
    case AVMEDIA_TYPE_AUDIO:
        ret = avcodec_receive_frame(d->avctx, frame);
        if (ret >= 0) {
            AVRational tb = (AVRational){1, frame->sample_rate};
            if (frame->pts != AV_NOPTS_VALUE)
                frame->pts = av_rescale_q(frame->pts, d->avctx->pkt_timebase, tb);
            else if (d->next_pts != AV_NOPTS_VALUE)
                frame->pts = av_rescale_q(d->next_pts, d->next_pts_tb, tb);
            if (frame->pts != AV_NOPTS_VALUE) {
                d->next_pts = frame->pts + frame->nb_samples;
                d->next_pts_tb = tb;
            }
        }
        break;

They don't even touch the packet timestamps at all, and they don't look at DTS unless they're explicitly prevented from reordering frames by decoder_reorder_pts. If they're explicitly told they should reorder, then the pts is left (but the frame pts, which isn't necessarily the pkt_pts that's also part of AVFrame.) In the absence of any explicit decoder_reorder_pts setting in either direction, the library handles the packet timestamps and cooks the available timestamp data down into the best_effort_timestamp provided in the AVFrame structure.

That makes sense to me, as trying to reorder packets before they're decoded into frames feels like rearranging deck chairs on a SCUD missile. You know the thing's going to be thrown away when it's decoded into a more usable format, so why even mess with it?

From the ffplay code, it sounds to me like when the library APIs create an AVFrame from an AVPacket, they're already handling the stuff we're [re-, or I guess pre-]doing in not only GetVideoPTS, but some or all of ConvertFrameToAudioPTS, ConvertFrameToVideoPTS, ConvertVideoPTStoFrame, and perhaps even GetAudioPTSLocation.

If we just hand every packet to avcodec_receive_frame(), and then deal with timing only after we get the AVFrame back, is there a chance that could work? Possibly even work better?

@jonoomph
Copy link
Member Author

That makes sense to me, as trying to reorder packets before they're decoded into frames feels like rearranging deck chairs on a SCUD missile.

I love it! I think your concerns and suggestions are worth investigating for sure! However, for this PR, I'm fixing the really broken logic back to what I initially intended. I'll add a TODO above this code with some links back to your suggestion, as I'm sure we'll revisit this again. Thx!

@jonoomph jonoomph merged commit 9c35e55 into develop Sep 12, 2020
@jonoomph jonoomph deleted the fix-pts-offset-logic branch September 12, 2020 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants