Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

variable frame rate video #32

Closed
mkassner opened this issue Jul 8, 2014 · 10 comments
Closed

variable frame rate video #32

mkassner opened this issue Jul 8, 2014 · 10 comments

Comments

@mkassner
Copy link
Contributor

mkassner commented Jul 8, 2014

Hi,

First of, this project is fantastic, we had be searching for useable ffmpeg python bindings for a while now. Really enjoying PyAV.

We want to record video from a live source and use recording timestamps at pts for each frame since the source may change frame rate.

Looking at video/stream.pyx L116-L120 i can see that frame.pts gets scaled. But frame.time_base is always 0/0 thus creating non-sensical pts. I cannot figure out how to set frame.time_base, before I add this as a variable I wanted to better understand your motivation for the above mentioned lines.

I guess the question comes down to understanding the input args for lib.av_rescale_q.

Could you give me a pointer?

thanks!

@mikeboers
Copy link
Member

In all honestly, the time representation needs a massive overhaul in general, and especially hasn't gotten much attention on the encoding side of things.

That said, give me a bit to take a look and try to answer the actual question. =P

On Jul 8, 2014, at 1:30 AM, mkassner notifications@github.com wrote:

Hi,

First of, this project is fantastic, we had be searching for useable ffmpeg python bindings for a while now. Really enjoying PyAV.

We want to record video from a live source and use recording timestamps at pts for each frame since the source may change frame rate.

Looking at video/stream.pyx L116-L120 i can see that frame.pts gets scaled. But frame.time_base is always 0/0 thus creating non-sensical pts. I cannot figure out how to set frame.time_base, before I add this as a variable I wanted to better understand your motivation for the above mentioned lines.

I guess the question comes down to understanding the input args for lib.av_rescale_q.

Could you give me a pointer?

thanks!

@mikeboers
Copy link
Member

I do believe that the absolute best thing to do is overhaul all time representations, as "discussed" in #25.

In the mean time, the time_base on a Frame could be exposed for modification (but this mutability would go away with the time overhaul since it would be meaningless).

Take a look at the definition of av_rescale_q and the docs for av_rescale_rnd (from the FFmpeg docs); it is essentially changing a number of ticks from one time_base to another.

@mikeboers
Copy link
Member

....and I just noticed that your fork (from Mark's fork) does expose the time_base. =P

@markreidvfx
Copy link
Contributor

Yeah the easiest fix is to add a time_base property to the Frame so you can set it. The presentation time stamp (pts) and time base are used to determine the time a frame is to be displayed, Pts * float(time_base) = seconds ( I think). When encoding the pts of the frame needs to be in the same "time base" as the codec. The scale is there so the frame's pts can be in any time base and it converts it for you. It can be really confusing all these fractions, Mike and I we're thinking of creating a Timestamp object or something to help make it them easier to deal with.

@mkassner
Copy link
Contributor Author

mkassner commented Jul 9, 2014

hi,

here my summary of time representation:

Time_base (fraction) is the conversion factor to multiply the uint64 pts value to seconds.

two time bases that we care about exsist:
time_base of the stream (AVStream) this is used for the packet pts/dts
time_base fo the codec (AVCodecContext) this is used for the frame

going from packet pts to frame pts when decoding:
frame.pts = av_rescale_q ( packet. pts , packetTimeBase , frameTimeBase )

..when encoding:
packet.pts = av_rescale_q ( frame. pts , frameTimeBase,packetTimeBase )

Setting the time_base:
The timebase of the codec is settable (and only settable at the beginning):
currently in PyAV this is done container.add_stream(codec,codec_timebase)

The timebase of the stream is not user settable. It is determined by ffmpeg.
The stream timebase uses the codec timebase as a hint to find a good value.
The stream timebase in influenced by the contraints/rules of the container as well.
Only when the header header of the stream is written stream.time_base is guaranteed
to be valid and should only now be accessed.

--- end notes

Can you confirm my findings? I had a hard time finding good docs on this.

frame.time_base should be set to codec timebase on encode. When the user creates a frame they should probably set it themselves.

Writing it out like this I guess it can be problematic because a frame can come from one codec and be destined for another... But you already have this solved by using av_rescale_q twice in encode().

Then one could introduce a class property .pts_seconds:

    property pts_seconds:
        """Presentation time stamp of this frame. in seconds"""
        def __get__(self):
            if self.ptr.pts == lib.AV_NOPTS_VALUE:
                return None
            return self.pts*self.time_base #timebase should be taken from the codec on decode
        def __set__(self, value):
            if value is None:
                self.ptr.pts = lib.AV_NOPTS_VALUE
            else:
                self.ptr.pts = value/self.time_base

@mikeboers
Copy link
Member

I also did not find any good docs on this; a very thorough description of how time works in the various libraries would be really nice to have.

AVCodecContext and AVStream both say "This is the fundamental unit of time (in seconds) in terms of which frame timestamps are represented.". The first example video I threw at it has a different time_base on each, and they appear to be related by AVCodecContext.ticks_per_frame.

From what I can see with a few examples, while decoding both the packet and frame pts/dts are expressed in AVStream.time_base. The only time we use the AVCodecContext.time_base is when actually encoding to set the frame's pts/dts. This does not make a ton of sense to me.

A sticky part in what you have written (that you may be aware of, but I don't know from this) is that due to frame re-ordering (and other effects), the i-th frame does not necessarily link to the i-th packet, nor is there a guaranteed 1:1 relationship in the quantities of packets or frames.

This setting of a time_base on a frame is only tricky because of FFmpeg/Libav's structure not matching what we would ideally want to do in Python (e.g. encoding an arbitrary frame into a stream, or attaching a manually constructed stream to a container.

... the further I dig into this the less sense it makes. facepalm

Mike

On Jul 8, 2014, at 11:51 PM, mkassner notifications@github.com wrote:

hi,

here my summary of time representation:

Time_base (fraction) is the conversion factor to multiply the uint64 pts value to seconds.

two time bases that we care about exsist:
time_base of the stream (AVStream) this is used for the packet pts/dts
time_base fo the codec (AVCodecContext) this is used for the frame

going from packet pts to frame pts when decoding:
frame.pts = av_rescale_q ( packet. pts , packetTimeBase , frameTimeBase )

..when encoding:
packet.pts = av_rescale_q ( frame. pts , frameTimeBase,packetTimeBase )

Setting the time_base:
The timebase of the codec is settable (and only settable at the beginning):
currently in PyAV this is done container.add_stream(codec,codec_timebase)

The timebase of the stream is not user settable. It is determined by ffmpeg.
The stream timebase uses the codec timebase as a hint to find a good value.
The stream timebase in influenced by the contraints/rules of the container as well.
Only when the header header of the stream is written stream.time_base is guaranteed
to be valid and should only now be accessed.

--- end notes

Can you confirm my findings? I had a hard time finding good docs on this.

frame.time_base should be set to codec timebase on encode. When the user creates a frame they should probably set it themselves.

Writing it out like this I guess it can be problematic because a frame can come from one codec and be destined for another... But you already have this solved by using av_rescale_q twice in encode().

Then one could introduce a class property .pts_seconds:

property pts_seconds:

"""Presentation time stamp of this frame. in seconds"""

def get(self):

if self.ptr.pts == lib.AV_NOPTS_VALUE:

return None

return self.pts*self.time_base #timebase should be taken from the codec on decode

def set(self, value):

if value is None:

self.ptr.pts = lib.AV_NOPTS_VALUE

else:

self.ptr.pts = value/self.time_base

Reply to this email directly or view it on GitHub.

@mikeboers
Copy link
Member

My confusion could be coming from using av_best_effort_timestamp.

Now my frames simply do not have a pts.

The best description I have found is in an old tutorial.

@mkassner
Copy link
Contributor Author

I understand that this part is tricky, going through ffmpeg source the trail goes from avcodec_decode_video2 to av_frame_get_best_effort_timestamp to guess_correct_pts.
The exact workings of this I would really prefer not to bother with.

From my tests I found the frame pts can be set during encoding and read out properly during decoding for mpeg4 in mp4 container. This allows me to read and write variable frame rate video.

Here my understanding so far:

Encoding:

  • I set frame.pts and frame.time_base before encoding.
  • PyAV implementation then scales pts to codec_context timebase.
  • Frame gets encoded into packets.
  • The resulting packets pts then gets scaled to stream timebase.

decoding:

  • PyAV does not do much (concerning pts), simply call avcodec_decode_video2
  • The resulting frame is in stream timebase and appropriately scaled pts.

I find it a bit confusing that before encoding the timebase is codec based and after decoding it is stream based. But I can certainly live with that.

@markreidvfx
Copy link
Contributor

I think a reason the timebase of the encoded packet in AVCodecContext.time_base is because avcodec_encode_video2 doesn't know what stream the packet is going to be added to.
If this is true then It would make sense to me that when encoding the frame's pts should also be in the AVCodecContext.time_base, I'm still trying to find out if thats the case.

here is the example, where I believe those the av_rescale_q code came from
https://www.ffmpeg.org/doxygen/trunk/transcoding_8c_source.html#l00387
It looks like it they also scale AVPacket.duration, which we probable should too.

It might be more flexible to do this scaling in OutputContainer.mux, but AVPacket doesn't seem to have a time_base attribute.
https://www.ffmpeg.org/doxygen/trunk/structAVPacket.html
but we could add time_base also to our packet object, or as I suggested use a timestamp object that combines timestamp and time_base for all timestamps instead. If we do that then the scaling can occur when ever we want. we could just use doubles (in seconds) to represent timestamps, but that might change timestamps values when doing a copy stream type muxing, it sounds better to me to preserve what we get.

@mikeboers
Copy link
Member

Does this paragraph on time accurately represent what we believe to be the case of time_base(s)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants