New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix dropped frames when reading, encoding, add 10 bit support, and more #279
Fix dropped frames when reading, encoding, add 10 bit support, and more #279
Conversation
Treat N0f8 arrays like Gray{N0f8} arrays
Also the tests currently fail in part because the tests do not expect all frames to survive a round trip through VideoIO. I will work on writing tests for the new code, and updating tests to no longer expect the wrong result. However, I would definitely appreciate some help, if you have any time. |
src/avio.jl
Outdated
out_img_check(r, buf) = out_img_size_check(r, buf) && out_img_eltype_check(r, buf) | ||
|
||
function out_bytes_size(fmt, width, height, align = VIDEOIO_ALIGN) | ||
sz = av_image_get_buffer_size(fmt, width, height, VIDEOIO_ALIGN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be align
src/encoding.jl
Outdated
pret = Cint(0) | ||
while pret >= 0 | ||
pret = avcodec_receive_packet(encoder.codec_context, encoder.packet) | ||
@show encoder.packet.pts, encoder.packet.dts, encoder.packet.duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be deleted. Will add commit soon.
test/avptr.jl
Outdated
end | ||
|
||
@testset "@avptr" begin | ||
@avptr AVPtrTestParentStruct TestParentStruct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incomplete test....
Thanks for the heads-up. I'll try to take a look at this later today (maybe later in the week). An initial comment is that the solution I implemented in #277 is pretty basic; it should transfer all the image planes (if I understand the terminology correctly - I might not as I'm quite new to this) but it does so in a very simplistic way in that it just dumps the entire frame (including all image planes) into a since It looks like your solution is a lot more comprehensive than mine and so I'd be happy if my changes were undone assuming that I can still get hold of the untranscoded frames somehow. |
I haven't yet reviewed the code yet, but the intentions and fixes sound excellent. Thank you. I'm not against API changes in favor of more logical designs. Perhaps we should start a roadmap to version 1 |
It might work in practice for multi-planar image formats, but I'm not sure it will work in general. An |
That definitely sounds like a better way of doing things. I have to admit that I was relying on the fact that the code path I was working on didn't work in the first place and so something is better than nothing. I started working my way through the FFMPEG API yesterday and was rapidly coming to the conclusion that things were not done in an ideal way. |
@ianshmean starting a roadmap to v1 sounds like a good idea. I have some other suggestions for breaking changes that I think would be an improvement to VideoIO's API, but will wait for such a road map to propose them. |
Thanks for the continued work on this @galenlynch Before we merge:
|
No problem. At this point I'm just trying to remove any bugs that I find as I use this PR, which I've been using for all of my decoding and encoding needs. We still need to decide whether we will change the names of functions used to decode and encode videos, or instead change how they are used.
I could take a look at splitting off non-breaking patches, but it would probably take a bit of effort giving how much of the plumbing has changed.
Not on my end. I would be happy to throw some stuff in if you have suggestions, or at least try to. It might not be too hard to do while everything's still "fresh" in my mind. However I'm not sure if I could devote a lot of time to it at the moment.
Sounds good
I haven't really been paying attention to this since it doesn't seem to be correlated with whether tests pass or fail for me locally, and I don't understand it. Is this just testing on ARM? I'm confused why method errors would be present on one platform and not the others. I could look into it though. |
Great
First thing I'd suggest is bumping drone to 1.5. If issues remain, I can help look into it.
I'd say go with your new names and depreciate the old ones.
Let's not worry. Onward and upward How's this for a plan..
|
Only export `append_encode_mux!`.
@galenlynch It'd be good to move this forward. Can you review and merge this into your branch, to tidy up the CI failures? |
Fix CI [merges into add_10bit_gray_encoding]
Yeah sorry that I haven't made much forward progress recently: had a few looming deadlines. I should be more free after next week, though I might also be able to do some work this weekend. |
@galenlynch how are things looking? Shall we merge and move on to the deprecations and updating docs? |
Sounds good to me! Was just about to check in to see what the status of this was. |
I was just about to merge, but wondered what kind of merge we should do here. I usually squash, but is there reason to keep the commits separate? @galenlynch if I don't hear back in a few hours I'll squash merge |
Exciting! So next steps:
|
Perhaps we could also work out FileIO integration for the 0.9 release |
Super exciting!!! Thanks for this!!! |
@galenlynch I don't think the changelog does everything in this PR justice. Could you add a few brief notes to the changelog, perhaps linking to the appropriate PR (most of which will be this one) |
Sure |
@galenlynch sorry to pester, but when do you think you could update the changelog? I'm hoping to not have v0.9 linger any longer |
Yeah sorry I can do it this weekend? |
That'd be great, thanks |
I apologize in advance for the vague title of this PR. While this PR started as a simple change to add support for encoding and decoding 10 bit gray videos, it expanded in scope as I ran into other bugs lurking in the code base that prevented forward progress on adding 10 bit support. This is definitely a work in progress: while the functionality is 100% complete, the documentation and testing are definitely not. Help would be appreciated.
I will attempt to summarize the changes made in this PR, and where possible motivate each change with a bug that it fixes or code that it simplifies. I will no doubt need to expand upon this list, as many parts of the encoding and decoding pipeline have been touched.
Possible conflict with master
In rebasing onto master, there was a merge conflict with the changes made in PR #277. I believe that the PR here also fixes the same problem, but in a slightly different way as this PR refactors some of the underlying code. I am not 100% positive that my changes will also work for @dawbarton, so I would appreciate his attention/review. While some of the difference is just a question of refactoring, I believe that there is also a semantic change between this PR and #277: in #277 I believe only the first image plane is transferred, while in this PR all image planes would be transferred. This difference would be important for many multi-planar formats, e.g. YUV formats, that are popular for webcams. I am not sure that #277 would work for such multi-planar formats. One nuance that should probably be discussed is that returning the raw image planes from FFMPEG may not be very helpful without the alignment that is used for the buffer. This is made more complicated by the fact the original code used a deprecated method that was not aware of this buffer alignment.
Prevent dropped frames by not creating a damaged MPEG-2 stream
Frames were being dropped because the intermediate MPEG-2 elementary stream file was missing a start code, as documented in #271. I have fixed this bug for the video encoding workflow that uses
VideoEncoder
, but it is not relevant for the newVideoWriter
file and workflow that is part of this PR.Prevent final encoded frame from being dropped due to it having zero duration
The final frame of encoding streams are often present in the resulting video container, but not shown due to an edit list. This has the effect of dropping the final frame of an encode, though it was present in the video stream. After much head scratching, it turns out that this was because the packet duration being fed into the muxer was zero, and libav would occasionally hide it as a result. This problem, and the fix, is documented on this stack overflow question that I posted.
Come to think of it, this may not have been a bug in the original implementation, and instead fixes a bug that plagued my PR for a while. However, the files being produced by VideoIO had strange timestamp durations that ffmpeg would complain about at high verbosity levels, and that some documentation I read online suggests may become unsupported at some point.
Start encoding frames at the PTS expected by libav: 0
The function
encodevideo
feeds frames into the encoder starting at timestamp 1. This produces videos that have strange edit lists, and may also result in surprising video-audio synchronization results when adding audio tracks to video streams produced by VideoIO. I have NOT changed the behavior ofencodevideo
in this PR, but function that is the equivalent toencodevideo
for the new encoding workflow that I add to this PR,encode_mux_video
, frames start at timestamp 0, which seems to be what libav expects. This produces videos that do not have strange edit lists, since this video stream does not need to be delayed some number of samples. I think this is also consistent with what users would expect when adding audio tracks: video and audio would start at the same time, instead of the video being delayed by one frame period.Prevent dropped frames while reading by flushing decoder
The decoding loop for
VideoReader
/AVInput
objects would not flush the decoder once all packets were read from a video container. In my hands with x264 streams in mp4 files, this would result in the final two frames from a video not being produced by VideoIO. This PR properly flushes the decoder at the end of a file to drain any frames buffered by the decoder.Use libav functions to set private and non-private encoder options
For the new encoding workflow that I have added as part of this PR, I use libav's interface to set both encoder settings and private encoder settings. Currently, VideoIO sets encoder settings by modifying the struct field names listed in the
AVCodecContextProperties
keyword argument. This change means that the names of encoder settings matches the names used in the command-line documentation instead of the names of fields in the libav C struct. I think it is perhaps easier for users to access the ffmpeg command line documentation instead of the developer documentation. It also means the error checking on non-private setting names is done by libav, where as currently no error checking is done.Encoder options are set with the keyword arguments
encoder_settings
andencoder_private_settings
, instead of keyword argumentAVCodecContextProperties
. I also use named tuples rather than arrays of pairs, which I feel is more ergonomic. So instead of doingencodevideo(..., AVCodecContextProperties = [:priv_data => ("crf"=>"22", "preset"=>"medium")])
, users would instead doencode_mux_video(..., encoder_private_settings = (crf = 22, preset = "medium"))
.Make seeking consistent
Seeking currently conflates PTS and DTS, as described in #275, and uses a strange offset when seeking into a file. In this PR PTS is used instead of DTS. This also makes
seek
andgettime
have the same notion of time.Change encoding workflow
Currently, the encoding workflow in VideoIO encodes frames, writes the encoded frames to a MPEG-2 elementary stream, and then using the FFMPEG binary to mux that stream into a mp4 container. This workflow uses the
VideoEncoder
type to store all of the necessary pieces to encode a video file. I have added a new workflow that uses libav'sAVFormat
API, that will mux and write video packets as they are produced, instead of creating an intermediate file. This also has the benefit of being more general down the road, and also allowing the encoder to be properly configured for the video container format (which is not currently being done). By relying on theAVFormat
API, I think it would be quite easy to add support for other container formats, while still checking that the encoder chosen by the user is compatible with the chosen format, etc.This new workflow has a simple correspondence to the existing workflow in VideoIO, and does not require a sepearte mux step.
encode_video
encode_mux_video
prepareencoder
open_video_out!
appendencode!
append_encode_mux!
finish_encode!
close_video_out!
close(io)
mux
I have not removed the old workflow, and indeed have fixed a few bugs with it. I personally think the new workflow is better than the old, and should at least be the default. However, whether the existing workflow should be removed is good question for discussion.
Add support for 10 bit gray and RGB decoding
You can now decode videos that have 10bit YUV encoding, receive them either as RGB or Gray buffers.
Add support for 10 bit gray and RGB encoding
You can now encode videos that have 10bit YUV encoding, and write them as either Gray or RGB buffers.
Add support for encoding frames that are "scanline major"
Most video packages expect pixels with neighboring x position and the same y position, that is pixels adjacent on the same scanline, to be adjacent in memory. Encoding frames that have this memory layout required the use of
PermutedDimsArray
with VideoIO. I have added a keyword argument,scanline_major
, that allows the use of this pervasive memory format in VideoIO when encoding videos. Indeed, libav itself uses this format. As such, encoding video frames that are scanline major requires less work on VideoIO's end, and can be as simple as copying memory.Eliminate use of deprecated functions in libav
I have removed calls to deprecated libav functions and replaced them with the recommended alternatives.
avpicture_get_size
av_image_get_buffer_size
avcodec_decode_video2
avcodec_send_packet
andavcodec_receive_frame
Replacing
avcodec_decode_video2
required changing the logic of the decoding loop, in order to handleEAGAIN
errors as per the libav documentation.There is one more deprecated function that I replaced, but I am having a hard time finding which function that was.
Make implicit memory layout requirement of decoding pipeline explicit
The video decoding pipeline seemed to require that the frame buffer supplied by users by scanline-major, as described above, but did not require that the array be indexed one way or another. I have made this requirement explicit, and disabled some code that I think would have errored previously because it violated this assumption.
Use method dispatch in favor of if-else statements where possible
Instead of big if-else statements that match user-supplied buffer type with pixel format, and scale poorly with the number of buffer and pixel types supported, I have moved much of this logic into method dispatch. The type of the user-supplied buffer now uses method dispatch, instead of if-else statments, when encoding and decoding video.
Refactor repeated code snippets into functions
Some code snippets, such as "pumping" a video container for frame data, were repeated many times in many different functions. I have tried to refactor this snippets into a single function where I noticed them.
Make decoding image buffer compatible with multi-planar image formats
The image buffer used by the decoding logic to handle containers with multiple video streams did not account for the use of multi-planar image formats. I have used libav functions to transfer the image buffers to the VideIO maintained image buffer, which allows it work with multi-planar image formats. I am unclear if this image buffer is ever used, but at least it's now not wrong.
New interface to interact with nested C structs in libav
One large change in this PR is that it introduces a new type,
NestedCStruct
, and associated methods for to make interacting with nested C structs require less code in Julia, and additionally fixes some potential memory bugs in the existing code. This type and related functions are in the new fileavptr.jl
. Interacting with libav (FFMPEG) requires receiving, modifying, and passing structs that are part of the API of that library. Interacting with these C structs requires modifying their fields and generating pointers to them. The existing codebase would often store these structs as length 1 arrays of Julia structs that match the memory layout of the C struct. However, accessing this memory and modifying individual fields would often boil down to generating pointers to the array, and then modifying fields of the Julia struct directly with pointer logic, requiring manyunsafe_store!
andunsafe_load
statements throughout the codebase, without any calls toGC.@preserve
etc to ensure that the underlying memory was not garbage collected before the pointer was used. Additionally, even if theGC.@preserve
macros were added in the right places, by my reading of Julia's semantics on getting pointers to fields of Juliastruct
s is that it's not supported, and while the existing code might work at the moment there is no guarantee that future compiler or garbage collector optimizations will not break it.Beyond the potential memory bugs, interacting with these c structs would require a fair amount of boiler-plate code, which I suspect led to some C structs of interest (e.g.
AVStream
) being cached as fields of multiple mutable structs in VideoIO. This meant that there was no single authoritative source of a piece of information. For example, the contents ofStreamInfo
in the current code base is entirely redundant with theVideoReader
struct. If accessing and manipulating the contents of nested C structs were easier, I suspect that there would be less redundant information the current types of VideoIO.My solution,
NestedCStruct
, simply acts like a typed double pointer to a C struct. It does not "own" the underlying memory, though it can be used withfinalizer
s to automatically manage memory that is dynamically allocated by libav. Reading fields from the C struct, and modifying them, is automatically accomplished using Julia's reflection capabilities, and is also done so in a way that (I believe) is memory-safe and protects the memory from being prematurely garabge collected. This means that functions in VideoIO do not need to invoke unsafe functions (unsafe_store!
,unsafe_convert
, etc) to manipulate the C struct. Interaction with C functions are relatively painless due tounsafe_convert
functions that will provide double or single pointers to the wrapped C struct as required by the C function's type signature. Importantly, because the memory is internally represented as a pointer, and not a Juliastruct
, this design no longer relies on Julia behavior that works for now but is not actually supported.This type, and the functions that go along with it, is likely a subset of Blobs.jl. Whether VideoIO wants to maintain its own code to access nested C structs, or instead wants to add a dependency, is a good question for discussion. Since I was able to make this solution work, and IMO be very ergonomic to work with, in ~82 LOC (with lots of spacing) I think it might not be worth it to add a new dependency.
I have additionally added a macro,
@avptr
, which makes it easy to assign an alias to aNestedCStruct
that wraps a particular C struct, and additionally make constructors that call allocating c functions, and attach finalizers to the resulting object that call the appropriate destructor. This allows types in VideoIO to have fields of types that have similar names to the struct name in libav, for exampleAVFramePtr
which is just an alias forNestedCStruct{AVFrame}
. It also reduces the amount of boiler plate required to use libav's functions to allocate the C struct, and to free it when Julia no longer uses it.The consequence of this new type is eliminating a lot of boiler plate across VideoIO necessary to interact with C structs, eliminating redundant fields in many of VideoIO's types, and eliminating some types all together (e.g.
StreamInfo
). Additionally, accessing C structs through multiple layers of nesting in Julia looks much more like accessing these structs in C, except that the->
operator is replaced by a.
in Julia code. In contrast, doing the same in the existing code base would require (and often did) usingunsafe_load
at each layer of nesting, and would further require the use ofGC.@preserve
at multiple stages to avoid memory bugs (although that was absent in the code).Use properly initialized libav structs
AVFrame
s in VideoIO were not initialized by libav, but instead were initialized by Julia. This results in many of the fields being initialized to invalid, or non-default values from the perspective of libav. While it is unclear if passing these improperly configured structs to libav was causing any problems, some behavior of VideoIO relied on the default value assigned by Julia, and not the default value assigned by libav.Reduce type instability
Some of VideoIO's types, had abstract field types that did not need to be abstract. For example,
VideoReader
's fieldavin
was typeAVInput
, which is abstract. I have parameterized (and made concrete) the type of these fields where possible, which reduces type instability in the generated code.Use locks around functions that are not thread safe
Some libav functions, such as
avcodec_open2
, are not thread-safe according to the libav documentation. I have added a constant globalReentrantLock
,VIDEOIO_LOCK
, to VideoIO and used it to prevent multiple Julia threads from using non-thread-safe functions.closes #270, closes #271, closes #274, closes #275, closes #242, closes #283, closes #284, closes #285