Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dropped frames when reading, encoding, add 10 bit support, and more #279

Merged
merged 102 commits into from Mar 11, 2021

Conversation

galenlynch
Copy link
Collaborator

@galenlynch galenlynch commented Nov 23, 2020

I apologize in advance for the vague title of this PR. While this PR started as a simple change to add support for encoding and decoding 10 bit gray videos, it expanded in scope as I ran into other bugs lurking in the code base that prevented forward progress on adding 10 bit support. This is definitely a work in progress: while the functionality is 100% complete, the documentation and testing are definitely not. Help would be appreciated.

I will attempt to summarize the changes made in this PR, and where possible motivate each change with a bug that it fixes or code that it simplifies. I will no doubt need to expand upon this list, as many parts of the encoding and decoding pipeline have been touched.

Possible conflict with master

In rebasing onto master, there was a merge conflict with the changes made in PR #277. I believe that the PR here also fixes the same problem, but in a slightly different way as this PR refactors some of the underlying code. I am not 100% positive that my changes will also work for @dawbarton, so I would appreciate his attention/review. While some of the difference is just a question of refactoring, I believe that there is also a semantic change between this PR and #277: in #277 I believe only the first image plane is transferred, while in this PR all image planes would be transferred. This difference would be important for many multi-planar formats, e.g. YUV formats, that are popular for webcams. I am not sure that #277 would work for such multi-planar formats. One nuance that should probably be discussed is that returning the raw image planes from FFMPEG may not be very helpful without the alignment that is used for the buffer. This is made more complicated by the fact the original code used a deprecated method that was not aware of this buffer alignment.

Prevent dropped frames by not creating a damaged MPEG-2 stream

Frames were being dropped because the intermediate MPEG-2 elementary stream file was missing a start code, as documented in #271. I have fixed this bug for the video encoding workflow that uses VideoEncoder, but it is not relevant for the new VideoWriter file and workflow that is part of this PR.

Prevent final encoded frame from being dropped due to it having zero duration

The final frame of encoding streams are often present in the resulting video container, but not shown due to an edit list. This has the effect of dropping the final frame of an encode, though it was present in the video stream. After much head scratching, it turns out that this was because the packet duration being fed into the muxer was zero, and libav would occasionally hide it as a result. This problem, and the fix, is documented on this stack overflow question that I posted.

Come to think of it, this may not have been a bug in the original implementation, and instead fixes a bug that plagued my PR for a while. However, the files being produced by VideoIO had strange timestamp durations that ffmpeg would complain about at high verbosity levels, and that some documentation I read online suggests may become unsupported at some point.

Start encoding frames at the PTS expected by libav: 0

The function encodevideo feeds frames into the encoder starting at timestamp 1. This produces videos that have strange edit lists, and may also result in surprising video-audio synchronization results when adding audio tracks to video streams produced by VideoIO. I have NOT changed the behavior of encodevideo in this PR, but function that is the equivalent to encodevideo for the new encoding workflow that I add to this PR, encode_mux_video, frames start at timestamp 0, which seems to be what libav expects. This produces videos that do not have strange edit lists, since this video stream does not need to be delayed some number of samples. I think this is also consistent with what users would expect when adding audio tracks: video and audio would start at the same time, instead of the video being delayed by one frame period.

Prevent dropped frames while reading by flushing decoder

The decoding loop for VideoReader/AVInput objects would not flush the decoder once all packets were read from a video container. In my hands with x264 streams in mp4 files, this would result in the final two frames from a video not being produced by VideoIO. This PR properly flushes the decoder at the end of a file to drain any frames buffered by the decoder.

Use libav functions to set private and non-private encoder options

For the new encoding workflow that I have added as part of this PR, I use libav's interface to set both encoder settings and private encoder settings. Currently, VideoIO sets encoder settings by modifying the struct field names listed in the AVCodecContextProperties keyword argument. This change means that the names of encoder settings matches the names used in the command-line documentation instead of the names of fields in the libav C struct. I think it is perhaps easier for users to access the ffmpeg command line documentation instead of the developer documentation. It also means the error checking on non-private setting names is done by libav, where as currently no error checking is done.

Encoder options are set with the keyword arguments encoder_settings and encoder_private_settings, instead of keyword argument AVCodecContextProperties. I also use named tuples rather than arrays of pairs, which I feel is more ergonomic. So instead of doing encodevideo(..., AVCodecContextProperties = [:priv_data => ("crf"=>"22", "preset"=>"medium")]), users would instead do encode_mux_video(..., encoder_private_settings = (crf = 22, preset = "medium")).

Make seeking consistent

Seeking currently conflates PTS and DTS, as described in #275, and uses a strange offset when seeking into a file. In this PR PTS is used instead of DTS. This also makes seek and gettime have the same notion of time.

Change encoding workflow

Currently, the encoding workflow in VideoIO encodes frames, writes the encoded frames to a MPEG-2 elementary stream, and then using the FFMPEG binary to mux that stream into a mp4 container. This workflow uses the VideoEncoder type to store all of the necessary pieces to encode a video file. I have added a new workflow that uses libav's AVFormat API, that will mux and write video packets as they are produced, instead of creating an intermediate file. This also has the benefit of being more general down the road, and also allowing the encoder to be properly configured for the video container format (which is not currently being done). By relying on the AVFormat API, I think it would be quite easy to add support for other container formats, while still checking that the encoder chosen by the user is compatible with the chosen format, etc.

This new workflow has a simple correspondence to the existing workflow in VideoIO, and does not require a sepearte mux step.

Old workflow new workflow
encode_video encode_mux_video
prepareencoder open_video_out!
appendencode! append_encode_mux!
finish_encode! close_video_out!
close(io) N/A
mux N/A

I have not removed the old workflow, and indeed have fixed a few bugs with it. I personally think the new workflow is better than the old, and should at least be the default. However, whether the existing workflow should be removed is good question for discussion.

Add support for 10 bit gray and RGB decoding

You can now decode videos that have 10bit YUV encoding, receive them either as RGB or Gray buffers.

Add support for 10 bit gray and RGB encoding

You can now encode videos that have 10bit YUV encoding, and write them as either Gray or RGB buffers.

Add support for encoding frames that are "scanline major"

Most video packages expect pixels with neighboring x position and the same y position, that is pixels adjacent on the same scanline, to be adjacent in memory. Encoding frames that have this memory layout required the use of PermutedDimsArray with VideoIO. I have added a keyword argument, scanline_major, that allows the use of this pervasive memory format in VideoIO when encoding videos. Indeed, libav itself uses this format. As such, encoding video frames that are scanline major requires less work on VideoIO's end, and can be as simple as copying memory.

Eliminate use of deprecated functions in libav

I have removed calls to deprecated libav functions and replaced them with the recommended alternatives.

removed deprecated function replacement function
avpicture_get_size av_image_get_buffer_size
avcodec_decode_video2 avcodec_send_packet and avcodec_receive_frame

Replacing avcodec_decode_video2 required changing the logic of the decoding loop, in order to handle EAGAIN errors as per the libav documentation.

There is one more deprecated function that I replaced, but I am having a hard time finding which function that was.

Make implicit memory layout requirement of decoding pipeline explicit

The video decoding pipeline seemed to require that the frame buffer supplied by users by scanline-major, as described above, but did not require that the array be indexed one way or another. I have made this requirement explicit, and disabled some code that I think would have errored previously because it violated this assumption.

Use method dispatch in favor of if-else statements where possible

Instead of big if-else statements that match user-supplied buffer type with pixel format, and scale poorly with the number of buffer and pixel types supported, I have moved much of this logic into method dispatch. The type of the user-supplied buffer now uses method dispatch, instead of if-else statments, when encoding and decoding video.

Refactor repeated code snippets into functions

Some code snippets, such as "pumping" a video container for frame data, were repeated many times in many different functions. I have tried to refactor this snippets into a single function where I noticed them.

Make decoding image buffer compatible with multi-planar image formats

The image buffer used by the decoding logic to handle containers with multiple video streams did not account for the use of multi-planar image formats. I have used libav functions to transfer the image buffers to the VideIO maintained image buffer, which allows it work with multi-planar image formats. I am unclear if this image buffer is ever used, but at least it's now not wrong.

New interface to interact with nested C structs in libav

One large change in this PR is that it introduces a new type, NestedCStruct, and associated methods for to make interacting with nested C structs require less code in Julia, and additionally fixes some potential memory bugs in the existing code. This type and related functions are in the new file avptr.jl. Interacting with libav (FFMPEG) requires receiving, modifying, and passing structs that are part of the API of that library. Interacting with these C structs requires modifying their fields and generating pointers to them. The existing codebase would often store these structs as length 1 arrays of Julia structs that match the memory layout of the C struct. However, accessing this memory and modifying individual fields would often boil down to generating pointers to the array, and then modifying fields of the Julia struct directly with pointer logic, requiring many unsafe_store! and unsafe_load statements throughout the codebase, without any calls to GC.@preserve etc to ensure that the underlying memory was not garbage collected before the pointer was used. Additionally, even if the GC.@preserve macros were added in the right places, by my reading of Julia's semantics on getting pointers to fields of Julia structs is that it's not supported, and while the existing code might work at the moment there is no guarantee that future compiler or garbage collector optimizations will not break it.

Beyond the potential memory bugs, interacting with these c structs would require a fair amount of boiler-plate code, which I suspect led to some C structs of interest (e.g. AVStream) being cached as fields of multiple mutable structs in VideoIO. This meant that there was no single authoritative source of a piece of information. For example, the contents of StreamInfo in the current code base is entirely redundant with the VideoReader struct. If accessing and manipulating the contents of nested C structs were easier, I suspect that there would be less redundant information the current types of VideoIO.

My solution, NestedCStruct, simply acts like a typed double pointer to a C struct. It does not "own" the underlying memory, though it can be used with finalizers to automatically manage memory that is dynamically allocated by libav. Reading fields from the C struct, and modifying them, is automatically accomplished using Julia's reflection capabilities, and is also done so in a way that (I believe) is memory-safe and protects the memory from being prematurely garabge collected. This means that functions in VideoIO do not need to invoke unsafe functions (unsafe_store!, unsafe_convert, etc) to manipulate the C struct. Interaction with C functions are relatively painless due to unsafe_convert functions that will provide double or single pointers to the wrapped C struct as required by the C function's type signature. Importantly, because the memory is internally represented as a pointer, and not a Julia struct, this design no longer relies on Julia behavior that works for now but is not actually supported.

This type, and the functions that go along with it, is likely a subset of Blobs.jl. Whether VideoIO wants to maintain its own code to access nested C structs, or instead wants to add a dependency, is a good question for discussion. Since I was able to make this solution work, and IMO be very ergonomic to work with, in ~82 LOC (with lots of spacing) I think it might not be worth it to add a new dependency.

I have additionally added a macro, @avptr, which makes it easy to assign an alias to a NestedCStruct that wraps a particular C struct, and additionally make constructors that call allocating c functions, and attach finalizers to the resulting object that call the appropriate destructor. This allows types in VideoIO to have fields of types that have similar names to the struct name in libav, for example AVFramePtr which is just an alias for NestedCStruct{AVFrame}. It also reduces the amount of boiler plate required to use libav's functions to allocate the C struct, and to free it when Julia no longer uses it.

The consequence of this new type is eliminating a lot of boiler plate across VideoIO necessary to interact with C structs, eliminating redundant fields in many of VideoIO's types, and eliminating some types all together (e.g. StreamInfo). Additionally, accessing C structs through multiple layers of nesting in Julia looks much more like accessing these structs in C, except that the -> operator is replaced by a . in Julia code. In contrast, doing the same in the existing code base would require (and often did) using unsafe_load at each layer of nesting, and would further require the use of GC.@preserve at multiple stages to avoid memory bugs (although that was absent in the code).

Use properly initialized libav structs

AVFrames in VideoIO were not initialized by libav, but instead were initialized by Julia. This results in many of the fields being initialized to invalid, or non-default values from the perspective of libav. While it is unclear if passing these improperly configured structs to libav was causing any problems, some behavior of VideoIO relied on the default value assigned by Julia, and not the default value assigned by libav.

Reduce type instability

Some of VideoIO's types, had abstract field types that did not need to be abstract. For example, VideoReader's field avin was type AVInput, which is abstract. I have parameterized (and made concrete) the type of these fields where possible, which reduces type instability in the generated code.

Use locks around functions that are not thread safe

Some libav functions, such as avcodec_open2, are not thread-safe according to the libav documentation. I have added a constant global ReentrantLock, VIDEOIO_LOCK, to VideoIO and used it to prevent multiple Julia threads from using non-thread-safe functions.

closes #270, closes #271, closes #274, closes #275, closes #242, closes #283, closes #284, closes #285

@galenlynch
Copy link
Collaborator Author

Also the tests currently fail in part because the tests do not expect all frames to survive a round trip through VideoIO. I will work on writing tests for the new code, and updating tests to no longer expect the wrong result. However, I would definitely appreciate some help, if you have any time.

src/avio.jl Outdated
out_img_check(r, buf) = out_img_size_check(r, buf) && out_img_eltype_check(r, buf)

function out_bytes_size(fmt, width, height, align = VIDEOIO_ALIGN)
sz = av_image_get_buffer_size(fmt, width, height, VIDEOIO_ALIGN)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be align

src/encoding.jl Outdated
pret = Cint(0)
while pret >= 0
pret = avcodec_receive_packet(encoder.codec_context, encoder.packet)
@show encoder.packet.pts, encoder.packet.dts, encoder.packet.duration
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be deleted. Will add commit soon.

test/avptr.jl Outdated
end

@testset "@avptr" begin
@avptr AVPtrTestParentStruct TestParentStruct
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete test....

@dawbarton
Copy link
Contributor

Thanks for the heads-up. I'll try to take a look at this later today (maybe later in the week). An initial comment is that the solution I implemented in #277 is pretty basic; it should transfer all the image planes (if I understand the terminology correctly - I might not as I'm quite new to this) but it does so in a very simplistic way in that it just dumps the entire frame (including all image planes) into a since Vector{UInt8}. The user then has to manually separate out the different image planes.

It looks like your solution is a lot more comprehensive than mine and so I'd be happy if my changes were undone assuming that I can still get hold of the untranscoded frames somehow.

@IanButterworth
Copy link
Member

IanButterworth commented Nov 23, 2020

I haven't yet reviewed the code yet, but the intentions and fixes sound excellent. Thank you.

I'm not against API changes in favor of more logical designs. Perhaps we should start a roadmap to version 1

@galenlynch
Copy link
Collaborator Author

galenlynch commented Nov 23, 2020

It might work in practice for multi-planar image formats, but I'm not sure it will work in general. An AVPicture in ffmpeg is described by four or more pointers, each of which points to a different plane of an image. How many image planes are used to store a AVPicture depends on the pixel format. Your code, as far as I can tell, uses only the first pointer, and copies data from that pointer into a buffer owned by Julia that is returned to the user. While this is semantically ignoring the other image planes in the trailing pointers, you are also copying more data than is just in the first image plane, because you use avpicture_get_size to get the image size in bytes, which should also include the other image planes. I suspect that the other data pointers in a AVPicture often point to a contiguous piece of memory that contains all the image planes (and are hopefully in the right order here?), so doing this may be successful even if I don't think it's explicitly considering the other image planes. At least semantically, you run the risk of reading more data than is stored in the first pointer (possible segfault?), and not copying the entire AVPicture data. I couldn't find documentation that explicitly describes how AVPicture data is internally organized, and if all the data has to be contiguous. Instead of trying to figure out the internal arrangement of AVPicture data buffers, I just used the functions provided by libav to copy image data to buffers.

@dawbarton
Copy link
Contributor

That definitely sounds like a better way of doing things. I have to admit that I was relying on the fact that the code path I was working on didn't work in the first place and so something is better than nothing. I started working my way through the FFMPEG API yesterday and was rapidly coming to the conclusion that things were not done in an ideal way.

@galenlynch
Copy link
Collaborator Author

galenlynch commented Nov 23, 2020

@ianshmean starting a roadmap to v1 sounds like a good idea. I have some other suggestions for breaking changes that I think would be an improvement to VideoIO's API, but will wait for such a road map to propose them.

@IanButterworth
Copy link
Member

Thanks for the continued work on this @galenlynch
We should make a plan to draw the line in the sand and merge this and start a NEWS.md to give upgrade guidance for breaking changes, then get a new breaking change out. Then after a little time in the wild, release 1.0

Before we merge:

  • Are there any changes you think we should make to master before this PR, to release as a patch update etc. ?
  • Is there anything else you wanted to get in here?
  • We should delete the Travis and and perhaps appveyor CI yml's given we cover windows on github now
  • Drone CI: The test failures may be real? Could be updated to 1.5?

@galenlynch
Copy link
Collaborator Author

No problem. At this point I'm just trying to remove any bugs that I find as I use this PR, which I've been using for all of my decoding and encoding needs.

We still need to decide whether we will change the names of functions used to decode and encode videos, or instead change how they are used.

  • Are there any changes you think we should make to master before this PR, to release as a patch update etc. ?

I could take a look at splitting off non-breaking patches, but it would probably take a bit of effort giving how much of the plumbing has changed.

  • Is there anything else you wanted to get in here?

Not on my end. I would be happy to throw some stuff in if you have suggestions, or at least try to. It might not be too hard to do while everything's still "fresh" in my mind. However I'm not sure if I could devote a lot of time to it at the moment.

  • We should delete the Travis and and perhaps appveyor CI yml's given we cover windows on github now

Sounds good

  • Drone CI: The test failures may be real? Could be updated to 1.5?

I haven't really been paying attention to this since it doesn't seem to be correlated with whether tests pass or fail for me locally, and I don't understand it. Is this just testing on ARM? I'm confused why method errors would be present on one platform and not the others. I could look into it though.

@IanButterworth
Copy link
Member

Great

I'm confused why method errors would be present on one platform and not the others

First thing I'd suggest is bumping drone to 1.5. If issues remain, I can help look into it.

We still need to decide whether we will change the names of functions used to decode and encode videos, or instead change how they are used.

I'd say go with your new names and depreciate the old ones.

I could take a look at splitting off non-breaking patches, but it would probably take a bit of effort giving how much of the plumbing has changed.

Let's not worry. Onward and upward

How's this for a plan..

  1. We fix CI then merge this
  2. We prepare a PR to depreciate old functions (with proper depreciation warnings/helpers) & update docs
  3. Release a breaking change version (v0.9.0) and see how people get on
  4. Aim for v1.0 within a month or so

@IanButterworth
Copy link
Member

IanButterworth commented Feb 3, 2021

@galenlynch It'd be good to move this forward.

Can you review and merge this into your branch, to tidy up the CI failures?

Fix CI [merges into add_10bit_gray_encoding]
@galenlynch
Copy link
Collaborator Author

Yeah sorry that I haven't made much forward progress recently: had a few looming deadlines. I should be more free after next week, though I might also be able to do some work this weekend.

@IanButterworth
Copy link
Member

@galenlynch how are things looking?

Shall we merge and move on to the deprecations and updating docs?

@galenlynch
Copy link
Collaborator Author

Sounds good to me! Was just about to check in to see what the status of this was.

@IanButterworth
Copy link
Member

I was just about to merge, but wondered what kind of merge we should do here. I usually squash, but is there reason to keep the commits separate?

@galenlynch if I don't hear back in a few hours I'll squash merge

@IanButterworth IanButterworth merged commit 0388303 into JuliaIO:master Mar 11, 2021
@IanButterworth
Copy link
Member

Exciting! So next steps:

  • We prepare a PR to depreciate old functions (with proper depreciation warnings/helpers) & update docs
  • Release a breaking change version (v0.9.0) and see how people get on
  • Aim for v1.0 within a month or so

@IanButterworth
Copy link
Member

Perhaps we could also work out FileIO integration for the 0.9 release

@yakir12
Copy link
Contributor

yakir12 commented Mar 11, 2021

Super exciting!!! Thanks for this!!!

@IanButterworth
Copy link
Member

@galenlynch I don't think the changelog does everything in this PR justice. Could you add a few brief notes to the changelog, perhaps linking to the appropriate PR (most of which will be this one)

@galenlynch
Copy link
Collaborator Author

Sure

@galenlynch galenlynch deleted the add_10bit_gray_encoding branch March 20, 2021 14:20
@IanButterworth
Copy link
Member

@galenlynch sorry to pester, but when do you think you could update the changelog? I'm hoping to not have v0.9 linger any longer

@galenlynch
Copy link
Collaborator Author

Yeah sorry I can do it this weekend?

@IanButterworth
Copy link
Member

That'd be great, thanks

@IanButterworth IanButterworth moved this from In progress to Done in Roadmap to V1 Apr 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment