New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sample-encode: measure sample video stream size instead of file size #82
Comments
I was getting very weird results with mp4 files, but then I saw that the sample duration was not being correctly detected.
That directly correlates to the differences in stream size being reported (57% vs 100%). Just to be sure, I remuxed the mp4 video stream into mkv, and the duration was showing correctly (20s) and stream/file size were virtually identical and matched the full mp4 file size too with insignificant overhead. Both mp4/mkv containers seem to have minimal overhead, and according to all documentation on the internet (such as the matroska overhead comparison tests), it should remain fairly low, below 3% for sure, being more common about 2% or maybe less for some files. It does not entail a big difference, but knowing the general overhead percentage and being fairly easy to implement as a correction, it is a free gain to get closer predictions. I will download and start using the new version to see how it works. |
I've been testing the new method to calculate the size prediction and in general it makes much more sense and it seems a lot more consistent, specially the logic improvement to account the real duration of the samples. However, I've still getting pretty weird inconsistencies with mp4 files now in the estimated percentage that I think might be inherited from the previous version, and maybe also related to how the lossless sample duration (and size) is incorrectly detected, as shown above, but I'm not sure. Here is an example of the same video file tested in 3 different ways (with av1): the original file in mp4 as reference (how anybody would usually run ab-av1), then remuxed into mkv with all the streams (maintaining the same filesize, to discard or confirm a suspected container issue), and also remuxed into mkv only with the video stream (to check prediction consistency in general, but specially between using the input size in the old method vs the estimated video stream size in the new method). Old prediction method:
The mkv files more o less make sense according to the old method (one is a bit bigger because of the included audio stream), but the mp4 file does not, having in mind that the input file size is almost exactly the same as the first mkv that includes all the streams, so the results should have been very similar between the mkv and mp4 files, being both all the same video and audio streams and same size, so the problem comes from that 18% estimated percentage. That is more clearly shown when we test the new prediction method for only vstream size:
Now the predicted sizes are a lot more consistent because all of them estimate the vstream size, which is the same, but oddly the mp4 file still shows that weird 18% estimated percentage that now does not correspond to the approximate size at all. So there is something wrong in the percentage there inherited from the previous version, and only seems to be inaccurate for mp4 files. You will know better how that works, maybe you already have an idea of what could be going wrong with the information I provided. Or you could reproduce the issue and confirm comparing mp4 vs mkv containers for the same video. |
This is the mediainfo of one of the lossless mp4 samples of that test file in case it helps. It shows two different durations, two bitrates, two stream sizes... I'm guessing the problem is something here not being correctly recognized and messing with ab-av1's estimations. Windows file explorer itself states the duration of this mp4 lossless sample as 11s, and similar inaccurate duration/sizes are reported by other mp4 lossless samples too from this and other mp4 source files:
Meanwhile, the samples from the file remuxed into mkv don't seem to have those kind of problems and they are correctly identified by ab-av1. |
If the lossless sample containers are inflated we can only really fix the incorrect percent prediction by properly reading the video stream size. That could be possible just from parsing the ffmpeg output that is already happening, I'm just not sure how reliable it'll be in the end. |
@iPaulis can you run The last line of output should list the video stream size according to ffmpeg, that's the info we should be able to get at fairly easily. |
Yes sure, here are the last lines of the output for the source file:
The sizes and duration are 100% correct and it took just about 10 seconds to read the whole file and give the results. And these are the results for a sample of that file:
There is a problem there. The size is correct (the same as the lossless samples generated from the mkv), but the time is not; it should be 20s, I checked. I saw you released a new version with more fixes, including a nice improvement for accuracy in size prediction if the vstream size method gave higher predictions than the whole input size method, which was happening in some cases, but should never happen.
This is the real encoded result (so the remuxed mkv files were correct in the encoding percentage prediction):
I don't know where the percentage inaccuracy is exactly coming from for mp4 files, and it would be great to understand and be able to fix the source of the issue itself. |
i think this must be an edge case with ffmpeg where is only encoding 11s because of some quirk with the lossless samples. So the encoded percent comes out too low. Just to clarify the video stream size there is pretty much the same as the file size then? I haven't seen this with mp4s myself though. It's interesting that using mkv solves it, that could be a solution as you suggest. However, I'll need to test more widely in case mp4->mkv sampling causes other issues. |
I've sourced some >20s samples to test with from 4kmedia & pixabay (it'll be useful to have a set of test videos to investigate this kind of thing). They are all .mp4, I've used them to test mp4 or mkv lossless samples. Tests4kmedia-sony-new-york-fashion-demo.mp4 results4kmedia-sony-new-york-fashion-demo.mp4mp4 -> mp4 lossless samples
mp4 -> mkv lossless samples
4kmedia-spacex-launch-demo.mp4 results4kmedia-spacex-launch-demo.mp4mp4 -> mp4 lossless samples
mp4 -> mkv lossless samples
pixabay-bridge-23544.mp4 resultspixabay-bridge-23544.mp4mp4 -> mp4 lossless samples
mp4 -> mkv lossless samples
pixabay-elevator-3735.mp4 resultspixabay-elevator-3735.mp4mp4 -> mp4 lossless samples
mp4 -> mkv lossless samples
pixabay-lemon-82602.mp4 resultspixabay-lemon-82602.mp4mp4 -> mp4 lossless samples
mp4 -> mkv lossless samples
pixabay-nature-31377.mp4 resultspixabay-nature-31377.mp4mp4 -> mp4 lossless samples
mp4 -> mkv lossless samples
pixabay-sunrise-83880.mp4 resultspixabay-sunrise-83880.mp4mp4 -> mp4 lossless samples
mp4 -> mkv lossless samples
pixabay-turntable-8453.mp4 resultspixabay-turntable-8453.mp4mp4 -> mp4 lossless samples
mp4 -> mkv lossless samples
AnalysisIn all cases frame count and duration of lossless vs encoded samples are closer using mkv and closer to 20s than with mp4. In particular pixabay-bridge-23544.mp4, pixabay-lemon-82602.mp4 reproduce the case where the mp4 sample is significantly under 20s. Playing the lossless versions with mpv I can see 4-5s of "negative time" before 0, so there is ~20s of video but with strange timing. When ffmpeg converts this to yuv streams for encoding or vmaf though this "negative time" doesn't seem to be used. This means the encoded samples have significantly fewer frames and the file/video stream comparison is misleading as the lossless samples contain more video than they let on. In both cases mkv samples had no issues. So according to these results for mp4 files using .mkv for lossless samples is a clear win. Video stream size vs file sizeGetting back to what this issue is really about this testing also shows file size vs video size info. We can compare encode-percent calculations using either. (We'll use mkv samples results since they're better).
This supports my initial feeling that there isn't a significant win in using video stream size over file size. If they were equally easy to do I would still do video stream comparison, however measuring file size is easier as it doesn't involve ffmpeg output parsing. |
Please, I would highly recommend you against it. I've also been testing the input size method vs the video stream method and using the whole input has an inherent inaccuracy issue. It is the most reliable and accurate prediction when the video stream size is most of the whole file size (probably the case for the testfiles you used), but the bigger the percentage of the audio streams, the higher the inaccuracy it introduces. That is specially noticeable in files with multiple audio tracks, high quality audios that take up much space, etc.
New method is close enough, that's fine. Another file (real encoded video stream size is 6,0GB):
So, in this case the new method overestimates size (which could be improved), but it is closer than the old method anyway. We can find a more reliable and accurate way than the current method avoiding having to regress to the input method and without needing to involve ffmpeg output parsing. |
This is a different issue though. I'm talking about lossless sample file size vs encoded sample file size which according to these tests is pretty much the same as using the video stream while much easier to do. The full input video stream size isn't relevant to calculating the encode percentage. It could make the full predicted video stream size more accurate, but it involves a full scan of the input which isn't desirable. The predicted size isn't as important as the percentage either, since the latter is used as part of a crf-search. |
I've raised #91 so we can investigate further getting the full input video stream size for the predicted size calc (since you won't let me avoid it 😆) |
Oh, you meant only for the samples, sorry for the misunderstanding. Yes, sure, as the samples don't have audio streams, sample file size should be pretty much the same as video stream size of the sample, with an almost insignificant overhead difference. Anyway, I believe there is a way to improve the final predicted video stream size, but I have to leave now, I'll explain later and I hope you find it useful. Sorry to bother you, it is not my intention, I just want to be helpful. |
It's been super helpful mate, I appreciate the input! |
I'm glad to help. I know you do, thank you for listening to all these requests and doing all the hard work. I really can't stress enough how helpful ab-av1 is and will continue being, great tool! I do a lot of encodes and it saves me a lot of time. |
When calculating the predicted encode size percent we compare the sample file size sum to their encoded counterparts. This works fairly well since the samples & encoded samples contain only a video stream.
However, there may also be some container overhead for each sample that introduces inaccuracy. So we could instead measure the lossless sample video stream size sum and compare that to the encoded video streams specifically avoiding any container overhead on each side. Also see discussion in #79.
Measuring video streams a little harder than measuring file sizes and may be a bit less reliable too as we have to trust what ffmpeg tells us. It also seems to be in "kB" precision only, I'm not sure if we can configure that or not.
My current feeling is that this won't significantly improve the sample-encode video stream size predictions over the noise of using sample sizes to deduce the total predicted stream size. So to begin with we should gather some test cases where calculating the samples vs encoded samples using video stream instead of full sample size would be more accurate over many different sample counts.
The text was updated successfully, but these errors were encountered: