New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
video stream size prediction: measure input video stream size #91
Comments
As you said, the current method is not a consistently accurate approximation, it depends a lot on the source file itself, how the encoding efficiency affect those specific samples and how many of those samples are created; the more samples the more accurate approximation, but with only a few of them, like for short videos, results may get skewed. Here is an idea to achieve a more reliable, consistent and accurate final encoded video stream size prediction based on an improved approximation, without having to scan the file at all. The aim is to get the percentage and thus the size of the video stream out of the whole input file. That is very difficult to calculate through the sample video streams due to how video varies a lot in bitrate from sample to sample, but it should be more easily achievable taking into account the audio streams. So, ab-av1 only needs to create one full sample (maybe 2, just in case only to debug and confirm audio streams are very similar size, but it might be unnecessary, or only useful when audio is VBR) with As the bitrate of those streams are mostly constant we can extrapolate to the whole file (this should be more consistent than the current method): Now we should be able to get a closer approximation to vstream_size:
That is the main idea, I hope it does not complicate things too much, becasue I believe it should give quite better results in a greater variety of scenarios. There are already a couple of files I would like to test this with; I did a rough manual calculation and in my limited testing it was more accurate every time with just a single 20s full sample as the only reference, it seems promising. What do you think? |
That's an interesting idea, though I suspect it'll end up being an approximation of similar accuracy to the current encoded-sample-size * duration calculation. The sample vstream to audio stream proportion will fluctuate in the same way as sample vstream size (which is the cause of the current inaccuracy). It will also mean writing more samples which is mildly undesirable. We could do better by just including all streams in the samples (or first sample) and doing a scan of them to get the stream sizes. However, I think I'd prefer doing an async scan of the input which will produce the video stream size. This option will be more accurate and I think simpler. |
It is more accurate than the current method because audio streams are not affected by the video streams fluctuation (current inaccuracy). I’m not saying to use video to audio proportions, which do fluctuate the same way and would maintain the same source inaccuracy, but to use absolute numbers of the audio streams, taking advantage of the constant bitrate nature of most of the audio and subtitle codecs, as shown in the equations I wrote above. It only requires creating 1 at most 2 more samples with all the streams. Then the absolute size of all streams that are not video can be obtained with a simple substraction of full sample minus video sample. The scanning method is of course the most accurate way without a doubt, but it needs extensive HDD operations in big files that may take more than 10mins to complete, like it did for one of my files. And the higher the size, the longer it would take, like with higher resolution 4k files, uncompressed raw video, and so on. I honestly think this new approximation could work and proof to be more reliable and accurate, so scanning could be completely avoided. This idea is a different kind of approach and should result in a noticeably more accurate approximation. It did in my tests. I can show you an example with real numbers later, but trust me, it is noticeably different to the current method, they are not similar (unless the current method was already spot on for a specific file, then results are similar, but for other files where the current method was deviating, this new approach makes a good difference). |
Some rough testsAudio sample size approx. method:
Source file 27,3GB = 19,1GB vstream + 8,2GB other streams (12 audio + 13 subtitle streams)Real encoded video stream size is 6,1GB:
Previous prediction was fine, but this one is better. Source file 24,3GB = 13,3GB vstream + 11GB other streams (1 secondary video + 12 audio + 17 subtitle streams)Real encoded video stream size is 6,0GB:
Previous prediction was not very good, new one is closer. I'll test a bit more with less complex source files to see how it behaves, but for what I've seen this is a more accurate method. |
Source file 22,9GB = 18,4GB vstream + 4,5GB other streams (2 audio + 2 subtitle streams)Real encoded video stream size is 3,86GB:
This case should have been a bit challenging for the new method as both audio streams were variable bitrate (unless I was extremely lucky with the random sample I created), but even so the results were even better than the previous method, which was already pretty good. Source file 11,4GB = 10,3GB vstream + 1,1GB other streams (2 audio + 6 subtitle streams)Real encoded video stream size is 2,27GB:
This would be like the best case scenario, because both audio streams are constant bitrate, and it shows in the result being so accurate. Source file 14,2GB = 11,2GB vstream + 3GB other streams (4 audio + 8 subtitle streams)Real encoded video stream size is 5,15GB:
Another easy case for the new method, as all the audio streams are CBR, but thanks to that it offered a far better approximation than the current method, that overestimated the encoded size. Source file 17,9GB = 16,9GB vstream + 1GB other streams (2 audio + 3 subtitle streams)Real encoded video stream size is 2,37GB:
It seems that the smaller the audio stream (which is the most common situation), the more accurate this method becomes, but even in extreme conditions with 12 audio streams the results were pretty good, better than current predictions. I'm more convinced the more I test it, you may try and see it for yourself. PS: By the way, the new results cache system is wonderful. |
OK, the last one, I tried to find one of the most complex files I have. However, I did all my tests with a single random 20s sample per test and the results were already pretty good, so unless some particular file is exceptionally difficult, like with lots of variable bitrate audio streams, that should be enough. This is the ffmpeg command (aparently writing the seeking parameters in the beginning helps accelerating the process very significantly): Source file 29,9GB = 22,4GB vstream + 7,5GB other streams (13 audio + 28 subtitle streams)Real encoded video stream size is 4,26GB:
I hope you find this method effective, it really works and requires minimal processing. |
Thanks! You've convinced me it's worth trying this out. And perhaps i can just include audio & subs in all lossless samples and parse to audio+sub stream sizes from ffmpeg to make the calculation. Videos can have attachments too like fonts, but it's probably something we can look closer at later. I'll test using this approach when I have some time. |
Currently we have a couple of methods to use the measured sample encode percent to predict the final encoded video stream size of the whole input.
input_file_size * encode_percent
Original method, it can work fairly well when the video stream is the vast majority of the file size. This falls down otherwise, e.g. when there are multiple large audio tracks.encode_size * input_duration / sample_duration
This can be better when the previous method would over-estimate. However, sample sizes tend to be a worse approximation of average size than they are VMAF indicators so this is also not super accurate.Since the first methods main problem is over-estimating we take the minimum of these two predictions.
A more accurate approach would be to use the input video stream size.
input_vstream_size * encode_percent
which directly works around the issues with the first approach.The problem with this is to calculate the input video stream size we need to fully scan the file with ffmpeg. This could be undesirable in cases where the input is on slower storage. It's a relatively expensive operation and only helps the prediction video stream size output which is only shown on crf-search.
On the other hand we may be able to scan the input concurrently with the crf-search, since it isn't a CPU intensive operation. So may be worth it depending on how valuable the predicted size is to the user.
The text was updated successfully, but these errors were encountered: