Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sample-encode: measure sample video stream size instead of file size #82

Closed
alexheretic opened this issue Nov 29, 2022 · 16 comments
Closed

Comments

@alexheretic
Copy link
Owner

alexheretic commented Nov 29, 2022

When calculating the predicted encode size percent we compare the sample file size sum to their encoded counterparts. This works fairly well since the samples & encoded samples contain only a video stream.

However, there may also be some container overhead for each sample that introduces inaccuracy. So we could instead measure the lossless sample video stream size sum and compare that to the encoded video streams specifically avoiding any container overhead on each side. Also see discussion in #79.

Measuring video streams a little harder than measuring file sizes and may be a bit less reliable too as we have to trust what ffmpeg tells us. It also seems to be in "kB" precision only, I'm not sure if we can configure that or not.

My current feeling is that this won't significantly improve the sample-encode video stream size predictions over the noise of using sample sizes to deduce the total predicted stream size. So to begin with we should gather some test cases where calculating the samples vs encoded samples using video stream instead of full sample size would be more accurate over many different sample counts.

@iPaulis
Copy link

iPaulis commented Nov 29, 2022

I was getting very weird results with mp4 files, but then I saw that the sample duration was not being correctly detected.
It was being reported like this:

Duración : 11 s 0 ms
Duración original : 20 s 0 ms

That directly correlates to the differences in stream size being reported (57% vs 100%). Just to be sure, I remuxed the mp4 video stream into mkv, and the duration was showing correctly (20s) and stream/file size were virtually identical and matched the full mp4 file size too with insignificant overhead.
So false alarm I guess.

Both mp4/mkv containers seem to have minimal overhead, and according to all documentation on the internet (such as the matroska overhead comparison tests), it should remain fairly low, below 3% for sure, being more common about 2% or maybe less for some files.

It does not entail a big difference, but knowing the general overhead percentage and being fairly easy to implement as a correction, it is a free gain to get closer predictions.

I will download and start using the new version to see how it works.

@iPaulis
Copy link

iPaulis commented Dec 4, 2022

I've been testing the new method to calculate the size prediction and in general it makes much more sense and it seems a lot more consistent, specially the logic improvement to account the real duration of the samples.

However, I've still getting pretty weird inconsistencies with mp4 files now in the estimated percentage that I think might be inherited from the previous version, and maybe also related to how the lossless sample duration (and size) is incorrectly detected, as shown above, but I'm not sure.

Here is an example of the same video file tested in 3 different ways (with av1): the original file in mp4 as reference (how anybody would usually run ab-av1), then remuxed into mkv with all the streams (maintaining the same filesize, to discard or confirm a suspected container issue), and also remuxed into mkv only with the video stream (to check prediction consistency in general, but specially between using the input size in the old method vs the estimated video stream size in the new method).

Old prediction method:

  • All streams remuxed into mkv: crf 17 VMAF 95.00 predicted full encode size 249.05 MiB (30%) taking 15 minutes
  • Only video stream remuxed into mkv: crf 17 VMAF 95.00 predicted full encode size 233.11 MiB (30%) taking 15 minutes
  • Original mp4 file as reference: crf 17 VMAF 95.12 predicted full encode size 153.14 MiB (18%) taking 10 minutes

The mkv files more o less make sense according to the old method (one is a bit bigger because of the included audio stream), but the mp4 file does not, having in mind that the input file size is almost exactly the same as the first mkv that includes all the streams, so the results should have been very similar between the mkv and mp4 files, being both all the same video and audio streams and same size, so the problem comes from that 18% estimated percentage.

That is more clearly shown when we test the new prediction method for only vstream size:

  • All streams remuxed into mkv: crf 17 VMAF 95.00 predicted video stream size 225.15 MiB (30%) taking 15 minutes
  • Only video stream remuxed into mkv: crf 17 VMAF 95.00 predicted video stream size 225.14 MiB (30%) taking 15 minutes
  • Original mp4 file as reference: crf 17 VMAF 95.12 predicted video stream size 230.57 MiB (18%) taking 16 minutes

Now the predicted sizes are a lot more consistent because all of them estimate the vstream size, which is the same, but oddly the mp4 file still shows that weird 18% estimated percentage that now does not correspond to the approximate size at all. So there is something wrong in the percentage there inherited from the previous version, and only seems to be inaccurate for mp4 files.
I've tested some more mp4 vs mkv files, and not all of them are inconsistent in the same degree, but they are not correct. I think the inaccuracy depends on how correctly the duration/size of the mp4 samples is detected. In some samples the detected duration was almost 20s, like 18-19s (and size was also a bit off), and then the estimated percentage was only inaccurate by 1-2% points. That is why I believe there is a direct correlation, but I'm not sure, it's difficult to say.

You will know better how that works, maybe you already have an idea of what could be going wrong with the information I provided. Or you could reproduce the issue and confirm comparing mp4 vs mkv containers for the same video.

@iPaulis
Copy link

iPaulis commented Dec 4, 2022

This is the mediainfo of one of the lossless mp4 samples of that test file in case it helps. It shows two different durations, two bitrates, two stream sizes... I'm guessing the problem is something here not being correctly recognized and messing with ab-av1's estimations. Windows file explorer itself states the duration of this mp4 lossless sample as 11s, and similar inaccurate duration/sizes are reported by other mp4 lossless samples too from this and other mp4 source files:

MP4 Vídeo

ID : 1
Formato : AVC
Formato/Info : Advanced Video Codec
Formato del perfil : Main@L3.1
Ajustes del formato : CABAC / 1 Ref Frames
Ajustes del formato, CABAC :
Ajustes del formato, RefFrames : 1 fotograma
ID códec : avc1
ID códec/Info : Advanced Video Coding
Duración : 11 s 0 ms
Duración original : 20 s 0 ms
Tasa de bits : 3 927 kb/s
Tasa de bits máxima : 3 733 kb/s
Ancho : 1 280 píxeles
Alto : 720 píxeles
Relación de aspecto : 16:9
Modo velocidad fotogramas : Constante
Velocidad de fotogramas : 25,000 FPS
Espacio de color : YUV
Submuestreo croma : 4:2:0
Profundidad bits : 8 bits
Tipo barrido : Progresivo
Bits/(píxel*fotograma) : 0.170
Tamaño de pista : 5,15 MiB (62%)
Cantidad de pistas original : 8,32 MiB (100%)
Idioma : Ruso
Codec configuration box : avcC

Meanwhile, the samples from the file remuxed into mkv don't seem to have those kind of problems and they are correctly identified by ab-av1.

@alexheretic
Copy link
Owner Author

If the lossless sample containers are inflated we can only really fix the incorrect percent prediction by properly reading the video stream size. That could be possible just from parsing the ffmpeg output that is already happening, I'm just not sure how reliable it'll be in the end.

@alexheretic
Copy link
Owner Author

@iPaulis can you run ffmpeg -i sample.mp4 -c copy -f null /dev/null (or maybe just ffmpeg -i sample.mp4 -c copy tmp.mp4 if /dev/null doesn't work on Windows).

The last line of output should list the video stream size according to ffmpeg, that's the info we should be able to get at fairly easily.

@iPaulis
Copy link

iPaulis commented Dec 5, 2022

Yes sure, here are the last lines of the output for the source file: ffmpeg -i source.mp4 -c copy -f null /dev/null

frame=43486 fps=4510 q=-1.0 Lsize=N/A time=00:28:59.49 bitrate=N/A speed= 180x
video:792718kB audio:54030kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

The sizes and duration are 100% correct and it took just about 10 seconds to read the whole file and give the results.

And these are the results for a sample of that file: ffmpeg -i sample.mp4 -c copy -f null /dev/null

frame= 500 fps=0.0 q=-1.0 Lsize=N/A time=00:00:10.96 bitrate=N/A speed=93.7x s speed=N/A
video:8522kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

There is a problem there. The size is correct (the same as the lossless samples generated from the mkv), but the time is not; it should be 20s, I checked.

I saw you released a new version with more fixes, including a nice improvement for accuracy in size prediction if the vstream size method gave higher predictions than the whole input size method, which was happening in some cases, but should never happen.
So I compiled this version and rerun this mp4 file, and it is indeed changing the prediction method, but for the wrong reasons this time as the percentage is not correct:

crf 17 VMAF 95.12 predicted video stream size 153.14 MiB (18%) taking 16 minutes

This is the real encoded result (so the remuxed mkv files were correct in the encoding percentage prediction):

Encoded 291.02 MiB (35%, video:237.36 MiB, audio:52.76 MiB)

I don't know where the percentage inaccuracy is exactly coming from for mp4 files, and it would be great to understand and be able to fix the source of the issue itself.
But if it is just an mp4 inconsistency issue, maybe the simplest and easiest way to fix it could be to generate the lossless samples in mkv instead, even for mp4 source files; it could end up giving more reliable results.

@alexheretic
Copy link
Owner Author

i think this must be an edge case with ffmpeg where is only encoding 11s because of some quirk with the lossless samples. So the encoded percent comes out too low. Just to clarify the video stream size there is pretty much the same as the file size then?

I haven't seen this with mp4s myself though. It's interesting that using mkv solves it, that could be a solution as you suggest. However, I'll need to test more widely in case mp4->mkv sampling causes other issues.

@alexheretic
Copy link
Owner Author

alexheretic commented Dec 5, 2022

I've sourced some >20s samples to test with from 4kmedia & pixabay (it'll be useful to have a set of test videos to investigate this kind of thing). They are all .mp4, I've used them to test mp4 or mkv lossless samples.

Tests

4kmedia-sony-new-york-fashion-demo.mp4 results

4kmedia-sony-new-york-fashion-demo.mp4

mp4 -> mp4 lossless samples

$ ab-av1 sample-encode -i test-vids/4kmedia-sony-new-york-fashion-demo.mp4 --crf 28 --preset 12
VMAF 95.83 predicted video stream size 248.06 MiB (26%) taking 3 minutes

==> 4kmedia-sony-new-york-fashion-demo.sample45+1199f.mp4                                                               
file: 186560 KiB
frame= 1199 fps=0.0 q=-1.0 Lsize=N/A time=00:00:19.36 bitrate=N/A speed=1.46e+03x    
video:186547kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

==> 4kmedia-sony-new-york-fashion-demo.sample45+1199f.crf28.p12.ivf
file: 49360 KiB
frame= 1164 fps=0.0 q=-1.0 Lsize=N/A time=00:00:19.41 bitrate=N/A speed=2.17e+03x    
video:49347kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

mp4 -> mkv lossless samples

$ ab-av1 sample-encode -i test-vids/4kmedia-sony-new-york-fashion-demo.mp4 --crf 28 --preset 12
VMAF 95.75 predicted video stream size 255.98 MiB (27%) taking 3 minutes

==> 4kmedia-sony-new-york-fashion-demo.sample45+1199f.mkv                                                               
file: 186557 KiB                                                                                                        
frame= 1199 fps=0.0 q=-1.0 Lsize=N/A time=00:00:19.97 bitrate=N/A speed=1.36e+03x                                       
video:186547kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

==> 4kmedia-sony-new-york-fashion-demo.sample45+1199f.crf28.p12.ivf                                                     
file: 50936 KiB
frame= 1200 fps=0.0 q=-1.0 Lsize=N/A time=00:00:20.01 bitrate=N/A speed=2.54e+03x    
video:50922kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
4kmedia-spacex-launch-demo.mp4 results

4kmedia-spacex-launch-demo.mp4

mp4 -> mp4 lossless samples

$ ab-av1 sample-encode -i test-vids/4kmedia-spacex-launch-demo.mp4 --crf 28 --preset 12
VMAF 99.04 predicted video stream size 21.21 MiB (12%) taking 82 seconds

==> 4kmedia-spacex-launch-demo.sample53+600f.mp4                                                                        
file: 28657 KiB                                                                                                         
frame=  600 fps=0.0 q=-1.0 Lsize=N/A time=00:00:19.85 bitrate=N/A speed=5.23e+03x                                       
video:28653kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown 

==> 4kmedia-spacex-launch-demo.sample53+600f.crf28.p12.ivf
file: 3418 KiB
frame=  597 fps=0.0 q=-1.0 Lsize=N/A time=00:00:19.91 bitrate=N/A speed=9.66e+03x    
video:3411kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

mp4 -> mkv lossless samples

$ ab-av1 sample-encode -i test-vids/4kmedia-spacex-launch-demo.mp4 --crf 28 --preset 12
VMAF 99.03 predicted video stream size 21.15 MiB (12%) taking 85 seconds

==> 4kmedia-spacex-launch-demo.sample53+600f.mkv                                                                        
file: 28659 KiB
frame=  600 fps=0.0 q=-1.0 Lsize=N/A time=00:00:19.95 bitrate=N/A speed=4.39e+03x    
video:28653kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

==> 4kmedia-spacex-launch-demo.sample53+600f.crf28.p12.ivf
file: 3425 KiB
frame=  600 fps=0.0 q=-1.0 Lsize=N/A time=00:00:20.01 bitrate=N/A speed=1.02e+04x    
video:3419kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
pixabay-bridge-23544.mp4 results

pixabay-bridge-23544.mp4

mp4 -> mp4 lossless samples

$ ab-av1 sample-encode -i test-vids/pixabay-bridge-23544.mp4 --crf 28 --preset 12
VMAF 96.36 predicted video stream size 29.29 MiB (30%) taking 23 seconds

==> pixabay-bridge-23544.sample5+600f.mp4                                                                               
file: 65739 KiB                                                                                                         
frame=  600 fps=0.0 q=-1.0 Lsize=N/A time=00:00:14.90 bitrate=N/A speed=2.45e+03x                                       
video:65731kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

==> pixabay-bridge-23544.sample5+600f.crf28.p12.ivf
file: 19409 KiB
frame=  450 fps=0.0 q=-1.0 Lsize=N/A time=00:00:15.00 bitrate=N/A speed=3.99e+03x    
video:19404kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

mp4 -> mkv lossless samples

$ ab-av1 sample-encode -i test-vids/pixabay-bridge-23544.mp4 --crf 28 --preset 12
VMAF 96.53 predicted video stream size 45.11 MiB (45%) taking 22 seconds

==> pixabay-bridge-23544.sample5+600f.mkv
file: 65737 KiB
frame=  600 fps=0.0 q=-1.0 Lsize=N/A time=00:00:19.90 bitrate=N/A speed=2.76e+03x    
video:65731kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

==> pixabay-bridge-23544.sample5+600f.crf28.p12.ivf
file: 29893 KiB
frame=  600 fps=0.0 q=-1.0 Lsize=N/A time=00:00:20.00 bitrate=N/A speed=4.35e+03x    
video:29886kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
pixabay-elevator-3735.mp4 results

pixabay-elevator-3735.mp4

mp4 -> mp4 lossless samples

$ ab-av1 sample-encode -i test-vids/pixabay-elevator-3735.mp4 --crf 28 --preset 12
VMAF 96.24 predicted video stream size 16.57 MiB (78%) taking 8 seconds

==> pixabay-elevator-3735.sample12+500f.mp4                                                                             
file: 9047 KiB                                                                                                          
frame=  500 fps=0.0 q=-1.0 Lsize=N/A time=00:00:17.88 bitrate=N/A speed=1.03e+04x                                       
video:9041kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

==> pixabay-elevator-3735.sample12+500f.crf28.p12.ivf
file: 7016 KiB
frame=  450 fps=0.0 q=-1.0 Lsize=N/A time=00:00:18.00 bitrate=N/A speed=7.88e+03x    
video:7011kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

mp4 -> mkv lossless samples

$ ab-av1 sample-encode -i test-vids/pixabay-elevator-3735.mp4 --crf 28 --preset 12
VMAF 96.28 predicted video stream size 17.47 MiB (87%) taking 8 seconds

==> pixabay-elevator-3735.sample12+500f.mkv
file: 9044 KiB
frame=  500 fps=0.0 q=-1.0 Lsize=N/A time=00:00:19.88 bitrate=N/A speed=1.1e+04x    
video:9041kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

==> pixabay-elevator-3735.sample12+500f.crf28.p12.ivf
file: 7837 KiB
frame=  500 fps=0.0 q=-1.0 Lsize=N/A time=00:00:20.00 bitrate=N/A speed=9.07e+03x    
video:7832kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
pixabay-lemon-82602.mp4 results

pixabay-lemon-82602.mp4

mp4 -> mp4 lossless samples

$ ab-av1 sample-encode -i test-vids/pixabay-lemon-82602.mp4 --crf 28 --preset 12
VMAF 97.03 predicted video stream size 15.81 MiB (17%) taking 34 seconds

==> pixabay-lemon-82602.sample16+500f.mp4                                                                               
file: 35574 KiB                                                                                                         
frame=  500 fps=0.0 q=-1.0 Lsize=N/A time=00:00:13.88 bitrate=N/A speed=3.81e+03x                                       
video:35567kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

==> pixabay-lemon-82602.sample16+500f.crf28.p12.ivf
file: 6220 KiB
frame=  350 fps=0.0 q=-1.0 Lsize=N/A time=00:00:14.00 bitrate=N/A speed=7e+03x    
video:6217kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

mp4 -> mkv lossless samples

$ ab-av1 sample-encode -i test-vids/pixabay-lemon-82602.mp4 --crf 28 --preset 12
VMAF 96.91 predicted video stream size 23.45 MiB (26%) taking 32 seconds

==> pixabay-lemon-82602.sample16+500f.mkv
file: 35571 KiB
frame=  500 fps=0.0 q=-1.0 Lsize=N/A time=00:00:19.88 bitrate=N/A speed=4.78e+03x    
video:35567kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

==> pixabay-lemon-82602.sample16+500f.crf28.p12.ivf
file: 9223 KiB
frame=  500 fps=0.0 q=-1.0 Lsize=N/A time=00:00:20.00 bitrate=N/A speed=8.69e+03x    
video:9218kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
pixabay-nature-31377.mp4 results

pixabay-nature-31377.mp4

mp4 -> mp4 lossless samples

$ ab-av1 sample-encode -i test-vids/pixabay-nature-31377.mp4 --crf 28 --preset 12
VMAF 96.67 predicted video stream size 332.43 MiB (324%) taking 22 seconds

==> pixabay-nature-31377.sample4+480f.mp4                                                                               
file: 72065 KiB                                                                                                         
frame=  480 fps=0.0 q=-1.0 Lsize=N/A time=00:00:19.72 bitrate=N/A speed=3.34e+03x                                       
video:72062kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

==> pixabay-nature-31377.sample4+480f.crf28.p12.ivf
file: 233506 KiB
frame=  474 fps=0.0 q=-1.0 Lsize=N/A time=00:00:19.76 bitrate=N/A speed= 717x    
video:233501kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

mp4 -> mkv lossless samples

$ ab-av1 sample-encode -i test-vids/pixabay-nature-31377.mp4 --crf 28 --preset 12
VMAF 96.65 predicted video stream size 329.75 MiB (325%) taking 22 seconds

==> pixabay-nature-31377.sample4+480f.mkv
file: 72067 KiB
frame=  480 fps=0.0 q=-1.0 Lsize=N/A time=00:00:19.97 bitrate=N/A speed=2.84e+03x    
video:72062kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

==> pixabay-nature-31377.sample4+480f.crf28.p12.ivf
file: 234560 KiB
frame=  480 fps=0.0 q=-1.0 Lsize=N/A time=00:00:20.02 bitrate=N/A speed=1.05e+03x    
video:234555kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
pixabay-sunrise-83880.mp4 results

pixabay-sunrise-83880.mp4

mp4 -> mp4 lossless samples

$ ab-av1 sample-encode -i test-vids/pixabay-sunrise-83880.mp4 --crf 28 --preset 12
- Sample 1 (55%) vmaf 96.13
VMAF 96.13 predicted video stream size 5.33 MiB (55%) taking 20 seconds

==> pixabay-sunrise-83880.sample6+2398f.mp4                                                                             
file: 5577 KiB                                                                                                          
frame= 2398 fps=0.0 q=-1.0 Lsize=N/A time=00:00:18.14 bitrate=N/A speed=5.33e+03x                                       
video:5549kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

==> pixabay-sunrise-83880.sample6+2398f.crf28.p12.ivf
file: 3072 KiB
frame= 2181 fps=0.0 q=-1.0 Lsize=N/A time=00:00:18.19 bitrate=N/A speed=3.28e+03x    
video:3047kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

mp4 -> mkv lossless samples

$ ab-av1 sample-encode -i test-vids/pixabay-sunrise-83880.mp4 --crf 28 --preset 12
VMAF 96.15 predicted video stream size 5.36 MiB (59%) taking 20 seconds

==> pixabay-sunrise-83880.sample6+2398f.mkv
file: 5566 KiB
frame= 2398 fps=0.0 q=-1.0 Lsize=N/A time=00:00:19.97 bitrate=N/A speed=5.48e+03x    
video:5549kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

==> pixabay-sunrise-83880.sample6+2398f.crf28.p12.ivf
file: 3266 KiB
frame= 2401 fps=0.0 q=-1.0 Lsize=N/A time=00:00:20.02 bitrate=N/A speed=3.47e+03x    
video:3238kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
pixabay-turntable-8453.mp4 results

pixabay-turntable-8453.mp4

mp4 -> mp4 lossless samples

$ ab-av1 sample-encode -i test-vids/pixabay-turntable-8453.mp4 --crf 28 --preset 12
- Sample 1 (15%) vmaf 94.14
VMAF 94.14 predicted video stream size 2.34 MiB (15%) taking 7 seconds

==> pixabay-turntable-8453.sample14+500f.mp4                                                                            
file: 6542 KiB                                                                                                          
frame=  500 fps=0.0 q=-1.0 Lsize=N/A time=00:00:19.52 bitrate=N/A speed=1.48e+04x                                       
video:6537kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

==> pixabay-turntable-8453.sample14+500f.crf28.p12.ivf
file: 974 KiB
frame=  491 fps=0.0 q=-1.0 Lsize=N/A time=00:00:19.64 bitrate=N/A speed=1.42e+04x    
video:969kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

mp4 -> mkv lossless samples

$ ab-av1 sample-encode -i test-vids/pixabay-turntable-8453.mp4 --crf 28 --preset 12
VMAF 93.95 predicted video stream size 2.29 MiB (15%) taking 8 seconds

==> pixabay-turntable-8453.sample14+500f.mkv
file: 6541 KiB
frame=  500 fps=0.0 q=-1.0 Lsize=N/A time=00:00:19.88 bitrate=N/A speed=1.3e+04x    
video:6537kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

==> pixabay-turntable-8453.sample14+500f.crf28.p12.ivf
file: 972 KiB
frame=  500 fps=0.0 q=-1.0 Lsize=N/A time=00:00:20.00 bitrate=N/A speed=1.48e+04x    
video:967kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

Analysis

In all cases frame count and duration of lossless vs encoded samples are closer using mkv and closer to 20s than with mp4.

In particular pixabay-bridge-23544.mp4, pixabay-lemon-82602.mp4 reproduce the case where the mp4 sample is significantly under 20s. Playing the lossless versions with mpv I can see 4-5s of "negative time" before 0, so there is ~20s of video but with strange timing. When ffmpeg converts this to yuv streams for encoding or vmaf though this "negative time" doesn't seem to be used. This means the encoded samples have significantly fewer frames and the file/video stream comparison is misleading as the lossless samples contain more video than they let on. In both cases mkv samples had no issues.

So according to these results for mp4 files using .mkv for lossless samples is a clear win.

Video stream size vs file size

Getting back to what this issue is really about this testing also shows file size vs video size info. We can compare encode-percent calculations using either. (We'll use mkv samples results since they're better).

  • 4kmedia-sony-new-york-fashion-demo.mp4
    • file size (50936 / 186557) = 27.30%
    • vid size (50922 / 186547) = 27.30%
  • 4kmedia-spacex-launch-demo.mp4
    • file size (3425 / 28659) = 11.95%
    • vid size (3419 / 28653) = 11.93%
  • pixabay-bridge-23544.mp4
    • file size (29893 / 65737) = 45.47%
    • vid size (29886 / 65731) = 45.47%
  • pixabay-elevator-3735.mp4
    • file size (7837 / 9044) = 86.65%
    • vid size (7832 / 9041) = 86.63%
  • pixabay-lemon-82602.mp4
    • file size (9223 / 35571) = 25.93%
    • vid size (9218 / 35567) = 25.92%
  • pixabay-nature-31377.mp4
    • file size (234560 / 72067) = 325.47%
    • vid size (234555 / 72062) = 325.49%
  • pixabay-sunrise-83880.mp4
    • file size (3266 / 5566) = 58.68%
    • vid size (3238 / 5549) = 58.35%
  • pixabay-turntable-8453.mp4
    • file size (972 / 6541) = 14.86%
    • vid size (967 / 6537) = 14.79%

This supports my initial feeling that there isn't a significant win in using video stream size over file size. If they were equally easy to do I would still do video stream comparison, however measuring file size is easier as it doesn't involve ffmpeg output parsing.

@alexheretic
Copy link
Owner Author

Great spot regarding mkv vs mp4 samples @iPaulis!

With #90 merged I think we can close this issue as I plan to continue using the file-size comparison until video stream size can be demonstrated to be better enough to be worth digging out of the ffmpeg output.

@iPaulis
Copy link

iPaulis commented Dec 5, 2022

i think this must be an edge case with ffmpeg where is only encoding 11s because of some quirk with the lossless samples. So the encoded percent comes out too low. Just to clarify the video stream size there is pretty much the same as the file size then?

Yes, completely accurate.

I haven't seen this with mp4s myself though. It's interesting that using mkv solves it, that could be a solution as you suggest. However, I'll need to test more widely in case mp4->mkv sampling causes other issues.

I've been testing more mp4 files and almost all of their samples have this issue where the duration is not correctly detected to some degree. For some files it is just 18-19s instead of 20s samples, which introduces a smaller inaccuracy, but there also are some files like the example above where the duration difference is higher like 11-12s or 15-16s, and the inaccuracy in the estimated percentage seems proportional to the duration inaccuracy.

I found a video file I could share with you so you could try to reproduce the issue and see if it also happens in your system.
I used the --samples 10 parameter to detect the issue more easily: ab-av1 crf-search --samples 10 --pix-format yuv420p10le --min-vmaf 96 --vmaf pool=harmonic_mean -i '.\webproject (5h zuzenketak).mp4' --preset 6
I'm adding a screenshot of the samples of both files, source mp4 and remuxed mkv.
Captura de pantalla 2022-12-05 140852

The sample sizes are equivalent and correct, but you can see the durations are not consistent. I confirm that all samples are in fact 20 seconds long.

Edit: Oh, ok, I just saw your replies, you were faster than me :)

@iPaulis
Copy link

iPaulis commented Dec 5, 2022

This supports my initial feeling that there isn't a significant win in using video stream size over file size. If they were equally easy to do I would still do video stream comparison, however measuring file size is easier as it doesn't involve ffmpeg output parsing.

Please, I would highly recommend you against it. I've also been testing the input size method vs the video stream method and using the whole input has an inherent inaccuracy issue. It is the most reliable and accurate prediction when the video stream size is most of the whole file size (probably the case for the testfiles you used), but the bigger the percentage of the audio streams, the higher the inaccuracy it introduces.

That is specially noticeable in files with multiple audio tracks, high quality audios that take up much space, etc.
Very clear example with a very common type of file (real encoded video stream size is 6,1GB):

  • Input size method: crf 18 VMAF 96.51 predicted full encode size 9.37 GiB (32%) taking 2 hours
  • Vstream size method: crf 18 VMAF 96.51 predicted video stream size 5.90 GiB (32%) taking 2 hours

New method is close enough, that's fine.

Another file (real encoded video stream size is 6,0GB):

  • Input size method: crf 17 VMAF 96.47 predicted full encode size 11.97 GiB (45%) taking 2 hours
  • Vstream size method: crf 17 VMAF 96.47 predicted video stream size 6.81 GiB (45%) taking 2 hours

So, in this case the new method overestimates size (which could be improved), but it is closer than the old method anyway.

We can find a more reliable and accurate way than the current method avoiding having to regress to the input method and without needing to involve ffmpeg output parsing.
I think I have an alternative idea that could work (and should be fairly simple to implement and have no impact in ab-av1 runtime), I just have to think about it a bit more, I can come back later in the afternoon to explain.

@alexheretic
Copy link
Owner Author

Please, I would highly recommend you against it. I've also been testing the input size method vs the video stream method and using the whole input has an inherent inaccuracy issue. It is the most reliable and accurate prediction when the video stream size is most of the whole file size (probably the case for the testfiles you used), but the bigger the percentage of the audio streams, the higher the inaccuracy it introduces.

This is a different issue though. I'm talking about lossless sample file size vs encoded sample file size which according to these tests is pretty much the same as using the video stream while much easier to do.

The full input video stream size isn't relevant to calculating the encode percentage. It could make the full predicted video stream size more accurate, but it involves a full scan of the input which isn't desirable. The predicted size isn't as important as the percentage either, since the latter is used as part of a crf-search.

@alexheretic
Copy link
Owner Author

I've raised #91 so we can investigate further getting the full input video stream size for the predicted size calc (since you won't let me avoid it 😆)

@iPaulis
Copy link

iPaulis commented Dec 5, 2022

Please, I would highly recommend you against it. I've also been testing the input size method vs the video stream method and using the whole input has an inherent inaccuracy issue. It is the most reliable and accurate prediction when the video stream size is most of the whole file size (probably the case for the testfiles you used), but the bigger the percentage of the audio streams, the higher the inaccuracy it introduces.

This is a different issue though. I'm talking about lossless sample file size vs encoded sample file size which according to these tests is pretty much the same as using the video stream while much easier to do.

The full input video stream size isn't relevant to calculating the encode percentage. It could make the full predicted video stream size more accurate, but it involves a full scan of the input which isn't desirable. The predicted size isn't as important as the percentage either, since the latter is used as part of a crf-search.

Oh, you meant only for the samples, sorry for the misunderstanding. Yes, sure, as the samples don't have audio streams, sample file size should be pretty much the same as video stream size of the sample, with an almost insignificant overhead difference.

Anyway, I believe there is a way to improve the final predicted video stream size, but I have to leave now, I'll explain later and I hope you find it useful.

Sorry to bother you, it is not my intention, I just want to be helpful.

@alexheretic
Copy link
Owner Author

Sorry to bother you, it is not my intention, I just want to be helpful.

It's been super helpful mate, I appreciate the input!

@iPaulis
Copy link

iPaulis commented Dec 5, 2022

Sorry to bother you, it is not my intention, I just want to be helpful.

It's been super helpful mate, I appreciate the input!

I'm glad to help. I know you do, thank you for listening to all these requests and doing all the hard work. I really can't stress enough how helpful ab-av1 is and will continue being, great tool! I do a lot of encodes and it saves me a lot of time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants