Skip to content

FAQ: HD Video Playback

Tobias Wolf edited this page Apr 26, 2013 · 3 revisions
High Performance Video Playback

HD quality video is very taxing on your CPU, video card, and disks, so first and foremost you'll want a fast machine.

If you are concerned about dropped frames, the safest option is to load all of your video frames into VRAM, and then play them back. The LoadMovieIntoTexturesDemoOSX is good for determining what is possible, in terms of load times, and playback rates, when called in benchmark mode:

LoadMovieIntoTexturesDemoOSX(<video file name> ,[], [], [], 1)

Note that it works cross-platform, not just under OS X. Usually the limiting factor is decoding the videos into textures, not the drawing of the textures to the screen.

You may also have more success using GStreamer than Quicktime. Type help gstreamer to find out more.


Q: What kind of file→texture decode rates are possible?

A: Using Version 3.0.9 - (Build date: Mar 17 2012), with an SSD drive as source storage:

Video: 1280x720x120hz, hardware: i7 2600: 4 core 3.4gz, Gforce GT 545, 3gb vram

  • Format mjpeg: 138 fps
  • Format mpeg4 (xvid, not avc): 133
  • Format x264 (avc): 43 fps

On the topic of video formats that might decode fastest, T Wolf said:

Just my ¤0.02, Handbrake's defaults enable all the bells and whistles that codec developers have come up with in the last 10 years to make the file as small as possible. This means decoding is complex. So don't use the defaults for fast decode.

There are profiles in H.264 that limit decoding complexity. In x264 they are exposed as:

--profile <string> Force the limits of an H.264 profile Overrides all
settings. - baseline,main,high,high10,high422,high444

I guess you want baseline. Note that high10 can give you 10-bit color and high444 gives you RGB. Otherwise you get 4:2:2 subsampled color, and much smaller gamut. Not all decoders support these. But Gstreamer does.

You can also try --tune fastdecode in addition. Don't worry about quality, this is governed by CRF constant quality factor. The file will just be bigger, but faster to decode.

FFmpeg might use a different MP4 muxer (for both ASP and AVC) [Than handbreak]. The encoder is the same. I would go with matroska mkv in any case. Quicktime won't play that though.


Note that while decoding a video frame does take a while, uploading that frame to VRAM isn't instantaneous. See MakeTextureTimingTest2 to find out how long this step takes.


Here are some other comments on this issue from Mario:

Try PlayGaplessMoviesDemo2 – GStreamer only, but "even more gapless" gapless playback than the classic gapless demo, because it can make use of GSTreamers builtin gapless playback support instead of needing to play tricks like with QT.

On OS choice: there's always Linux as the better alternative. Linux GStreamer is of a more recent version, with improved support for multi-threaded decoding from Ubuntu 11.10 onwards. E.g., H264 encoded material can get a nice boost, utilizing up to 4 cores of a 8-core machine.

Simultaneously decoding videos and drawing them to the screen (no pre-loading), via OpenMovie with asyncflag set to 4:

[This] decodes the movie at highest speed and queues up all video frames in memory for presentation, doesn't drop frames, doesn't care about playback timing or audio-video sync etc. It is the fastest method if you don't care about random access to specific frames (like you could with the method of LoadMovieIntoTextureDemo), don't care about audio-video sync (because there isn't any audio to sync to) and control playback timing yourself via Screen('Flip') 'when' parameter. I don't know if things like looping the movie would still work, but for your task this should be the most efficient way of doing it.

This method does put more stress on the OS scheduler because there isn't any throttling of decoding to playback framerate anymore, so GStreamer will just get any amount of CPU time it can get to decode as fast as possible – the GStreamer threads compete for CPU resources with the main Matlab/PTB thread, so it could e.g., happen that GStreamer gets the CPU to decode and queue yet another frame when the main thread or graphics driver would need the CPU more urgently to avoid a skipped presentation deadline. The Priority() command can help a bit there, but how much it helps depends on the underlying realtime capabilities of the operating system scheduler. In that category Linux, when configured properly for realtime use, has a fabulous reputation, whereas MS-Windows defines the absolute zero reference point. OS/X is somewhere in between those two.

The additional buffering is mostly only useful for movies without sound, because it prevents automatic control of playback framerate and automatic audio-video sync.

Steps:

  1. OpenMovie with asyncflag set to 4.
  2. Start movie playback. This starts the decoding process.
  3. Wait for a few seconds to prebuffer data.
  4. Start your Screen('GetMovieImage') fetch and draw loop.

The engine will decode video buffers and queue them in an internal queue, as soon as playback is started, until it gets stopped. GetMovieImage will fetch the oldest buffer (fifo order) and convert it to a texture. You can control the maximum amount of buffered video via the preloadSecs parameter (default = 1 second), a setting of -1 would allow infinite buffering, ie., until you run out of system memory.

You'll probably have to use Priority() to make sure your main thread isn't deprived of computation time by all the GStreamer threads running at maximum decoding speed.

The remaining bottleneck would be the texture creation/upload/draw time. There you can try a few things, which may or may not have any effect on performance, in a good or bad direction:

  1. A new optional parameter specialFlags1 in OpenMovie: A setting of 2 disables audio-decoding, 1 tries to use YUV color encoded textures instead of RGBA textures if the GPU and driver supports this – may increase or decrease performance. A setting 2+1 gives you both.

  2. There's some Screen('Preference', 'ConserveVRAM') (help ConserVRAMSettings) setting called TextureFormatOverride or something like that. It allows to try an alternative texture encoding for RGBA which also may be faster or slower, if the YUV parameter doesn't help.

And then you can do little micro-optimizations, e.g. using the dontclear=2 flag in Flip to prevent clearing the framebuffer if you're overdrawing the video stimulus anyway – may save up to one msec or so...

Using the additional conservevram setting 512 aka kPsychAvoidCPUGPUSync could also make sense (see help ConserveVRAMSettings, all numbers of all used flags add up). This would disable any kind of OpenGL/GPU error checking in texture creation, DrawTexture, Flip etc. Usually not recommended, but once your code works error-free it may save some fraction of a msec.

Clever use of Screen('DrawingFinished') after the last drawing command, before you do other stuff like KbChecks and such may also help to increase parallelism between CPU and GPU. Could be that the remaining skipped frames are due to delays on the GPU, not CPU – you can only time the CPU with GetSecs, tic/toc, the profiler etc. For the GPU there is special profiling support on supported GPUs, as shown in DrawingSpeedTest if you follow the gpumeasure flag.

In the end, if we talk about occassional misses by a (few) msecs, we're in the world of endless tweaks. E.g., running the GPU always at its highest performance setting to avoid interference of GPU power management, choosing the right operating system instead of the wrong one, tweaking CPU power management and other settings on operating systems that support such things, and so on...

Clone this wiki locally