Skip to content
Hermann Voßeler edited this page Aug 20, 2016 · 9 revisions

Blender hacks

This is a clone from Blender upstream git — for experiments, extensions, patches

Improving the 2D stabiliser function

Note
the resulting improvement D583 was accepted to official Blender master with Git commit b167720
It is expected to become part of official Blender Release 2.78 and above

Recently I tried to use Blender to prepare some video shots for editing.

While generally speaking the tracker component of Blender is very powerful and opens endless possibilities, I found the 2D stabilisation support to be somewhat lacking.

Problems

  • unpredictable bumpiness of the translation stabilisation, which just doesn’t go away, even when adding several tracks with carefully chosen layout

  • the canvas does not adapt and expand on demand, which makes it near impossible to deal with sidewise tracking or panning shots

  • tracks with later onset and gaps cause hard to understand problems

  • rotation stabilisation works quite well in a “clean demo” setup, but is unusable when it’s needed most.

    • only a single track can be used to control rotation

    • this track doesn’t pick up the angular movements in a correct way

    • when the tracking marker moves towards the top or the bottom, above or below the other tracking points, unpredictable jumps can happen.

    • when some tracks start at a differing positions, the rotation compensation flounders completely, flipping the frame upside down

A closer investigation of the Blender source code indicated some problems with the current implementation.

  • tracking is based on the “median” of the tracking points, which effectively means to use just the information from two outlier tracks. Thus we’re throwing away most of the available tracking data. This is unfortunate, more so, since tracks close to the border tend to be problematic in practice

  • “median” behaves unstable over time and depends on the track configuration, especially when the outlying tracks have gaps or start and end at different frames than the rest of the tracks.

  • the rotation stabilisation support hooks in at top call level and does not integrate well with the rest of the code structure; this might explain why there is only one rotation track

  • the rotation is measured relative to the problematic “median” location (probably with the intention of using something like a weight centre of tracking positions), but the determined angle is applied in the frame centre, which leads to wrong positions and spurious movements basically almost everywhere in the frame.

  • no proper handling for the possibility of tracks to cover different time intervals, or to contain gaps. Instead, just the first/last value is used as placeholder. Combine this with using the “median” as reference point for rotation, and we get all sorts of unpredictable behaviour

All theses problems aren’t fundamental though, they can be amended by actually using the available tracking data and by consciously picking one reasonable approximation and handling all the necessary calculations systematically, uniform, and without the attempt to cut corners.

Concept for the reworked Version

To get to such a more uniform approach, in my reworked version, we measure the movement parameters in several probe points (tracking markers) and combine the normalised contributions for each parameter by a weighted average.

Basic assumptions

We intend to apply a 2D workflow — which is distinct and must not be confused with a 3D workflow. In post-production, a 2D workflow starts and ends at the image level, without any modelling, spatial reconstruction, camera mapping or rendering of virtual remoulded elements into the reconstructed image. While a 3D workflow might open new and thrilling possibilities, typically a 2D workflow is chosen and preferred, when there is no need for spatial reconstruction, since all we want to do is to perform some direct image manipulation. In such situations, a 2D workflow is less cost intensive and error prone.

Typical usage scenarios

  • fix minor deficiencies (shaky tripod, jerk in camera movement)

  • poor man’s steadycam (when a real steadycam was not available, affordable or applicable)

  • as preparation for masking, matching and rotoscoping

It is not uncommon for 2D stabilisation to have to deal with somewhat imperfect and flawed footage.

Calculation model

Given this usage scenario, we don’t even attempt to recreate or solve for the original camera’s 3D movement; rather we postulate a affine-linear image transformation (translation + rotation and scale) and fit it with measurement values. Measurements are taken at several tracked image features, available as tracks from Blender’s tracking component. Moreover, we acknowledge the fact these might be inaccurate — so we allow the user to use one set of tracking points to determine the image translation offset, while another set of tracking points is used to measure rotation and scale relative to the translation offset. Measurements are set up in a way to normalise away the different tracking point locations, so that every point basically contributes a “similar” value. Individual contributions can then be combined by weighted average — allowing the user to control and fade out the effect of individual measurement points. As stated before, the general assumption is that the user knows and understands the spatial image structure and provides sensible measurement points; the averaging helps to work around image deficiencies and perspective movements.

A mathematical correct calculation would require us to integrate the step wise measurement contributions — that is, for each single frame to determine the average increment of all tracks, and then to sum up the incremental steps globally to form a movement path from start to end frame. Obviously, such a calculation would be problematic to implement, especially for longer video sequences, leading either to a quadratic calculation pattern or necessitating a cache of per frame partial sums.

But this calculation can be approached reasonably by looking at a start frame and the current frame, and then using (vector) distances to get the whole contribution of a track in one step. Basically we swap integration and averaging, and we exploit the simple geometric nature of the contributions to perform a symbolic integration, instead of summing per frame contributions numerically. Thus we calculate the cumulated contribution for each track and then average over all the tracks. This approach doesn’t capture the effect of track weights changing under way, only the effect of the track weights at current frame, yet this error seems acceptable.

To deal with the flexible track start positions possible in Blender, we introduce the following definitions

anchor frame

this is a global reference point and can be just frame 0 — but preferably it is chosen in the middle of the video sequence. By definition the position and rotation of the image is set to zero for the anchor frame. Ideally the anchor frame should feature the subject positioned optimally in cadre and composition.

reference frame of a track

for each track we define a local reference frame. This frame is chosen as this track’s data frame closest to the anchor frame (ideally at the anchor-frame).

baseline contribution

for each track and each individual measurement data feed, we define the baseline contribution as the cumulated value of data feed at the reference frame. Obviously, this baseline contribution doesn’t contain values produced by the track itself, because the reference frame was chosen as close as possible to the anchor. Thus we can approximate the baseline contribution by the average measurement of all those other tracks, which do cover the timespan between anchor frame and our local reference frame.

Together, this allows to “bootstrap” the calculation for all tracks.

normalised contributions

For each data feed, we try to get the contributions into a “logically similar” shape.

  • for offsets, we subtract the position at the reference frame

  • for angles, we subtract the angle measurement at reference frame

  • for scale measurements, we divide through the scale measurement of this track at reference frame.

It figures that in practice this normalisation constant and the baseline can be packed into a single pre computed value per each track and data feed.

By following this rule, for each data feed we get a measurement which represents the observed movement of the image frame. We try to keep these measurements clearly distinct from the setup of a function to compensate this movement. But this compensation function must be defined in a way corresponding precisely to the way used to retrieve the data measurement.

Especially this means that, for compensation, the rotation has to be performed around exactly the same pivot point, which has previously been used to retrieve the rotation information for this frame.

The Pivot Point Problem

The most tricky part is how to pick the pivot point for rotation and scale compensation. To be more precise, we have to distinguish several pivots involved in the process.

  • the (accidental) rotation centre of the original camera movement to be compensated

  • the rotation centre chosen deliberately by the camera operator for moving shots

  • the centre used in the 2D stabiliser to retrieve measurements and move the frame for compensation

A 100% compensation is possible only when this latter pivot has been chosen suitably. Yet there seems to be no easy rule for this choice. Especially we have to consider that in practice tracking points are not 100% stable and contain some error, both image quantisation error and errors caused by 3D perspective. Especially movement of tracking points in 3D space tends to produce spurious rotation information. This is a major obstacle for any attempt to figure out “the real” pivot automatically.

As a simple starting point, for this reworked version I’ve chosen just the image centre as pivot point — but the image centre after applying the translation compensation. It remains to be discussed if there could be a better choice, or if we should leave this choice to the user. The following diagram shows the situation when compensating a rotation movement. Note that for the original image in camera, the rotation centre is not on the horizon, but slightly above.

what we do to compensate 2d image movement
step-1

we determine the translation offset by averaging over the measurement points
please note: what we measure is how the image elements are shifted inside the frame, not how the frame itself needs to be shifted for compensation. This explains the direction of the measured translation vector (in this example down and right)

step-2

we shift the pivot point (= image centre) of frame-B by the measured translation, i.e. we shift the pivot alongside with the image elements, to the amount we have determined them to be shifted. This happens to bring the pivot close to the corresponding position (image centre) of frame-A

step-3

now we can measure the angle changes for each tracking point. We then average over these angle contributions. Note: in this example, the asymmetry of the pivot point relative to the horizon causes slightly differing angles for both points; the resulting average angle lies in the middle

When we now rotate the frame-B around the translation compensated pivot point and then apply the translation offset, we get a close match, but we miss the target points by a small margin. In this example, we can just “see” that a better pivot point would have been where the two horizon lines cross — but we shouldn’t assume that matters are as clear in a real world situation

  • the original rotation centre does not necessarily lie on the horizon line nor the frame centre. In fact, it can be anywhere in the frame, or even outside the frame, since it is an arbitrary combination of accidental movement, and a free choice of movement by the camera operator

  • in practice, the movements of the tracking points are not “clean” (like in this example), but contain some fuzziness or errors caused by motion blur, shape and perspective changes, or by real movements of depicted objects during the shot.

Features of this rework of Blender’s 2D stabilizer

  • ability to pick multiple tracks for translation compensation, and to pick the same or other tracks for rotation compensation.

  • detection of scale changes (Zoom) together with rotation

  • the ability to define a expected target position of the stabilized image

Image Target Position

In practice, stabilized shots are rarely static. Rather, the camera made all kinds of zoom, pan, and travelling movements. Deliberately, we want to reconstruct these movements — we do not want to “correct” the original movements, rather we want to re-do them in a more considerate way. Moreover, please note that it is impossible by principle to calculate “correct” reconstructed movements automatically — since the actual movements involve an esthetical component and are to some extent based on taste.

The approach proposed here is to work these intended movements right into the parameters of the stabilisation — and do so before it is applied to the image. Basically this means that the raw stabilisation data is corrected by (compensates for) the intended movement. The net effect is that the corrected image stays in frame

  • when the camera pans or travels, the raw stabilisation would soon move the image entirely out of frame. But by animating the parameter for the target position, we can keep the image roughly in frame

  • by animating the target rotation, we can follow rolling and tilting movements of the original shot

  • when the footage a zoom, the scale detection of the stabilisation function will apply the reverse of the zoom. If we zoomed e.g. to Tele, the stabilised frame will become smaller. By animating the target zoom parameter, we can now recreate a smooth zoom according to actual esthetical demands

You can’t perform that action at this time.