# Lucas Kanade Optical Flow

## Motivation
* Optical flow is the pattern of motion of objects, surfaces, and edges in a visual scene caused by the relative motion between an observer and a scene
* It is a 2D vector field where each vector is a displacement vector showing the movement of points from first frame to second

## Applications
Optical flow has many application domains:
* Structure from motion
* Video compression
* Video stabilization
* Autonomous vehicles
* Object detection and tracking

## Foundational Ideas
It works on several assumptions:
* The pixel intensities of an object do not change between consecutive frames
* Neighbouring pixels have similar motion
* It assumes that motion is smooth, which means that the motion of one object is not independent of the motion of its neighbours

# Images as 3D objects
* Previously we've talked about how although images are 2D arrays of pixels, they are better thought of as samples from a continuous image surface: $I(x, y)$
* We've also talked about other dimensions that we could consider such as
    * Scale
    * Color
    * Time
* For now, we'll focus on the time dimension
* This means we can think of a sequence of frames (video) as samples of a 3D function: $I(x, y, t)$

# Optical Flow Foundational Equation
* Suppose the frames are capturing a moving object
* This means from one frame to another, the image will be displaced by some amount
    * We considered this idea previously in the context of Harris corner detection
* The object is also moving through the frame over time
* Suppose our object moves $(dx, dy)$ in a time interval of $dt$
* Assuming the intensity doesn't change and the movement is purely translational, we can write the following equation:
\begin{equation*}
    I(x + dx, y + dy, t + dt) = I(x, y, t) 
    \hspace{100em}
\end{equation*}
If we then take the Taylor expansion of the right hand side, we get:
\begin{align*}
    I(x + dx, y + dy, t + dt) &\approx I(x, y, t) + \frac{\partial I}{\partial x}dx + \frac{\partial I}{\partial y}dy + \frac{\partial I}{\partial t}dt
    \hspace{100em}
\end{align*}
which means
\begin{align*}
    I(x + dx, y + dy, t + dt) &= I(x, y, t) \\
    I(x + dx, y + dy, t + dt) - I(x, y, t) &= 0 \\
    \frac{\partial I}{\partial x}dx + \frac{\partial I}{\partial y}dy + \frac{\partial I}{\partial t}dt &= 0\\
    \frac{\partial I}{\partial x} \frac{dx}{dt} + \frac{\partial I}{\partial y} \frac{dy}{dt}  + \frac{\partial I}{\partial t} &=0\\
    \frac{\partial I}{\partial x} u + \frac{\partial I}{\partial y} v + \frac{\partial I}{\partial t} &=0
    \hspace{100em}
\end{align*}
where $u \triangleq \frac{dx}{dt}$ and $v \triangleq \frac{dy}{dt}$ are the velocities in the $x$ and $y$ directions respectively


## Lucas-Kanade Method
* The Lucas-Kanade method is a method for estimating the optical flow of an object in a video
* It is based on the idea that the motion of an object is smooth and that the pixel intensities of an object do not change between consecutive frames
* It works by taking a small window around a pixel and assuming that the motion of the object in that window is constant
* It then solves for the velocities $u$ and $v$ that satisfy the optical flow equation in that window

Let's assume we wish to minize the $L_2$ norm of the error between one patch in a frame and a shifted patch in the next frame. We'll use the first order Taylor expansion to approximate the shifted patch. We'll also assume that the motion is small, so we can ignore the higher order terms. This gives us the following equation:
\begin{align*}
    I(x+u, y+v, t) &\approx I(x, y, t) + \frac{\partial I(x,y,t)}{\partial x}dx + \frac{\partial I(x,y,t)}{\partial y}dy \\
    E(h) &\triangleq \sum_{x \in R} \left(I(x+u, y+v, t) - I(x, y, t + dt)\right)^2 \\
    &\approx \sum_{x \in R}  \left(I(x, y, t) + \frac{\partial I(x,y,t)}{\partial x} u + \frac{\partial I(x,y,t)}{\partial y} v - I(x, y, t + dt)\right)^2 
    \hspace{100em}
\end{align*}
where $h \triangleq [u, v]^T$ is the vector of velocities (changes from one frame to the next). The gradient of $E(h)$ is:
\begin{align*}
    \nabla_h E(h) &\triangleq 
    \begin{bmatrix}
        \frac{\partial E}{\partial u} \\
        \frac{\partial E}{\partial v}
    \end{bmatrix} \\
    &= 2 \sum_{x,y \in R} \left(I(x, y, t) + \frac{\partial I(x,y,t)}{\partial x} u + \frac{\partial I(x,y,t)}{\partial y} v - I(x, y, t + dt)\right) \nabla_{x,y} I(x,y,t) \\
    &= 2 \sum_{x,y \in R} \nabla_{x,y} I(x,y,t) \left(I(x, y, t) + (\nabla_{x,y} I(x,y,t))^T h - I(x, y, t + dt)\right) \\
    &= 0
    \hspace{100em}
\end{align*}
which means if we let $g \triangleq \nabla_{x,y} I(x,y,t)$ represent the image gradient (column) vector, we can solve for $h$ as follows:
\begin{align*}
    \sum_{x,y \in R} g \left(I(x, y, t) + g^T h - I(x, y, t + dt)\right)  &= 0 \\
    \sum_{x,y \in R} g \left(g^T \right) h &= \sum_{x,y \in R} g \left(I(x, y, t + dt) - I(x, y, t)\right) \\
    h &= \left(\sum_{x,y \in R} \left(g g^T \right)^{-1}\right) \sum_{x,y \in R} g \left(I(x, y, t + dt) - I(x, y, t)\right)
    \hspace{100em}
\end{align*}

* Note that like the Harris corner detector, this is driven by the average outer product of the Image gradient
* However, unlike the Harris corner detector, we're solving for the most accurate displacement of the patch in the next frame

### Possible Improvements
* Spatial weighting
    * Typically Gaussian
* Acount for affine transforms
    * This would require a 6D vector instead of a 2D vector
* Pyramidal approach
    * This would involve solving for the optical flow at multiple scales
    * This would allow for larger displacements to be captured
* Account for changes in lighting 
    * Brightness (additive) 
    * Contrast (multiplicative)
* Do iterative improvements
    * This would involve solving for the optical flow, then warping the image and solving again
    * This would be repeated until convergence or you reach the limit of your computational budget 


