# Image Basics
## Projection
While real cameras use lenses and mirrors, the most simple form of a camera is the *pin-hole camera*; consisting of a point aparture with no lense which projects the world onto a flat (if not flat can become flat with transformation) ***image*** (*retinal*) ***plane***.

![image.png](attachment:image.png)

As the traingles formed by the environment point $P=(X,Y,Z)$, the ***centre of projection*** $O$, the projected point $Q=(x,y,f)$ are similar, and the ***optical axes*** (perpendicular to *image plane* passing through *centre of projection*); a projection between them can be defined as $\frac{x}{f}=\frac{X}{Z}$ and $\frac{y}{f}=\frac{Y}{Z}$ ($\frac{y_1}{f}=\frac{x_1}{x_3}$ and $\frac{y_2}{f}=\frac{x_2}{x_3}$ respectively in the illustration). Equivalently in vector form this would become $\frac{1}{f}\vec{Q} = \frac{1}{\vec{P}\cdot\hat{z}}\vec{P}$ (where $\vec{P}\cdot\hat{z}=(X,Y,Z)^{\text{T}}(0,0,1)$).

By differentiating these projections with respect to time, $t$, the relationship between the environment velocity and the projection velocity can be defined $\frac{1}{f}\frac{dx}{dt}=\frac{1}{Z}\frac{dX}{dt}-\frac{X}{Z^2}\frac{dZ}{dt}=\frac{1}{Z}(\frac{dX}{dt}-\frac{x}{f}\frac{dZ}{dt})$. These thereby, facilitate the identification of stationary points (part of the image that do not exhibit any motion) $(x_0,y_0)=(\frac{dX}{dt}f/\frac{dZ}{dt},\frac{dY}{dt}f/\frac{dZ}{dt})$, called the ***focus of expansion***. Re-writting the prespective differentials using the FOE results in $\frac{1}{f}\frac{dx}{dt}=\frac{1}{Z}\frac{dX}{dt}-\frac{X}{Z^2}\frac{dZ}{dt}= \frac{\frac{dZ}{dt}}{Z}\frac{\frac{dX}{dt}}{\frac{dZ}{dt}}-\frac{X}{Z}\frac{\frac{dZ}{dt}}{Z}=\frac{\frac{dZ}{dt}}{Z}(\frac{x_0-x}{f})$, describing a vector field moving outwards (or inwards) from the FOE with a rate of $\frac{dZ}{dt}/Z=\frac{d}{dt}\ln{Z}$ (its inverse $Z/\frac{dZ}{dt}$ is the time to impact).
- $\frac{1}{f}\frac{d\vec{Q}}{dt}=\frac{1}{\vec{P}\cdot\hat{z}}\frac{d\vec{P}}{dt}-\frac{\vec{P}}{(\vec{P}\cdot\hat{z})^2}\frac{d(\vec{P}\cdot\hat{z})}{dt}=\frac{1}{Z}(\frac{d\vec{P}}{dt}-\frac{\vec{Q}}{f}\frac{\frac{d(\vec{P}\cdot\hat{z})}{dt}}{\vec{P}\cdot\hat{z}})$
- given the *cross product of a cross product* $\vec{a}\times(\vec{b}\times\vec{c})=(\vec{c}\cdot\vec{a})\vec{b}-(\vec{a}\cdot\vec{b})\vec{c}$ and $\frac{1}{\vec{P}\cdot\hat{z}}\frac{d\vec{P}}{dt}-\frac{\vec{P}}{(\vec{P}\cdot\hat{z})^2}\frac{d(\vec{P}\cdot\hat{z})}{dt}=\frac{1}{(\vec{P}\cdot\hat{z})^2}((\vec{P}\cdot\hat{z})\frac{d\vec{P}}{dt}-\vec{P}\frac{d(\vec{P}\cdot\hat{z})}{dt})$, and given the *cross product of a cross product* $\vec{a}\times(\vec{b}\times\vec{c})=(\vec{c}\cdot\vec{a})\vec{b}-(\vec{a}\cdot\vec{b})\vec{c}$, then $\frac{1}{f}\frac{d\vec{Q}}{dt}=\frac{1}{(\vec{P}\cdot\hat{z})^2}[\hat{z}\times(\frac{d\vec{P}}{dt}\times\vec{P})]$. 

This means that $\frac{d\vec{Q}}{dt}$ is perpendicular to $\hat{z}$ (cross product of 2 vectors is perpendicular to them); and if $\frac{d\vec{P}}{dt}$ is parallel to $\vec{P}$, $\frac{d\vec{P}}{dt}\times\vec{P}=0$ and so there will be no change in $\frac{d\vec{Q}}{dt}$ (element moving toward camera at the FOE will just appear to expand and not move).

## Brightness
An image (grayscale) is a 2D pattern of ***brightness*** values, which can be thought of as a continuous function, $E(x,y)$ (or descete $E_{x,y}$), which depends on the ***Illumination***, ***Reflectance***, and ***Orientation*** of the object photographed. while modern (digital) camera systems generate discretized images in both space (rectangle image sensor) and brightness (bitwise scale), it is better understood in the continuous domain and transformed back into descrete form.

### 1D motion
While the majority of conventional images are 2D, there are specialised 1D sensors, $E(x,t)$, which are useful and can provide a much longer sensor array than a traditional sesor; though they require a scan/sweep to provide a "real" image (conveyor belt automation, satallite imaging as it moves, etc.).

Assuming image measurements (*brightness*) in a small region remain the same although their location may change (*brightness* consistancy); suppose a 1D image is taken as an object moves across the sensor. This means that $E(x,t)=E(x+\delta x,t+\delta t)$ as neighbouring points in the scene will typically belong to the same surface and hence typically have similar motions (spatial coherence in image flow is expected).

Therefore, the change in position of point $x$ in the *brightness* curve (assumed smooth) over time is given by $\delta x = \frac{dx}{dt}\delta t$; with the *brightness* gradient being $\frac{\partial E}{\partial x}$, and its change $\delta E=\frac{\partial E}{\partial x}\delta x =\frac{\partial E}{\partial x}\frac{dx}{dt}\delta t$ (linear approximation of the local *brightness*).

$$\therefore\frac{\delta E}{\delta t}= -\frac{\partial E}{\partial x}\frac{dx}{dt} \overset{\lim}{\rightarrow} \frac{\partial E}{\partial t}= -\frac{\partial E}{\partial x}\frac{dx}{dt}\rightarrow \frac{dx}{dt}=-\frac{\partial E/\partial t}{\partial E/\partial x}$$

$-\text{ve}$ added as the function curve has $\frac{\partial E}{\partial x}$ and $\frac{\partial E}{\partial t}$ related in such a way that forward motion has only one of the partial derivatives $-\text{ve}$, and backwards motion will have both $-\text{ve}$.
- A differnet approach is through $E(x+\delta x,t+\delta t) = E(x,t) + \frac{\partial E}{\partial x}\delta x + \frac{\partial E}{\partial t}\delta t$ and as $E(x,t)=E(x+\delta x,t+\delta t)$, then $E(x,t)=E(x,t) + \frac{\partial E}{\partial x}\delta x + \frac{\partial E}{\partial t}\delta t\rightarrow 0=\frac{\partial E}{\partial x}\delta x + \frac{\partial E}{\partial t}\delta t$. Therefore, dividing by $\delta t$ results in $0=\frac{\partial E}{\partial x}\frac{\delta x}{\delta t} + \frac{\partial E}{\partial t}\overset{\lim}{\rightarrow} \frac{dx}{dt}=-\frac{\partial E/\partial t}{\partial E/\partial x}$.

This allows the motion to be recovered from the image $\vec{u}=\frac{dx}{dt}=-\frac{\partial E/\partial t}{\partial E/\partial x}$ from a single point (only true in 1D case). In the descrete case, the *brightness* gradient can be estimated via $\frac{\partial E}{\partial x}\approx \frac{1}{\delta x}(E(x+\delta x,t)-E(x,t))$, and likewise $\frac{\partial E}{\partial t}\approx \frac{1}{\delta t}(E(x,t+\delta t)-E(x,t))$.

Note however that this subtraction of similar quantities can result in $\partial E/\partial x = 0$ (or close to 0), which means then the motion cannot be calculated (and implementation-wise it will be a dividing by 0). This means that each individual motion measurement is very noisy and not trustworthy; and so in practice to reduce the noise, motion is estimated using many pixels through regression techniques (Ordinary Least Squares, $\frac{\sum_{i=1}^Nw_i\frac{-\partial E/\partial t_i}{\partial E/\partial x_i}}{\sum_{i=1}^Nw_i}$ or integration in the continuous domain, etc.).

### 2D motion
As in the previous 1D case, approximating the derivatives with finite forward first difference can be done:
- $\frac{\partial E}{\partial x}\approx \frac{1}{\delta x}(E(x+\delta x,y,t)-E(x,y,t))$
- $\frac{\partial E}{\partial y}\approx \frac{1}{\delta y}(E(x,y+\delta y,t)-E(x,y,t))$
- $\frac{\partial E}{\partial t}\approx \frac{1}{\delta t}(E(x,y,t+\delta t)-E(x,y,t))$

Alternatively, instead of taking the derivatives at a pixel and see how it changes to the next time frame; track the object and see how it changes. Making the constant *brightness* assumption means that $\frac{dE(x,y,t)}{dt}=0$ as the brightness of the object does not change over time and so:
$$\frac{dE(x,y,t)}{dt}=\frac{dx}{dt}\frac{\partial E}{\partial x}+\frac{dy}{dt}\frac{\partial E}{\partial y}+\frac{\partial E}{\partial t}=0 \leftrightarrow (\frac{dx}{dt},\frac{dy}{dt})\cdot(\frac{\partial E}{\partial x},\frac{\partial E}{\partial y})=-\frac{\partial E}{\partial t}$$

This appears as a line in velocity space, $(\frac{dx}{dt},\frac{dy}{dt})$, and by further normalizing the equation to turn the *brightness* gradient to a unit vector $(\frac{dx}{dt},\frac{dy}{dt})\cdot(\frac{\frac{\partial E}{\partial x}}{\sqrt{\frac{\partial E}{\partial x}^2+\frac{\partial E}{\partial y}^2}},\frac{\frac{\partial E}{\partial y}}{\sqrt{\frac{\partial E}{\partial x}^2+\frac{\partial E}{\partial y}^2}})=\frac{-\frac{\partial E}{\partial t}}{\sqrt{\frac{\partial E}{\partial x}^2+\frac{\partial E}{\partial y}^2}}$ and so as the *brightness* is constant over time, $\frac{\partial E}{\partial t}=0$ and so $(\frac{dx}{dt},\frac{dy}{dt})\cdot(\frac{\frac{\partial E}{\partial x}}{\sqrt{\frac{\partial E}{\partial x}^2+\frac{\partial E}{\partial y}^2}},\frac{\frac{\partial E}{\partial y}}{\sqrt{\frac{\partial E}{\partial x}^2+\frac{\partial E}{\partial y}^2}})=0$ meaning that the optical flow vector $(\frac{dx}{dt},\frac{dy}{dt})$ is perpendicular to the maximum change in *brightness* $(\frac{\partial E}{\partial x},\frac{\partial E}{\partial y})$ (*brightness* gradient points across edges, from dark to light, whereas optical flow vector points along the motion).

However, as there are two unknowns, $(\frac{dx}{dt},\frac{dy}{dt})$, the equation is underdetermined and thus requires additional constraints to be solvable. This is the aperture problem which states that the motion direction of a contour is ambiguouse due to the motion component parallel to the line cannot be inferred based on the visual input; meaning that a variety of contours of different orientations moving at different speeds (i.e. underdetermined equation) can cause identical responses in a motion sensitive neuron in the visual system.

A simple solution to this would be to add the constraint that more than one point in the image moves with the same velocities (single element moving across the plane); then $\frac{dx}{dt}\frac{\partial E}{\partial x_1}+\frac{dy}{dt}\frac{\partial E}{\partial y_1}+\frac{\partial E}{\partial t_1}=0$, and $\frac{dx}{dt}\frac{\partial E}{\partial x_2}+\frac{dy}{dt}\frac{\partial E}{\partial y_2}+\frac{\partial E}{\partial t_2}=0$ which means that

$$\left[ {\begin{array}{cc}\frac{\partial E}{\partial x_1} & \frac{\partial E}{\partial y_1} \\ \frac{\partial E}{\partial x_2} & \frac{\partial E}{\partial y_2} \\ \end{array} } \right] \left[ {\begin{array}{cc} \frac{dx}{dt} \\ \frac{dy}{dt} \\  \end{array} } \right] = \left[ {\begin{array}{cc} -\frac{\partial E}{\partial t_1} \\ -\frac{\partial E}{\partial t_2} \\ \end{array} } \right] \rightarrow \left[ {\begin{array}{cc} \frac{dx}{dt} \\ \frac{dy}{dt} \\  \end{array} } \right] = \frac{1}{\frac{\partial E}{\partial x_1}\frac{\partial E}{\partial y_2}-\frac{\partial E}{\partial y_1}\frac{\partial E}{\partial x_2}} \left[ {\begin{array}{cc}\frac{\partial E}{\partial y_2} & -\frac{\partial E}{\partial y_1} \\ -\frac{\partial E}{\partial x_2} & \frac{\partial E}{\partial x_1} \\ \end{array} } \right] \left[ {\begin{array}{cc} -\frac{\partial E}{\partial t_1} \\ -\frac{\partial E}{\partial t_2} \\ \end{array} } \right]$$

Similar to the 1D case, this solution is noisey and so can be improved by either utilising more points into the calculation (increased dimentionality will require a different $E_{x|y}'[u, v]^{\text{T}}=E_t'\rightarrow \ldots$) or taking multiple pairs of points and aggregating them. Note that this equation can fail when the determinant is 0, meaning that $\frac{\partial E}{\partial y_1}/\frac{\partial E}{\partial x_1}=\frac{\partial E}{\partial y_2}/\frac{\partial E}{\partial x_2}$ and that the *brightness* gradients are proportional to each other (does not provide new information).

A more robust solution would be to reframe this as an optimisation problem, where the minima of the motion equation is found instead. $J(u,v)\overset{\Delta}{=}\int_{x\in X}\int_{y\in Y}(u\frac{\partial E}{\partial x}+v\frac{\partial E}{\partial y}+\frac{\partial E}{\partial t})^2dxdy$ where if the correct velocities are found than the intergrand would be 0 and integrating it across the whole image will result in 0 (quadratic to ensure $\geq 0$). The strategy to find the "best" velocities would then be $u^*,v^*=\arg\min_{u,v}J(u,v)\rightarrow \frac{\partial J(u,v)}{\partial u}=0, \frac{\partial J(u,v)}{\partial v}=0$ (2 unknowns and 2 equations, can be solved). This method can also fail as the resultant determinant will be $\int\int E_x^2\int\int E_y^2-(\int\int E_xE_y)^2$ and can be 0 when $E=0$ everywhere, $E=\text{constant}$, $\frac{\partial E}{\partial x}=0$, $\frac{\partial E}{\partial y}=0$ (constant *brightness*), and $\frac{\partial E}{\partial x}=k\frac{\partial E}{\partial y}$