# Image Basics
## Perspective Projection
While real cameras use lenses and mirrors, the most simple form of a camera is the *pin-hole camera*; consisting of a point aparture with no lense which projects the world onto a flat (if not flat can become flat with transformation) ***image*** (*retinal*) ***plane***.

![image.png](attachment:image.png)

As the traingles formed by the environment point $P=(X,Y,Z)$, the ***centre of projection*** $O$, the projected point $Q=(x,y,f)$ are similar, and the ***optical axes*** (perpendicular to *image plane* passing through *centre of projection*); a projection between them can be defined as $\frac{x}{f}=\frac{X}{Z}$ and $\frac{y}{f}=\frac{Y}{Z}$ ($\frac{y_1}{f}=\frac{x_1}{x_3}$ and $\frac{y_2}{f}=\frac{x_2}{x_3}$ respectively in the illustration). Equivalently in vector form this would become $\frac{1}{f}\vec{Q} = \frac{1}{\vec{P}\cdot\hat{z}}\vec{P}$ (where $\vec{P}\cdot\hat{z}=(X,Y,Z)^{\text{T}}(0,0,1)$).

By differentiating these projections with respect to time, $t$, the relationship between the environment velocity and the projection velocity can be defined $\frac{1}{f}\frac{dx}{dt}=\frac{1}{Z}\frac{dX}{dt}-\frac{X}{Z^2}\frac{dZ}{dt}=\frac{1}{Z}(\frac{dX}{dt}-\frac{x}{f}\frac{dZ}{dt})$. These thereby, facilitate the identification of stationary points (part of the image that do not exhibit any motion) $(x_0,y_0)=(\frac{dX}{dt}f/\frac{dZ}{dt},\frac{dY}{dt}f/\frac{dZ}{dt})$, called the ***focus of expansion*** (*FOE*). Re-writting the prespective differentials using the *FOE* results in $\frac{1}{f}\frac{dx}{dt}=\frac{1}{Z}\frac{dX}{dt}-\frac{X}{Z^2}\frac{dZ}{dt}= \frac{\frac{dZ}{dt}}{Z}\frac{\frac{dX}{dt}}{\frac{dZ}{dt}}-\frac{X}{Z}\frac{\frac{dZ}{dt}}{Z}=\frac{\frac{dZ}{dt}}{Z}(\frac{x_0-x}{f})$, describing a vector field moving outwards (or inwards) from the *FOE* with a rate of $\frac{dZ}{dt}/Z=\frac{d}{dt}\ln{Z}$ (its inverse, $Z/\frac{dZ}{dt}$ is the *time to contact*).
- $\frac{1}{f}\frac{d\vec{Q}}{dt}=\frac{1}{\vec{P}\cdot\hat{z}}\frac{d\vec{P}}{dt}-\frac{\vec{P}}{(\vec{P}\cdot\hat{z})^2}\frac{d(\vec{P}\cdot\hat{z})}{dt}=\frac{1}{Z}(\frac{d\vec{P}}{dt}-\frac{\vec{Q}}{f}\frac{\frac{d(\vec{P}\cdot\hat{z})}{dt}}{\vec{P}\cdot\hat{z}})$
- given the *cross product of a cross product* $\vec{a}\times(\vec{b}\times\vec{c})=(\vec{c}\cdot\vec{a})\vec{b}-(\vec{a}\cdot\vec{b})\vec{c}$ and $\frac{1}{\vec{P}\cdot\hat{z}}\frac{d\vec{P}}{dt}-\frac{\vec{P}}{(\vec{P}\cdot\hat{z})^2}\frac{d(\vec{P}\cdot\hat{z})}{dt}=\frac{1}{(\vec{P}\cdot\hat{z})^2}((\vec{P}\cdot\hat{z})\frac{d\vec{P}}{dt}-\vec{P}\frac{d(\vec{P}\cdot\hat{z})}{dt})$, and given the *cross product of a cross product* $\vec{a}\times(\vec{b}\times\vec{c})=(\vec{c}\cdot\vec{a})\vec{b}-(\vec{a}\cdot\vec{b})\vec{c}$, then $\frac{1}{f}\frac{d\vec{Q}}{dt}=\frac{1}{(\vec{P}\cdot\hat{z})^2}[\hat{z}\times(\frac{d\vec{P}}{dt}\times\vec{P})]$. 

This means that $\frac{d\vec{Q}}{dt}$ is perpendicular to $\hat{z}$ (cross product of 2 vectors is perpendicular to them); and if $\frac{d\vec{P}}{dt}$ is parallel to $\vec{P}$, $\frac{d\vec{P}}{dt}\times\vec{P}=0$ and so there will be no change in $\frac{d\vec{Q}}{dt}$ (element moving toward camera at the *FOE* will just appear to expand and not move).

### Brightness
An image (grayscale) is a 2D pattern of ***brightness*** values, which can be thought of as a continuous function, $E(x,y)$ (or descete $E_{x,y}$), which depends on the ***Illumination***, ***Reflectance***, and ***Orientation*** of the object photographed. while modern (digital) camera systems generate discretized images in both space (rectangle image sensor) and brightness (bitwise scale), it is better understood in the continuous domain and transformed back into descrete form.

#### 1D motion
While the majority of conventional images are 2D, there are specialised 1D sensors, $E(x,t)$, which are useful and can provide a much longer sensor array than a traditional sesor; though they require a scan/sweep to provide a "real" image (conveyor belt automation, satallite imaging as it moves, etc.).

Assuming image measurements (*brightness*) in a small region remain the same although their location may change (*brightness* consistancy); suppose a 1D image is taken as an object moves across the sensor. This means that $E(x,t)=E(x+\delta x,t+\delta t)$ as neighbouring points in the scene will typically belong to the same surface and hence typically have similar motions (spatial coherence in image flow is expected).

Therefore, the change in position of point $x$ in the *brightness* curve (assumed smooth) over time is given by $\delta x = \frac{dx}{dt}\delta t$; with the *brightness* gradient being $\frac{\partial E}{\partial x}$, and its change $\delta E=\frac{\partial E}{\partial x}\delta x =\frac{\partial E}{\partial x}\frac{dx}{dt}\delta t$ (linear approximation of the local *brightness*).

$$\therefore\frac{\delta E}{\delta t}= -\frac{\partial E}{\partial x}\frac{dx}{dt} \overset{\lim}{\rightarrow} \frac{\partial E}{\partial t}= -\frac{\partial E}{\partial x}\frac{dx}{dt}\rightarrow \frac{dx}{dt}=-\frac{\partial E/\partial t}{\partial E/\partial x}$$

$-\text{ve}$ added as the function curve has $\frac{\partial E}{\partial x}$ and $\frac{\partial E}{\partial t}$ related in such a way that forward motion has only one of the partial derivatives $-\text{ve}$, and backwards motion will have both $-\text{ve}$.
- A differnet approach is through $E(x+\delta x,t+\delta t) = E(x,t) + \frac{\partial E}{\partial x}\delta x + \frac{\partial E}{\partial t}\delta t$ and as $E(x,t)=E(x+\delta x,t+\delta t)$, then $E(x,t)=E(x,t) + \frac{\partial E}{\partial x}\delta x + \frac{\partial E}{\partial t}\delta t\rightarrow 0=\frac{\partial E}{\partial x}\delta x + \frac{\partial E}{\partial t}\delta t$. Therefore, dividing by $\delta t$ results in $0=\frac{\partial E}{\partial x}\frac{\delta x}{\delta t} + \frac{\partial E}{\partial t}\overset{\lim}{\rightarrow} \frac{dx}{dt}=-\frac{\partial E/\partial t}{\partial E/\partial x}$.

This allows the motion to be recovered from the image $\vec{u}=\frac{dx}{dt}=-\frac{\partial E/\partial t}{\partial E/\partial x}$ from a single point (only true in 1D case). In the descrete case, the *brightness* gradient can be estimated via $\frac{\partial E}{\partial x}\approx \frac{1}{\delta x}(E(x+\delta x,t)-E(x,t))$, and likewise $\frac{\partial E}{\partial t}\approx \frac{1}{\delta t}(E(x,t+\delta t)-E(x,t))$.

Note however that this subtraction of similar quantities can result in $\partial E/\partial x = 0$ (or close to 0), which means then the motion cannot be calculated (and implementation-wise it will be a dividing by 0). This means that each individual motion measurement is very noisy and not trustworthy; and so in practice to reduce the noise, motion is estimated using many pixels through regression techniques (Ordinary Least Squares, $\frac{\sum_{i=1}^Nw_i\frac{-\partial E/\partial t_i}{\partial E/\partial x_i}}{\sum_{i=1}^Nw_i}$ or integration in the continuous domain, etc.).

#### 2D motion
As in the previous 1D case, approximating the derivatives with finite forward first difference can be done:
- $\frac{\partial E}{\partial x}\approx \frac{1}{\delta x}(E(x+\delta x,y,t)-E(x,y,t))$
- $\frac{\partial E}{\partial y}\approx \frac{1}{\delta y}(E(x,y+\delta y,t)-E(x,y,t))$
- $\frac{\partial E}{\partial t}\approx \frac{1}{\delta t}(E(x,y,t+\delta t)-E(x,y,t))$

Alternatively, instead of taking the derivatives at a pixel and see how it changes to the next time frame; track the object and see how it changes. Making the constant *brightness* assumption means that $\frac{dE(x,y,t)}{dt}=0$ as the brightness of the object does not change over time and so: ***Brightness Change Constraint Equation***
$$\frac{dE(x,y,t)}{dt}=\frac{dx}{dt}\frac{\partial E}{\partial x}+\frac{dy}{dt}\frac{\partial E}{\partial y}+\frac{\partial E}{\partial t}=0 \leftrightarrow (\frac{dx}{dt},\frac{dy}{dt})\cdot(\frac{\partial E}{\partial x},\frac{\partial E}{\partial y})=-\frac{\partial E}{\partial t}$$

This appears as a line in velocity space, $(\frac{dx}{dt},\frac{dy}{dt})$, and by further normalizing the equation to turn the *brightness* gradient to a unit vector $(\frac{dx}{dt},\frac{dy}{dt})\cdot(\frac{\frac{\partial E}{\partial x}}{\sqrt{\frac{\partial E}{\partial x}^2+\frac{\partial E}{\partial y}^2}},\frac{\frac{\partial E}{\partial y}}{\sqrt{\frac{\partial E}{\partial x}^2+\frac{\partial E}{\partial y}^2}})=\frac{-\frac{\partial E}{\partial t}}{\sqrt{\frac{\partial E}{\partial x}^2+\frac{\partial E}{\partial y}^2}}$ and so as the *brightness* is constant over time, $\frac{\partial E}{\partial t}=0$ and so $(\frac{dx}{dt},\frac{dy}{dt})\cdot(\frac{\frac{\partial E}{\partial x}}{\sqrt{\frac{\partial E}{\partial x}^2+\frac{\partial E}{\partial y}^2}},\frac{\frac{\partial E}{\partial y}}{\sqrt{\frac{\partial E}{\partial x}^2+\frac{\partial E}{\partial y}^2}})=0$ meaning that the optical flow vector $(\frac{dx}{dt},\frac{dy}{dt})$ is perpendicular to the maximum change in *brightness* $(\frac{\partial E}{\partial x},\frac{\partial E}{\partial y})$ (*brightness* gradient points across edges, from dark to light, whereas optical flow vector points along the motion).

However, as there are two unknowns, $(\frac{dx}{dt},\frac{dy}{dt})$, the equation is underdetermined and thus requires additional constraints to be solvable. This is the aperture problem which states that the motion direction of a contour is ambiguouse due to the motion component parallel to the line cannot be inferred based on the visual input; meaning that a variety of contours of different orientations moving at different speeds (i.e. underdetermined equation) can cause identical responses in a motion sensitive neuron in the visual system.

A simple solution to this would be to add the constraint that more than one point in the image moves with the same velocities (single element moving across the plane); then $\frac{dx}{dt}\frac{\partial E}{\partial x_1}+\frac{dy}{dt}\frac{\partial E}{\partial y_1}+\frac{\partial E}{\partial t_1}=0$, and $\frac{dx}{dt}\frac{\partial E}{\partial x_2}+\frac{dy}{dt}\frac{\partial E}{\partial y_2}+\frac{\partial E}{\partial t_2}=0$ which means that

$$\left[ {\begin{array}{cc}\frac{\partial E}{\partial x_1} & \frac{\partial E}{\partial y_1} \\ \frac{\partial E}{\partial x_2} & \frac{\partial E}{\partial y_2} \\ \end{array} } \right] \left[ {\begin{array}{cc} \frac{dx}{dt} \\ \frac{dy}{dt} \\  \end{array} } \right] = \left[ {\begin{array}{cc} -\frac{\partial E}{\partial t_1} \\ -\frac{\partial E}{\partial t_2} \\ \end{array} } \right] \rightarrow \left[ {\begin{array}{cc} \frac{dx}{dt} \\ \frac{dy}{dt} \\  \end{array} } \right] = \frac{1}{\frac{\partial E}{\partial x_1}\frac{\partial E}{\partial y_2}-\frac{\partial E}{\partial y_1}\frac{\partial E}{\partial x_2}} \left[ {\begin{array}{cc}\frac{\partial E}{\partial y_2} & -\frac{\partial E}{\partial y_1} \\ -\frac{\partial E}{\partial x_2} & \frac{\partial E}{\partial x_1} \\ \end{array} } \right] \left[ {\begin{array}{cc} -\frac{\partial E}{\partial t_1} \\ -\frac{\partial E}{\partial t_2} \\ \end{array} } \right]$$

Similar to the 1D case, this solution is noisey and so can be improved by either utilising more points into the calculation (increased dimentionality will require a different $E_{x|y}'[u, v]^{\text{T}}=E_t'\rightarrow \ldots$) or taking multiple pairs of points and aggregating them. Note that this equation can fail when the determinant is 0, meaning that $\frac{\partial E}{\partial y_1}/\frac{\partial E}{\partial x_1}=\frac{\partial E}{\partial y_2}/\frac{\partial E}{\partial x_2}$ and that the *brightness* gradients are proportional to each other (does not provide new information).

A more robust solution would be to reframe this as an optimisation problem, where the minima of the motion equation is found instead. $J(u,v)\overset{\Delta}{=}\int_{x\in X}\int_{y\in Y}(u\frac{\partial E}{\partial x}+v\frac{\partial E}{\partial y}+\frac{\partial E}{\partial t})^2dxdy$ where if the correct velocities are found than the intergrand would be 0 and integrating it across the whole image will result in 0 (quadratic to ensure $\geq 0$). The strategy to find the "best" velocities would then be $u^*,v^*=\arg\min_{u,v}J(u,v)\rightarrow \frac{\partial J(u,v)}{\partial u}=0, \frac{\partial J(u,v)}{\partial v}=0$ (2 unknowns and 2 equations, can be solved). 

$$\left[ {\begin{array}{cc} \int\int\frac{\partial E}{\partial x}^2 & \int\int\frac{\partial E}{\partial x}\frac{\partial E}{\partial y} \\ \int\int\frac{\partial E}{\partial x}\frac{\partial E}{\partial y} & \int\int\frac{\partial E}{\partial y}^2 \end{array} } \right] \left[ {\begin{array}{cc} \frac{dx}{dt} \\ \frac{dy}{dt} \end{array} } \right] = -\left[ {\begin{array}{cc} \int\int\frac{\partial E}{\partial x}\frac{\partial E}{\partial t} \\ \int\int\frac{\partial E}{\partial y}\frac{\partial E}{\partial t} \end{array} } \right]$$
$$\therefore \left[ {\begin{array}{cc} \frac{dx}{dt} \\ \frac{dy}{dt} \\  \end{array} } \right] = -\frac{1}{\int\int \frac{\partial E}{\partial x}^2\int\int \frac{\partial E}{\partial y}^2-(\int\int \frac{\partial E}{\partial x}\frac{\partial E}{\partial y})^2} \left[ {\begin{array}{cc} \int\int\frac{\partial E}{\partial y}^2 & -\int\int\frac{\partial E}{\partial x}\frac{\partial E}{\partial y} \\ -\int\int\frac{\partial E}{\partial x}\frac{\partial E}{\partial y} & \int\int\frac{\partial E}{\partial x}^2 \end{array} } \right] \left[ {\begin{array}{cc} \int\int\frac{\partial E}{\partial x}\frac{\partial E}{\partial t} \\ \int\int\frac{\partial E}{\partial y}\frac{\partial E}{\partial t} \end{array} } \right]$$

This method can also fail as the resultant determinant will be $\int\int E_x^2\int\int E_y^2-(\int\int E_xE_y)^2$ and can be 0 when $E=0$ everywhere, $E=\text{constant}$, $\frac{\partial E}{\partial x}=0$, $\frac{\partial E}{\partial y}=0$ (constant *brightness*), and $\frac{\partial E}{\partial x}=k\frac{\partial E}{\partial y}$

- A different variation of this problem supposes spatial parameterization of *brightness* for linear $f$, $E(x,y)=f(ax+by)$; and supposing $f$ is differentibale over the domain, then $\frac{\partial E}{\partial x}=f'(ax+by)a$, $\frac{\partial E}{\partial y}=f'(ax+by)b$, and $f'$ is the derivative of this scalar-valued function. As seen, $\int\int E_x^2\int\int E_y^2=(\int\int E_xE_y)^2$ and so the equation fails as this parameterization creates linear isophotes.

### Noise Gain
***Noise gain***, the relationship between errors in measurements and errors in estimation of quantities about the environment, describes how errors in observed measurements (like noise in a signal) propagate and potentially amplify when estimating other quantities derived from those measurements. An example of this the contours of constant error patterns of an indoor GPS system using Wi-Fi access points (transponders), which arise from the difficulty to precisely measure electromagnetic wave propagation speeds, leading to significant inaccuracies in position estimates derived from noisy signal measurements.

<table><tr><td>

![image.png](attachment:image.png)
</td><td>

![image-2.png](attachment:image-2.png)
</td></tr></table>

Given than a phenomenon $x$ in the environment which cannot be directly observed, and instead $y\overset{\Delta}{=}f(x)$ is measured, written as $x\rightarrow f(x)$. This is called the *Forward Problem* of machine vision, with the *Inverse Problem* being $y\overset{\Delta}{=}f(x)\rightarrow x$ which is often more important in machine vision (i.e. from image/video derive environment phenomena). 

When it is possible to express the inverse of a function in closed form or via a matrix/coefficient, the inverse problem can be simply solved using $x=f^{-1}(y)$; though this is not the case often. Instead, a robust machine vision system to solve this *inverse problem* would need to be built; and to do that it is critical that small perturbations in $y$  do not lead to large changes in $x$ (important as sensors can exhibit measurment noise).


When the inverse of a function can be expressed in closed form or represented via a matrix or set of coefficients, the *inverse problem* can be directly solved using $x=f^{-1}(y)$. However, this is often not feasible in real-world scenarios. In such cases, a robust machine vision system must be designed to solve the *inverse problem* effectively; crucially, it must ensure that small perturbations in $y$  do not lead to large changes in $x$ (important as sensors can exhibit measurment noise). 


When solving an inverse problem, a small perturbation in the observed variable $y$ (denoted $\delta y$) leads to a corresponding perturbation in the estimated variable $x$ (denoted $\delta x$). The *noise gain* then quantifies how much the noise in $y$ affects the estimate $x$; and so mathematically, if $x=f^{-1}(y)$, then:

$$\text{noise gain}=\lim_{\delta\in\mathbb{R}\rightarrow 0}\frac{\delta x}{\delta y}=(\frac{df(x)}{dx})^{-1}$$

This fails if $\frac{df(x)}{dx}=0$ (phenomenon does not respond to quantity measured) or $\frac{df(x)}{dx}\approx 0$ (phenomenon responds slightly to quantity measured).

#### Vector case
Considering the multidimensional (linear) case, the *forward problem* can be defined as $\vec{x}=M\vec{b}$ and the *inverse* as $\vec{b}=M^{-1}\vec{x}$, $M\in\mathbb{R}^{m\times n}$ for $m,n\in\mathbb{N}$; thus, $\text{"noise gain"}\rightarrow\frac{|\delta\vec{b}|}{|\delta\vec{x}|},\delta\in\mathbb{R}$, though this does not take into account *antisotropy* (physical property which has a different value when measured in different directions) and so may not be equally high (or low) in all directions (similar to the error patterns above) and is good to know which component to trust. 

Solving the *inverse problem* requires $M^{-1}=\frac{1}{\det(M)}\left[ {\begin{array}{cc} \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \end{array} } \right]$ which can also fail if $\det(M)=0$ or $\det(M)\approx 0$ (linear dependence in the columns of $M$). As can be seen above, solving the two-pixel motion estimation is indeed solving the *inverse problem*.

### Time To Contact (*TTC*)
Previously, *TTC* was defined as $Z/\frac{dZ}{dt}$, and so let the quantity $C$ be expressed as the inverse of *TTC*, $C\overset{\Delta}{=}\frac{dZ}{dt}/Z$; and suppose that the camera is moving directly towards/away from an object (*radical* motion in image, $\frac{dX}{dt}=0$, $\frac{dY}{dt}=0$, $x_0=0$, and $y_0=0$), then $\frac{1}{f}\frac{dx}{dt} = \frac{\frac{dZ}{dt}}{Z}(\frac{x_0-x}{f})\rightarrow \frac{dx}{dt}=Cx$, and likewise $\frac{dy}{dt}=Cy$.

Applying this to the *Brightness Change Constraint Equation* would then produce $C(x\frac{\partial E}{\partial x}+y\frac{\partial E}{\partial y})+\frac{\partial E}{\partial t}=0\rightarrow C=\frac{\frac{\partial E}{\partial t}}{x\frac{\partial E}{\partial x}+y\frac{\partial E}{\partial y}}$; the denominator is called the *"radial gradient"*, $(x,y)\cdot(\frac{\partial E}{\partial x},\frac{\partial E}{\partial y})=\sqrt{x^2+y^2}(\frac{x}{\sqrt{x^2+y^2}},\frac{y}{\sqrt{x^2+y^2}})\cdot(\frac{\partial E}{\partial x},\frac{\partial E}{\partial y})$, essencially computing the *brightness* gradient component in the radial direction (multiplied by the radius), and can fail if $(x,y)\approx k(\frac{\partial E}{\partial x},\frac{\partial E}{\partial y})$.

![image.png](attachment:image.png)

Therefore, the "bullseye" will *NOT* cause the equation to fail while the "pie-chart" *WILL* fail. This makes intuative sense as moving towards the centre causes a change in the image for the "bullseye" but not the "pie-chart", and thus no motion can be ascertained.

For a more robust estimate, the least-squares approach can be taken again and minimize across the entire image using the parameterized velocities defined above ($\frac{dX}{dt}=0$, $\frac{dY}{dt}=0$, $x_0=0$, and $y_0=0$); and since the *inverse TTC* is to be solved, the minimisation of error is done on it to produce $\min_{C}J(C)\overset{\Delta}{=}\int_{x\in X}\int_{y\in Y}(C(x\frac{\partial E}{\partial x}+y\frac{\partial E}{\partial y})+\frac{\partial E}{\partial t})^2dxdy$. As there is only one unknown, minimising this is simply solving $\frac{dJ(C)}{dC}=2\int_{x\in X}\int_{y\in Y}(C(x\frac{\partial E}{\partial x}+y\frac{\partial E}{\partial y})+\frac{\partial E}{\partial t})(x\frac{\partial E}{\partial x}+y\frac{\partial E}{\partial y})dxdy=0$ $$\therefore C=-\frac{\int_{x\in X}\int_{y\in Y}\frac{\partial E}{\partial t}(x\frac{\partial E}{\partial x}+y\frac{\partial E}{\partial y})dxdy}{\int_{x\in X}\int_{y\in Y}(x\frac{\partial E}{\partial x}+y\frac{\partial E}{\partial y})^2dxdy}{}$$

While mathematically correct, practically there are a few issues which arise; as real cameras utilise lenses, as the camera moves closer to the object it may become out of focus and so changing the information and violating the assumption of constant *brightness*. Also, where initially the image motion is very small, as the camera gets closer to the object the image motion increases substantially; violating the assumption of derivative estimation (taking hardcoded neighbouring pixels does not apply well for non-linear image motion). A solution to this is to combine image pixels (motion on the "super"-pixel is relatively small relative to full resolution) and run these algorithms at **multiple scales**, thus computing/analysing motion velocities over different timescales. This can also be helpful for building more computationally-tractable motion estimation implementations, since the number of pixels over which computations must occur can be reduced quadratically with the scale.

This method is additionally computationally efficient, using the infinite geometric series $\sum_{n=0}^\infty(\frac{1}{r^2})^n=\frac{1}{1-\frac{1}{r^2}}=1+\frac{1}{r^2-1}$, downsampling/downscaling by a factor of $2$ each time and storing all the smaller image representations requires only $33%$ more stored data than the full size image itself (downsampling across both $x$ and $y$ dimensions, and so $r^2$).

- Note to be mindful of aliasing, which causes overlap and distortion between signals in the frequency domain, and it is required to sample at a spatial frequency that is high enough to not produce aliasing artifacts. ***Nyquist’s Sampling Theorem*** states to sample at **twice** the frequency of the highest-varying component of the image to avoid aliasing and consequently reducing spatial artifacts.

#### Relaxed velocities constraints

If $\frac{dX}{dt}\neq 0$ and $\frac{dY}{dt}\neq 0$, then $\frac{dx}{dt}=\frac{1}{Z}(f\frac{dX}{dt}-x\frac{dZ}{dt})=\frac{f\frac{dX}{dt}}{Z}-Cx$ and $\frac{dy}{dt}=\frac{1}{Z}(f\frac{dY}{dt}-y\frac{dZ}{dt})=\frac{f\frac{dY}{dt}}{Z}-Cy$. Given that $C\overset{\Delta}{=}\frac{dZ}{dt}/Z$, these can be further simplified as $\frac{f\frac{dX}{dt}}{Z} = (\frac{\frac{dZ}{dt}}{Z})(\frac{\frac{dX}{dt}f}{\frac{dZ}{dt}})=Cx_0$ and similarly $\frac{f\frac{dY}{dt}}{Z} = Cy_0$; and so $\frac{dx}{dt}=C(x_0-x)$ and $\frac{dy}{dt}=C(y_0-y)$. Let than $\frac{f\frac{dX}{dt}}{Z}=A$ and $\frac{f\frac{dY}{dt}}{Z}=B$. Therefore, *Brightness Change Constraint Equation* can be rewritten as $A\frac{\partial E}{\partial x}+B\frac{\partial E}{\partial y} + C(x\frac{\partial E}{\partial x}+y\frac{\partial E}{\partial y}) +\frac{\partial E}{\partial t}=0$ and so $\min_{A,B,C}J(A,B,C)\overset{\Delta}{=}\int_{x\in X}\int_{y\in Y}(A\frac{\partial E}{\partial x}+B\frac{\partial E}{\partial y} + C(x\frac{\partial E}{\partial x}+y\frac{\partial E}{\partial y}) +\frac{\partial E}{\partial t})^2dxdy$ will than be solvable at

$$\left[ {\begin{array}{cc} \int\int E_x^2 & \int\int E_xE_y & \int\int E_x(x\frac{\partial E}{\partial x}+y\frac{\partial E}{\partial y}) \\ \int\int E_yE_x & \int\int E_y^2 & \int\int E_y(x\frac{\partial E}{\partial x}+y\frac{\partial E}{\partial y}) \\ \int\int (x\frac{\partial E}{\partial x}+y\frac{\partial E}{\partial y})E_x & \int\int (x\frac{\partial E}{\partial x}+y\frac{\partial E}{\partial y})E_y & \int\int (x\frac{\partial E}{\partial x}+y\frac{\partial E}{\partial y})^2 \end{array} } \right] \left[ {\begin{array}{cc} A \\ B \\ C \end{array} } \right] = -\left[ {\begin{array}{cc} \int\int E_xE_t \\ \int\int E_yE_t \\ \int\int (x\frac{\partial E}{\partial x}+y\frac{\partial E}{\partial y})E_t \end{array} } \right]$$

(solving system of equations $\frac{\partial J(A,B,C)}{\partial A}=0$, $\frac{\partial J(A,B,C)}{\partial B}=0$, and $\frac{\partial J(A,B,C)}{\partial C}=0$)

#### Relaxed Planar constraints
If the optical axis is not perpendicular to the wall, but instead the camera plane is tilted, $Z=aX+bY+C$ for some $a,b,C\in\mathbb{R}$. This follows the same process as above, creating $6$ equations with $6$ unknowns; unfortunately these are no longer linear equations and so can be solved numerically rather than through a closed-form expression.

A planar surface was chosen due to it producing linear equations which can be solved in a closed-form expression; where real surfaces may not be planar, and while can be placed in the same procedure (least squares optimisation) and numerically solved (do not produce closed-form expression), in practice the planar model gives a very good estimate of *TTC* while decreasing the complexity (more unknowns, lost overconstraintness, etc.). Therefore, the only time this should be considered is when the **depth-change of the object is similar to the distance from the object to the camera**.

#### Relaxed Fixed Optical Flow constraint
The solutions found for the *Brightness Change Constraint Equation* above were done under the assumption that the entire image motion is constant and singular (move together). If this is not the case, then *BCCE* introduces one constraint to solve for two variables (under-constrained). A naive solution would then be to **divide the image into equal-sized patches and apply the *Fixed Flow Paradigm***. Note that the smaller the patch, the more uniform the *brightness* patterns will be across the patch, and patches may be too uniform to detect motion.

#### Applications
NASA's mission to Europa required a probe to safely land on the surface; but to a large extent there was not a lot of detailed topographical maps and imagery of Europa, and so they wanted something to reliably bring down the spacecraft without the reliance of known patterns. One of the ideas explored was to use *TTC* ***in control*** as it does not need calibration or pre-knowledge of topographical maps of the surface. Additionally, it works on any surface (except certain special textures which are known).

![image-2.png](attachment:image-2.png)

The difference between some desired *TTC* and the estimated *TTC* is taken, producing an error signal, and multiplying it by a gain to control the rocket engine; changing the acceleration of the probe. This change in acceleration would change the height ($Z$) which would be picked up by the imaging system and will lead to a new *TTC* estimate. This system attempts to maintain the same *TTC*, as close to the desired value as is possible.

While it is difficult to compute height from a monocular camera without calibration, its ratio with its speed can be robustly calculated. Let then $\frac{Z}{\frac{dZ}{dt}}=T\rightarrow \frac{dZ}{dt}=\frac{1}{T}Z$ where $T$ is constant; and so the solution for this first-order ***Ordinary Differential Equation*** will be $Z(t)=Z_0e^{\frac{-t}{T}}$ ($Z_0$ depends on initial conditions).

Compare this with a more traditional method of constant deceleration $\frac{d^2Z}{dt^2}\overset{\Delta}{=}a\rightarrow \frac{dZ}{dt}=a(t-t_0)\rightarrow Z=\frac{1}{2}a(t-t_0)^2$ where the *TTC* will be $T=\frac{Z}{\frac{dZ}{dt}}=\frac{1}{2}(t-t_0)$. While this method is more fuel efficient than the *TTC*, it will achieve a half of the desired *TTC* and is harder to implement (requires accurate estimations of distance and velocities).

### Vanishing Point
Suppose there is a line in the world, defined as $\vec{R}=\vec{R_0}+s\hat{\vec{n}}$, which is imaged by a camera projecting it onto the 2D image plane, $\frac{1}{f}\vec{r}=\frac{1}{(\vec{R_0}+s\hat{\vec{n}})\hat{\vec{n}}}(\vec{R_0}+s\hat{\vec{n}})$. Parametrically, $(X_0+\alpha s, Y_0+\beta s, Z_0+\gamma s)\rightarrow \frac{x}{f}=\frac{X_0+\alpha s}{Z_0+\gamma s},\frac{y}{f}=\frac{Y_0+\alpha s}{Z_0+\gamma s}$ and so if $\lim_{s\rightarrow\infty}$ (line extends infinitely), then $\frac{x_\infty}{f}=\frac{\alpha}{\gamma}$ and $\frac{y_\infty}{f}=\frac{\beta}{\gamma}$. This means that $(\frac{\alpha}{\gamma},\frac{\beta}{\gamma})$ is the ***vanishing point*** in the image plane; as the line in the world extends to infinity, the projected line approaches this point but never reaches it (more generally, parallel lines in the world have the same *vanishing point* in the image).

This is important for both finding relative orientation between coordinate systems, as well as camera calibration.

#### Calibration Objects
One way of calibrating a camera is solving for the ***Centre of Projection*** (*COP*) (centre of lens is very rarely at the centre of the sensor plane and needs to be recovered) in the image space using perspective projection, typically achieved through calibration objects.
- **Sphere**: Relatively easiy to manifacture, if image is directly straight-on the projection from the world sphere to the image plane is a circle but if is not will be elliptic (plane intersection with cone at different angles produces ellipses). This requires accurate position measures of object and projection and while it can be done, its *noise gain* is high.
- **Cube**: Harder to manifacture, though can be used for detecting edges and thereby finding *vanishing points* (edges are lines in the world). For each these sets of lines, a line which goes through the *Centre of Projection*, $\vec{p}\in\mathbb{R}^3$, can be chosen and so can then project the *COP* onto the image plane. While $\vec{p}$ is unknown, the projected vanishing points, $\vec{a},\vec{b},\vec{c}\in\mathbb{R}^2$, are; and as a cube has its edges orthogonal to one another, $(\vec{p}-\vec{a})\cdot(\vec{p}-\vec{b})=0$, $(\vec{p}-\vec{b})\cdot(\vec{p}-\vec{c})=0$, and $(\vec{p}-\vec{c})\cdot(\vec{p}-\vec{a})=0$. Using ***Bezout's Theorem*** (maximum number of solutions is the product of the polynomial order of each equation in the system of equation), $\text{number of solutions}=\prod_{e=1}^{E}o_e$ ($2^3=8$). This is too many equations to work with, but as these equations have a special structure and so can be subtracted from one another to generate $(\vec{p}-\vec{a})\cdot(\vec{c}-\vec{b})=0$, $(\vec{p}-\vec{b})\cdot(\vec{a}-\vec{c})=0$, and $(\vec{p}-\vec{c})\cdot(\vec{b}-\vec{a})=0$ which all have order 1; however, adding the first two equations results in the third (not linearly independent), and so to maintain independence, the third equation can be left as a quadratic resulting in $2$ solutions for the system. This problem can be thought of as finding the intersection of $3$ spheres (point lying on a sphere will be at aright-angle from points at the diameter) which will produce 2 solutions (intersection of two spheres produces a ring of solutions); one above the centre-points plane, and one below (one above and one below the image sensor); thereby a single real solution.

#### Multilateration
This is very similar to the multilateration problem, where a position of an object is estimated based of *Time of Arrival*; and like the previous one is solved by finding teh intersection of 3 spheres $||\vec{r}-\vec{r}_i||_2=p_i\forall i\in\{1,\ldots,N\}$. The $i^{th}$ sphere can then be written as $\vec{r}\cdot\vec{r}-2\vec{r}\cdot\vec{r}_i+\vec{r}_i\cdot\vec{r}_i=p_i^2$ and so the interect between two spheres can be rewritten as $2\vec{r}\cdot(\vec{r}_j-\vec{r}_i)+\vec{r}_i\cdot\vec{r}_i-\vec{r}_j\cdot\vec{r}_j=p_i^2-p_j^2$ and thus repeated for the 3 spheres becomes
$$\left[ {\begin{array}{cc} (\vec{r}_2-\vec{r}_1)^{\text{T}} \\ (\vec{r}_3-\vec{r}_2)^{\text{T}} \\ (\vec{r}_1-\vec{r}_3)^{\text{T}} \end{array} } \right] \vec{r}= \frac{1}{2}\left[ {\begin{array}{cc} (p_2^2-p_1^2)-(\vec{r}_1\cdot\vec{r}_1-\vec{r}_2\cdot\vec{r}_2) \\ (p_3^2-p_2^2)-(\vec{r}_2\cdot\vec{r}_2-\vec{r}_3\cdot\vec{r}_3) \\ (p_1^2-p_3^2)-(\vec{r}_3\cdot\vec{r}_3-\vec{r}_1\cdot\vec{r}_1) \end{array} } \right]$$

Note however, that similar to the previous method, this also results in a linear dependence between the equations (left-hand matrix is singular) and so can be solved again using *Bezout's Theorem* (keeping one second-order equation). Another method of solving this can be to view it geometrically, where the contraints $(\vec{r}-\vec{r}_1)\cdot(\vec{r}-\vec{r}_2)=0$ and $(\vec{r}-\vec{r}_2)\cdot(\vec{r}-\vec{r}_3)=0$ can be subtracted from each other, resulting in $(\vec{r}-\vec{r}_2)\cdot(\vec{r}_3-\vec{r}_1)=0$; meaning that $(\vec{r}-\vec{r}_2)\perp (\vec{r}_3-\vec{r}_1)$ and that the plane this represents passes through $\vec{r}_2$. Applying this to all the equations results in the interesting solution of the ***orthocenter*** point of the traingle $\vec{r}_1\vec{r}_2\vec{r}_3$; which is the ***Principal Point*** (orthogonal projection of *COP*) of the camera. Therefore, as $\vec{r}_1,\vec{r}_2,\vec{r}_3$ are on the image plane ($\vec{r}_1\cdot\hat{\vec{z}}=0$, etc.), then after solving to find the *orthocenter*, a simple quadratic equation can eb solved to find the perpendicular distance of *COP* to image plane $f$.

![image.png](attachment:image.png)

Linear equations can be written as $y=mx+c\leftrightarrow ax+by+c=0$, $\sin\theta x-\cos\theta y+\rho=0\leftrightarrow (-\sin\theta, \cos\theta)^{\text{T}}\cdot\vec{r}=\rho$ in 2D space (2 degrees of freedom); or $\hat{\vec{n}}\cdot\vec{r}=\rho$, $aX+bY+cZ+d=0$ in 3D space (3 degrees of freedom)