# **3D Computer Vision (NUS, 2020)**

My notes for the 3D Computer Vision course by Lee Gim Hee on Youtube. This course mostly covers topics in multiple view geometry from Hartley-Zisserman and occasionally from Ma-Soatto-Kosecka-Sastry. 

## **Lecture 1: 2D and 1D Projective Geometry**



We say a point $\vec{x} = [x, y, z]$ in homogeneous coordinates lies on a line $\ell = [a,b,c]$ if $\ell \cdot \vec{x} = ax + by + cz = 0$. The intersection of two points $\ell$ and $\ell'$ is $\ell \times \ell'$, and the line through two points $\vec{x}$ and $\vec{x'}$ is $\vec{x} \times \vec{x'}$.

Parallel lines $[a,b,c]$ and $[a,b,c']$ meet at the poindt at in finity $[-b, a, 0]$.

### **Conics and Dual Conics**

A conic is a curve defined by a second degree equation in the plane. In matrix form we can write this in the form $x^T C x = 0$ where $C$ is a symmetric matrix.

Each point $(x_i, y_i)$ places a linear constraint on the conic coefficients. Since a conic is determined by $6$ parameters up to a scale factor, 5 points determine a conic.

A line $\ell$ is tangent to $C$ at a point $x$ if $\ell = Cx$. First note that in this case we have $\ell^T x = x^T C x = 0$. If $\ell^Tz = 0$ for some $z$ such that $z^TCz = 0$ then $x^TCz = z^TCz = 0$. This means $(x - z)^T \cdot Cz = 0$. But $$(x-z)^TC(x-z) =  x^TCx - x^TCx - z^TC(x-z) = 0$$

A dual (line) conic defines an equation on lines, denoted by $C^*$, and those lines satisfying $\ell^T C^* \ell = 0$ are precisely those that are tangent to $C$. If $C$ is non-singular and symmetric, then $C^* = C^{-1}$. Indeed, if $\ell = Cx$, then $\ell^T C^{-1} \ell  = x^T C C^{-1} C x = x^T C x = 0$. If $\ell^T C^{-1} \ell = 0$ Then for $x = C^{-1}\ell$ we have $\ell = Cx$, $\ell^T x = 0$, and $x^TC x = \ell^T C^{-1} C C^{-1} \ell = 0$. 

### **Projective Transformations of the Plane**

A projectivity is an invertible mapping $h : \mathbb{P}^2 \to \mathbb{P}^2$ such that $x_1,x_2,x_3$ lie on the same line if and only if $h(x_1), h(x_2), h(x_3)$ do. A projectivity is also called a *projective transformation* or a *homography*. A mapping $h : \mathbb{P}^2 \to \mathbb{P}^2$ is a projectivity if and only if there exists a non-singular $3 \times 3$ matrix $H$ such that $h(x) = Hx$ for any $x \in \mathbb{P}^2$.

*Proof*: If $x_1,x_2, x_3$ lie on a line $\ell$ then $\ell^T x_i = 0$. Now $(H^{-T} \ell)^T Hx_i = \ell^T x_i = 0$ for each $i$. The converse is more difficult, and so we omit the proof.

So under a transformation $x' = Hx$ we have that a line $\ell$ transforms by $\ell' = H^{-T}\ell$. Indeed $\ell'Hx = \ell x$ so that $Hx \in \ell'$ if and only if $x \in \ell$.

 Similarly, we have $C' = H^{-T}CH^{-1}$ and $C^{*'} = H C^* H^{T}$ since we have $(Hx)^T C' (Hx) = x^T C x$ and $(H^{-T}\ell)^T C^{*'} (H^{-T}\ell) = \ell^T C^* \ell$.
`
 

A projective transformation can be computed from four point correspondences, with no three collinear on each plane. A general projective transformation can be decompose into a chain of transformations $$H = H_S H_A H_p = \begin{bmatrix} sR & \mathbf{t} \\ \mathbf{0}^T & 1 \end{bmatrix}\begin{bmatrix} K & \mathbf{0} \\ \mathbf{0}^T & 1\end{bmatrix}\begin{bmatrix} I & \mathbf{0} \\ \mathbf{v}^T & v\end{bmatrix} = \begin{bmatrix}A & \mathbf{t} \\ \mathbf{v}^T & v \end{bmatrix}$$ if $v \neq 0$. Indeed, $$H_{S}H_{A}H_{p} =  \begin{bmatrix} sR & \mathbf{t} \\ \mathbf{0}^T & 1 \end{bmatrix}\begin{bmatrix} K & \mathbf{0} \\ \mathbf{v}^T & v \end{bmatrix} = \begin{bmatrix}sRK + \mathbf{t}\mathbf{v}^T & \mathbf{t} v \\ \mathbf{v} & v\end{bmatrix}$$ So we just have to pick our variables so that $\mathbf{t}' = \mathbf{t}v$ and $A = sRK + \mathbf{t'}\frac{\mathbf{v}}{v}$. Now $A - \mathbf{t}\mathbf{v}^T$ is a map $\mathbb{R}^2 \to \mathbb{R}^2$ so we need only consider decompositions of $A : \mathbb{R}^2 \to \mathbb{R}^2$. 

## **Lecture 2: Rigid Body Motion and 3D Projection Geometry**

A motion of a rigid body preserves the distance between any pair of points on it as well as orientation.

 A rigid body motion is a (continuous) family of maps $g(t) : \mathbb{R}^3 \to \mathbb{R}^3$ such that $g(t) \in SE(3)$ for each $t$. We have $$SE(3) =  \left\{\begin{bmatrix} R & T \\ 0 & 1 \end{bmatrix} \right\}$$ Write $g(t) = R(t) + T(t)$. Them $R(t)$ preserves both norm $\lvert\lvert\cdot\rvert\rvert$ and cross product $\cdot \times \cdot$. Also  we can write any rotation in the form $R(t) = R_{z}(\gamma) R_{y}(\beta) R_{x}(\alpha)$ where $\alpha,\beta, \gamma$ are the *Euler angles*.

We can denote a plane in $\mathbb{P}^3$ by $\mathbf{\pi} = [\pi_1,\ ldots, \pi_4]$ and a point $\mathbf{x} = [x_1, \ldots, x_4]$ lies on the plane if and only if $\mathbf{\pi} \cdot \mathbf{x} = 0$. Under a projective isomorphism $\mathbf{x}' = H \mathbf{x}$ we have $\mathbf{\pi}' = H^{-1}\mathbf{\pi}$.

Suppose that the points $\mathbf{x}_1, \mathbf{x}_2, \mathbf{x}_3$ are on the plane $\mathbf{\pi}$ so that $\begin{bmatrix} \mathbf{x}_1^T \\ \mathbf{x}_2^T\\\mathbf{x}_3^T \end{bmatrix}\mathbf{\pi} = \mathbf{0}$. If the three points are in *general position*, that is, the rank of the above $3 \times 4$ matrix is $3$, $\mathbf{\pi}$ is uniquely determined as the 1-dimensional null space of that matrix. Thus, three points determine a plane. If the matrix $[\mathbf{x}_1, \mathbf{x}_2, \mathbf{x}_3]^T$ has rank only 2, the three points are collinear and define a pencil of planes withe the line as axis.  

If $\mathbf{x}$ and $\mathbf{y}$ are two points in $\mathbb{P}^3$, then the line between then is given by $[\mathbf{x}, \mathbf{y}]^T$. Indeed, a line is given by the zero set of $n-1$ equations in $\mathbb{P}^n$ and this corresponds to a plane in $\mathbb{R}^{n+1}$.   

The plane containing a specified line $\ell = \verb|span|(W)$ and a point $\mathbf{x}$ not on $\ell$ is the 1 dimensional null space of $M = \begin{bmatrix} W \\ \mathbf{x}\end{bmatrix}$. Note that this null space is 1-dimensional unless $\mathbf{x} \in \ell$.

We can also represent a line by *Plücker line coordinates* of a line in $\mathbb{R}^3$. These are the six non-zero elements $\ell = \{\vec{d}, \vec{m}\}$ where $\vec{d} = \mathbf{y} - \mathbf{x}$ is the direction vector of the line and $\vec{m} = \mathbf{x} \times \mathbf{y}$ is the moment vector. $\vec{d}, \vec{m}$ uniquely determine $\ell$ upto a scale factor, and so $[\vec{d}, \vec{m}] \in \mathbb{P}^5$ parametrizes lines in $\mathbb{R}^3$. If $\ell$ and $\ell'$ are the lines joining $\mathbf{a}$ and $\mathbf{b}$ and $\mathbf{a}'$ and $\mathbf{b}'$ respectively. Then the lines will intersect if and only if the four points are coplanar, and a necessary and sufficient condition is that $\det [\mathbf{a},\mathbf{b}, \mathbf{a'}, \mathbf{b}'] = 0$. 

More generally, for the line through $\mathbf{a} = [a_0, a_1, a_2, a_3]$ and $\mathbf{b} = [b_0,b_1,b_2,b_3]$ in $\mathbb{P}^3$, we define the Plücker coordinates to be $[p_{01}, p_{02}, p_{03}, p_{23}, p_{31}, p_{12}] \in \mathbb{P}^5$ where $p_{ij} = a_ib_j - a_jb_i$. Note that the last three components form $(a_1,a_2, a_3) \times (b_1,b_2,b_3)$. 

A quadric $Q$ is a symmetric $4 \times 4$ matrix and corresponds to the points $x$ such that $x^TQx = 0$. If $\pi$ is a plane then $\pi \cap Q$ is a conic since we can write $\pi = \{Mx : x \in \mathbb{R}^3\}$ so that $\pi \cap Q$ = $\{y \in \mathbb{R}^3 : y^T (M^TQM) y = 0\}$. This is a conic.

## **Lecture 3: Circular points and Absolute conic**

### **Identifying the Line at Infinity**

The line at infinity $\ell_\infty = [0,0,1]$ in $\mathbb{P^2}$ is invariant under a homography $H$ if and only if $H$ is an affinity. Indeed if $$H = \begin{bmatrix} A & \mathbf{t} \\ \mathbf{0}^T & 1\end{bmatrix}$$ then $$H^{-1} = \begin{bmatrix} A^{-1} & -A^{-1} \mathbf{t} \\ \mathbf{0}^T & 1\end{bmatrix}$$ So $$\ell_\infty' = H_{A}^{-T} \ell_\infty = \begin{bmatrix} A^{-T} & \mathbf{0} \\ -\mathbf{t}^TA^{-T} & 1\end{bmatrix}\begin{bmatrix} 0 \\ 0 \\1 \end{bmatrix} = \begin{bmatrix}0 \\ 0 \\ 1\end{bmatrix} = \ell_\infty$$ Conversely, for $H$ to preserve $[0,0,1]$, $H^{-1}$ should have $[0,0,1]$ on the bottom row and so $H$ must as well. Note that $\ell_\infty$ is not preserved pointwise. We will see that identifying the line at infinity $\ell_\infty$ allows recovery of affine properties (parallelism, ratio of lengths),

Given a homography of $\mathbb{P}^2$, the line of infinity is mapped to some line in the image plane. We can identify this line by finding the images of two points at infinity (by identifying the point of intersection of the images of corresponding parallel lines) and connecting them. We can perform an affine rectification that maps the image of the line at infinity back to the line at infinity. The resulting image will be an affine transformation of the original image.

So given $\ell = [l_1, l_2, l_3]^T$ where $\ell_3 \neq 0$ we wish to find a homography $H_p$ such that $H_p^{-T}\ell = [0,0,1]$.  This is equivalent to $H_p^{T}[0,0,1]^T = \ell$. Thus,, if $\ell = [a,b,c]$ we can choose $$H_p = \begin{bmatrix}1 & 0 & 0 \\ 0 & 1 &0 \\ \ell_1 & \ell_2 & \ell_3 \end{bmatrix}$$ Now $H_pH = H_A$ where $H_A$ is affine, so $H_p = H_AH^{-1}$. This means that we can also choose any affine transformation of $H_p$ as our choice. Another way to write $H_p$ is $$H_p = H_A \begin{bmatrix}1 & 0 & -\frac{l_1}{l_3} \\ 0 & 1 & -\frac{l_2}{l_3} \\ 0 & 0 & \frac{1}{l_3} \end{bmatrix}^{-T}$$

So if we want to remove projection distortion from an image we should 

1. Identify the image of the line at infinity from the intersection of two sets of images of parallel lines.
2. Computer $H_p = H_A H^{-T}$ where $H$ is a candidate transformation that maps $\ell_\infty \to \ell$ defined as above.

### **Using Length Ratios to Identify Points at Infinity**

We can also use affine properties to determine points and the line at infinity.

 

 Suppose that we have three points $a, b, c$ on a line in 3D space and we identify their images $a', b', c'$. Suppose that we know the length ratio $\lvert ab\rvert : \lvert bc\rvert = \alpha : \beta$. Then using the length ratio in the image $\alpha' : \beta'$ we can identify a vanishing point.




 Write $a,b,c$ as $[0,1], [a,1],[a+b, 1] \in \mathbb{P}^1$. Similarly write $a',b',c'$ as $[0,1], [a',1], [a'+b',1]$. Then we can identify a projective transformation of $\mathbb{P}^1$ such that $a \to a', b \to b', c \to c'$. Then we can find the point at infinity by finding the image of the point at infinity $[1,0]$. Indeed, given three distinct points $P,Q,R$ on $\mathbb{P}^1$ there is a unique projective transformation $\mathbb{P}^1 \to \mathbb{P}^1$ that sends the standard frame $[0,1], [1,0], [1,1]$ to $P,Q,R$. This can be easily seen as follows: If $[0,1] \to cP$ and $[1,1] \to dQ$ we must have that the matrix is of the form $[cP, dQ]$ for unspecified $c,d$. If we want $[cP, dQ][1,1]^T = sR$ Then we need $cP + dQ = sR$. Since $P,Q$ are linearly independent in $\mathbb{R}^2$, for any choice of $s$ there is a unique choice of $c,d$.


### **Circular Points and the Dual Conic $C_\infty^*$** 

Under any similarity transformation, there are always two points on $\ell_\infty$ that are fixed: $$I = \begin{pmatrix} 1 \\ i \\ 0 \end{pmatrix} \text{        and         } J = \begin{pmatrix}1 \\ -i \\ 0 \end{pmatrix}$$ These points are called *circular points* and they are a pair of complex conjugate points at infinity. Indeed, $$H_s I = \begin{bmatrix}s \cos\theta & -s\sin \theta & t_x \\ s \sin \theta & s\cos\theta & t_y \\ 0 & 0 & 1 \end{bmatrix}\begin{pmatrix} 1 \\ i\\ 0\end{pmatrix} = se^{-i\theta}\begin{pmatrix}1 \\ i \\ 0 \end{pmatrix} $$ holds, and a similar relation holds for $J$.

The converse is also true, if the circular points are fixed then the linear transformation is a similarity transformation. Indeed, suppose that $H = [H_0 | H_1 | H_2]$ is a projectivity. Then $$H \begin{pmatrix} 1\\i \\ 0\end{pmatrix} = H_0 + iH_1 = \alpha \begin{pmatrix} 1 \\ i \\ 0\end{pmatrix}$$ Write $\alpha = se^{-i\theta}$. Then $$H_0 + iH_1 = \begin{pmatrix}s\cos\theta - i s \sin \theta \\ s \sin \theta + is \cos\theta \\ 0 \end{pmatrix} $$ Thus, $H_0$ and $H_1$ are completely determined, and this means that the bottom right entry of $H$ has to be non-zero as well. Thus, $H$ has to have the desired form.

These points are called *circular* because every circle intersects $\ell_\infty$ at these points. Indeed, if we write a circle in the form $(x - az)^2 + (y- bz^2) = r^2z^2$, we get $x^2 + y^2 = 0$ on setting $z =0$. 

The dual to the circular points is the conic $C_\infty^* = IJ^T + JI^T$ (we need both terms for $C_\infty^*$ to be symmetric). Thus $$C_\infty^* = \begin{bmatrix}1 & 0 & 0 \\ 0 & 1 & 0 \\ 0& 0 & 0 \end{bmatrix} $$ Note that $\ell^T C_\infty^* \ell = \ell^T I J^T \ell + \ell^TJ I^T \ell = 0 \iff I \in \ell \text{ or } J \in \ell$. This is a degenerate line conic.

The dual conic $C_\infty^*$ is also fixed under similarity transformations and it can be checked that $H_s C_\infty^* H_s^T = s C_\infty^*$.


Let us look at some properties of $C_\infty^*$ in any projective frame. 

1. $C_\infty^*$ has four degrees of freedom since a regular $3 \times 3$ homogeneous symmetric has five degrees of freedom and $C_\infty^*$ has the additional constraint that $\det C_\infty^*  = 0$.
2. $C_\infty^* \ell_\infty = 0$ since $I^T \ell_\infty = J^T\ell_\infty = 0$. This continues to hold in any frame since $HC_\infty^*H^T(H^{-T} \ell_\infty H^T) = 0$ 

### **Angles on $\mathbb{P}^2$**

Once the conic $C_\infty^*$ is identified on the projective plane, we may measure euclidean angles between lines $\ell$ and $m$ by $$\cos \theta = \frac{\ell^T C_\infty^* m}{\sqrt{\left(\ell^TC_\infty^*\ell\right)\left(m^T C_\infty^* m\right)}}$$ This measure is *invariant under projective transformations*.


First note that if $C_\infty^* = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \end{bmatrix}$ then $\ell^T C_\infty^* m = \ell_1 m_1 + \ell_2 m_2$ which reduces to the correct formula for the angle in the affine case where $\ell_3=m_3 = 1$.



 Now suppose we have a projective transformation $H$. Then $\ell$ transforms as $H^{-T}\ell H^T$, $C_\infty^*$ transforms as $HC_\infty^* H^T$ and $m$ transforms as $H^{-T} m H^T$. Then we have $$(H^{-T}\ell)^T HC_\infty^*H^T (H^{-T}m) = \ell^T C_\infty^* m$$ and similarly for the other terms. Thus, the term is independent under a projective transformation.

So we can identify the "true" angle between lines in an image if we have knowledge of the absolute conic. For example, if $\ell^T C_\infty^* m = 0$ then $\ell$ and $m$ are orthogonal.

### **Metric Rectification using $C_\infty^*$**

Once $C_\infty^*$ is identified, we can rectify projective distortions upto a similarity. Write $H = H_PH_AH_S$ from before, where $P$ is an elation, $A$ is an affinity and $S$ is a similarity. This is possible if the origin $[0,0,1]$ is not sent to a point at infinity. In all the real world cases of consideration, this is a reasonable assumption to make, but we may otherwise recenter the origin so this is possible. Then since $H_S C_\infty^*H_S^T = C_S$ we have that the image of the conic is $$C_\infty^{*'} = H_pH_A C_\infty^* H_A^T H_P^T = \begin{bmatrix} KK^T & KK^T\mathbf{v} \\ \mathbf{v}^TKK^T & \mathbf{v}^TKK^T\mathbf{v}\end{bmatrix}$$ So the image of $C_\infty^*$ gives the projective component $\mathbf{v}$ and affine $K$ components, but not the similarity component.

Since $K$ is upper triangular with determinant $1$, we may write $K$ in the form $K = \begin{bmatrix}a & b \\ 0 & \frac{1}{a} \end{bmatrix}$ Then $$KK^T= \begin{bmatrix}a^2 + b^2 & \frac{b}{a} \\ \frac{b}{a} & \frac{1}{a^2} \end{bmatrix}$$ Given these four values, we can determine $a$ and $b$ and hence $K$.

Since we know $\frac{1}{a^2}$, we will know $b^2$ from the top left entry. Thus, we only need to determine the signs of $a$ and $b$. The other entry will tell us if $a$ and $b$ have the same or opposite signs. Since projective transformations are only determined up to a scalar, we may ignore the sign issue. 

Given $K$, we can easily compute $\mathbf{v}$ since is $KK^T \mathbf{v} = \mathbf{w}$ then $\mathbf{v} = (KK^T)^{-1}\mathbf{w}$. This is possible since $K$ has determinant 1 and so $KK^T$ is invertible.

Given an image, we can first remove projective distortion by identifying the image of the line at infinity and ensuring it is sent to $[0,0,1]$. Now $C^{*'}_\infty = H_PH_A C_\infty^* H_A^T H_P^T$. Thus, $H_p^{-1}C_\infty^{*'}H_p^{-T} = H_AC_\infty^*H_A^T = C_{\infty}^{*''}$.

This is the conic we will observe after removing projective distortion.

We can compute $C_\infty^*{''}$ using two pairs of orthogonal lines. Suppose that $\ell'$, $m'$ correspond to orthogonal lines in the world plane. Then $$ \ell^{'T} C_\infty^{*''}m' = (\ell^T H_A^{-1})H_A C_\infty^* H_A^T (H_A^{-1T}m) = \ell^T C_\infty^* m = 0$$ Writing $H_A = \begin{bmatrix} K & \mathbf{0} \\ \mathbf{0}^T & 1 \end{bmatrix}$ we have $$\ell'^T\begin{bmatrix} KK^T & \mathbf{0} \\ \mathbf{0}^T & 0\end{bmatrix}m' = 0$$ Thus, a pair of orthogonal lines gives one linear constraint equation on the coefficients of $KK^T$ (and hence on $K$). Since $KK^T$ is symmetric, we have only 3 unknowns and so 2 pairs of orthogonal lines suffices.

We may also directly rectify an image by identifying $C_\infty^*$ on it. We will have $$C_\infty^{*'} = \begin{bmatrix} KK^T & KK^T\mathbf{v} \\ \mathbf{v}^TKK^T & \mathbf{v}^TKK^T\mathbf{v}\end{bmatrix}$$ To identify this matrix, we will use orthogonal relations of the form $\ell'^T C_\infty'^*m' = 0$. We require $5$ such constraints to identify the entries of the matrix.

### **The Plane at Infinity**

In $ \mathbb{P}^3$ the plane at infinity is $\pi_\infty = [0,0,0,1]$ and we say two planes are parallel if and only if their line of intersection is on $\pi_\infty$.

The plane at infinity $\pi_\infty$ is fixed under a projective transformation $H$ if and only if $H$ is an affinity. Indeed, $H^{-T}[0,0,0,1]^T = [0,0,0,x]$ means that the last row of $H$ should be of the form $[0,0,0,x]$ and hence $H$ has to be an affinity. In general, the planes fixed under a projectivity $H$ are the eigenvalues of $H^T$.

### **The Absolute Conic $\Omega_\infty$**

The absolute conic $\Omega_\infty$ is a (point) conic on $\pi_\infty$. In a metric frame $\pi_\infty = [0,0,0,1]$ we have $$\Big\{[X_1,X_2,X_3,0] : X_1^2 + X_2^2 + X_3^2 =  0\Big\}$$

 We can write the condition in the form $$\begin{pmatrix} X_1 & X_2 & X_3\end{pmatrix} \begin{bmatrix}1 & 0 & 0 \\ 0& 1 &0 \\ 0 & 0 &1 \end{bmatrix} \begin{pmatrix}X_1 \\ X_2 \\ X_3 \end{pmatrix}$$ so that $\Omega_\infty$ is a conic of purely imaginary points on $\pi_\infty$. This is a representation of the five degrees of freedom required to specify metric properties in an affine coordinate frame.
 

**Theorem**. A transformation fixes the absolute conic $\Omega_\infty$ if and only if it is a similarity. 

*Proof*. If we have a similarity $$H_A = \begin{bmatrix} A & \mathbf{t} \\ \mathbf{0}^T & 1 \end{bmatrix}$$ then for a point at infinity $P = [x,y,z,0]^T$ we have $H_AP = [A[x,y,z]^T, 0]^T$ so we may focus on the action of $A$ on $[x,y,z]$. Now this point transforms as $A[x,y,z]$ and so the point conic $\Omega_\infty$, now treated as a subset of the plane at infinity (a copy of $\mathbb{P}^2$), transforms as $A^{-T}IA^{-1} = s^2I$. This condition is equivalent to $A$ being a similarity. 

If we have a projectivity fixing $\Omega_\infty$, it must fix $\pi_\infty$. Indeed, suppose that $P_1, \ldots, P_3$ are (complex) points in $\Omega_\infty$ such that $P_1, \ldots, P_3$ are linearly independent (for example, take $[1,i,0], [1, -i,0]$, and $[1,0,i]$). Then $\pi_\infty \cdot P_j = 0$ for each $j$. Now $H P_j \in \Omega_\infty$ as well, so $\pi_\infty H P_j = 0$ for $j = 1,\ldots, n$. Now $\{HP_1, HP_2, HP_3\}$ are linearly independent in $\mathbb{R}^4$ and so there is only one plane in $\mathbb{P}^3$ containing these. This has to be $\pi_\infty$ as well as $H^{-T}\pi_\infty H^{-1}$, so these must be equal. Thus this transformation must be an affinity and the above argument applies. $\blacksquare$

Here are some properties of the absolute conic $\Omega_\infty$.

1. Any circle intersects $\Omega_\infty$ at two points.  If this circle lies in a plane $\pi$ then $\pi$ will intersect $\pi_\infty$ in a line and this line will intersect $\Omega_\infty$ in two points. These are the *circular points* of $\pi$. (Will have to understand what a circle is first).
2. All spheres intersect $\pi_\infty$ in $\Omega_\infty$.

We can define the angle between two lines not contained in the plane at infinity with direction vectors (3-vectors) $v$ and $w$ is given as before by $$\cos \theta = \frac{v^T \Omega_\infty w}{\sqrt{\left(v^T \Omega_\infty v\right)\left(w^T\Omega_\infty w\right)}}$$ Here $v$ and $w$ are the intersections of the lines with $\pi_\infty$. Indeed, a line is given by $\verb|row_space|\begin{bmatrix} w_1^T \\ w_2^T \end{bmatrix}$ where $w_1,w_2$ are linearly independent and $v$ is the point of intersection of the line with $\pi_\infty$.

The absolute conic can be used to recover the *camera intrinsics* (calibration) and the absolute conic and the plane at infinity can be used to remove affine distortion so that metric properties can be measured.

### **The Absolute (Dual) Quadric**

The planes tangent to the absolute conic form the *absolute quadric*.

Algebraically, $Q_\infty^*$ is represented by the $4 \times 4$ matrix of rank 3 $$Q_\infty^* = \begin{bmatrix} 1 & 0 & 0 &0 \\ 0 &1 & 0 & 0 \\ 0 & 0  & 1 & 0 \\ 0& 0 & 0 &  0\end{bmatrix}$$ Thus a plane $\pi = [a,b,c,d]$ is tangent to $\Omega_\infty$ if and only if $a^2 + b^2 + c^2 = 0$ which means $\Omega_\infty \subset \pi$.


The dual quadric $Q_\infty^*$ is a degenerate quadric and has 8 dof (rather than the usual 10 that a $4\times 4$ symmetric matrix has) since there is a scale factor and a zero determinant. The dual quadric is preserved by a projectivity if and only if the transformation is a similarity.

Also, we have $Q_\infty^* \pi_\infty = 0$ This clearly holds in the normal frame, and the property is preserved in any transformation.

Given two planes $\pi_1$ and $\pi_2$ we may calculate the angle between two planes by computing in a similar manner above using the absolute quadric.

## **Lecture 4: Robust Homography Estimation**



 Suppose that $\mathbf{x}_1$ and $\mathbf{x}_2$ are the projections of 3D points $\mathbf{X}$ lying on a plane $\bm{\pi}$ onto two cameras $C_1$ and $C_2$. Then these coordinates are related by a *homography*.



### **The Four Point Algorithm**

 First note that the coordinate of $\mathbf{X}$ with respect to the optical center of the second camera $C_2$ is of the form $R\mathbf{X} + \mathbf{T}$ for some $R \in SO(3)$ and $\mathbf{T} \in \mathbb{R}^3$. Let $\mathbf{n}$ be the unit normal to $\bm{\pi}$ so that $\mathbf{n}^T \mathbf{X} = d$ is constant. Then $\frac{1}{d}\mathbf{n}^T\mathbf{X} = 1$ for any point $\mathbf{X}$ on $\bm{\pi}$.



 So we may write $$\mathbf{X}_2 = R\mathbf{X} + \mathbf{T}\frac{1}{d}\mathbf{n}^T\mathbf{X} = \left(R + \frac{1}{d}\mathbf{T}\mathbf{n}^T\right)\mathbf{X}$$ Now $\lambda_1\mathbf{x}_1 = \mathbf{X}$ and $\lambda_2\mathbf{x}_2 = \mathbf{X}_2$. So, $$\lambda \mathbf{x}_2 = \underbrace{\left(R + \frac{1}{d}\mathbf{T}\mathbf{n}^T\right)}_{H}\mathbf{x}_1$$

Can this homography matrix be singular? Assume that $Hx = 0$. Then $H(\lambda x) = HX = 0$. This means that $X_2 = 0$ and the origin of the second camera lies on the plane being considered. It is reasonable assumption that we will largely consider planes where this is not the case.

If we have two views of a scene, we would like to recover the homography taking coordinates from one image to the other using point correspondences. We will see the four point correspondences will suffice and so this method is called the *four point algorithm*. Since points are measured inexactly, we usually use more points to obtain a least-squares solution.

We have $H\mathbf{x} = \mathbf{y}$ so $\mathbf{y} \times H\mathbf{x} = \widehat{y}H\mathbf{x} = 0$. We may express this equation as $Eh = 0$ where $E$ is a $3 \times 9$ matrix of rank 2. Unless we are in a degenerate situation, repeating this process three more time allows us to determine $H$.

In real image measurements we minimize $\lvert Ah \rvert$ subject to $\lvert h \rvert = 1$. This is the singular vector corresponding to the smallest singular value of $A$.

### **Degenerate Situations**

In the system $Ah = 0$ if the rank of $A$ drops below 8 then we have a *degenerate* situation. This occurs if three of the minimum four points are collinear as in this case one of these point gives no information.

Data Normalization is an essential step in this algorithm and is not optional. Once we have our correspondences $x_i \leftrightarrow x'_i$, we:

* Normalize the points $\tilde{x}_i = T_{\text{norm}} x_i$ and $\tilde{x}'_i = T_{\text{norm}}' x_i'$.
* Apply the DLT algorithm to $\tilde{x}_i \leftrightarrow \tilde{x}'_i$.
* Denormalize the solution $H = T_{\text{norm}}^{'-1}\tilde{H}T_{\text{norm}}$.
  
Here $$T_{\text{norm}} = \begin{bmatrix}s & 0 & -sc_x\\  0 & s & -sc_y \\ 0 & 0 &1\end{bmatrix}$$ where $c$ is the centroid and $s = \frac{\sqrt{2}}{\bar{d}}$ where $\bar{d}$ is the average distance to the centroid.

### **Non-Linear Minimization**



The DLT algorithm minimizes the *residual error* $\lvert\lvert Ah\rvert\rvert$. Each correspondence $x_i \leftrightarrow x_i'$ contributes a partial error vector $\epsilon_i$ and the total error is $$\lvert\lvert\epsilon \rvert\rvert^2 = \sum_i \lvert\lvert \epsilon_i \rvert\rvert^2$$ This has no correspondence with actual properties of the image and so we may instead prefer more *geometric* distance functions.

The *transfer error* is given by $d(x_i', Hx_i)^2$ for a point correspondence $x_i' \leftrightarrow x_i$ and measures  the euclidean distance between the measured point $x_i'$ and the corresponding point $Hx_i$. This error is minimized over the estimated homography $H$. The *symmetric* transfer error considers both the forward and backward transformation $d(x_i, H^{-1}x_i')^2 + d(Hx_i, x_i')^2$.

An alternative method of quantifying error in each of the two images involves estimating a "correction" for each correspondence. One asks how much it is necessary to correct the measurements in each of the two images in order to obtain a perfectly matched set of image points. So we are seeking a homography $H$ and pairs of *perfectly* matched points $\widehat{x}_i \leftrightarrow \widehat{x}'_i$ that minimizes the total error $$\sum_{i=1}^{N} \left(d(x_i, \widehat{x}_i)^2 + d(\widehat{x}_i', x'_i)^2\right)$$ subject to $\widehat{x}_i' = H\widehat{x}_i$ for each $i$. Minimizing this cost function involves determining both $H$ and a set of subsidiary correspondences $\{\widehat{x}_i\}$ and $\{\widehat{x}_i'\}$.

Although this method leads to a more accurate solution, it makes the solution more computationally complex. So we prefer the *Sampson error* that gives a close approximation to the re-projection error with significantly reduced complexity. We will require some steps to describe the Sampson error.

Let $C_H(X) = 0$ denote the cost function $Ah = 0$ that is satisfied by the point $X = (x,y,x',y')^T$ for a given homography $H$. We further denote $\hat{X}$ as the desired point so that $C_H(\hat{X}) = 0$ where $\delta_X = \hat{X} - X$. We now use the Taylor expansion $$C_H(X + \delta_X) \approx C_H(X) + \frac{\partial C_H}{\partial X}\delta_X = 0$$

For a given homography $H$, any point correspondence $X = (x,y,x',y')^T$ lying in $\mathcal{V}_H$ will satisfy the equation $Ah = 0$. This can be reformulated and written as $C_H(X) = 0$ where $C_H(X)$ is a 2-vector. The components of $C_H$ are quadratic polynomials, so we approximate $C_H$ with $C_H(\widehat{X}) \approx C_H(X) + J_X (\widehat{X}-X)$. We will choose point correspondences $\widehat{X}$ so that $C_H(\widehat{X}) = 0$, which means $J_X (\widehat{X} - X) = -\epsilon$ where $\epsilon = C_H(X)$ is fixed (here we are attempting to define a loss for $H$, and so $H$ is fixed).



Our goal is to find the closest point in $\mathcal{V}_H$ to $X$, and we may approximate that problem with the easier constrained optimization problem of minimizing $\lvert \delta_X \rvert$ subject to $J_X \delta_X = -\epsilon$ (this $\widehat{X}$ will simultaneously be close to $X$ and be a point correspondence, up to a first order). 



We can solve this problem using Lagrange Multipliers: we want to optimize $\delta_X^T\delta_X - 2\lambda^T\left(J \delta_X + \epsilon\right)$. Taking derivatives with respect to $\delta_X$, we get $2 \delta_x^T - 2\lambda^T J =0$ and so $\delta_X^T = J^T \lambda$. We already had $J\delta_X + \epsilon = 0$ and so $JJ^T \lambda = -\epsilon$. This can easily be solved as $\lambda= - (JJ^T)^{-1}\epsilon$ and so $\delta_X = - J^T(JJ^T)^{-1}\epsilon$. Thus, the final Sampson error is given by $$\lvert \delta_X \rvert^2 = \delta_X^T\delta_X = \epsilon^T (JJ^T)^{-1}\epsilon$$



The above calculations were for a single point pair, but these remarks easily extend to the case of many point correspondences by summing the individual errors $\sum_{i} \epsilon_i^T (J_iJ_i^T)^{-1}\epsilon_i$. To estimate $H$, we minimize this expression over all values of $H$. This is a simple minimization problem in which the set of variable parameters consists only of the entries of $H$, unlike the minimization problem involved in the re-projection error case.

### **Iterative Minimization**



The Geometric and Sampson errors are usually minimized as the squared Mahalanobis distance $$\lvert X - f(P) \rvert^2_\Sigma = (X - f(P))^T \Sigma^{-1} (X - f(P))$$ where $X$ is a measurement vector with covariance matrix $\Sigma$, $P$ is a set of parameters to be optimized, and $f$ is a mapping function.

This is an unconstrained continuous optimization problem that can be solved with solvers like Gauss-Newton and Levenberg-Marquardt.

In the case of Sampson error $X$ is a 4-vector made of the inhomogeneous coordinates $x_i, x_i'$ and  the set of parameters to be optimized is the variables of $h$. Here we have $X-f(h) = \delta_X$.

### **Random Sample Consensus (RANSAC)**

Up to now we only assumed measurement noise in our correspondences, but we may also have outliers that can severely disturb our optimization. For example a patch of road at the left of the scene might be matched with a different patch at the other end of the scene that looks similar. We will have to remove outliers so our algorithms become more *robust*.

RANSAC aims to resolve this issue with the following general algorithm:

1. Randomly select a minimal subset of points. For example, 2 points when fitting a line to data, or 4 points when fitting a homography from two images.
2. Hypothesize a model with the chosen subset.
3. Compute the error function for each point in relation to the model and determine the number of *inliers* corresponding to a given threshold.
4. Repeat this process $N$ times and choose the best model with the largest consensus set $S$.

To determine the number of samples $N$ we can use the formula $$1 - p = (1 - w^s)^N$$ where $p$ is the (desired) probability that at least one random sample consists of only inliers, $w$ is the probability that any selected point is an inlier, $s$ is the size of the sample set, and $N$ is the number of trials. Thus we have $$N = \frac{\log \big(1-p\big)}{\log \big(1 - w^s\big)}$$ This formula demonstrates that we should set the sample size $s$ to be as small as possible to formulate a model.

Often $w$ is unknown so we may choose $w = 0.5$ to assume the worst case, or we may decide $w$ adaptively. The *Adaptive* RANSAC algorithm is as follows. 

```python
N = inf, sample_count = 0
p = 0.99
best_inliers = -1
best_sample = None
while N > sample_count:
    sample = choose_sample()
    num_inliers = num_inliers(sample)
    if num_inliers > best_inliers:
        best_inliers = num_inliers
        best_sample = sample
    w = num_inliers/num_points
    N = log(1 - p)/log(1 - w**s)
    sample_count += 1

```

So the robust 2D homography computation algorithm is as follows:

1. **Interest points**. Compute the keypoints in each image using descriptors like SIFT or SURF.
2. **Putative correspondences**. Match keypoints using descriptors.
3. **RANSAC robust estimation**. For N trials, determined adaptively:
   1. Select a random sample of 4 correspondences and compute the homography H.
   2. Calculate the distance $d$ for each putative correspondence.
   3. Calculate the number of inliers consistent with $H$.  
4. **Optimal estimation**. Re-estimate $H$ from all correspondences classified as inliers.

## **Lecture 5: Camera Models and Calibration**

### **The Camera Matrices**

The matrix $K = \begin{bmatrix} fm_x & s & o_x \\ 0 & fm_y & o_y \\ 0 & 0 & 1\end{bmatrix}$ is called the *intrinsic parameter matrix* or the *camera calibration matrix*, while $[R|T]$ is called the *extrinsic parameter matrix*. The parameters represent the following:
* *f* represents the focal length.
* $m_x$ and $m_y$ represent the scale factor of pixel lengths in the x and y directions, and $m_y/m_x$ represents the aspect ratio.
* *s* represents the skew-factor in case we have non rectangular pixels
* $o_x,o_y$ represents the center coordinate in the image.
  


 We may write $[R|t]$ as $[R|t] = R[I|R^{-1}t]$. In this case, $R^{-1}t = -O$, the camera center. Also it is clear that $P = KR[I|R^{-1}t]$ has rank 3 with null space precisely the camera center $O$. Indeed, if $P = [x,y,z,1]$ is any point, the image of $\lambda P + (1-\lambda) O$ is independent of $\lambda$, and this can only happen if $PO = \vec{0}$. 

Given a general $3 \times 4$ matrix $P$ of rank $4$ with the first three columns linearly independent, we have $P = [M | \mathbf{v}] = M[I|M^{-1}v]$. Further, since $M$ is invertible, we can write $M = KR$ where $K$ is upper triangular and $R$ is orthogonal (write the QR-factorization of $M^{-1}$ as $M^{-1} = R^{-1}K^{-1}$) and so we have $P = KR[I|M^{-v}v]$, the decomposition as above. This is unique up to multiplication by a matrix of the form $\verb|diag|(\pm 1, \pm 1, \pm 1)$. Since we require the entries of $K$ to be positive, the decomposition is unique. Thus we may uniquely determine the camera parameters from the general finite camera matrix.

If the first three columns are not linearly independent, then our camera is a *camera at infinity* with camera center $[d,0]^T$ where $d$ is in the null space of the first three vectors. In this case, we may determine the camera center by computing the null space of $P$.

### **The Camera Anatomy**

If we write $P = [p_1 | p_2 | p_3 | p_4]$, then $p_1,p_2$ and $p_3$ are the *vanishing points of the world coordinates* X,Y, and Z, respectively. The column $p_4$ is the image of the world origin. 




The rows of the projective camera matrix are 4-vectors which may be interpreted geometrically as particular world planes. Write $p^1, p^2, p^3$ for the three rows of $P$.

1. **The Principal plane**. The principal plane is the plane through the camera center parallel to the image plane. This consists of the set of points $X$ imaged on the line at infinity of the image (i.e $PX = [x,y,0]^T$). So we have that a point $X$ lies in the principal plane if and only if $p^3 \cdot X = 0$, and so $p^3$ is the (normal to) the principal plane.
2.  **The Axis planes**. Similarly, $p^1$ is defined by the camera center $O$ and the line $x = 0$ in the image, while $p^2$ is defined by the line $y =0$ in the image. This is easy to see because $p^1 \cdot X = 0$ if and only if the $x$ coordinate of the image is zero.

The **principal point** is the intersection of the image plane and the *principal/optical axis*, the line passing through the camera center $C$ with direction vector $p^3$ (with the last coordinate dropped).

The point $\widehat{p}^3 = [p_{31}, p_{32}, p_{33}, 0]^T$ projects onto the image as the principal point $x_0 = P\widehat{p}^3$. Write $P = [M|p_4]$. Then $x_0 = Mm^3$, where $m^3$ is the third row of $M$.

### **Action of the Projective Camera on Points**



The points $D = (d,0)^T$ on the plane at infinity represent vanishing points and are mapped to $Md$ in the image plane.

Suppose we are given an image with an observed point $x$. We cannot recover the point completely, and the pre-image of $x$ is the line $X(\lambda) = P^+x + \lambda C$ where $C$ is the camera center and $P^+ = P^T (PP^T)^{-1}$ is the pseudo-inverse of $P$. For a finite camera, we can write this in a simpler way without needing to compute pseudo-inverses: $$X(\mu) = \mu \begin{pmatrix} M^{-1}x \\ 0 \end{pmatrix} + \begin{pmatrix} -M^{-1}p_4 \\ 1 \end{pmatrix} = \begin{pmatrix} M^{-1}(\mu x - p_4) \\ 1\end{pmatrix}$$

Suppose that $X = [X,Y,Z,T]$ is a 3D point and $P = [M|p_4]$ is a finite camera. Suppose that $PX = [xw, yw, w]$ where $w \neq 0$. Then the depth of $X$ is $$\verb|depth|(x; P) = \frac{\verb|sign|(\det M) w}{T \lvert\lvert m^3\rvert\rvert}$$

### **Affine Cameras**

The camera matrix of an affine camera has the form $$P_A = \begin{bmatrix} M_{2 \times 3} &  \mathbf{t} \\ \mathbf{0}^T & 1\end{bmatrix}$$ These are cameras with camera center lying on the plane at infinity. For such cameras, the plane at infinity $p^3 = [0,0,0,1]^T$ is the principal plane.

An affine camera matrix can be decomposed in the form $$P_A = \begin{bmatrix} 3 \times 3  \text{  affine}\end{bmatrix} \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 4 \times 4 \text{  affine}\end{bmatrix} =\begin{bmatrix} K_{2\times 2} & \mathbf{x}_0 \\ \mathbf{0}^T & 1\end{bmatrix}\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}\begin{bmatrix}R &\mathbf{t} \\ \mathbf{0}^T & 1 \end{bmatrix}$$ where $\mathbf{x}_0 = (p_x,p_y)$ is the principal point (conventionally set to 0). We may also write this in the form $$ P_A = \begin{bmatrix} \alpha_x & s &  \\  & \alpha_y & \\ & & 1  \end{bmatrix} \begin{bmatrix} \mathbf{r}^{1T} & t_1 \\ \mathbf{r}^{2T} & t_2 \\ \mathbf{0}^T & 1 \end{bmatrix}$$ This matrix has eight degrees of freedom.

In an affine camera, the plane at infinity is mapped to points at infinity and so parallel world lines are projected to parallel image lines.

There are several different types of affine cameras:

1. **Orthographic projection**. Here there is no change in scale, and so the camera calibration is the identity matrix. The optical center is located at infinity and ignores depth altogether. In this case, the camera matrix has the form $$P_A = \begin{bmatrix} \mathbf{r}^{1T} & t_1 \\ \mathbf{r}^{2T} & t_2 \\ \mathbf{0}^T & 1 \end{bmatrix}$$
2. **Scaled orthographic projection**. Here a point in 3D space is first projected onto a reference plane by orthographic projection followed by perspective projection. In this case $$ P_A = \begin{bmatrix} k &  &  \\  & k & \\ & & 1  \end{bmatrix} \begin{bmatrix} \mathbf{r}^{1T} & t_1 \\ \mathbf{r}^{2T} & t_2 \\ \mathbf{0}^T & 1 \end{bmatrix}$$ Here there are six degrees of freedom.
3. **Weak perspective projection**. This is similar to scaled orthogonal projection except there may be different factors of scaling in the two directions.

### **Calibration of a Projective Camera**

A camera projection matrix $P$ has 11 degrees of freedom and the easiest method of identifying the parameters is to use a checkerboard pattern.

The world coordinate system is centered at the corner of the checkerboard so that all the points lie on the $x-y$ plane. Now we have the equation $\lambda x = P X$ which we may write as $$ \lambda \begin{bmatrix} x \\ y\\ 1\end{bmatrix} = K[R|t] \begin{bmatrix} X\\ Y \\ 0 \\ 1 \end{bmatrix}$$ where $R = [r_1|r_2|r_3]$. 

Since the third coordinate is zero, we may ignore the corresponding rows and columns and obtain a homography $$\lambda \begin{bmatrix}x\\ y\\1 \end {bmatrix} = \underbrace{K \begin{bmatrix}r_1 & r_2 & t\end{bmatrix}}_{\lambda \cdot H} \begin{bmatrix} X \\ Y \\ 1 \end{bmatrix}$$ Write $\lambda H = K\begin{bmatrix} r_1 & r_2 & t\end{bmatrix}$. 

We have $\lambda [h_1 | h_2 | h_3 ] = K[r_1 | r_2 | t]$ and so $r_1 = \lambda K^{-1}h_1$ and $r_2 = \lambda K^{-1} h_2$. Since we also have the orthonormality condition, we have $h_1^T K^{-T}K^{-1}h_2 = 0$ and $h_1^{T} K^{-T}K^{-1}h_1 = h_2^T K^{-T} K^{-1}h_2$

Thus we have obtained equations independent of camera extrinsics. Now $B = K^{-T}K^{-1}$ is a symmetric positive definite $3\times 3$ matrix. If we identify $B$, we may recover $K^{-1}$ and hence $K$ from it. Indeed, $K^{-T}$ is lower triangular and $K^{-1}$ is upper triangular, so we may use the *Cholesky decomposition* to compute these values. The Cholesky decomposition is unique upto signs of the columns, but since we require the diagonal to be positive in $K$, we can uniquely retrieve $K$ from $B$.

So we have $h_1^TBh_2 = 0$ and $h_1^TBh_1^T= h_2^T B h_2^T$. Since $B$ is symmetric, we get two linear constraints on the 6 variables forming $B$ which we write as $Ab = 0$. We will need a minimum of three views to solve for the 6 unknowns in B. To account for noise, we can compute the SVD of A and pick the vector corresponding to the smallest singular value for a least squares approximation for $b$.

 *I don't see why this guarantees that the approximate $B$ will be positive definite. I also don't see how we can identify the homography $H$ required for each view since the $\lambda$ value depends on each point. Maybe focusing on views with constant depth $\lambda$ would suffice, but most examples seem to involve pictures with sufficient perspective distortion*.

EDIT: The second issue is important and can be resolved, see the videos on DLT and Zhang's method by Cyrill Stachniss. Basically we write $[x,y,u]^T = H[X,Y,1]^T$ and so $h_3^T[X,Y,1]^T = u$. We want to work with $[x/u, y/u, 1]$, which are the coordinates we observe, so we just write $x_o = h_1^T X/h_3^T X$ (where $x_0$ is the observed coordinate in the image) and so $h_1^T X - x_0h_3^T X = 0$. So we end up getting two equations without the $\lambda$ ambiguity from above.

### **Lens Distortion**

We model lens distortion with *radial distortion* $x_r = (1 + \kappa_1 r^2 + \kappa_2 r^4 + \kappa_5 r^6) x$ and tangential distortion $(x,y) \to (2\kappa_3 xy + \kappa_4 (r^2 + 2x^2), \kappa_3 (r^2 + 2y^2) + 2\kappa_4 xy)$ where $r = \lvert x \rvert^2$ and $\kappa_i$ are parameters to be determined. 

Once we have estimated the extrinsic and intrinsic parameters as above we estimate the lens distortion parameters by minimizing the total reprojection error $$\arg\min_{K,R,t, \kappa} \sum_{i=1}^{n} \sum_{j=1}^{m} \lvert\lvert x_{ij} - \pi(K, R_i, t_i, \kappa, X_j)\rvert\rvert^2$$ using Levenberg-Marquardt. Here 

* $n$ is the number of views
* $m$ is the number of 3D points
* $X_j$ is the $j^{\text{th}}$ 3D point and $x_{ij}$ is the 2D image point from the $i^{\text{th}}$ view corresponding to $X_j$
* $K$ is the camera intrinsic matrix and $(R_i,t_i)$ are the extrinsic parameters of the $i^{\text{th}}$ view.
* $\kappa$ is the lens distortion parameter, and
* $\pi$ is the projection function including lens distortion.



## **Lecture 6: Single View Metrology**

A line in 3_space projects to a line in the image. The line and the camera define a plane and the line in the image is the intersection of this plane with the image plane. The back projection of this line is the same plane. Indeed, if $\ell$ is the line in the image, we have $X^T P^T \ell = 0$ and so any $X$ such that $X \in P^T\ell$ is in the preimage.

Under a camera $P$ a conic $C$ back projects to the cone $Q_{co} = P^TCP$, This is a degenerate cone with the camera center in the null space. Indeed a point $x \in C$ iff $x^T C x = 0$ and so for any $X$ we have $X$ maps onto $C$ iff $X^T P^TCP X = 0$. 

The image outline of a smooth surface $S$ results from surface points at which the imaging rays are tangent to the surface/ Similarly, the lines tangent to the outline back-project to planes which are tangent to the surface.

The contour generator $\Gamma$ is the set of points $X$ on $S$ at which rays  are tangent to the surface. The corresponding image *apparent contour* $\gamma$ is the set of points $x$ which are the image of $X$, i.e $\gamma$ is the image of $\Gamma$. the apparent contour is also called the "outline" or "profile".

The contour generator depends only on the relative position of the camera center and surface and not on the image plane.

Under a camera matrix $P$ the outline of the quadric $Q$ is the conic given by $C^* = PQ^*P^T$. Indeed, the lines $\ell$ tangent to the conic satisfy $\ell^T C^* \ell = 0$ and these lines back-project to planes $P^T\ell$ that are tangent to the quadric and hence satisy $\pi^T Q^* \pi = 0$. This means that  $\ell^T PQ^*P \ell = 0$.

Any two images $I,I'$ with the same camera center are related by a homography. Indeed, write $P = KR[I | v]$ and $P' = K'R'[I | v]$. Then $P'= K'R'(KR)^{-1}P$ and so we have $$x' = P'X = K'R' (KR)^{-1}(PX) = \underbrace{K'R'R^{-1}K^{-1}}_{H}x$$

Moving the image plane along the principal axis (increasing the focal length) corresponds to a simple magnification. In this case we will have $x' = K'K^{-1}x$ since $R'=R$. This is just magnification by $k = \frac{f'}{f}$.

If we have a pure rotation $R$ then we have $x' = KRK^{-1}x$ if we assume the original rotation is the identity matrix. This is called a *conjugate rotation* and has the same eigenvalues as the rotation matrix. Indeed, if $Rv = \lambda v$ we have $KRK^{-1}(Kv) = KRv = \lambda Kv$. The angle of rotation between views may be computed directly from the phase of the complex eigenvalues of $H$.