# Pixel alignment between RGB and depth camera

In [5]:
# import image module
from IPython.display import Image
  
# get the image
Image(url="./images/Pic1.png")

Let's assume that we have the information related to intrinsic parameters of both sensors and also the extrinsic parameters between these two sensors coordinate frame. From perspective pinhole camera projection, we have:

$ \lambda \begin{bmatrix}
u \\
v \\
1 
\end{bmatrix} = \begin{bmatrix}
s_x & s_{\theta} & o_x \\
0 & s_y & o_y \\
0 & 0 & 1 \\
\end{bmatrix} \begin{bmatrix}
f & 0 & 0 \\
0 & f & 0 \\
0 & 0 & 1 \\
\end{bmatrix} \begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
\end{bmatrix} \begin{bmatrix}
R & t \\
0 & 1 \\
\end{bmatrix} \begin{bmatrix}
X_0\\
Y_0 \\
Z_0 \\
1 \end{bmatrix}$

Which: 

$ f $ : focal length of camera in milimeters, 
$ o_x$: x-coordinate of principal point in pixels, 
$ o_y$: y-coordinate of principal point in pixels, 
$ s_x$ : scaling in x-direction to acquire unit length in horizontal pixels, 
$ s_y$ : scaling in y-direction to acquire unit length in horizontal pixels,
$ s_{\theta}$: skew factor ( often close to zero).

In the above equation the $\begin{bmatrix}
X_0\\
Y_0 \\
Z_0 \\
1 \end{bmatrix}$ shows the homogeneous coordinate of a 3D point in the world frame, which if we transfer it to cmaera fram via extrinsic parameters we have: 

$
X_c = \begin{bmatrix}
R & T \\
0 & 1 \\
\end{bmatrix} \begin{bmatrix}
X_0\\
Y_0 \\
Z_0 \\
1 \end{bmatrix}$

In [6]:
# import image module
from IPython.display import Image
  
# get the image
Image(url="./images/Pic2.png")

In our case we consider $s_{\theta} = 0$, also $fs_x = f_x$ and $fs_y = f_y$. Then we have:

$ \lambda \begin{bmatrix}
u_c \\
v_c \\
1 
\end{bmatrix} = \begin{bmatrix}
f_x & 0 & o_x \\
0 & f_y & o_y \\
0 & 0 & 1 \\
\end{bmatrix}_c \begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
\end{bmatrix}  X_c$

In this scenario we will take a pixel location from RGB image and use the projection model to acquire the corrseponding 3D ray in homogeneous coordinate in the RGB camera frame $ X_c $.


Without loosing the generality of equation we can consider $ \lambda = 1 $. 
On the other hand we assumed we have the transformation matrix between two sensors, so we have the below equation for transformation between a homogeneous point in RGB camera and the same point in depth camera frame:

$ 
X_c = T_{d2c}  X_d 
$



In [7]:
from IPython.display import Image
  
# get the image
Image(url="./images/Pic3.png")

Therefore, by substituting $X_c $ in the projection equation, we have:

$ \begin{bmatrix}
u_c \\
v_c \\
1 
\end{bmatrix} = \begin{bmatrix}
f_x & 0 & o_x \\
0 & f_y & o_y \\
0 & 0 & 1 \\
\end{bmatrix}_c \begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
\end{bmatrix}  T_{d2c}  X_d  $

If we use the projection equation for depth camera we have: 

$\lambda_{d} \begin{bmatrix}
u_d \\
v_d \\
1 
\end{bmatrix} = \begin{bmatrix}
f_x & 0 & o_x \\
0 & f_y & o_y \\
0 & 0 & 1 \\
\end{bmatrix}_d \begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
\end{bmatrix} X_d  $

=> $(\begin{bmatrix}
f_x & 0 & o_x \\
0 & f_y & o_y \\
0 & 0 & 1 \\
\end{bmatrix}_d \begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
\end{bmatrix})^{-1} \lambda_{d} \begin{bmatrix}
u_d \\
v_d \\
1 
\end{bmatrix} = 
 X_d  $

By substituting $ X_d $ we have: 

$ \begin{bmatrix}
u_c \\
v_c \\
1 
\end{bmatrix} = \begin{bmatrix}
f_x & 0 & o_x \\
0 & f_y & o_y \\
0 & 0 & 1 \\
\end{bmatrix}_c \begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
\end{bmatrix}  T_{d2c} (\begin{bmatrix}
f_x & 0 & o_x \\
0 & f_y & o_y \\
0 & 0 & 1 \\
\end{bmatrix}_d \begin{bmatrix}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
\end{bmatrix})^{-1} \lambda_{d} \begin{bmatrix}
u_d \\
v_d \\
1 
\end{bmatrix} $

The this way we can submit a relation between pixel location in RGB and find the corresponding one in depth image.