## Reading the input datasets

In [4]:
import json
import numpy as np
import cv2

jsonTrainFile = "Data/ship/transforms_train.json"

with open(jsonTrainFile, "r") as fp:
    jsonTrainData = json.load(fp)

print(f"Field of view in X direction - train: {jsonTrainData['camera_angle_x']}")
print(f"Number of frames - train: {len(jsonTrainData['frames'])}")

Field of view in X direction - train: 0.6911112070083618
Number of frames - train: 100


In [5]:
first_frame = jsonTrainData["frames"][0]

# transformation matrix and related image
transform_matix = np.array(first_frame["transform_matrix"])
file_name = first_frame["file_path"]

print(transform_matix)  # This the Camera-World transform matrix for the respective image.
print(file_name)  # Respective image file path.
print(f"Rotation {first_frame['rotation']}")

[[-4.65905853e-02  1.75296515e-01 -9.83412623e-01 -3.96426296e+00]
 [-9.98914063e-01 -8.17604642e-03  4.58675772e-02  1.84898108e-01]
 [-9.31322575e-10  9.84481692e-01  1.75487086e-01  7.07411051e-01]
 [ 0.00000000e+00  0.00000000e+00  0.00000000e+00  1.00000000e+00]]
./train/r_0
Rotation 0.012566370614359171


In [3]:
img = cv2.imread('Data/ship/train/r_0.png')
cv2.imshow('image', img)
cv2.waitKey(0)

-1

---
---


## Rays in Computer Graphics

    r(t) = O + t.d

Here r(t) represent the ray.

- O: Origin vector of the ray (starting point)
- d: Direction unit vector of the ray
- t: Parameter for the ray propagation (like time)



But to do the further processing in our code, we need to first find the corresponding camera coordinate frame x, y, z values to the image pixels. This is where the computer graphics knowledge comes to play.

<center><image src="./imgs/image_plane.png" width="500px"/></center>



We can apply basic trigonometry, to find a relationship between xi, yi values and Xc, Yc values in terms of the f value(we should know this. This is the focal length of the camera)


<center><image src="./imgs/camera_coords.png" width="100"/></center>

In above eqation we assume images start the origin from the top left corner, hence the ox, oy values. With this we can get the Camera Origin vector. But getting the ray origin direction vector is bit more complex.

Theory behind this is pretty simple and [This artical](https://pyimagesearch.com/2021/11/10/computer-graphics-and-deep-learning-with-nerf-using-tensorflow-and-keras-part-1/) have nice simple explanation about it.

But basically what we do is, for a point P in world coordinate system (in homogenius form vector) we need to convert it to camera coordinate frame (This vector can be used to generate the image pixels using the previouly mentioned formula, but in this case we start from there and go backward to get the world coordinates). To do that transformation we need to have `Camera Extrinsic Matrix` for the Camera coordinate system. (Inverse of this matrix is the one given with all the frames in above dataset, so we can directly convert camera coordinates to world coordinates.)

<center><image src="./imgs/coor_transform.png" width="500"/></center>

Here Xw means world coordinates. Xc means Camera world coordinates. Cex means the Camera Extrinsic parameter matrix. This matrix include the transformations required to make the world coordinates convert to camera coordinates. Using this formula we can calculate the world coordinate of a pixel, given its camera coordinate and using that world coordinate vector we can get the directional unit vector.

And also origin of ray is of cause, the translational vector of the world coordinate.

Lets look at basic example on above concepts using our training data.
