# Assignment 1

Github repo for assignment: https://github.com/brentonjackson/csc-4980/tree/master/Assignment1

I'll be using Python for the assignments in this class, as opposed to Matlab.

## Part A: Fundamentals

Go over camera calibration toolbox and calibrate camera.

It may be worth mentioning that in the [DepthAI documentation](https://docs.luxonis.com/projects/hardware/en/latest/pages/guides/calibration.html), for the nonmodular cameras, they've already been calibrated before shipment so recalibration isn't needed. 

However, I've calibrated the camera by following directions at this link as a learning exercise: https://docs.opencv.org/4.x/dc/dbb/tutorial_py_calibration.html

Below is the code I used to do this, in Python (skipped in slideshow), and before and after images:

In [3]:
#!/usr/bin/env python3

# Camera calibration for the OAK-D Lite camera (or any camera)


import numpy as np
import cv2 as cv
import glob

# 1. get object points and image points
# termination criteria
criteria = (cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER, 30, 0.001)
# prepare object points, like (0,0,0), (1,0,0), (2,0,0) ....,(6,5,0)
objp = np.zeros((6*7, 3), np.float32)
objp[:, :2] = np.mgrid[0:7, 0:6].T.reshape(-1, 2)
# Arrays to store object points and image points from all the images.
objpoints = []  # 3d point in real world space
imgpoints = []  # 2d points in image plane.
images = glob.glob('../../opencv-samples/left*.jpg')
for fname in images:
    img = cv.imread(fname)
    gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
    # Find the chess board corners
    ret, corners = cv.findChessboardCorners(gray, (7, 6), None)
    # If found, add object points, image points (after refining them)
    if ret == True:
        objpoints.append(objp)
        corners2 = cv.cornerSubPix(gray, corners, (11, 11), (-1, -1), criteria)
        imgpoints.append(corners2)
        # Draw and display the corners
        cv.drawChessboardCorners(img, (7, 6), corners2, ret)
        cv.imshow('img', img)
        cv.waitKey(500)
cv.destroyAllWindows()


# 2. calibrate camera
ret, mtx, dist, rvecs, tvecs = cv.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, None)

# 3. undistort image
img = cv.imread('../../opencv-samples/left12.jpg')
h,  w = img.shape[:2]
newcameramtx, roi = cv.getOptimalNewCameraMatrix(mtx, dist, (w,h), 1, (w,h))

# undistort
dst = cv.undistort(img, mtx, dist, None, newcameramtx)
# crop the image
x, y, w, h = roi
dst = dst[y:y+h, x:x+w]
cv.imwrite('calibresult.png', dst)
cv.imshow('calibrate result', dst)
cv.waitKey(500)

-1

Distorted image:

![Distorted image](https://github.com/brentonjackson/csc-4980/blob/master/opencv-samples/left12.jpg?raw=true)

Image after calibration and undistortion:

![Undistorted image](Part-1/calibresult.png)

As you can see from the curved lines in the first image, there was a considerable amount of radial distortion present.

That was fixed in the latter image.

## Part B: Matlab/Python Prototyping 

Write a MATLAB/Python script to find the real world dimensions (e.g. diameter of a ball, side length of a cube) of an object using perspective projection equations. 

Validate using an experiment where you image an object using your camera from a specific distance (choose any distance but ensure you are able to measure it accurately) between the object and camera.

This assignment requires some background to understand before implementing in code.

## Perspective Projection

### Background

In this example, I use the pinhole camera model to understand perspective projection.

In the previous example, we learned that camera calibration required a few things:
- **Extrinsic parameters** of the camera, e.g. rotation and translation vectors, which translates a coordinate of a 3D point to a coordinate system
- **Intrinsic parameters** of the camera, e.g. focal length and optical centers (both given in the form of a camera matrix) - visit [this link](https://ksimek.github.io/2013/08/13/intrinsic/) for a great breakdown on the intrinsic params
- **Distortion coefficients**

We used images of a chess board to find the camera matrix values since we knew the relative positions of the square corners on the board. By doing this, we were able to find intrinsic parameters of the OAK-D Lite camera.

We can use those parameters (e.g. focal point and optical centers) to undistort any image taken with the camera.

As will be explained in a second, we can also use one of the parameters to help us find real world coordinates of our object from 2d image coordinates. That's the goal of using perspective projection equations.


### Use of Perspective Projection Equations

The general play-by-play of part B will be to:
1. Undistort our image
2. Find some desired dimension in our 2D image, like height or width
3. Use perspective projection equations to convert our desired dimension to 3D, real world values and units

We know how to do 1. We've already done it, so we can modify our camera calibration script from Part A to accept any image, undistort it, then write that new image to the disk for us.

For 2, we can go about it in two ways:
1. Calculate the 2D dimensions of the object(s) in our image using object outlines, bounding boxes, and calculated Euclidean distances of those bounding box sides
2. Allow the user to specify two points of interest and use those points to calculate the 2D dimensions of interest

I will opt for method 2, since it's very easy to test in the real-world.

For 3, this is where we actually make use of the equations which I will go over in the next section.

### Perspective Projection Equations

The perspective projection equations are equations that allow us to convert coordinates on the image plane to coordinates in the real world, and vice-versa. It uses the concept of similar triangles to essentially create a ratio. It's better explained with an image:

![Perspective Projection image](perspective_projection.png)

We can treat the middle plane as our camera and the image plane on the left as the computer screen our image is rendered on.

We want to convert the point P_i on the image plane to the point P_0 in the real world on the right. To do this, we see that the optical axis sets up similar triangles. 

From these equations, we can see that **a point in the real world depends on the ratio of the object distance from the camera to the focal length, times the corresponding coordinate on the image plane**. Since we know the 2D image coordinates and the focal length of the camera, which we got from our camera matrix, we only need to know the distance of the object from the camera.

We can give that distance to our program as a required parameter.

Below is the code that achieves this:

find_obj_dims.py

```python
#!/usr/bin/env python3

# Script to find real world dimensions of an object from image
# Author: Brenton Jackson
# Date: 11/25/22

# parameters needed (inputs):
# image
# viewport distance (real-world) from camera dist

# desired output:
# viewport dist corresponding to dist from 2d image


import argparse
import subprocess
from dist_between_pts import get_dims_2D
from calibrate_camera import newcameramtx as camera_matrix # import calculated intrinsic and extrinsic parameters of camera

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=False, help="Path to the image")
ap.add_argument("-d", "--distance", required=False, help="Distance of object from camera")
args = vars(ap.parse_args())

# 1. Grab image with object in it
# if no image argument specified, capture new image
if (not args["image"]) :
    subprocess.run(['python3', 'capture_img.py'])
imgName = args["image"] or "opencv_frame1.png"
dist = args["distance"] or 15 # distance from camera in inches
dist = int(dist)

# 2. Find height of object (in image) by allowing user to select two points on image
dist_image = get_dims_2D(imgName=imgName)


# 3. Use perspective projection equations to convert our desired dimension to 3D, real world values and units
# equation: real_dist = dist_image * dist_from_camera / focal_len
focal_len = (camera_matrix[0][0] + camera_matrix[1][1]) / 2 # avg of fx and fy
real_dist = dist_image * dist / focal_len

print("real world distance: ", real_dist, '(in dist units)')
```


This is the main script that follows our algorithm one-to-one.

I use some helper functions and abstract out some code to make it cleaner. I'll post the imported code to implement step 2 below:

dist_between_pts.py

```python
#!/usr/bin/env python3

# Click two points on an image and calculate Euclidean distance
# Author: Brenton Jackson
# Date: 11/27/22

from math import sqrt
import cv2


#define global variables for mouse callback function capture_pts
clickOne = False
clickTwo = False
imgPts = []
clone = []
image = []
circleRadius = 3
circleColor = (0, 255, 0) # green
lineThickness = 4
lineColor = (0, 0, 255) # red




def capture_pts(event, x, y, flags, param):
    """ 
    Callback function that allows user to click two points in image to calculate distance

    @param event The event that took place (left mouse button pressed, left mouse button released, mouse movement, etc)
    @param x The x-coordinate of the event
    @param y The y-coordinate of the event
    @param flags Any relevant flags passed by OpenCV
    @param param Any extra parameters supplied by OpenCV

    return No return value. Modifies global imgPts array
    """

    # grab references to the global variables
    global clickOne, clickTwo, imgPts, clone, image
 
    

	# if the left mouse button was clicked, record the 
	# (x, y) coordinates and indicate that first click is
	# being captured
    if event == cv2.EVENT_LBUTTONUP and clickOne == True:
        imgPts = [(x, y)]
        # draw circle around the region of interest
        cv2.circle(clone, (x, y), circleRadius, circleColor, -1)
        cv2.imshow("image", clone)

        k = cv2.waitKey(0)
        
        # reset points if r is pressed
        if k == ord("r"):
            refresh_image()

        if k%256 == 32:
            # SPACE pressed, break out
            clickOne = False
            clickTwo = True
            print("click second coordinate")
    
    # if the left mouse button was clicked, record the 
	# (x, y) coordinates of the second point and indicate 
    # that the operation is complete
    if event == cv2.EVENT_LBUTTONUP and clickTwo == True:
        # draw circle around the region of interest
        cv2.circle(clone, (x, y), circleRadius, circleColor, -1)
        cv2.imshow("image", clone)

        k = cv2.waitKey(0)
        
        # reset points if r is pressed
        if k == ord("r"):
            refresh_image()

        if k%256 == 32:
            # SPACE pressed, break out
            clickTwo = False
            imgPts.append((x, y))
            print(imgPts)
            print("captured both coordinates")

def refresh_image():
    """ 
    Function that resets the image so it's clear of points
    and resets the array of image points

    return No return value. Modifies global imgPts array,
    clickOne flag, clickTwo flag, and clone image variable
    """
    global imgPts, clickOne, clickTwo, clone, image
    clone = image.copy()
    imgPts = []
    clickOne = True
    clickTwo = False

def calculate_dist(points):
    """
    Function that takes two (x,y) coordinates and calculates
    the euclidean distance between them

    @param points An array containing two tuples
    """
    (x1, y1) = points[0]
    (x2, y2) = points[1]
    return round(sqrt((x1-x2)**2 + (y1-y2)**2))

def get_dims_2D(imgName):
    """
    Function to get allow user to get distance between
    two points on object in an image

    @return Distance between points in pixels

    @param imgName Name of desired image with object

    Instructions:

    Click a point on image, then press SPACE to confirm point.
    
    To refresh the image, press 'R'

    After confirming two points, press 'C' to continue and a
    line connecting the points will be drawn in addition to
    the distance being calculated.

    Press 'Q' to close image windows when done.
    
    """

    global imgPts, clickOne, clickTwo, clone, image

    # load the image, clone it, and setup the mouse callback function
    image = cv2.imread(imgName)
    clone = image.copy()
    cv2.namedWindow("image")
    cv2.setMouseCallback("image", capture_pts)
    
    # loop until we've confirmed our two points
    clickOne = True
    while True:
        # display the image and wait for a keypress
        cv2.imshow("image", clone)
        key = cv2.waitKey(1)
        
        # if the 'r' key is pressed, reset the points
        if key == ord("r"):
            refresh_image()
        # if the 'c' key is pressed, break from the loop
        elif key == ord("c"):
            break
        elif len(imgPts) >= 2:
            break    
    
    # if there are two reference points, then draw the line
    # between the points on the image clone
    dist = 0
    if len(imgPts) == 2:
        while True:
            cv2.line(clone, imgPts[0], imgPts[1], lineColor, lineThickness)
            cv2.imshow("Distance", clone)
            k = cv2.waitKey(1)
            if k == ord('q'):
                break
        dist = calculate_dist([imgPts[0], imgPts[1]])
    

    print("dist: ", dist, "px")
    return dist
```

## Implementation Details

There are a few things to note about my implementation.

To run the script, you're expected to add an image name (undistorted ideally) and a distance as arguments.

When you run it, an image window will popup. 
You're expected to click on two points, pressing SPACE bar to confirm the points.
After confirming, you will see the distance line segment between the points.


For my focal length, I used the average of the two focal length values given in our camera matrix.

**Actual real-world dimensions using this script has a margin of error due to:**
- Correctness of the distance provided
- How distorted or accurate the image is
- How accurately you placed your two points in the image
- The angle of the camera in relation to the object in the image


With that being said, I used the script on an image I took yesterday, in which I estimated the distance based on my usual sitting position and the position of my mug in the picture.
I measured my mug's height at ~5.3125". When I ran the script, it gave me a height of 5.38".

### Demo

![Link](./part2_demo.gif)


## Part C: Application Development

Setup your application to show a RGB stream from the mono camera and a depth map stream from the stereo camera simultaneously. 

Is it feasible? 

What is the maximum frame rate and resolution achievable?


Below is the code of the application I created to accomplish this task.

**It is feasible**, and not only is it feasible, so much more can be done in addition to showing the two streams simultaneously.

With depthai, you can show however many streams you want, as long as you're okay with the limitations that are inherent in that.

Since the entire API is centered around the concept of the device running a pipeline, you can add whatever nodes to the pipeline you want. With depth-ai, any camera is a node. There is a Node for the RGB stream, as well as Nodes for the depth map stream. Connect both those Nodes to output nodes and that allows users to view the data from them.

The trade-off there, however, is additional lag. That's to be expected with more computations. 

DepthAI has posted some performance benchmarks to go by as a [reference](https://docs.luxonis.com/projects/api/en/latest/tutorials/low-latency/).

I wasn't able to measure the framerates from the camera exactly, so the results may vary. However, in my application, I found that the **maximum resolution allowed for the mono cameras was 480P, and the maximum FPS was 30FPS**. It responded pretty smoothly.

For the Color camera, the **max resolution was 4K**, although that limited the framerate to 28.8FPS. **At 1080P, 30FPS was supported**. I noticed more latency here, but it was still acceptable.

Again, I didn't calculate the FPS myself. This was just the information that the API exposed to me via the getFPS() method on each of the cameras.

```python

#!/usr/bin/env python3

# Script to show RGB stream and depth map stream
# from camera simultaneously
# Author: Brenton Jackson
# Date: 11/28/22

import cv2
import depthai as dai
import numpy as np


res = dai.MonoCameraProperties.SensorResolution.THE_480_P
median = dai.StereoDepthProperties.MedianFilter.KERNEL_7x7

def getDisparityFrame(frame):
    maxDisp = stereo.initialConfig.getMaxDisparity()
    disp = (frame * (255.0 / maxDisp)).astype(np.uint8)
    disp = cv2.applyColorMap(disp, cv2.COLORMAP_JET)

    return disp

# Create pipeline with RGB and stereo depth
pipeline = dai.Pipeline()

# Define source and output for RGB
camRgb = pipeline.create(dai.node.ColorCamera)
xoutVideo = pipeline.create(dai.node.XLinkOut)
xoutVideo.setStreamName("video")

# Define source and outputs for stereo depth
camLeft = pipeline.create(dai.node.MonoCamera)
camRight = pipeline.create(dai.node.MonoCamera)
stereo = pipeline.create(dai.node.StereoDepth)
xoutLeft = pipeline.create(dai.node.XLinkOut)
xoutRight = pipeline.create(dai.node.XLinkOut)
xoutDisparity = pipeline.create(dai.node.XLinkOut)


# RGB Properties
camRgb.setBoardSocket(dai.CameraBoardSocket.RGB)
camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
camRgb.setVideoSize(1920, 1080)

# Stereo Depth Properties
camLeft.setBoardSocket(dai.CameraBoardSocket.LEFT)
camRight.setBoardSocket(dai.CameraBoardSocket.RIGHT)
for monoCam in (camLeft, camRight):
    monoCam.setResolution(res)

stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)
stereo.initialConfig.setMedianFilter(median)  # KERNEL_7x7 default
stereo.setRectifyEdgeFillColor(0)  # Black, to better see the cutout

xoutLeft.setStreamName("left")
xoutRight.setStreamName("right")
xoutDisparity.setStreamName("disparity")


# Linking
camRgb.video.link(xoutVideo.input)


camLeft.out.link(stereo.left)
camRight.out.link(stereo.right)
stereo.syncedLeft.link(xoutLeft.input)
stereo.syncedRight.link(xoutRight.input)
stereo.disparity.link(xoutDisparity.input)

# set stream names for depth streams
streams = ["left", "right", "disparity"]



# Connect to device and start pipeline
with dai.Device(pipeline) as device:
    print("RGB Framerate: ", camRgb.getFps())
    print("Depth Framerate: ", camLeft.getFps())
    # queue for RGB
    video = device.getOutputQueue(name="video", maxSize=1, blocking=False)
    
    # queues for stereo depth streams
    qList = [device.getOutputQueue(stream, 8, blocking=False) for stream in streams]
    
    while True:
        # show RGB video
        videoIn = video.get()
        # Get BGR frame from NV12 encoded video frame to show with opencv
        # Visualizing the frame on slower hosts might have overhead
        cv2.imshow("video", videoIn.getCvFrame())
        if cv2.waitKey(1) == ord('q'):
            break
        
        # show depth stream video along with left and right mono cameras
        for q in qList:
            name = q.getName()
            frame = q.get().getCvFrame()
            if name == "depth":
                frame = frame.astype(np.uint16)
            elif name == "disparity":
                frame = getDisparityFrame(frame)

            cv2.imshow(name, frame)
        if cv2.waitKey(1) == ord("q"):
            break


```