### Question 1

1. Capture a 10 sec video footage using a camera of your choice. The footage should be taken
with the camera in hand and you need to pan the camera slightly from left-right or right-left
during the 10 sec duration. Pick any image frame from the 10 sec video footage. Pick a region of
interest corresponding to an object in the image. Crop this region from the image. Then use this
cropped region to compare with randomly picked 10 images in the dataset of 10 sec video
frames, to see if there is a match for the object in the scenes from the 10 images. For
comparison use sum of squared differences (SSD) or normalized correlation.

In [5]:
import cv2
import numpy as np

# Coordinates for the object region to crop
x, y, w, h = 700, 700, 200, 200  

# Load the frame with the object of interest and the region to crop
object_frame = cv2.imread('IMG_2049.png')
object_region = object_frame[y:y+h, x:x+w]  

# Function to compute Sum of Squared Differences
def compute_ssd(image1, image2):
    return np.sum((image1.astype("float") - image2.astype("float")) ** 2)

# Function to compute Normalized Correlation
def compute_normalized_correlation(image1, image2):
    image1 = image1.astype("float")
    image2 = image2.astype("float")
    product = np.mean((image1 - np.mean(image1)) * (image2 - np.mean(image2)))
    stds = np.std(image1) * np.std(image2)
    if stds == 0:
        return 0
    else:
        return product / stds

# List of paths to the randomly picked 10 images in the dataset
video_frames = [
    'E:\\GSU\\Course Work\\Computer Vision\\Assignment_3\\Random Images\\IMG_2050.png',
    'E:\\GSU\\Course Work\\Computer Vision\\Assignment_3\\Random Images\\IMG_2052.png',
    'E:\\GSU\\Course Work\\Computer Vision\\Assignment_3\\Random Images\\IMG_2054.png',
    'E:\\GSU\\Course Work\\Computer Vision\\Assignment_3\\Random Images\\IMG_2055.png',
    'E:\\GSU\\Course Work\\Computer Vision\\Assignment_3\\Random Images\\IMG_2056.png',
    'E:\\GSU\\Course Work\\Computer Vision\\Assignment_3\\Random Images\\IMG_2057.png',
    'E:\\GSU\\Course Work\\Computer Vision\\Assignment_3\\Random Images\\IMG_2059.png',
    'E:\\GSU\\Course Work\\Computer Vision\\Assignment_3\\Random Images\\IMG_2061.png',
    'E:\\GSU\\Course Work\\Computer Vision\\Assignment_3\\Random Images\\IMG_2062.png',
    'E:\\GSU\\Course Work\\Computer Vision\\Assignment_3\\Random Images\\IMG_2063.png'
]

matches_ssd = []
matches_ncorr = []
for frame_path in video_frames:
    frame = cv2.imread(frame_path)
    frame_region = frame[y:y+h, x:x+w]  # Crop the same region coordinates
    ssd = compute_ssd(object_region, frame_region)
    ncorr = compute_normalized_correlation(object_region, frame_region)
    matches_ssd.append((frame_path, ssd))
    matches_ncorr.append((frame_path, ncorr))

# Find the best matches based on SSD and normalized correlation
best_match_ssd = min(matches_ssd, key=lambda x: x[1])
best_match_ncorr = max(matches_ncorr, key=lambda x: x[1])

# Extract the file names and scores from the tuples for clearer printing
best_ssd_filename = best_match_ssd[0].split('\\')[-1]  # Get the file name only
best_ssd_score = best_match_ssd[1]

best_ncorr_filename = best_match_ncorr[0].split('\\')[-1]  # Get the file name only
best_ncorr_score = best_match_ncorr[1]

# Print the formatted results
print(f"Best match based on SSD:\nFile: {best_ssd_filename}\nSSD Score: {best_ssd_score:.2f}\n")
print(f"Best match based on Normalized Correlation:\nFile: {best_ncorr_filename}\nCorrelation Score: {best_ncorr_score:.4f}\n")



Best match based on SSD:
File: IMG_2057.png
SSD Score: 633363746.00

Best match based on Normalized Correlation:
File: IMG_2055.png
Correlation Score: 0.1655



In [None]:
# Displaying the images
cv2.imshow("Reference Region", object_region)
cv2.imshow("Best Match SSD", cv2.imread(best_match_ssd[0]))
cv2.imshow("Best Match Normalized Correlation", cv2.imread(best_match_ncorr[0]))

cv2.waitKey(0)  # Wait for a key press to close the images
cv2.destroyAllWindows()  # Destroy all the created windows

The results from comparing a cropped region of interest (ROI) from one frame of a 10-second video footage using both the Sum of Squared Differences (SSD) and Normalized Correlation metrics offer valuable insights into the presence and visibility of the object across different frames of the video. 

#### 1. Best Match Based on SSD:
- **File**: IMG_2057.png
- **SSD Score**: 633363746.00

The SSD metric measures the similarity between two images or regions by summing the squared intensity differences between corresponding pixels. A lower SSD score indicates higher similarity. Here, the score of 633363746.00, while seemingly large, is the lowest among the comparisons made with the 10 randomly picked images, making IMG_2057.png the best match based on this metric. The high absolute value of the score could indicate significant differences in lighting, contrast, or slight variations in the object's position or orientation, but it is the relative difference that matters for identifying the best match.

#### 2. Best Match Based on Normalized Correlation:
- **File**: IMG_2055.png
- **Correlation Score**: 0.1655

Normalized Correlation measures the strength and direction of a linear relationship between corresponding pixels in two images. Values range from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no correlation. A score of 0.1655 suggests a weak positive correlation. Despite its weakness, this score was the highest among the images tested, making IMG_2055.png the best match based on this metric. This indicates some level of similarity in the pixel values' arrangement and intensity between the ROI and this frame, but it also highlights that the correlation is not particularly strong.



### Question 3

3. Fix a marker on a wall or a flat vertical surface. From a distance D, keeping the camera
stationed static (not handheld and mounted on a tripod or placed on a flat surface), capture an
image such that the marker is registered. Then translate the camera by T units along the axis
parallel to the ground (horizontal) and then capture another image, with the marker being
registered. Compute D using disparity based depth estimation in stereo-vision theory. (Note: you
can pick any value for D and T. Keep in mind that T cannot be large as the marker may get out
of view. Of course this depends on D)

In [None]:
import cv2
import numpy as np

# Camera parameters 
focal_length_mm = 5.1  
sensor_width_mm = 7.0  
T = 0.15  # Horizontal translation in meters

image1 = cv2.imread('Q_3_1.jpg', cv2.IMREAD_GRAYSCALE)
image2 = cv2.imread('Q_3_2.jpg', cv2.IMREAD_GRAYSCALE)

image1 = cv2.resize(image1, (1024, 768))  
image2 = cv2.resize(image2, (1024, 768))


if image1 is None or image2 is None:
    raise Exception("Images could not be loaded. Check the file paths.")

image_width_pixels = image1.shape[1]  # Image width in pixels

# Calculate the focal length in pixels
focal_length_pixels = (focal_length_mm / sensor_width_mm) * image_width_pixels

# Initialize SIFT detector
sift = cv2.SIFT_create()

# Detect keypoints and compute descriptors
kp1, des1 = sift.detectAndCompute(image1, None)
kp2, des2 = sift.detectAndCompute(image2, None)

# Create BFMatcher object
bf = cv2.BFMatcher(cv2.NORM_L2, crossCheck=True)

# Match descriptors
matches = bf.match(des1, des2)

# Sort them in the order of their distance
matches = sorted(matches, key=lambda x: x.distance)

# Collect all valid disparities
disparities = []
for match in matches:
    pt1 = kp1[match.queryIdx].pt
    pt2 = kp2[match.trainIdx].pt
    disparity = abs(pt1[0] - pt2[0])
    if disparity > 0:
        disparities.append(disparity)

# Calculate distance using the median disparity
if disparities:
    median_disparity = np.median(disparities)
    D = (focal_length_pixels * T) / median_disparity
    print(f"Estimated distance to the marker: {D:.2f} meters")
else:
    print("No valid matches were found or disparity was zero for all matches.")

# Display the first image with keypoints highlighted
img_keypoints = cv2.drawKeypoints(image1, [kp1[m.queryIdx] for m in matches], None, color=(255,0,0))
cv2.imshow("Key Points", img_keypoints)
cv2.waitKey(0)
cv2.destroyAllWindows()


Estimated distance to the marker: 0.39 meters


### Question 4

4. For the video (problem 1) you have taken, plot the optical flow vectors on each frame using
MATLAB’s optical flow codes. (i) treating every previous frame as a reference frame (ii) treating
every 11th frame as a reference frame (iii) treating every 31st frame as a reference frame.

Done in Matlab.

### Question 5

5. Run the feature-based matching object detection on the images from problem (1).
MATLAB (not mandatory for this problem) Tutorial for feature-based matching object
detection is available here:
https://www.mathworks.com/help/vision/ug/object-detection-in-a-cluttered-scene-using-point-feat
ure-matching.html

In [2]:
import cv2
import numpy as np

# Load the images: the object to detect and the scene to search
object_image = cv2.imread('Westin Building.jpg', cv2.IMREAD_GRAYSCALE)  # Load object image
scene_image = cv2.imread('Q5_Scene_1.png', cv2.IMREAD_GRAYSCALE)  # Load scene image

if object_image is None or scene_image is None:
    raise Exception("Could not load images")

# Initialize SIFT detector
sift = cv2.SIFT_create()

# Find keypoints and descriptors with SIFT
keypoints_obj, descriptors_obj = sift.detectAndCompute(object_image, None)
keypoints_scene, descriptors_scene = sift.detectAndCompute(scene_image, None)

# FLANN parameters and matcher setup
FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm=FLANN_INDEX_KDTREE, trees=5)
search_params = dict(checks=50)  # or pass an empty dictionary
flann = cv2.FlannBasedMatcher(index_params, search_params)

# Matching descriptor using KNN algorithm
matches = flann.knnMatch(descriptors_obj, descriptors_scene, k=2)

# Store all good matches as per Lowe's ratio test
good_matches = []
for m, n in matches:
    if m.distance < 0.75 * n.distance:
        good_matches.append(m)

# Draw top matches
img_matches = cv2.drawMatches(object_image, keypoints_obj, scene_image, keypoints_scene, good_matches, None, flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)

# Save the matches image
cv2.imwrite('Q_5_Good_Matches.jpg', img_matches)

# Compute Homography if enough matches are found
if len(good_matches) > 4:
    src_pts = np.float32([keypoints_obj[m.queryIdx].pt for m in good_matches]).reshape(-1,1,2)
    dst_pts = np.float32([keypoints_scene[m.trainIdx].pt for m in good_matches]).reshape(-1,1,2)

    H, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC, 5.0)
    matchesMask = mask.ravel().tolist()

    # Get the dimensions of the object image and map the corners to the scene
    h, w = object_image.shape
    pts = np.float32([[0, 0], [0, h-1], [w-1, h-1], [w-1, 0]]).reshape(-1,1,2)
    dst = cv2.perspectiveTransform(pts, H)

    # Draw bounding box in the scene image
    scene_image_polylines = cv2.polylines(scene_image, [np.int32(dst)], True, 255, 3, cv2.LINE_AA)
    
    # Save the detected object image
    cv2.imwrite('Q_5_Detected_Object.jpg', scene_image_polylines)
else:
    print("Not enough matches are found - {}/{}".format(len(good_matches), 4))


### Question 6

6. Refer to the Bag of Features example MATLAB source code provided in the classroom’s
classwork page. In your homework, pick an object category that would be commonly seen in any household (e.g. cutlery) and pick 5 object types (e.g. for cutlery pick spoon, fork, butter knife,
cutting knife, ladle). Present your performance evaluation.

Attemped in Matlab

### Question 7

7. Repeat the image capture experiment from problem (3), however, now also rotate (along the
ground plane) the camera 2 (right camera) towards camera 1 position, after translation by T.
Make sure the marker is within view. Note down the rotation angle. Run the tutorial provided for
uncalibrated stereo rectification in here:
https://www.mathworks.com/help/vision/ug/uncalibrated-stereo-image-rectification.html
(MATLAB is mandatory for this exercise). Exercise this tutorial for the image pairs you have
captured. You can make assumptions as necessary, however, justify them in your
answers/description. (Note: you can print out protractors from any online source and place your
cameras on that when running experiments:
http://www.ossmann.com/protractor/conventional-protractor.pdf).

Attempted in Matlab

### Question 8

8. Implement a real-time object tracker (two versions) that (i) uses a marker (e.g. QR code or
April tags), and (ii) does not use any marker and only relies on the object.

In [20]:
pip install pupil-apriltags


Collecting pupil-apriltags
  Downloading pupil_apriltags-1.0.4.post10-cp38-cp38-win_amd64.whl (2.1 MB)
Installing collected packages: pupil-apriltags
Successfully installed pupil-apriltags-1.0.4.post10
Note: you may need to restart the kernel to use updated packages.


#### Marker-Based Tracking Using QR code

In [3]:
import cv2

# Load the video captured with your phone
cap = cv2.VideoCapture('Q_8_Part_a.MOV')  # Replace with your video file path

# Create a QRCode detector object
qr_detector = cv2.QRCodeDetector()

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # Detect QR Code in the frame
    data, bbox, _ = qr_detector.detectAndDecode(frame)

    # Check if there is a QR Code in the image
    if bbox is not None and len(bbox) > 0:
        # Convert float coordinates to integers and reshape the bbox array
        bbox = bbox[0].astype(int)

        # Loop to draw the bounding box
        n = len(bbox)
        for j in range(n):
            pt1 = tuple(bbox[j % n])
            pt2 = tuple(bbox[(j + 1) % n])
            cv2.line(frame, pt1, pt2, (255, 0, 0), 3)

        # Position for the decoded text
        cv2.putText(frame, data, (bbox[0][0], bbox[0][1] - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
        print("QR Code data:", data)
    else:
        print("QR Code not detected")

    cv2.imshow("Frame", frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()


QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code data: 
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detecte

QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code not detected
QR Code data: 
QR Code not detecte

QR Code data: http://searchmobilecomputing.techtarget.com/definition/2D-barcode
QR Code data: http://searchmobilecomputing.techtarget.com/definition/2D-barcode
QR Code data: http://searchmobilecomputing.techtarget.com/definition/2D-barcode
QR Code data: http://searchmobilecomputing.techtarget.com/definition/2D-barcode
QR Code data: http://searchmobilecomputing.techtarget.com/definition/2D-barcode
QR Code data: http://searchmobilecomputing.techtarget.com/definition/2D-barcode
QR Code data: http://searchmobilecomputing.techtarget.com/definition/2D-barcode
QR Code data: http://searchmobilecomputing.techtarget.com/definition/2D-barcode
QR Code data: http://searchmobilecomputing.techtarget.com/definition/2D-barcode
QR Code data: http://searchmobilecomputing.techtarget.com/definition/2D-barcode
QR Code data: http://searchmobilecomputing.techtarget.com/definition/2D-barcode
QR Code data: http://searchmobilecomputing.techtarget.com/definition/2D-barcode
QR Code data: http://searchmobilecomputi

#### Markerless Tracking Using CSRT Tracker

In [1]:
import cv2

# Load the video captured with your phone
cap = cv2.VideoCapture('Q_8_Part_b.MOV')  # Replace with your video file path

# Read the first frame of the video
success, frame = cap.read()
if not success:
    print("Failed to read video")
    exit(1)

# Let user select the bounding box
bbox = cv2.selectROI('Tracking', frame, True)

# Initialize CSRT Tracker
tracker = cv2.TrackerCSRT_create()
tracker.init(frame, bbox)

while cap.isOpened():
    success, frame = cap.read()
    if not success:
        break

    # Update tracker and get new bounding box
    success, bbox = tracker.update(frame)

    # Draw bounding box
    if success:
        p1 = (int(bbox[0]), int(bbox[1]))
        p2 = (int(bbox[0] + bbox[2]), int(bbox[1] + bbox[3]))
        cv2.rectangle(frame, p1, p2, (255,0,0), 2, 1)
        # Print bounding box coordinates to console
        print(f"Tracking object at: x={bbox[0]}, y={bbox[1]}, width={bbox[2]}, height={bbox[3]}")
    else:
        cv2.putText(frame, "Tracking failure detected", (100, 80), cv2.FONT_HERSHEY_SIMPLEX, 0.75, (0,0,255), 2)
        print("Tracking failure detected")

    cv2.imshow("Tracking", frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()


Tracking object at: x=281, y=361, width=772, height=369
Tracking object at: x=266, y=361, width=787, height=376
Tracking object at: x=253, y=358, width=803, height=384
Tracking object at: x=259, y=374, width=772, height=369
Tracking object at: x=245, y=372, width=787, height=376
Tracking object at: x=239, y=375, width=787, height=376
Tracking object at: x=232, y=380, width=787, height=376
Tracking object at: x=219, y=381, width=803, height=384
Tracking object at: x=211, y=384, width=803, height=384
Tracking object at: x=203, y=388, width=803, height=384
Tracking object at: x=201, y=395, width=787, height=376
Tracking object at: x=195, y=397, width=787, height=376
Tracking object at: x=187, y=400, width=787, height=376
Tracking object at: x=180, y=403, width=787, height=376
Tracking object at: x=166, y=402, width=803, height=384
Tracking object at: x=159, y=403, width=803, height=384
Tracking object at: x=150, y=406, width=803, height=384
Tracking object at: x=137, y=406, width=819, hei

Tracking object at: x=-88, y=-258, width=1081, height=517
Tracking object at: x=-88, y=-253, width=1060, height=507
Tracking object at: x=-80, y=-253, width=1060, height=507
Tracking object at: x=-81, y=-240, width=1039, height=497
Tracking object at: x=-82, y=-237, width=1060, height=507
Tracking object at: x=-81, y=-239, width=1060, height=507
Tracking object at: x=-85, y=-230, width=1060, height=507
Tracking object at: x=-95, y=-234, width=1081, height=517
Tracking object at: x=-96, y=-231, width=1081, height=517
Tracking object at: x=-108, y=-239, width=1103, height=527
Tracking object at: x=-108, y=-227, width=1103, height=527
Tracking object at: x=-119, y=-230, width=1125, height=538
Tracking object at: x=-131, y=-213, width=1147, height=548
Tracking object at: x=-139, y=-206, width=1170, height=559
Tracking object at: x=-148, y=-211, width=1193, height=570
Tracking object at: x=-158, y=-223, width=1217, height=582
Tracking object at: x=-179, y=-221, width=1292, height=617
Tracki

Tracking object at: x=247, y=689, width=597, height=285
Tracking object at: x=248, y=686, width=597, height=285
Tracking object at: x=247, y=685, width=597, height=285
Tracking object at: x=248, y=683, width=597, height=285
Tracking object at: x=248, y=681, width=597, height=285
Tracking object at: x=249, y=679, width=597, height=285
Tracking object at: x=249, y=676, width=597, height=285
Tracking object at: x=249, y=674, width=597, height=285
Tracking object at: x=250, y=670, width=597, height=285
Tracking object at: x=252, y=668, width=597, height=285
Tracking object at: x=254, y=663, width=597, height=285
Tracking object at: x=255, y=661, width=597, height=285
Tracking object at: x=257, y=657, width=597, height=285
Tracking object at: x=258, y=654, width=597, height=285
Tracking object at: x=259, y=652, width=597, height=285
Tracking object at: x=261, y=648, width=597, height=285
Tracking object at: x=262, y=645, width=597, height=285
Tracking object at: x=264, y=643, width=597, hei

Tracking object at: x=191, y=469, width=742, height=355
Tracking object at: x=192, y=477, width=742, height=355
Tracking object at: x=193, y=487, width=742, height=355
Tracking object at: x=193, y=496, width=742, height=355
Tracking object at: x=200, y=510, width=727, height=348
Tracking object at: x=200, y=520, width=727, height=348
Tracking object at: x=200, y=529, width=727, height=348
Tracking object at: x=206, y=542, width=713, height=341
Tracking object at: x=206, y=553, width=713, height=341
Tracking object at: x=204, y=562, width=713, height=341
Tracking object at: x=203, y=571, width=713, height=341
Tracking object at: x=210, y=583, width=699, height=334
Tracking object at: x=209, y=592, width=699, height=334
Tracking object at: x=209, y=599, width=699, height=334
Tracking object at: x=208, y=606, width=699, height=334
Tracking object at: x=207, y=614, width=699, height=334
Tracking object at: x=206, y=621, width=699, height=334
Tracking object at: x=211, y=629, width=686, hei