# Assignment 4

Github repo for assignment: https://github.com/brentonjackson/csc-4980/tree/master/Assignment4

I'll be using Python for the assignments in this class, as opposed to Matlab.

## Part I

Implement an application using the stereo camera where it will recognize, track and
estimate dimensions of an object within 3m distance and inside field-of-view to the
camera. 

Please see the application code below:

```python
#!/usr/bin/env python3

# Script to recognize, track, and estimate the 
# real world dimensions of an object within 
# 3m of distance from the camera
# Author: Brenton Jackson
# Date: 11/28/22


from math import ceil
from pathlib import Path
import cv2
import depthai as dai
import numpy as np
import time
import argparse

labelMap = ["background", "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow",
            "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]

nnPathDefault = str((Path(__file__).parent / Path('../depthai-python/examples/models/mobilenet-ssd_openvino_2021.4_5shave.blob')).resolve().absolute())
parser = argparse.ArgumentParser()
parser.add_argument('nnPath', nargs='?', help="Path to mobilenet detection network blob", default=nnPathDefault)
parser.add_argument('-ff', '--full_frame', action="store_true", help="Perform tracking on full RGB frame", default=False)

args = parser.parse_args()

fullFrameTracking = args.full_frame

# Create pipeline
pipeline = dai.Pipeline()

# Define sources and outputs
camRgb = pipeline.create(dai.node.ColorCamera)
spatialDetectionNetwork = pipeline.create(dai.node.MobileNetSpatialDetectionNetwork)
monoLeft = pipeline.create(dai.node.MonoCamera)
monoRight = pipeline.create(dai.node.MonoCamera)
stereo = pipeline.create(dai.node.StereoDepth)
objectTracker = pipeline.create(dai.node.ObjectTracker)

xoutRgb = pipeline.create(dai.node.XLinkOut)
trackerOut = pipeline.create(dai.node.XLinkOut)

xoutRgb.setStreamName("preview")
trackerOut.setStreamName("tracklets")

# Properties
camRgb.setPreviewSize(300, 300)
camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
camRgb.setInterleaved(False)
camRgb.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR)

monoLeft.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
monoLeft.setBoardSocket(dai.CameraBoardSocket.LEFT)
monoRight.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
monoRight.setBoardSocket(dai.CameraBoardSocket.RIGHT)

# setting node configs
stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)
# Align depth map to the perspective of RGB camera, on which inference is done
stereo.setDepthAlign(dai.CameraBoardSocket.RGB)
stereo.setOutputSize(monoLeft.getResolutionWidth(), monoLeft.getResolutionHeight())

spatialDetectionNetwork.setBlobPath(args.nnPath)
spatialDetectionNetwork.setConfidenceThreshold(0.5)
spatialDetectionNetwork.input.setBlocking(False)
spatialDetectionNetwork.setBoundingBoxScaleFactor(0.5)
spatialDetectionNetwork.setDepthLowerThreshold(100)
spatialDetectionNetwork.setDepthUpperThreshold(5000)

objectTracker.setDetectionLabelsToTrack([5])  # track only person
# possible tracking types: ZERO_TERM_COLOR_HISTOGRAM, ZERO_TERM_IMAGELESS, SHORT_TERM_IMAGELESS, SHORT_TERM_KCF
objectTracker.setTrackerType(dai.TrackerType.ZERO_TERM_COLOR_HISTOGRAM)
# take the smallest ID when new object is tracked, possible options: SMALLEST_ID, UNIQUE_ID
objectTracker.setTrackerIdAssignmentPolicy(dai.TrackerIdAssignmentPolicy.SMALLEST_ID)

# Linking
monoLeft.out.link(stereo.left)
monoRight.out.link(stereo.right)

camRgb.preview.link(spatialDetectionNetwork.input)
objectTracker.passthroughTrackerFrame.link(xoutRgb.input)
objectTracker.out.link(trackerOut.input)

if fullFrameTracking:
    camRgb.setPreviewKeepAspectRatio(False)
    camRgb.video.link(objectTracker.inputTrackerFrame)
    objectTracker.inputTrackerFrame.setBlocking(False)
    # do not block the pipeline if it's too slow on full frame
    objectTracker.inputTrackerFrame.setQueueSize(2)
else:
    spatialDetectionNetwork.passthrough.link(objectTracker.inputTrackerFrame)

spatialDetectionNetwork.passthrough.link(objectTracker.inputDetectionFrame)
spatialDetectionNetwork.out.link(objectTracker.inputDetections)
stereo.depth.link(spatialDetectionNetwork.inputDepth)

# Connect to device and start pipeline
with dai.Device(pipeline) as device:

    preview = device.getOutputQueue("preview", 4, False)
    tracklets = device.getOutputQueue("tracklets", 4, False)

    startTime = time.monotonic()
    counter = 0
    fps = 0
    color = (255, 255, 255)

    while(True):
        imgFrame = preview.get()
        track = tracklets.get()

        counter+=1
        current_time = time.monotonic()
        if (current_time - startTime) > 1 :
            fps = counter / (current_time - startTime)
            counter = 0
            startTime = current_time

        frame = imgFrame.getCvFrame()
        trackletsData = track.tracklets
        for t in trackletsData:
            if int(t.spatialCoordinates.z) > 3000:
                continue
            roi = t.roi.denormalize(frame.shape[1], frame.shape[0])
            x1 = int(roi.topLeft().x)
            y1 = int(roi.topLeft().y)
            x2 = int(roi.bottomRight().x)
            y2 = int(roi.bottomRight().y)

            width = abs(x2 - x1)
            height = abs(y2 - y1)
            focal_len = 457

            try:
                label = labelMap[t.label]
            except:
                label = t.label

            cv2.putText(frame, str(label), (x1 + 10, y1 + 20), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)
            cv2.putText(frame, f"ID: {[t.id]}", (x1 + 10, y1 + 35), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)
            cv2.putText(frame, t.status.name, (x1 + 10, y1 + 50), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)
            cv2.rectangle(frame, (x1, y1), (x2, y2), color, cv2.FONT_HERSHEY_SIMPLEX)

            cv2.putText(frame, f"X: {int(t.spatialCoordinates.x)} mm", (x1 + 10, y1 + 65), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)
            cv2.putText(frame, f"Y: {int(t.spatialCoordinates.y)} mm", (x1 + 10, y1 + 80), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)
            cv2.putText(frame, f"Z: {int(t.spatialCoordinates.z)} mm", (x1 + 10, y1 + 95), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)

            cv2.putText(frame, f"Width: {ceil(width * int(t.spatialCoordinates.z) / focal_len) } mm", (x2 - 100, y1 + 110), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)
            cv2.putText(frame, f"Height: {ceil(height * int(t.spatialCoordinates.z) / focal_len)} mm", (x2 - 100, y1 + 125), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)

        cv2.putText(frame, "NN fps: {:.2f}".format(fps), (2, frame.shape[0] - 4), cv2.FONT_HERSHEY_TRIPLEX, 0.4, color)

        cv2.imshow("tracker", frame)

        if cv2.waitKey(1) == ord('q'):
            break


```

This application tracks and measures the dimensions of bottles specifically.

However, any of the objects in the labelMap array at the top of our scrip can be identified.

*insert demo gif here

## Part II

Design an eco-friendly (try your best: as reusable as possible) “smart” business/visiting
card (actual hardware) and an associated computer vision application using the camera
provided. 

You can leverage depth information in your design.

Please see the application code below:

```python smart_card.py

#!/usr/bin/env python3

# Script to recognize and perform actions
# with smart card
# Author: Brenton Jackson
# Date: 11/28/22

# The smart card system is a way to get more
# insightful information about candidates in
# a short amount of time, while also not putting
# the burden of remembering every interaction
# on recruiters.
# It doesn't substitute face-to-face interaction,
# but it adds an additional human element to a
# candidate's resume.

from math import ceil
from pathlib import Path
import cv2
import depthai as dai
import time
import argparse
import subprocess

labelMap = ["background", "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow",
            "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]

nnPathDefault = str((Path(__file__).parent / Path('../../depthai-python/examples/models/mobilenet-ssd_openvino_2021.4_5shave.blob')).resolve().absolute())
parser = argparse.ArgumentParser()
parser.add_argument('nnPath', nargs='?', help="Path to mobilenet detection network blob", default=nnPathDefault)
args = parser.parse_args()


# Create pipeline
pipeline = dai.Pipeline()

# Define sources and outputs
camRgb = pipeline.create(dai.node.ColorCamera)
spatialDetectionNetwork = pipeline.create(dai.node.MobileNetSpatialDetectionNetwork)
monoLeft = pipeline.create(dai.node.MonoCamera)
monoRight = pipeline.create(dai.node.MonoCamera)
stereo = pipeline.create(dai.node.StereoDepth)
objectTracker = pipeline.create(dai.node.ObjectTracker)

xoutRgb = pipeline.create(dai.node.XLinkOut)
trackerOut = pipeline.create(dai.node.XLinkOut)

xoutRgb.setStreamName("preview")
trackerOut.setStreamName("tracklets")

# Properties
camRgb.setPreviewSize(300, 300)
camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
camRgb.setInterleaved(False)
camRgb.setColorOrder(dai.ColorCameraProperties.ColorOrder.BGR)

monoLeft.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
monoLeft.setBoardSocket(dai.CameraBoardSocket.LEFT)
monoRight.setResolution(dai.MonoCameraProperties.SensorResolution.THE_400_P)
monoRight.setBoardSocket(dai.CameraBoardSocket.RIGHT)

# setting node configs
stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)
# Align depth map to the perspective of RGB camera, on which inference is done
stereo.setDepthAlign(dai.CameraBoardSocket.RGB)
stereo.setOutputSize(monoLeft.getResolutionWidth(), monoLeft.getResolutionHeight())

spatialDetectionNetwork.setBlobPath(args.nnPath)
spatialDetectionNetwork.setConfidenceThreshold(0.5)
spatialDetectionNetwork.input.setBlocking(False)
spatialDetectionNetwork.setBoundingBoxScaleFactor(0.5)
spatialDetectionNetwork.setDepthLowerThreshold(100)
spatialDetectionNetwork.setDepthUpperThreshold(5000)


objectTracker.setDetectionLabelsToTrack([5, 15])  # track bottle or person
# possible tracking types: ZERO_TERM_COLOR_HISTOGRAM, ZERO_TERM_IMAGELESS, SHORT_TERM_IMAGELESS, SHORT_TERM_KCF
objectTracker.setTrackerType(dai.TrackerType.ZERO_TERM_COLOR_HISTOGRAM)
# take the smallest ID when new object is tracked, possible options: SMALLEST_ID, UNIQUE_ID
objectTracker.setTrackerIdAssignmentPolicy(dai.TrackerIdAssignmentPolicy.SMALLEST_ID)

# Linking
monoLeft.out.link(stereo.left)
monoRight.out.link(stereo.right)

camRgb.preview.link(spatialDetectionNetwork.input)
objectTracker.passthroughTrackerFrame.link(xoutRgb.input)
objectTracker.out.link(trackerOut.input)


spatialDetectionNetwork.passthrough.link(objectTracker.inputTrackerFrame)
spatialDetectionNetwork.passthrough.link(objectTracker.inputDetectionFrame)
spatialDetectionNetwork.out.link(objectTracker.inputDetections)
stereo.depth.link(spatialDetectionNetwork.inputDepth)

# Connect to device and start pipeline
with dai.Device(pipeline) as device:

    preview = device.getOutputQueue("preview", 4, False)
    tracklets = device.getOutputQueue("tracklets", 4, False)

    startTime = time.monotonic()
    counter = 0
    fps = 0
    color = (255, 255, 255)

    frameNum = 0
    while(True):
        imgFrame = preview.get()
        track = tracklets.get()

        counter+=1
        current_time = time.monotonic()
        if (current_time - startTime) > 1 :
            fps = counter / (current_time - startTime)
            counter = 0
            startTime = current_time

        frame = imgFrame.getCvFrame()
        # make copy of frame to preserve original value without
        # object detection information overlayed
        frame_orig = frame.copy() 
        trackletsData = track.tracklets
        for t in trackletsData:            
            roi = t.roi.denormalize(frame.shape[1], frame.shape[0])
            x1 = int(roi.topLeft().x)
            y1 = int(roi.topLeft().y)
            x2 = int(roi.bottomRight().x)
            y2 = int(roi.bottomRight().y)

            width = abs(x2 - x1)
            height = abs(y2 - y1)
            focal_len = 457

            # if card is close enough is detected
            if int(t.spatialCoordinates.z) < 3000 and labelMap[t.label] == "bottle":
                frameNum += 1
                print(frameNum)
                if frameNum > 30:
                    # delay to let user stabilize the camera. upper limit on fps is 30,
                    # so min delay would be 1 second after first detecting object
                    # expected delay is ~2 seconds, since avg fps is ~15-25
        
                    filename = "capture{}".format(t.id)
                    # create video, using -a flag to specify .mov file format
                    # seems I must detach from this device to be able to connect again.
                    device.close()
                    cv2.imwrite(filename + ".png", frame_orig)
                    cv2.destroyWindow("tracker") # close tracking window after capturing last frame
                    print('recording starting in 5 seconds...')
                    time.sleep(2)
                    # has a couple additional second delay to start subprocess and run it
                    ret = subprocess.run(['python3', 'record_video.py', '-n', filename, '-a'], check=True)
                    if ret.returncode != 0:
                        # error
                        print(ret.returncode)
                        print('video was not captured for {}'.format(filename))
                    else:
                        print('saved video successfully')
                        print('quitting...')
                        quit()
            
            try:
                label = labelMap[t.label]
            except:
                label = t.label

            cv2.putText(frame, str(label), (x1 + 10, y1 + 20), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)
            cv2.putText(frame, f"ID: {[t.id]}", (x1 + 10, y1 + 35), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)
            cv2.putText(frame, t.status.name, (x1 + 10, y1 + 50), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)
            cv2.rectangle(frame, (x1, y1), (x2, y2), color, cv2.FONT_HERSHEY_SIMPLEX)

            cv2.putText(frame, f"X: {int(t.spatialCoordinates.x)} mm", (x1 + 10, y1 + 65), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)
            cv2.putText(frame, f"Y: {int(t.spatialCoordinates.y)} mm", (x1 + 10, y1 + 80), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)
            cv2.putText(frame, f"Z: {int(t.spatialCoordinates.z)} mm", (x1 + 10, y1 + 95), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)

            cv2.putText(frame, f"Width: {ceil(width * int(t.spatialCoordinates.z) / focal_len) } mm", (x2 - 100, y1 + 110), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)
            cv2.putText(frame, f"Height: {ceil(height * int(t.spatialCoordinates.z) / focal_len)} mm", (x2 - 100, y1 + 125), cv2.FONT_HERSHEY_TRIPLEX, 0.5, 255)

        cv2.putText(frame, "NN fps: {:.2f}".format(fps), (2, frame.shape[0] - 4), cv2.FONT_HERSHEY_TRIPLEX, 0.4, color)

        cv2.imshow("tracker", frame)

        if cv2.waitKey(1) == ord('q'):
            break


```

Here is the helper module to record video:

```python record_video.py
#!/usr/bin/env python3

# Script to record video for
# specified amount of time
# Author: Brenton Jackson
# Date: 12/1/22

import argparse
import cv2
import time
import depthai as dai
from RGBPipeline import RGBPipeline
import threading
import subprocess
import sounddevice as sd
import soundfile as sf
from scipy.io.wavfile import write
import os

# parse command line arguments
default_duration = 5
parser = argparse.ArgumentParser()
parser.add_argument('duration', nargs='?', help="Video length in seconds", default=default_duration)
parser.add_argument('-n', '--filename', nargs='?', help="Name of file to write to disk", default='test')
parser.add_argument('-a', '--apple', action="store_true", help="If on Apple device (mac), change video codec", default=False)
args = parser.parse_args()

# global variables
duration = int(args.duration)
recording = None
video_filename = ""
startTime = 0
elapsedTime = 0
frameCount = 0
expectedFps = 11.7 # may need to be fine-tuned for your computer, look at stdout for ***video fps***

def record_vid():
    global recording, video_filename, startTime, elapsedTime, frameCount, expectedFps
    rgb_pipeline = RGBPipeline()
    video_filename = args.filename + '.mp4'

    # Define the codec and create VideoWriter object
    fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v') if args.apple else cv2.VideoWriter_fourcc('M','J','P','G')
    out = cv2.VideoWriter(video_filename, fourcc, expectedFps, (1920, 1080))

    # Connect to device and start pipeline
    with dai.Device(rgb_pipeline.getPipeline()) as device:

        video = device.getOutputQueue(rgb_pipeline.getStreamName(), maxSize=1, blocking=False)
        startTime = time.monotonic()
        print('recording video frames')
        while recording == True or elapsedTime < duration:
            videoIn = video.get()
            vidFrame = videoIn.getCvFrame()
            frameCount += 1
            # Get BGR frame from NV12 encoded video frame to show with opencv
            # Visualizing the frame on slower hosts might have overhead
            
            # write the frame
            out.write(vidFrame)
            # cv2.imshow('frame', vidFrame)
            currTime = time.monotonic()
            elapsedTime = currTime - startTime
        print('video recording done')
        recording = False
        out.release()

def record_audio():
    global recording
    rate = 44100
    channels = 1
    audio_filename = args.filename + '.wav'
    # start stream, at the same time as video starts
    recording = True
    time.sleep(2) # delay to wait for video to start recording
    print('recording audio')
    myrecording = sd.rec(int(duration * rate), samplerate=rate, channels=channels)
    sd.wait()  # Wait until recording is finished
    recording = False
    print('audio recording done')
    write(audio_filename, rate, myrecording)  # Save as WAV file 
    
def start_AVrecording():
    # play 3 beeps
    data, fs = sf.read('beep.wav')
    for i in range(0, 2):
        sd.play(data, fs)
        sd.wait()
        time.sleep(0.9)
    sd.play(data, fs)
    # sd.wait()
    
    print('****  recording starting in 1s  ****')
    threading.Thread(target=record_vid).start()
    threading.Thread(target=record_audio).start()

def file_manager(filename=args.filename):
    # Processing of final files
    print('processing files')
    local_path = os.getcwd()
    # if os.path.exists(str(local_path) + "/" + filename + ".wav"):
    #     os.remove(str(local_path) + "/" + filename + ".wav")

    if os.path.exists(str(local_path) + "/" + filename + "2.mp4"):
        os.remove(str(local_path) + "/" + filename + "2.mp4")

    if os.path.exists(str(local_path) + "/" + filename + "_AV.mp4"):
        os.remove(str(local_path) + "/" + filename + "_AV.mp4")
    print('file processing complete')

def merge_files():
    # merge two files after they're done
    global video_filename, elapsedTime, frameCount, expectedFps
    recordedFps = frameCount/elapsedTime
    print('merging audio and video files')
    print("*********   video fps: " + str(recordedFps) + "   ***********")
    print("fps difference: " + str(abs(recordedFps - expectedFps)))
    if abs(recordedFps - expectedFps) >= 0.01:  # If the fps rate was higher/lower than expected, re-encode it to the expected
        print("Re-encoding")
        tempFilename = args.filename + "2.mp4"
        cmd = "ffmpeg -r " + str(recordedFps) + " -i " + video_filename + " -r " + str(expectedFps) + " -b:v 6000k -vcodec mpeg4 " + tempFilename
        subprocess.call(cmd, shell=True)

        print("Muxing")
        cmd = "ffmpeg -ac 1 -channel_layout stereo -i " + args.filename + ".wav -i " + tempFilename + " -b:v 6000k -vcodec mpeg4 " + args.filename + "_AV.mp4"
        retcode = subprocess.call(cmd, shell=True)

    else:
        print("Muxing")
        cmd = "ffmpeg -ac 1 -channel_layout stereo -i " + args.filename + ".wav -i " + video_filename + " -b:v 6000k -vcodec mpeg4 " + args.filename + "_AV.mp4"
        retcode = subprocess.call(cmd, shell=True)
    
    if not retcode:
        print("done merging the files")
    else:
        print("video and audio merge failed")


if __name__ == '__main__':
    # main control flow
    start_AVrecording()
    time.sleep(duration + 5) # 5s buffer to make sure everything is done
    file_manager()
    time.sleep(5) # time for os to handle files
    merge_files()

```

And here is the helper to the helper, RGBPipeline.py, which just saves some code by putting the DepthAI pipeline creation into its own class.

```python
import depthai as dai

class RGBPipeline:
    'Base class for DepthAI RGB camera pipeline'

    def __init__(self, streamName="video") -> None:
        self.streamName = streamName
        # Create pipeline
        self.pipeline = dai.Pipeline()

        # Define source and output
        camRgb = self.pipeline.create(dai.node.ColorCamera)
        xoutVideo = self.pipeline.create(dai.node.XLinkOut)
        xoutVideo.setStreamName(streamName)
        
        # Properties
        camRgb.setBoardSocket(dai.CameraBoardSocket.RGB)
        camRgb.setResolution(dai.ColorCameraProperties.SensorResolution.THE_1080_P)
        camRgb.setVideoSize(1920, 1080)

        xoutVideo.input.setBlocking(False)
        xoutVideo.input.setQueueSize(1)

        # Linking
        camRgb.video.link(xoutVideo.input)

    def getPipeline(self):
        return self.pipeline

    def getStreamName(self):
        return self.streamName
   
 ```

## Approach / Ideas

We thought of this prompt from a number of approaches.

One thought that stood out in this situation was the scenario of being at a career fair, both as a job seeker and an employer. That yielded a few ideas:

On the employer end:
- Create an application that used depth to trigger taking a snapshot of the business card, then subsequently allow the job seeker to record a 20-30s elevator pitch. All of this would be done automatically and saved either on the filesystem or in a database to be used in a web app. The web app's main purpose would be to serve as an easy interface for organization.

That idea was a combination of a few ideas:
- Using depth to save pic of card to database when it got close enough to the camera
- Using depth to trigger a prompt to record the voice of the candidate
- Using depth to trigger a prompt to record video of the candidate

The intended setting for this system would be as a stationary stand by or behind your booth, as an optional task.
In this way, we know we're getting enough power to power everything and having the camera be stationary instead of having this application use a mobile camera adds reliability to the system.

We had other ideas in the scenario that our computer vision software was running on a cell phone, but decided to focus on the former scenario instead of the latter.

## Implementation Details

To merge the audio and video signals, you need to install [ffmpeg](https://ffmpeg.org/) first.

You're also going to want to run video_recorder.py once to get an accurate idea of the actual fps your camera is running at. 

Then fine-tune the value on line 34 based on the log output.

## Limitations

Unfortunately, this prototype is not complete.

The most obvious point of improvement would be in the object recognition implementation.

We need to detect actual business cards or resumes.

In addition, it would be convenient to perform OCR on the card and use the candidate's name as the filename. My partner implemented the optical character recognition feature in his prototype.

I did explore using traditional computer vision methods, such as getting the contours of the frame and drawing the edges to detect business cards, but that method didn't work in real-time where I was performing that operation on every frame. It just wasn't accurate enough.

I could have managed the time a little better to be able to make a new model that recognized business cards.
Instead, I skipped that step and assumed it was trivial for the purpose of finishing and proving the proof of concept - which I believe we did.

Another improvement to the application would be to make it run continuously. In this implementation, everything
closes after the video is recorded. This is due in part to the fact that I have to disconnect from the device in order to open it again for the video recording.

Now, I could implement a goto action or wrap the entire program in a loop to run continuously. I'm sure there are smarter and more sophisticated ways to go about it.

*insert demo gif here