Ankit Dheendsa: Capstone - Sprint 1                                     
<br>
July 28th, 2023

This notebook is what we will use for all of our data collection for our custom data set of different ASL (American Sign Language) hand figures as a means to train a ML model to automatically detect and translate ASL to english text. It utilizes a variety of libraries to be able to capture standardized images of the hand in different positions to where we can then save those images in a certain file for training later on.

To begin this script we first need to import all necessary libraries, these being cv2 (from OpenCV), cvzone, numpy and math. 

The cv2 library provides a large amount of functions and tools to perform computer vision tasks (as we will be using our webcam to detect and capture image data). The cvzone library is complementary to the cv2 library and it makes OpenCV easier to work with for specific tasks.

All knowledge about OpenCV was obtained from https://docs.opencv.org/4.x/d6/d00/tutorial_py_root.html
<br>
All knowledge about specific cvzone functions were obtained from https://github.com/cvzone/cvzone
<br>
No code snippets were copy and pasted, only specific commands (syntax)
<br>
NOTE: If this notebook causes errors upon running, please continue to run the demo with the file named "collect_data.py" within the same folder (code is the same). 


In [1]:
# We start off by pip installing and importing the packages cvzone and mediapipe
import cv2
from cvzone.HandTrackingModule import HandDetector
import numpy as np
import math

Next, we will need to instantiate a variety of variables. We will start by creating a "capture" variable which will use the cv2 library to use our webcam as a video capture method. We will create a "detector" variable which will be used to detect when a hand is on the screen and for simplicity purposes we will set it to only detect 1 hand at a time.

Afterwards, we will create placeholder variables (offset and imgSize) that will be used later on when creating our cropped box video capture (will be explained later on).

Finally, we create a "folder" variable that holds our target folders relative path so that we can store the images we will take from the video capture in a specific folder (currently set to the folder "A" as that will be the first letter we will be capturing data on) as well as the "counter" variable to keep a count of how many images we have captured so as to keep track easier.

In [5]:
# Create the capture variable using cv2, note that 0 is the webcam number (this uses the webcam so we can start using it to create our custom data set)
capture = cv2.VideoCapture(0)
# Creating detector variable to detect hands (will be used for data collection of single hand ASL) - note: we set maxHands=1 because we are only looking to track 1 hand
detector = HandDetector(maxHands=1)

# Place holder variables that will be used with our live-feed cropping logic
offset = 30
imgSize = 300

# Creating a folder variable where we would like to store the images (this will change everytime we would like to store new image sets in a new folder, for example when we are capturing img data to train for the letter B)
folder = "Data/A"
# Creating a variable to store a count, this count will tell us how many images we have saved
counter = 0

Moving forward we now need to create the logic that will turn on the webcam as a means to capture video as well write the logic that will allow us to create a cropped box around the hand so that the images captured are all of the same size to create better training data (how this cropped box looks and acts is better understood via a live demonstration or video recording).

We will start by creating an infinite while loop that will enable us to turn on the webcam indefinitely (until we close it) as well as continuously run the logic. 

Within the while loop we create a cropped box (within an if statement that activates when a hand is detected via the .findHands() method). We then use our imgSize and offset values to create a new white box and manipulate the size of the box to a 300x300 pixel cropped section with the hand being in the focus/middle of the box. We create a white box with set dimensions so that all images are of the save size. This makes for better training data as image sizes are standardized and will make training our ML model much easier and more efficient in the future. Once we create the white box we overlay it with our regular cropped box so that the dynamic cropped image of the hand is now sitting on top of the static white box. 

We then create the logic to dynamically change the focus and sizing of the image to allow for certain height and width parameters (so if the hand symbol is too high or too wide, the detector wont crash). We accomplish this using an if else statement to check if the aspect ratio is above or less than 1 (indicating if a height readjustment or width readjustment is needed).

We end off by using the .imshow() function to show the white box within this loop. The final result is a dynamically changing cropped box around the hand that has been detected that is on top (layered) of the static white box that guarantees a specific size is maintained across all images. The final result will also showcase that even if a hand symbol is fairly wide or long, the original cropped box will re adjust the sizing to include the entirety of the hand as a means of acquiring clean and standardized training data. This also allows us to capture images of hand symbols in awkward positions and differing distances away from the webcam.

In [None]:
# Creating while loop to turn on webcam
while True:
    success, img = capture.read()
    hands, img = detector.findHands(img)
    
    # Here we are creating a cropped box that capures only the hand rather than the entire video image (which would also include the background etc.) this way the model is only looking at the hand
    if hands:
        hand = hands[0]
        x,y,w,h = hand['bbox']
        
        # Creating a white box to keep a standard size of imagery so that it doesnt change in accordance to the height and width of the hand position
        imgWhite = np.ones((imgSize,imgSize,3),np.uint8)*255
        imgCrop = img[y-offset: y+h+offset, x-offset:x+w+offset]
        
        #Overlaying the white box behind the cropped box
        imgCropShape = imgCrop.shape
        
        # Creating an aspect ratio that measures the ratio of height and width so we can write logic
        # compensates when a detected hand symbol is too long or too wide
        aspectRatio = h/w
        
        # logic to adjust cropped image when hand symbol is too long
        if aspectRatio >1:
            k = imgSize/h
            wCal = math.ceil(k*w)
            imgResize = cv2.resize(imgCrop, (wCal,imgSize))
            imgResizeShape = imgResize.shape
            wGap = math.ceil((imgSize-wCal)/2)
            imgWhite[:, wGap:wCal+wGap] = imgResize
        # logic to adjust cropped image when hand symbol is too long
        else:
             k = imgSize/w
             hCal = math.ceil(k*h)
             imgResize = cv2.resize(imgCrop, (imgSize,hCal))
             imgResizeShape = imgResize.shape
             hGap = math.ceil((imgSize-hCal)/2)
             imgWhite[hGap:hCal+hGap,:] = imgResize
        
        # Displaying the white box
        cv2.imshow("imgWhite" ,imgWhite)

Finaly, we use the .imshow() method to display the final image box. We then set a variable that holds a delay (1ms) using the .waitKey() method. We use this variable in conjunction with an if statement that checks to see if the letter "s" (case sensitive) is being pressed;if so, then we increase the counter by 1 (to keep track of how many images were taken) and write the captured image to the folder specified in the folder variable (above). 

In [None]:
# Displaying cropped box live feed
cv2.imshow("Image" ,img)
# Storing delay functionality in a variable 
key = cv2.waitKey(1) #1ms delay
    
# Logic for capturing an image from live feed and saving it to the specified file
if key == ord("s"):
    counter += 1 
    cv2.imwrite(f'{folder}/Image_{counter}.jpg', imgWhite)
    print(f"Image {counter} captured")