<font size="5">American Sign Language Recognition Using Tensorflow, Keras, and OpenCV</font>

Before diving into the code, we first have to import all the required libraries. I'll use Tensorflow, Keras, and OpenCv. We're also importing numpy which will help us reshae the image frame from the video.

In [1]:
import numpy as np
import cv2
import keras
from keras.preprocessing.image import ImageDataGenerator
import tensorflow as tf
print(cv2.__version__)

4.5.5


Now that's done, we have to import our Machine Learning model into our file. Keras provides us with a module that makes this quite simple.

In [2]:
model = keras.models.load_model('model1.h5')

Accumulated weight will be useful if we're predicting foreground masks instead of images themselves. However, since we trained our model on images, this won't be necessary.

Region of Interest is the region we want our model to see. I only made a small portion of the frame as region of interest so my body won't be visible when I'm projecting the signs.

In [3]:
background = None
accumulated_weight = 0.5

ROI_top = 100
ROI_bottom = 300
ROI_left = 150
ROI_right = 350

Now to the interesting part. We're starting video capture using OpenCV and obtaining our frame. We're then converting it into a grayscale image and resizing it. Numpy comes in handy here during reshaping.

We're then calling the predict function from our model to preidct the image.

In [38]:
from cv2 import VideoCapture

word_dict = {}

for n in range(0, 26):
    word_dict[n] = chr(97 + n)

word_dict[26] = 'space'
word_dict[27] = 'nothing'

print(word_dict)

cam = VideoCapture(0)
num_frames = 0


while True:
    ret, frame = cam.read()

    if not ret:
        print("failed to grab frame")
        break

    # filpping the frame to prevent inverted image of captured frame...
    frame = cv2.flip(frame, 1)

    frame_copy = frame.copy()

    # ROI from the frame
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    gray = gray[ROI_top:ROI_bottom, ROI_left:ROI_right]
    cv2.imshow("ROI", gray)

    thresholded = cv2.resize(gray, (64, 64))
    thresholded = cv2.cvtColor(thresholded, cv2.COLOR_GRAY2RGB)
    thresholded = cv2.GaussianBlur(thresholded, (5, 5), 0)
    thresholded = thresholded.reshape(1, 64, 64, 3)

    pred = model.predict(thresholded)
    cv2.putText(frame_copy, str(word_dict[(np.argmax(pred))]) , (100, 100), cv2.FONT_HERSHEY_SIMPLEX, 3, (0, 0, 0), 2)
    cv2.imshow("Sign_Detection", frame_copy)
    if(cv2.waitKey(1) == ord('q')):
        break
    

{0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e', 5: 'f', 6: 'g', 7: 'h', 8: 'i', 9: 'j', 10: 'k', 11: 'l', 12: 'm', 13: 'n', 14: 'o', 15: 'p', 16: 'q', 17: 'r', 18: 's', 19: 't', 20: 'u', 21: 'v', 22: 'w', 23: 'x', 24: 'y', 25: 'z', 26: 'space', 27: 'nothing'}


Ta da! Its done! There, we have our first ever (well, mine atleast :) ) sign language recongition model

Don't forget to destroy the opencv window once its done!

In [39]:
cam.release()
cv2.destroyAllWindows()