### In this notebook, we'll go through the steps of creating a fun computer vision math game using mediapipe, opencv, and sklearn

First, we'll import some important libraries

In [1]:
import numpy as np #For working with numpy arrays and functions
import cv2 #For working with images
import mediapipe as mp #For hand tracking
import time #For timing
import random #For generating random numbers
from sklearn.linear_model import LogisticRegression #For training a more advanced countingFingers version
import pandas as pd #For loading and working with csv files
import pickle

#If you don't have these libraries installed, run "pip install -r requirements.txt"

Next, we'll import our HandDetector class

In [2]:
import HandtrackingModule as htm

Define the function to count the number of fingers raised. The function increments fingersRaised if the tip landmark for each finger is higher than the landmarks below it.

In [3]:
def countFingersRaised(lmList): #Takes landmarks list and outputs the number of fingers raised
    tipIds = [4, 8, 12, 16, 20] #Tips of : [Thumb, Index, Middle, Ring, Pinky] 
    fingersRaised = 0
    if(len(lmList) > 0):
        for hand in range(len(lmList)):
            for tip in tipIds:
                if(tip == 4): #Thumb raises along both x and y axis so it has some special logic
                    if(lmList[hand][4][1] < lmList[hand][0][1]):
                        if(lmList[hand][4][1] < lmList[hand][3][1]):
                            fingersRaised += 1
                    
                    elif(lmList[hand][4][1] > lmList[hand][0][1]):
                        if(lmList[hand][4][1] > lmList[hand][3][1]):
                            fingersRaised += 1
                            
                elif(lmList[hand][tip][2] < lmList[hand][tip-2][2]): #Rest of Fingers
                    fingersRaised += 1
            
    return fingersRaised

Next, we'll define a function to start capturing frames from the camera and show us the image as well as the number of fingers raised

In [4]:
def captureAndDetect(countFingersRaised): #Captures camera footage using cv2 and shows the image after processing it
    pTime = 0
    cTime = 0
    cap = cv2.VideoCapture(0) #Create video object
    detector = htm.HandDetector() #Initialize hand detector
    try:
        while True: #Start camera footage
            success, img = cap.read()
            img = cv2.flip(img, 1)
            lmList, img = detector.findLandmarks(img)
            fingersRaised = countFingersRaised(lmList)
            
            #Calculate frames per second
            cTime = time.time()
            fps = 1/(cTime - pTime)
            pTime = cTime
            
            cv2.putText(img, str(int(fps)), (10,70), cv2.FONT_HERSHEY_PLAIN, 3,(255,0,255),3) #display fps
            cv2.putText(img, str(fingersRaised), (560,70), cv2.FONT_HERSHEY_PLAIN, 4,(255,0,255),4) #display fingers raised
            
            cv2.imshow("Image", img)
            cv2.waitKey(1)
    except KeyboardInterrupt:
        cap.release()
        cv2.destroyAllWindows()  # Close all OpenCV windows
        cap = None

Let's try calling the function

In [5]:
captureAndDetect(countFingersRaised) #Interrupt whenever you want

The algorithm seems to be working well ! It detects the fingrs raised with great accuracy, as long as the you raise your fingers with your palms straight and facing forwards. We'll revisit the countingFingersRaised() function to make it work more accurately and in more cases later on.

Next, let's work on our maths game !

In [6]:
def computerVisionMath(countFingersRaised, numQuestions):
    cap = cv2.VideoCapture(0)
    detector = htm.HandDetector()
    newRound = True
    rand1 = rand2 = res = 0
    framesRight = 0
    prevRand2 = 0
    round = 0
    
    try:
        while round <= numQuestions:
            success, img = cap.read()
            img = cv2.flip(img, 1)
            
            lmList, img = detector.findLandmarks(img)
            fingersRaised = countFingersRaised(lmList)
            
            if newRound:
                round += 1
                if(round > numQuestions):
                    continue
                rand1 = random.randint(1,10)
                while(rand2 == prevRand2):
                    rand2 = random.randint(1,10)
                prevRand2 = rand2
                res = rand1 * rand2
                print(f"Question {round}/{numQuestions} : {res} / {rand1} = ", end = "")
                newRound = False
            
            if(fingersRaised == rand2):
                framesRight += 1
                if(framesRight == 3): #You should get 3 frames right continously -> Avoid accidental answers
                    print(fingersRaised)
                    newRound = True
                    framesRight = 0
            else:
                framesRight = 0
            
            cv2.imshow("Computer Vision Math", img)
            cv2.waitKey(1)
        cap.release() #Release camera
        cv2.destroyAllWindows()  # Close all OpenCV windows
        cap = None
    except KeyboardInterrupt:
        cap.release() #Release camera
        cv2.destroyAllWindows()  # Close all OpenCV windows
        cap = None

Let's try out our math game !

In [7]:
computerVisionMath(countFingersRaised, 3) #First argument is the function we wanna use to count the fingers, second argument is for how many math questions do you want to solve

Question 1/3 : 24 / 6 = 4
Question 2/3 : 30 / 3 = 10
Question 3/3 : 35 / 5 = 7


Works out amazingly well ! With that, we now have a fun computer vision math game.

### Extras : Implementing a more advanced countingFingersRaised algorithm

Our countingFingersRaised function works well if the player keeps their palms up straight and facing forward, but we can do better !
Instead of detecting raised fingers based on the x and y positions of the landmarks, we can use a more powerful relation : The angle between the two vectors of every three landmarks.

Explanation :
When raising your finger, the if we vector lines between the landmarks representing the joints of your fingers, we'll notice that the angles formed between the intersecting vectors are pretty obtuse (close to 180 degrees). Alternatively, when your fingers are closed, the angles formed between the intersecting vectors are relatively low in comparison.

As such, we can train a logistic regression model which takes the angles formed between the intersecting vectors and outputs whether the finger is raised or not !

#### Getting training labels

First of all, we'll get training data for our logistic regression algorithm. How will we find the data ? We'll create our own ! We can create our own data using a very simple and efficient python script.

##### Creating a MultiProcessing python script to get training data

We'll create a script that launches two processes to do the following :

Process 1 : Keep the camera open and detecting all of our hands landmarks

Process 2 : Awaits user input (1 or 0 standing for raised or not raised)

When user input is set, process one will start saving the landmark data of each frame it captures. It'll then calculate the angles between landmark vectors and save the angles corresponding to one finger into an list, appending to the list the label. Finally, it'll save the list entries into a CSV file. After that, it'll wait for the user to input a new label to repeat the process again.

Since Jupyter notebooks don't support multiprocessing very well, we've saved all the needed code for this section in a Module called MultiProcessLabelingModule.

In [16]:
#Import the module
import MultiProcessLabelingModule as mplm

We'll call the startLabelingProcess() twice, once to label training data for our thumb since it raises a little different than the other fingers, and once to label training data for the rest of our fingers. Since the rest of our fingers all behave in a similar way, we can just get training data on one of them and apply the model predictions to all the others. I'll choose the index finger as it's easier to work with

In [21]:
#Do NOT enter input before your camera starts
#Do NOT interrupt the process while running, if you want it to end then enter a label other than 0 or 1
mplm.startLabelingProcess(4) #Thumb training data

In [22]:
#Do NOT enter input before your camera starts
#Do NOT interrupt the process while running, if you want it to end then enter a label other than 0 or 1
mplm.startLabelingProcess(8) #Index (and rest of fingers) training data

Let's load our datasets !

In [23]:
dataThumb = pd.read_csv("trainingDatathumb.csv")
Xthumb = dataThumb.iloc[:, :-1].values
Ythumb = dataThumb.iloc[:, -1].values

dataIndex = pd.read_csv("trainingDataindex.csv")
Xindex = dataIndex.iloc[:, :-1].values
Yindex = dataIndex.iloc[:, -1].values

Let's define and train and our models

In [24]:
modelThumb = LogisticRegression()
modelThumb.fit(Xthumb, Ythumb)
with open("modelThumb.pkl", "wb") as f:
    pickle.dump(modelThumb, f)

In [25]:
modelIndex = LogisticRegression()
modelIndex.fit(Xindex, Yindex)
with open("modelIndex.pkl", "wb") as f:
    pickle.dump(modelIndex, f)

We'll now define a helper function which takes a landmark list and outputs a numpy array of angles for the specified finger.

In [26]:
def returnAngleArray(lmList, tip, hand):
    landmarksList = [lmList[hand][tip][1:4], lmList[hand][tip-1][1:4], lmList[hand][tip-2][1:4], lmList[hand][tip-3][1:4], lmList[hand][0][1:4]] #We'll get data from only one hand at a time
    angles = []
    for point in range(1,4):
        angles.append(mplm.angle_between_vectors(mplm.construct_vector(np.array(landmarksList[point]), np.array(landmarksList[point-1])),
                                                mplm.construct_vector(np.array(landmarksList[point]), np.array(landmarksList[point+1]))))
    return np.array(angles)


Finally, we'll define our advanced counting raised fingers algorithm

In [27]:
def advancedCountFingers(lmList): #Takes landmarks list and outputs the number of fingers raised
    tipIds = [4, 8, 12, 16, 20]
    fingersRaised = 0
    if(len(lmList) > 0):
        for hand in range(len(lmList)):
            for tip in tipIds:
                angles = returnAngleArray(lmList, tip, hand)
                angles = angles.reshape(1, -1)
                if(tip == 4): #Special logic for Thumb
                    prediction = modelThumb.predict(angles)
                    fingersRaised += 1 if prediction>=0.5 else 0
                else:
                    prediction = modelIndex.predict(angles)
                    fingersRaised += 1 if prediction >= 0.5 else 0
            
    return fingersRaised

Now Let's try out our model !

In [14]:
captureAndDetect(advancedCountFingers) #Interrupt whenever you want

Our model works amazingly well ! It's detecting the fingers raised near perfectly as long as mediapipe itself is tracking and labeling the landmarks well ! Moreover since it's a simple logistic regression model, it isn't computationally expensive and we get decent framerates.

Now let's try out our math game !

In [None]:
computerVisionMath(advancedCountFingers, 10) #Interrupt whenever you want

Question 1/10 : 30 / 6 = 5
Question 2/10 : 8 / 2 = 4
Question 3/10 : 12 / 2 = 6
Question 4/10 : 21 / 3 = 7
Question 5/10 : 27 / 9 = 3
Question 6/10 : 14 / 7 = 2
Question 7/10 : 18 / 3 = 6
Question 8/10 : 2 / 2 = 1
Question 9/10 : 28 / 4 = 7
Question 10/10 : 8 / 8 = 1


Works perfectly !