## Advanced Control Methods | Assignment 4

### Introduction

The last assignment is devoted to the application of the methods of classical Computer Vision to the problem of calculating the number of fingers that are shown to the camera. Check the file *fingers.mov* that we will be working with.

It is beyond necessity to prove the importance of the Computer Vision in robotics. Let us just briefly highlight the following.

- CV in application to the autonomous robotocs is a part of control in a broad sense of the word. Its role is to transform millions of numbers in few ones as fast as possible.

- Classical (without NNs) CV is a set of approaches that rely on the handcrafted features. From a computational point of view these methods are often faster, while being worse in terms of performance, which was the whole point of the NNs develpment. Another factor is that many classical algorithms are supported by very low-powered computers, SOCs, and microcontrollers, and NNs are not.

- All the methods are supposed to be used in this assignment could be applied to another CV problems. Mask processing and analysis is one of the foundations of the rapid and effective prototyping of the vision pipelines.

In the first section the structure for the code is introduced. Since we do not use any learning-based or autoadjustment techniques here, it will be necessary to tune the color filter to detect the hand, which is in the second section of the notebook. In the third section of the notebook you are supposed to write the code that will calculate the number of the fingers in the frame.

### Methods

Here we provide a short list of sketches of approaches to calculate fingers in the frame, feel free to use any of them. Do not limit yourself by those methods, everything (except for NNs) will be accepted as long as it works. Of course, you could try them for fun, and probably you will get a solution that works better than the author's :)

**Sketetonization**

- mask obtainment
- mask refinement: denoising, smoothing. Leaving a single connected component
- skeletonization
- fingertip obtainment (via filter2d)
- fingertip filtering

**Convexity defects**

- mask obtainment
- mask refinement: denoising, smoothing. Leaving a single connected component
- finding contour, finding rough approximation
- finding convexity defects, processing them

**Morphology**

- mask obtainment
- mask refinement: denoising, smoothing. Leaving a single connected component
- top hat/black hat morphological operations

***

<h2 style="color:#A7BD3F;">Section 1: Preparations</h2>

Please examine the code below. Essentially it is a wrapper around a frame processing.

Use 'q' to stop the execution. Due to the implementation details this feature works with english language only.

In [7]:
import numpy as np
import cv2

class FrameProcessor:
    def __init__(self):
        pass
    
    def processing_loop(self, source, lth, hth, max_frame_num = -1,\
                        alternative_source="", save_to_file=""):
        i = 0
        results = []

        output_file = None
        
        #out = cv2.VideoWriter('outpy.avi', cv2.VideoWriter_fourcc('M','J','P','G'),
#                               30, (WINDX, WINDY))
#         out.write(canvas)
#         out.release()
        
        while (True):
            retval, frame = source.read()

            if (retval == False):
                print("Cannot read frame")
                
                if (alternative_source != ""):
                    print("Opening alternative source ", alternative_source)
                    source = cv2.VideoCapture(alternative_source)
                    continue
                
                else:
                    print("Exiting loop")
                    break

            result = self.process_frame(frame, lth, hth)
            
            results.append(result)

            key = cv2.waitKey(100) & 0xFF

            i += 1

            if (key == ord('q')):
                break
                        
            if (max_frame_num != -1 and i >= max_frame_num):
                break

        return results
    
    def process_frame(self, frame, lth, hth):
        return 5

***

<h2 style="color:#A7BD3F;">Section 2: Color filter tuning</h2>

Tune the parameters of the color filtering. Note that it is performed in *HSV* color space. After you did that, write these parameters into *lth* and *hth* respectively. These parameters will be used further in the notebook.

In [12]:
#############################################
# YOUR DEFAULT PARAMETERS BELOW
#############################################

lth, hth = (0, 0, 0), (255, 255, 255)

#############################################
# YOUR DEFAULT PARAMETERS ABOVE
#############################################

class ColorFilterTuning(FrameProcessor):
    def __init__(self):
        super().__init__()
        
        cv2.namedWindow("color_filter_parameters")
                
        cv2.createTrackbar('rl', 'color_filter_parameters', lth[0], 255, self.nothing)
        cv2.createTrackbar('gl', 'color_filter_parameters', lth[1], 255, self.nothing)
        cv2.createTrackbar('bl', 'color_filter_parameters', lth[2], 255, self.nothing)
        cv2.createTrackbar('rh', 'color_filter_parameters', hth[0], 255, self.nothing)
        cv2.createTrackbar('gh', 'color_filter_parameters', hth[1], 255, self.nothing)
        cv2.createTrackbar('bh', 'color_filter_parameters', hth[2], 255, self.nothing)

    def nothing(self, inp):
        pass
    
    def process_frame(self, frame, lth, hth):
        hsv = cv2.cvtColor(frame, cv2.COLOR_RGB2HSV)
        
        low_th =  (cv2.getTrackbarPos('rl', 'color_filter_parameters'),
                   cv2.getTrackbarPos('gl', 'color_filter_parameters'),
                   cv2.getTrackbarPos('bl', 'color_filter_parameters'))
        
        high_th = (cv2.getTrackbarPos('rh', 'color_filter_parameters'),
                   cv2.getTrackbarPos('gh', 'color_filter_parameters'),
                   cv2.getTrackbarPos('bh', 'color_filter_parameters'))
        
        mask = cv2.inRange(frame, low_th, high_th)
        
        cv2.imshow("frame", frame)
        cv2.imshow("mask", mask)
        
        return (low_th, high_th)

In [13]:
!ls

asgn-4.ipynb fingers.mov


In [14]:
import numpy as np
import cv2

video_file = "fingers.mov"

cam = cv2.VideoCapture(video_file)

#print(cam)
# frame_offset = 100
# cam.set(1, frame_offset)

tuner = ColorFilterTuning()

colors = tuner.processing_loop(cam, None, None, max_frame_num = -1,\
            alternative_source=video_file)
lth, hth = colors[-1]

print("Color filter parameters: ", lth, hth)
cam.release()
cv2.waitKey(0)
cv2.destroyAllWindows()
cv2.waitKey(100)

Color filter parameters:  (95, 0, 0) (255, 255, 255)


-1

***

<h2 style="color:#A7BD3F;">Section 3: Fingers counting</h2>

Implement the core finger counting algorithm in a frame given below. Don't forget to use *lth* and *hth* parameters that stand for the triplets of lower and higher color bounds respectively.

In [17]:
class FingersCounter(FrameProcessor):
    def __init__(self):
        super().__init__()

    def filter_cc(self, mask, area_th = -1):
        connectivity = 4
        output = cv2.connectedComponentsWithStats(mask, connectivity, cv2.CV_32S)
        num_labels = output[0]
        labels = output[1]
        stats = output[2]
        #centroids = output[3]

        if (num_labels < 1):
            return mask
        
        if (area_th == -1):
            max_area = 1
            max_label = 1
            
            for i in range(1, num_labels):
                area = stats[i, cv2.CC_STAT_AREA]
                
                if (area > max_area):
                    max_area = area
                    max_label = i
            
            for i in range(1, len(stats)):
                if (i != max_label):
                    mask[np.where(labels == i)] = 0
                    
        else:
            for i in range(len(stats)):
                area = stats[i, cv2.CC_STAT_AREA]

                if (area < area_th):
                    mask[np.where(labels == i)] = 0

        return mask
    
    def fill_holes (self, img):
        (h, w) = img.shape

        before_area = img.sum ()

        img_enlarged = np.zeros ((h + 2, w + 2), np.uint8)
        img_enlarged [1:h+1, 1:w+1] = img

        img_enl_not = cv2.bitwise_not (img_enlarged)
        th, im_th = cv2.threshold (img_enl_not, 220, 255, cv2.THRESH_BINARY_INV);

        im_floodfill = im_th.copy()

        h, w = im_th.shape[:2]
        mask = np.zeros((h+2, w+2), np.uint8)

        cv2.floodFill(im_floodfill, mask, (0,0), 255);
        im_floodfill_inv = cv2.bitwise_not(im_floodfill)
        im_out = im_th | im_floodfill_inv

        result = im_out [1:h-1, 1:w-1]

        #after_area = result.sum ()
        
        return result

    def process_frame(self, frame, lth, hth):
        cv2.imshow("frame", frame)
        
        return 3
    
    #def process_frame(self, frame, lth, hth):
        #############################################
        # YOUR CODE BELOW
        #############################################
        
        
        
        #############################################
        # YOUR CODE ABOVE
        #############################################
        
    #    cv2.imshow("stages", stages_concat)
        
    #    return fingers_num

In [18]:
cam = cv2.VideoCapture("fingers.mov")

finger_counter = FingersCounter()

fingers_num = finger_counter.processing_loop(cam, lth, hth)

print(fingers_num)

cam.release()
cv2.waitKey(0)
cv2.destroyAllWindows()
cv2.waitKey(100)

Cannot read frame
Exiting loop
[3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]


-1

***

<h2 style="color:#A7BD3F;">Section 4: Grading</h2>

The grading scheme is quite straightforward: *0.5* or more finger  gives full grade of *100* points with linear interpolation downwards.

Please execute the cell below to grade your solution. As you can see, it runs your counting function on a pre-recorded video and compares the results with the markup. In order to avoid confusion, only the unambiguous cases are counted in the grading.

In case of any questions, reach out to Ilya Osokin (@elijahmipt) or Georgy Malaniya (@OdinManiac) on Telegram.

In [19]:
reference_fingers_num = [5, 5, 1, 0, 0, 5, 5, 5, 0, 0, 0, 5, 5, 5, 5, 4, 3, 3,\
                         3, 3, 3, 3, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2,\
                         2, 2, 2, 2, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5,\
                         2, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2,\
                         2, 2, 1, 2, 2, 2, 1, 2, 2, 2, 1, 2, 1, 1, 1, 1, 1, 1,\
                         3, 4, 0, 0, 0, 1]

max_grade = 100

corr_num = 0

for r, s in zip(reference_fingers_num, fingers_num):
    if (r == s):
        corr_num += 1

acc = corr_num / len(reference_fingers_num)

#print("correct ", corr_num, " out of ", len(reference_fingers_num),
#      corr_num / len(reference_fingers_num))

grade = min(acc * 2, 1) * max_grade

print("Your grade is ", "\033[92m{}\033[0m".format(str(int(grade)) +\
        " out of " + str(max_grade) + "; " + str(corr_num) + " frames out of "
        + str(len(reference_fingers_num))))

Your grade is  [92m16 out of 100; 8 frames out of 96[0m
