In [1]:
%matplotlib widget

In [2]:
import ipywidgets as widgets
import cv2
import numpy as np
from ipywebrtc import CameraStream, ImageRecorder, VideoRecorder
from src.View import JupyterGUIFactory
from src.Controller import StateManager
from src.landmark_names import landmark_names
# from IPython.display import clear_output

In [3]:
conditions = ["open_palm",
             "open_dorsal",
             "fist_palm",
             "fist_dorsal",
             "three_fingers_palm",
             "three_fingers_dorsal"]

# Task 1 - Raw Videos

Record 6 videos of your hand (5+ seconds), each showing a different gesture (for examples check the instructions for assignment 1):

1.	**open_plam**: The palm side is facing the camera, and the hand is open; the fingers are spread.
2.	**open_dorsal**: The dorsal side (back of the hand) is facing the camera, and the hand is open; the fingers are spread
3.	**fist_palm**: The palm side is facing the camera, and the hand is closed to a fist
4.	**fist_dorsal**: The dorsal side is facing the camera, and the hand is closed to a fist
5.	**three_fingers_palm**: The palm side is facing the camera, and three fingers are spread out: thumb, index, and middle fingers. Ring, and pinky fingers are folded in.
6.	**three_fingers_dorsal**: The dorsal side is facing the camera, and three fingers are spread out: thumb, index, and middle fingers. Ring and pinky fingers are folded in.


**Important:** While recording the videos move your hand around slowly in the image, move it closer to the camera or further away, or do mild rotations. 

This is done to create more diverse training data. The hand can be anywhere on the screen, arbitrarily rotated, and at any distance from the camera (as long as it is still identifiable as a hand). The broader the set of examples, the more robust the trained model will be.

In [53]:
camera = CameraStream(constraints=
                      {'facing_mode': 'user',
                       'audio': False,
                       'video': { 'width': 640, 'height': 480 }
                       })
camera

CameraStream(constraints={'facing_mode': 'user', 'audio': False, 'video': {'width': 640, 'height': 480}})

In [42]:
recorder = VideoRecorder(stream=camera)
recorder

VideoRecorder(stream=CameraStream(constraints={'facing_mode': 'user', 'audio': False, 'video': {'width': 640, …

In [54]:
camera.close()

Once you are happy with the video save it using the name of the condition (bold word in above list).

In [43]:
file_name="three_fingers_dorsal"
with open(file_name+".webm", 'wb') as out_file:
    out_file.write(recorder.video.value)

In [44]:
input_video = cv2.VideoCapture("three_fingers_dorsal.webm")

output_file_name = "three_fingers_dorsal.mp4"
backend = cv2.CAP_ANY
fourcc_code = cv2.VideoWriter_fourcc(*"H264")
fps = 24
frame_size = (640, 480)
output_video = cv2.VideoWriter(output_file_name, backend, fourcc_code, fps, frame_size)

ret, frame = input_video.read()
counter = 0
while ret:
    ret, frame = input_video.read()
    if not ret:
        continue
    output_video.write(frame)
input_video.release()
output_video.release()

**Optional:** If you want the video to be anonymous and your face can be seen in the video, you may anonymize it using the face detection method presented in lab 1. For each detected face, either draw a solid box over that section of the image, or blurr the region using gaussian blur.

In [149]:
# place the code for loading, anonymizing, and saving the video here

One you are done with Task 1, release the camera.

In [40]:
camera.close()

Save the video using the name of the condition (bold word in above list). Create a .zip file containing either all 6 original or all 6 anonymized videos. The name of the .zip file must be of the form `<last name>_<first name>_<student number>_videos.zip`, where you replace <last name>, <first name>, and <student number> with your last name, first name, and student number. Submit The .zip file on studentportalen.

 **Optional:** Fill out the consent form to allow us to use the raw videos to create a benchmark dataset for hand gesture recognition and and skeleton tracking from 2D video.

# Task 2 - Annotating the Videos

For this task you will not have to write any code; instead, this notebook provides an annotation tool so that you can work more efficiently. This ensures that everybody has the same format for their annotations and saves you some time in the data processing / cleaning phase.

It will still show you the challenges associated with this step, as - for most (real) ML projects - data cleaning and preparation is where you spend the majority of your time.

In [4]:
state = StateManager(conditions, video_format=".mp4")
gui = JupyterGUIFactory(state, conditions)
# below you will see a figure with the first frame of a condition you recorded
# this is NOT the GUI; you will execute the GUI in the next code block

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

**How to use this annotation tool:**
1. Use the "Video" drop down menu to select and jump between videos (progress is preserved) #done
2. Use the "Frames" slider (timeline) to scrub through the video
3. Use the two blue arrow buttons at the bottm left to select the landmark you wish to place
4. Click on the image to place the landmark so that it matches the location in the graphic next to it. This does the following:
    0. (If you know the concept of keyframing for animation, this is what is happening)
    1. A marker for the selected landmark is placed at the position for this frame
    2. The marker position becomes a supporting point for the trajectory of the landmark over time (keyframe)
    3. Frames that aren't keyframes will have the marker position calculated by interpolating linearly between the two nearest keyframes
5. Use the above steps to create markers for all visible landmarks in the first and last frame of the video
6. Use the timeline slider to slowly scrub through the timeline and observe the position of the marker change over time
7. If the position doesn't match where it is supposed to be (on the respective joint of the hand) do the following:
    1. Click on the markers to select it
    2. Drag the marker around in the image so that it is in the correct position again
    3. move the timeline slider a bit back to check if the marker position follows the hand well, if not repeat these three steps and adjust further
8. Before moving to the next video, use the timeline once more to visuall inspect that all the markers follow their intended location.
9. Use this method to annotate all videos
    

In [5]:
gui

VBox(children=(HBox(children=(IntSlider(value=0, description='Frames:', layout=Layout(width='100%'), max=222),…

**Important:** This step is crucial for good performance of your machine learning model. ML is not black magic, and your model can only ever be as good as the data you use for training. If you are sloppy here, you will not have high performance later. The saying "trash in, trash out" is very true.

Once the annotation has been completed for each video, you can generate a .csv file with the marker positions.

Run the two segments below to generate the file in your working directory.

In [18]:
last_name = "Miah"
first_name="Shafi"
student_number=9111131356

In [19]:
state.createCsv(last_name, first_name, student_number, conditions)

**Important:** Please modify the block above, so that the values match your last name, first name and student number. They will be used to generate the .csv file.

Upload the .csv file to studentportalen.

# Task 3 - Annotated Videos

Write code that reads the CSV file you just generated into a numpy array. (You will do the same later when you load the data to train a model)

Then, load each of the 6 videos from task 1 and perform the following steps (pseudo code):
```
for each video:
    open a new_video file called <video name>_annotated
    for each frame in video:
        place a label into the top left corner that reads "annotated"
        get the corresponding landmark positions from the CSV file
        draw a circle at each (non-missing) landmark position
        for each pair of visible landmarks connected by the skeleton:
            draw a line between the two landmarks using cv2.line
        write the frame into the new_video
    close the new_video
```

Name the new resulting videos `<video name>_annotated` where `<video name>` is the name of the pose, e.g., `fist_palm_annotated` for the annotated video of the fist facing the camera palmside. Then, create a .zip file named 
`<last name>_<first name>_<student number>_videos_annotated.zip`, replacing the tags as you did in task 1, and upload it to studentportalen.


In [23]:
conditions = ["open_palm",
             "open_dorsal",
             "fist_palm",
             "fist_dorsal",
             "three_fingers_palm",
             "three_fingers_dorsal"]
#Read csv Data
import csv
data_path = 'Miah_Shafi_9111131356_annotations.csv'
with open(data_path, 'r') as f:
    reader = csv.reader(f, delimiter=',')
    headers = next(reader)
    csv_data = np.array(list(reader)).astype(str)
#End read csv data
#Create video
for video_name in conditions:
    input_video = cv2.VideoCapture(video_name + ".mp4")
    output_file_name = video_name + "_annotated.mp4"
    backend = cv2.CAP_ANY
    fourcc_code = cv2.VideoWriter_fourcc(*"H264")
    fps = 24
    frame_size = (640, 480)
    output_video = cv2.VideoWriter(output_file_name, backend, fourcc_code, fps, frame_size)

    ret, frame = input_video.read()
    counter = 0
    while ret:
        ret, frame = input_video.read()
        if not ret:
            continue
            
        data_frame = csv_data[np.where((csv_data[:,0] == video_name + '.mp4') * (csv_data[:,1] == str(counter)))]
        cv2.putText(frame, "annotated", (5, 50),cv2.FONT_HERSHEY_SIMPLEX, 2, (128,0,0), 2)
        #print a circle
        for indexval in range(4,84,2):
            xpos =int(float(data_frame[0][indexval]))
            ypos = int(float(data_frame[0][indexval + 1]))
            cv2.circle(frame,(xpos, ypos), 3, (128,0,50),5)
        #palm draw line
        #bura angul
        xposroot =int(float(data_frame[0][4]))
        yposroot = int(float(data_frame[0][5]))
        number = 6
        for indexval in range(6,9,2):
            xposth1 =int(float(data_frame[0][indexval]))
            yposth1 = int(float(data_frame[0][indexval + 1]))
            xposth2 =int(float(data_frame[0][indexval+2]))
            yposth2 = int(float(data_frame[0][indexval + 3]))
            if (xposth1 != 0 and yposth1 != 0):
                        if (xposth2 != 0 and yposth2 != 0):
                            cv2.line(frame, (xposth1, yposth1), (xposth2, yposth2), (128, 0, 10), 1)
            if indexval == number :
                number += 6
                if (xposroot != 0 and yposroot != 0):
                    if (xposth1 != 0 and yposth1 != 0):
                        cv2.line(frame, (xposroot, yposroot), (xposth1, yposth1), (128, 0, 10), 1)
        #run this 4 time
        for i in range(0,4,1):
            for indexval in range(number,number + 6,2):
                xposth1 =int(float(data_frame[0][indexval]))
                yposth1 = int(float(data_frame[0][indexval + 1]))
                xposth2 =int(float(data_frame[0][indexval+2]))
                yposth2 = int(float(data_frame[0][indexval + 3]))
                if (xposth1 != 0 and yposth1 != 0):
                        if (xposth2 != 0 and yposth2 != 0):
                            cv2.line(frame, (xposth1, yposth1), (xposth2, yposth2), (128, 0, 10), 1)
                if indexval == number :
                    number += 8
                    if (xposroot != 0 and yposroot != 0):
                        if (xposth1 != 0 and yposth1 != 0):
                            cv2.line(frame, (xposroot, yposroot), (xposth1, yposth1), (128, 0, 10), 1)
        #now it will come to dorsal side
        xposroot =int(float(data_frame[0][number]))
        yposroot = int(float(data_frame[0][number + 1]))
        number += 2
        for indexval in range(number,number + 4,2):
            xposth1 =int(float(data_frame[0][indexval]))
            yposth1 = int(float(data_frame[0][indexval + 1]))
            xposth2 =int(float(data_frame[0][indexval+2]))
            yposth2 = int(float(data_frame[0][indexval + 3]))
            if (xposth1 != 0 and yposth1 != 0):
                        if (xposth2 != 0 and yposth2 != 0):
                            cv2.line(frame, (xposth1, yposth1), (xposth2, yposth2), (128, 0, 10), 1)
            if indexval == number :
                number += 6
                if (xposroot != 0 and yposroot != 0):
                    if (xposth1 != 0 and yposth1 != 0):
                        cv2.line(frame, (xposroot, yposroot), (xposth1, yposth1), (128, 0, 10), 1)
        #run this 4 time
        for i in range(0,4,1):
            for indexval in range(number,number + 6,2):
                xposth1 =int(float(data_frame[0][indexval]))
                yposth1 = int(float(data_frame[0][indexval + 1]))
                xposth2 =int(float(data_frame[0][indexval+2]))
                yposth2 = int(float(data_frame[0][indexval + 3]))
                if (xposth1 != 0 and yposth1 != 0):
                        if (xposth2 != 0 and yposth2 != 0):
                            cv2.line(frame, (xposth1, yposth1), (xposth2, yposth2), (128, 0, 10), 1)
                if indexval == number :
                    number += 8
                    if (xposroot != 0 and yposroot != 0):
                        if (xposth1 != 0 and yposth1 != 0):
                             cv2.line(frame, (xposroot, yposroot), (xposth1, yposth1), (128, 0, 10), 1)
        output_video.write(frame)
        counter += 1
    input_video.release()
    output_video.release()

#
#data_specific

In [71]:
for i in range(0,10,1):
    print(i)

0
1
2
3
4
5
6
7
8
9
