# Assignment 1

__Note__: *For general instructions about Assignment 1 as well as deadlines, please refer to the acompanying PDF document on Studium.*

### Setup

This notebook is there to help you complete Assignment 1. It contains 3 sections (one for each task of the assignment) that guide you through the processes of completing the assignment. First import the required libraries:

In [1]:
%matplotlib widget

In [2]:
import cv2
import numpy as np
from ipywebrtc import CameraStream, VideoRecorder
from src.app import AnnotationApp
from src.landmark_names import landmark_names
from imageio import v3 as iio
from numpy.lib.stride_tricks import as_strided
import matplotlib.pyplot as plt
from src.skbot_replacement import linear_trajectory
from zipfile import ZipFile
from IPython.display import YouTubeVideo

In [3]:
gestures = [
    "ASL_letter_A",
    "ASL_letter_B",
    "ASL_letter_C",
    "ASL_letter_L",
    "ASL_letter_R",
    "ASL_letter_U",
]

### Metadata

This notebook will generate the files you have to upload in completion of Task 1 and Task 2. In order to ensure that these files will be named correctly, update the variables below with the appropriate values:

In [4]:
# update the variables below to the correct values
last_name = "HO"
first_name = "VU"
student_number = "20010502-T018"

# Task 1 - Anonymized Videos

### Recording the Gestures

Task 1 asks you to record a total of 6 videos of your hand (6 seconds per video), with each video showing a different gesture. The required gestures change from year to year, so please refer to the instruction document for Assignment 1 for the exact gestures that you will have to record.

Currently, the overarching theme is american sign language (ASL) and below you can find a visualization of the ASL alphabet:

![ASL Alphabet](https://t3.ftcdn.net/jpg/02/42/65/88/360_F_242658839_l38u5LVWSROrFMLQasxLJgl6sk8yAq9m.jpg)
(Source: Adobe Stock)

**Important:** While recording the videos move your hand around slowly. For this, pick a constant direction (in 3D) and move your hand slowly in that direction. This is done to create more diverse training data. The hand can be anywhere on the screen, arbitrarily rotated, and at any distance from the camera (as long as it is still clearly identifiable as a hand). The broader the set of examples, the more robust the trained model will be.

In [5]:
YouTubeVideo("VSu_Nkn7TzY")

_Example 1_: Place your hand on the left side of the image, show the gesture and start recording. Then - over the course of 6 seconds - slowly move your hand to the right side of the image.

In [6]:
YouTubeVideo("ZEdfzmMntHs")

_Example 2_: Place your hand in the bottom of the image, show the gesture and start recording. Then - over the course of 6 seconds - move your hand up towards the top of.

*__Note__: We will aggregate the videos from all assignments into a large dataset that you will use during the final project. For this you will anonymize the recorded videos in the next step by blurring your face. If you feel that this level of anonymization is insufficient, you can avoid recording your face or use another means of anonymization.*

In [5]:
camera = CameraStream(constraints=
                      {'facing_mode': 'user',
                       'audio': False,
                       'video': { 'width': 640, 'height': 480}
                       })
camera

CameraStream(constraints={'facing_mode': 'user', 'audio': False, 'video': {'width': 640, 'height': 480}})

In [6]:
recorder = VideoRecorder(stream=camera)
recorder

VideoRecorder(stream=CameraStream(constraints={'facing_mode': 'user', 'audio': False, 'video': {'width': 640, …

In [7]:
camera.close()

Once you are happy with the video save it as a MP4 file. The filename should be the name of the gesture that has been recorded. For example, if you recorded the letter A, then the filename would be `ASL_letter_A.mp4`.

In [60]:
gesture_name = "ASL_letter_A"

In [61]:
with iio.imopen(recorder.video.value, "r", format="ffmpeg") as video_in:
    frames = video_in.read()

iio.imwrite(f"{gesture_name}.mp4", frames, fps=30)

**Please review the videos before you continue** and ensure that:

-	The name of the file must match the gesture shown in the video
-	The video must be in MP4 format
-	The gesture is shown for at least 6 seconds
-	The hand showing the gesture is not occluded by any objects
-	The hand showing the gesture does not occlude your face (if present)
-	The video must be anonymized

If your video does not satisfy the above, __we will reject it__ and you will have to record it again (and redo the annotations for that video).

### Anonymizing the Video

Run the snippet below. It uses the face detection method presented in lab 1 to track your face and then applies gaussian blur to it.

In [8]:
for condition in gestures:
    frames = iio.imread(f"{condition}.mp4")
    face_cascade = cv2.CascadeClassifier('frontal_face_features.xml')

    # find the face in each frame (only keep exact matches)
    timestamps = list()
    face_coords = list()
    for idx, frame in enumerate(frames):
        faces = face_cascade.detectMultiScale(frame)
        if len(faces) == 1:
            timestamps.append(idx)
            face_coords.append(faces[0])
            
    if timestamps == 0:
        # no faces detected
        iio.imwrite(f"{condition}_anonymous.mp4", frames, fps=30)
        continue
    
    # Track the faces across frames and smooth the track
    if timestamps[0] != 0:
        timestamps.insert(0, 0)
        face_coords.insert(0, face_coords[0])
    if timestamps[-1] != len(frames):
        timestamps.append(len(frames))
        face_coords.append(face_coords[-1])
    t_control = np.stack(timestamps)
    face_coords = np.stack(face_coords)  
    face_coords_interpolated = linear_trajectory(np.arange(len(frames)), face_coords, t_control=t_control)
    face_coords_interpolated = np.round(face_coords_interpolated).astype(int)
    
    old_shape = face_coords_interpolated.shape
    shape = (old_shape[0] - 5, old_shape[1], 5)
    old_strides = face_coords_interpolated.strides
    strides = (*old_strides, old_strides[0])    
    face_coords_smoothed = np.round(as_strided(face_coords_interpolated, shape, strides).mean(-1)).astype(int)

    # blurr the tracked area
    # 1. extract the face ROI in each frame
    # 2. apply incrementally apply mild blurs to incrementally larger areas
    #    (this gives the final result a smoother look)
    for frame, pos in zip(frames, face_coords_smoothed): 
        for partition in range(3, 10):
            inner = pos.copy()
            inner[0] = pos[0] + 1/partition * pos[2]
            inner[1] = pos[1] + 1/partition * pos[3]
            inner[2] = (partition-2)/partition * pos[2]
            inner[3] = (partition-2)/partition * pos[3]

            (x, y, w, h) = inner
            roi = frame[y:y+h, x:x+w]   
            roi = cv2.GaussianBlur(roi,(21, 21), 50)
            frame[y:y+h, x:x+w] = roi

        x, y, w, h = pos
        roi = frame[y:y+h, x:x+w]   
        roi = cv2.GaussianBlur(roi,(11, 11), 50)
        frame[y:y+h, x:x+w] = roi

    # write the frames to disk
    num_frames = len(face_coords_smoothed)
    iio.imwrite(f"{condition}_anonymous.mp4", frames[:num_frames], fps=30)

Inspect the anonymized videos and make sure that you are happy with the result of the anonymization. Also check that the gesture is still clearly visible and not affected by the annonymization. If not, please re-record the video in question and repeat the anonymization process.

### Prepare Task 1 for submission

After you have generated all the videos, group them into a zip file that can be uploaded to Studium. For this, make sure you have updated the values of the variables defined in the **Metadata** section above. Then, run the snippet below to generate the ZIP file.

In [5]:
with ZipFile(f"{last_name}_{first_name}_{student_number}_videos.zip", "w") as zipy:
    for condition in gestures:
        zipy.write(f"{condition}_anonymous.mp4", f"{condition}.mp4")

# Task 2 - Annotating the Videos

Annotate the videos that you have created in Task 1. This task will show you some of the challenges associated with "real world" ML projects and it tends to be the most disliked step in a ML project. It is, however, one of the most crutial ones, and the better the quality of your annotations is, the better your resulting model tends to be. To help you along in this process, we provide you with an annotation tool.

**Important:** This step is crucial for good performance of your machine learning model. ML is not black magic, and your model can only ever be as good as the data you use for training. If you are sloppy here, you will not have high performance later. The saying "trash in, trash out" is very true.

#### How to annotate the gestures:

*__Note__: If the annotation is lacking in accuracy, we will ask you to re-annotate the video before you can pass assignment 1.*

0. Only annotate <u>visible</u> joints.
1. Go to the first frame of the gesture.
2. Place the respective joint markers onto each joint according to the figure on the right.
3. Go to the last frame of the gesture.
4. Move the respective joint markers to their correct position in the last frame.
5. Go back to the first frame of the gesture.
6. Slowly move the "Frames" slider forward. Whenever you see that a marker deviates from the hand's joint, adjust it so that it follows the hand again.
7. After all videos are annotated, click the "Save CSV" button in the top left corner to create the CSV containing the annotations. It will be placed next to the notebook.

#### How to use this annotation tool:

1. Use the "Video" drop-down menu to select and jump between gestures (progress is preserved).
2. Move the "Frames" slider at the bottom to select individual frames (progress is preserved).
3. Click on the figure on the right, use the blue arrows on the bottom, or use the "Joint" drop-down menu to select a joint to annotate.
4. Click on the image on the left to place the respective joint marker. This will automatically jump to the next joint.
5. The marker matching the currently selected joint will turn red and you can drag it around in the image.
5. Use the blue buttons with label "Keyframe" to jump to the next (or previous) frame that had the current joint marker placed on it explicitly.
6. Use the red "🗑️ Keyframe" button to delete the annotation of the current joint at the current frame. This will have no effect if the current frame is not a keyframe.
7. Use the red "Reset Joint" button to delete all keyframes for the current joint.

#### Example Video of an Annotation

In [64]:
YouTubeVideo("9UYYc_L3E34")

#### The Annotation Tool

In [6]:
%%capture
gui = AnnotationApp(gestures, f"{last_name}_{first_name}_{student_number}")

In [7]:
gui

VBox(children=(HBox(children=(Button(button_style='success', description='Save CSV', icon='save', style=Button…

### Information Sheet and Consent Form

Please review (and sign) the __Information Sheet and Consent Form__ document that was distributed with this assignment.


# Task 3 - Annotated Videos

Visualize the annotations (CSV from task 2) on top of each gesture (video from task 1). For this, draw circles onto each frame at the position of each visible joint. Then, draw lines that connect all pairs of visible joints that are adjacent according to the skeleton. This should result in a skeleton drawn onto the frame similar to the skeleton you saw while annotating the videos. Finally, add the text "annotated" into the top left corner of the frame and save the frames of each gesture as a new video.

Here is the program/algorithm in pseudo-code:

```
for each video in gestures:
    for each frame in video:
        place a label into the top left corner that reads "annotated"
        get the corresponding joint positions from the CSV file
        draw a circle at each (non-missing) joint position
        for each pair of visible joints connected by the skeleton:
            draw a line between the two joints using cv2.line
    create a new_video file called <gesture>_annotated from the annotated frames
```


In [5]:
import cv2
import mediapipe as mp
from imageio import v3 as iio
import os

In [13]:
import csv
name = "ASL_letter_U"
frames = iio.imread(f"{name}.mp4")
x = 120
y = 120
i = 0
j = 0

font = cv2.FONT_HERSHEY_SIMPLEX
color1 = (255, 0, 0)
color2 = (255, 255, 255)
    
for idx, frame in enumerate(frames):
    print(idx)
    with open('HO_VU_20010502-T018.csv', newline='') as f:
        reader = csv.reader(f)
        c = []
        for row in reader:
            #first row
            #print(row)
            if row[0] != "ID":
                r = int(float(row[2]))
            #print("r",r)
            if row[1] == name and r == idx:
                x = int(float(row[4]))
                y = int(float(row[5]))
                if x != 0:
                    cv2.circle(frame, (y, x), 5, color1, -1)
                    c.append(row)
    last = "99"
    #print(c)
    if c[0][3] =="root":
        rx = int(float(c[0][4]))
        ry = int(float(c[0][5]))
    for circle in c:
        to_check = circle[3]
        #print(to_check)
        if to_check != 'root':
            finger = to_check.split("_")
            if finger[1] == '1':
                end_point = (int(float(circle[5])),int(float(circle[4])))
                cv2.line(frame, (ry,rx), end_point, color2, 2)

            fingerindex = int(float(finger[1]))
            lastfingerindex = int(float(last[1]))
            
            offset = fingerindex - lastfingerindex
            if finger[0] == last[0] and offset == 1:                
                end_point = (int(float(circle[5])),int(float(circle[4])))
                start_point = (int(float(lastC[5])),int(float(lastC[4])))
                cv2.line(frame, start_point, end_point, color2, 2)
                                         
            last = finger
            lastC = circle
    cv2.putText(frame, 'Annotated', (10,50), font, 
                   1, color1, 1, cv2.LINE_AA)
iio.imwrite(f"{name}_annotated.mp4", frames, fps=30)

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
27

Once you are done, combine the annotated videos into a ZIP file using the snippet below.

In [14]:
with ZipFile(f"{last_name}_{first_name}_{student_number}_annotated.zip", "w") as zipy:
    for condition in gestures:
        zipy.write(f"{condition}_annotated.mp4")

## Upload to Studium

Once you are done, upload the following files to Studium:

1. The filled out and signed *Information Sheet and Consent Form*
2. The ZIP file with the anonymous videos (`<last name>_<first name>_<student number>_videos.zip`)
3. The CSV file containing the annotations (`<last name>_<first name>_<student number>.csv`)
4. the ZIP file with the annotated videos (`<last name>_<first name>_<student number>_videos_annotated.zip`)