# Human Attention Annotation: Painting by Hand

## Sample Selection

Each annotator has up to 12 5-second snippets to annotate. These have been selected from videos that are:

1. In the test fold for the learning algorithm, and
2. In the unseen fold for the corresponding annotator.

The snippets have been chosen for each task (people, eggs, drums) and each variable (attending, participating), choosing when possible two samples that the remaining annotators agreed on.

Because of annotator disagreements in the original ELAN phase, not every annotator will have to cover 12 samples in this painting phase. Just work through the examples marked with your initials.

## Instructions

1. Select a sample marked with your initials from the drop-down menu.
2. Watch the 5s snippet (use the play button).
3. Classify the sample using the label buttons.
4. Paint on the video using your mouse to select the most important moments and image details for your decision (see **Controls** section below).
5. **DON'T FORGET TO SAVE!**
6. Repeat until all videos are annotated.
7. Send the data to Marc, as usual.

## Controls

* **Primary button press:**   Add attention to an area you deemed important (doesn't accumulate if the mouse is static! move your mouse slightly to increase intensity).
* **Secondary button press:** Delete attention from an area (like with the primary button, you need to move to keep erasing).
* **Mouse wheel:**            Change the size of the "paint brush" (represented by the in-screen circle).

## Attention Target

The learning algorithms have been trained on *binarized* versions of our annotations (they just consider if the child is participating at all vs. not participating, or equivalently if the child is attending at all vs. not attending). From the completely trained networks, we use *attention* algorithms to mark different parts of the video as more or less relevant for the final decision.

Our painted annotations are a human baseline to compare against the attention algorithm. Hence, we should mark any moments and/or frame regions that we consider important to determine (for us as humans who have some expertise with this dataset) if the child is attending (resp. participating) or not.

In case of doubt, look at the sample information in the drop-down menu to determine which variable you should paint attention for.

In [1]:
%matplotlib widget

In [2]:
import os
from IPython.display import Video
from pathlib import Path
from local.attention_painting import Annotator
from local.navigation import get_repo_root

In [3]:
script_dir = Path(os.getcwd())
repo_root = get_repo_root(script_dir)
os.chdir(repo_root)

print(f"Repo Root  (absolute)         : {repo_root}")
print(f"Script Dir (relative to repo) : {script_dir.relative_to(repo_root)}")

Repo Root  (absolute)         : /home/marcfraile/Documents/PhD/self-study/infant-engagement
Script Dir (relative to repo) : scripts/human_attention


In [4]:
SNIPPET_DURATION : float =  5.0 # seconds.

In [5]:
vid_root        = Path("data/processed/video/")
annotation_root = Path("data/processed/human_attention/")
snippet_file    = Path("data/processed/human_attention/candidate_snippets.csv")

assert vid_root.is_dir()
assert annotation_root.is_dir()
assert snippet_file.is_file()

## Machine Attention Example

Below you can see an example of a machine attention algorithm (*guided grad-CAM*) targeting a positive "participating" example (which was correctly classified as positive). This can give us an approximate reference of how much and how thick we should paint attention wherever we consider it reasonable.

In [None]:
Video(url="machine_attention.mp4", width=160*3, height=160*3)

## Human Attention Example

Below you can see an example of a painted annotation by me. The target here was "attending". According to what we discussed in the ELAN annotation phase, I considered the fact that the child was participating (arm movement) as secondary proof (not very intense). The gaze from the child to the experimenter towards the end was seen as strong evidence of attention.

For this, I used the default brush size, but feel free to adjust it to your liking. I also had some amount of painting on every frame, but it might make sense to leave most frames empty (more similar to the machine example above).

In [None]:
Video(url="human_attention.mp4", width=160*3, height=160*3)

## Annotation Tool

Time to paint!

In [6]:
annotator = Annotator(SNIPPET_DURATION, vid_root, annotation_root, snippet_file)
annotator.display()

VBox(children=(VBox(children=(Canvas(capture_scroll=True, footer_visible=False, header_visible=False, layout=L…