# Gestural Backchannel

In addition to verbal backchannels such as humming, we believe that there are certain gestures that indicate the listener's attention to (and even possibly sentiments towards) the speech. Although these gestures may differ in diffrent cultures, given that almost all of our participants are native to the United States of America, we shall only consider backchannel gestures in the American culture. In particular, we consider a participant to be giving a **gestural backchannel** at a given frame if the participant is either *nodding* or *shaking their head*.

Note that we do not distinguish the types of the gestural backchannel. In particular, we will not record nodding and headshaking separately. This is because we are primarily concerned with how frequent the listener is responding to the speaker in *some* way, rather than *in what way* the listener is responding to the speaker, which may be a helpful feature in tasks such as sentiment analysis. Thus, we will record the frequency of the backchannels without their types.

## Detecting Head Motions

Our collected raw data includes a triple $(roll, pitch, yaw)$ of angles to represent each participant's head direction. More specifically, the roll angle is indicative of the motion of tilting one's head towards the shoulders. The pitch angle records lowering and raising one's head as in a nodding motion, and the yaw angle contains information about turning one's head to the left and right. We will utilize these angles to detect head motions of nodding and headshaking.

We need first define the head motions of interest in terms of the three head direction angles. Intuitively, in a nodding motion, the pitch angle of the head must follow a fluctuating pattern before eventually stabilizing, and so does the yaw angle in a headshaking motion. Indeed, this is the most widely accepted definition of head motions in relevant works, and we shall adopt the same characterization. 

Now, to detect head motions in our data, we will adopt a hybrid of programmatic and manual detection of nodding and headshaking. For the programmatic portion, we will employ (with modifications) the straightforward algorithms proposed in this paper. Although this paper is aimed to deal with more primitive forms of data - that is, 2D images of participants' face rather than 3D head angles, we discovered that the approach is simple but captures the essence of the motions rather well and decided that it could be adapted to our data. 

Note that the algorithms for detecting nodding and headshaking are almost identical except for the head angles used. Below we shall only introduce the headshaking detection algorithm we will use. Replacing all occurrences of "yaw angle" in the headshaking detection algorithm with "pitch angle" will result in the nodding detection algorithm.

## Detection Algorithm

The algorithm for headshaking detection we will use, which is based on the same algorithm proposed in this paper and with modifications as we saw fit, is based on the following intuitions: 
- the start and the end of a headshaking motion must be "stable";
- the head must reach a leftmost angle and a rightmost angle during a headshaking motion;
- the difference between the leftmost angle and the rightmost angle must be somewhat "significant".

Now, following the described intuitions, we are ready to introduce the entire programmatic procedure more rigorously. (Footnote: Note that although we have three participants in our data, we will define the following terminologies and procedures for only one participant, and our data will eventually be processed for one participant at the same for a total of three participants.)


**Definition**: We define the **neighboring frames** of a given frame to be the collection of all frames that are no more than two frames away from the given frame. For example, with $n$ denoting the total number of frames (Footnote: index of frames starts with 0). For $i$ with $i \in \mathbb{Z}^+, 2 \leq i \leq n -3$, the **neighboring frames** of the $i$-th frame $F_i$ is the set $\{F_{i-2}, F_{i-1}, F_i, F_{i+1}, F_{i+2}\}$. For other valid frame indices $i$, some elements in the above set may not exist due to invalid frame indices, in which case we will simply discard them from the set and keep the valid elements.

**Definition**: Given a frame that is neither the starting frame nor the ending frame of the data, the frame is called **stable** if the difference between the maximum and the minimum of the *yaw* angle in the neighboring frames is less than or equal to $5$ (degrees). The frame is called **extreme** if the yaw angle at this frame achieves an extremum in the neighboring frames, and the frame is called **transient** if it is neither stable nor extreme. Finally, the starting and the ending frames are categorized as having none of these three states (since one-sided neighbors alone do not contain sufficient information for a state of motion to be defined).

**Algorithm**: Given two stable frames, we determine that a headshaking motion has occurred between these two frames if 1) there are at least two extreme frames between the two given stable frames; and 2) for all such extreme frames, each pair of adjacent extreme frames satifsfies that the difference of the yaw angles at these two extreme frames is larger than $5$ (degrees).

Note that the "significant difference" threshold (5 degrees) defined above is an empirical value. In fact, after implementing and experimenting with different threshold values, we manually check the original video footage to determine a decent value to characterize the motions and manually fix erroneously detected motions.

In [2]:
import pandas as pd
import numpy as np

In [10]:
# all_date=["12-15-2021","01-28-2022","02-11-2022","03-04-2022","03-05-2022","03-11-2022","03-12-2022"]
# all_date=all_date[2:7]
# all_sessions=[['1','2','3','4'],['1','2','3','4','5','6'],['2','3'],['1','2','3'],['1','2'],['1','2','3'],['1','2','3']]
# all_sessions = all_sessions[2:7]

all_date=["02-11-2022","03-04-2022","03-05-2022","03-12-2022"]
all_sessions=[['2','3'],['1','2','3'],['1','2'],['1','2','3']]

headshaking_threshold = 2
nodding_threshold = 2

In [11]:
all_date

['02-11-2022', '03-04-2022', '03-05-2022', '03-12-2022']

In [12]:
all_sessions

[['2', '3'], ['1', '2', '3'], ['1', '2'], ['1', '2', '3']]

#### 1. Assigning Each Frame a State Based on the Neighboring Frames

In [13]:
# angle_name should be "yaw" for headshaking and "pitch" for nodding
# p_id is an integer 1, 2, 3 indicating the id of the participant
def assign_states(angle_name, p_id, state_name, threshold):
    
    df = head_angles_for_three[p_id - 1]
    angle_col_index = df.columns.get_loc(angle_name) # Angle column index

    df[state_name] = "" 
    state_col_index = df.columns.get_loc(state_name)

    num_frames = len(df) # Note that the length is the same for three dataframes = the number of frames.

    for i in range(num_frames):
        # Starting and ending frames have no states
        if (i == 0 or i == num_frames - 1):
            df.iloc[i, state_col_index] = None 
        else:
            # Frames that are no more than two frames away from the given frame
            min_index = max(0, i - 2)
            max_index = min(num_frames - 1, i + 2)
            neighboring_frames = df.iloc[range(min_index, max_index + 1), :]

            # Determine states
            max_angle = max(neighboring_frames[angle_name])
            min_angle = min(neighboring_frames[angle_name])

            if (max_angle - min_angle <= threshold):
                df.iloc[i, state_col_index] = "stable"
            elif (max_angle == df.iloc[i, angle_col_index] or min_angle == df.iloc[i, angle_col_index]):
                df.iloc[i, state_col_index] = "extreme"
            else:
                df.iloc[i, state_col_index] = "transient"

#### 2. Implement the Algorithmic Procedure

In [14]:
def gestural_algorithm(angle_name, p_id, state_name, movement_name, threshold):

    df = head_angles_for_three[p_id - 1]

    df[movement_name] = 0
    movement_col_index = df.columns.get_loc(movement_name)

    stable_frames = df[df[state_name] == "stable"].reset_index(drop = True)
    num_stable_frames = len(stable_frames)
    angle_col_index = df.columns.get_loc(angle_name) # Angle column index    

    # Given two stable frames...
    for index in range(len(stable_frames)):
        if (index == num_stable_frames - 1):
            break
        else:
            frame_range_between_stables = range(stable_frames["frame"][index], stable_frames["frame"][index + 1] + 1)

            # Count the number of extreme states between the two stable frames
            extreme_count = 0
            for i in frame_range_between_stables:
                if (df[state_name][i] == "extreme"):
                    extreme_count = extreme_count + 1

            if (extreme_count >= 2):
                extreme_frames_between = df[df["frame"].isin(frame_range_between_stables)][df[state_name] == "extreme"].reset_index(drop = True)

                # For each pair of adjacent extreme frames...
                headshaking = True
                for j in range(len(extreme_frames_between) - 1):
                    if (j != len(extreme_frames_between) - 1 and abs(extreme_frames_between.iloc[j, angle_col_index] - extreme_frames_between.iloc[j + 1, angle_col_index]) <= threshold):
                        headshaking = False
                        break
            
                if headshaking:
                    # Then headshaking has occurred between the two stable frames
                    df.iloc[frame_range_between_stables, movement_col_index] = 1      

In [15]:
for idd, all_session in enumerate(all_sessions):
    date=all_date[idd]
    for session in all_session:
        # Load Data
        print(date)
        df_1 = pd.read_table("../../Turn taking and keeping/3people/" + date + "/Mocap/Separate/Session_" + session + "_PC_1_mocap_data.txt", delimiter = " ", header = None)
        df_2 = pd.read_table("../../Turn taking and keeping/3people/" + date + "/Mocap/Separate/Session_" + session + "_PC_2_mocap_data.txt", delimiter = " ", header = None)
        df_3 = pd.read_table("../../Turn taking and keeping/3people/" + date + "/Mocap/Separate/Session_" + session + "_PC_3_mocap_data.txt", delimiter = " ", header = None)

        # Remove the excess space which formed a column
        # There should be 1 + 3 * 7 columns in total, with the first column being index of frames, and the 7 groups of three being angles in x, y, and z directions.
        # The Head Angles
        p1_head_angles = df_1.iloc[:, 30:33]
        p2_head_angles = df_2.iloc[:, 30:33]
        p3_head_angles = df_3.iloc[:, 30:33]

        # Get head angle data from the original df
        # Each participant's head angle df
        p1_head_angles.columns = ["roll", "pitch", "yaw"]
        p2_head_angles.columns = ["roll", "pitch", "yaw"]
        p3_head_angles.columns = ["roll", "pitch", "yaw"]
        # col_indices = [0, 30, 31, 32]
        # head_angles = df.iloc[:, col_indices]
        # head_angles.columns = ["frame", "roll", "pitch", "yaw"]
        # head_angles["frame"] = head_angles["frame"] - 1
        p1_head_angles["frame"] = p1_head_angles.index+1
        p2_head_angles["frame"] = p2_head_angles.index+1
        p3_head_angles["frame"] = p3_head_angles.index+1

        # Create a list for them for convenience
        head_angles_for_three = [p1_head_angles, p2_head_angles, p3_head_angles]
        for i in [1, 2, 3]:
            assign_states("yaw", i, "headshaking_state", headshaking_threshold)
            gestural_algorithm("yaw", i, "headshaking_state", "headshaking", headshaking_threshold)
            
        for i in [1, 2, 3]:
            assign_states("pitch", i, "nodding_state", nodding_threshold)
            gestural_algorithm("pitch", i, "nodding_state", "nodding", nodding_threshold)
            
        AudioAndGaze = pd.read_csv("../Training/AudioAndGaze/" + date + "_Session_" + session + "_audio_gaze.csv")

        for i in range(3):
            headshaking_col_name = "p" + str(i + 1) + "_headshaking"
            nodding_col_name = "p" + str(i + 1) + "_nodding"

            AudioAndGaze[headshaking_col_name] = head_angles_for_three[i]["headshaking"]
            AudioAndGaze[nodding_col_name] = head_angles_for_three[i]["nodding"]
            
        AudioAndGaze.to_csv("../Training/AudioGazeGBack/" + date + "_Session_" + session + "_audio_gaze_gback.csv", index = False)

02-11-2022




02-11-2022
03-04-2022
03-04-2022
03-04-2022
03-05-2022
03-05-2022
03-12-2022
03-12-2022
03-12-2022


In [35]:
# date = "01-28-2022"
# session = "4"

In [36]:
# # Load Data
# df_1 = pd.read_table("../../Turn taking and keeping/3people/" + date + "\Mocap\Separate\Session_" + session + "_PC_1_mocap_data.txt", delimiter = " ", header = None)
# df_2 = pd.read_table("../../Turn taking and keeping/3people/" + date + "\Mocap\Separate\Session_" + session + "_PC_2_mocap_data.txt", delimiter = " ", header = None)
# df_3 = pd.read_table("../../Turn taking and keeping/3people/" + date + "\Mocap\Separate\Session_" + session + "_PC_3_mocap_data.txt", delimiter = " ", header = None)

# # Remove the excess space which formed a column
# # There should be 1 + 3 * 7 columns in total, with the first column being index of frames, and the 7 groups of three being angles in x, y, and z directions.
# # The Head Angles
# p1_head_angles = df_1.iloc[:, 30:33]
# p2_head_angles = df_2.iloc[:, 30:33]
# p3_head_angles = df_3.iloc[:, 30:33]

# # Get head angle data from the original df
# # Each participant's head angle df
# p1_head_angles.columns = ["roll", "pitch", "yaw"]
# p2_head_angles.columns = ["roll", "pitch", "yaw"]
# p3_head_angles.columns = ["roll", "pitch", "yaw"]
# # col_indices = [0, 30, 31, 32]
# # head_angles = df.iloc[:, col_indices]
# # head_angles.columns = ["frame", "roll", "pitch", "yaw"]
# # head_angles["frame"] = head_angles["frame"] - 1
# p1_head_angles["frame"] = p1_head_angles.index+1
# p2_head_angles["frame"] = p2_head_angles.index+1
# p3_head_angles["frame"] = p3_head_angles.index+1

# # Create a list for them for convenience
# head_angles_for_three = [p1_head_angles, p2_head_angles, p3_head_angles]
# AudioAndGaze = pd.read_csv("../Training/AudioAndGaze/" + date + "_Session_" + session + "_audio_gaze.csv")

### Get Headshaking Data

In [37]:
# for i in [1, 2, 3]:
#     assign_states("yaw", i, "headshaking_state", headshaking_threshold)
#     gestural_algorithm("yaw", i, "headshaking_state", "headshaking", headshaking_threshold)

### Get Nodding Data

In [38]:
# for i in [1, 2, 3]:
#     assign_states("pitch", i, "nodding_state", nodding_threshold)
#     gestural_algorithm("pitch", i, "nodding_state", "nodding", nodding_threshold)

In [39]:
# p1_head_angles.index[p1_head_angles['nodding']==1].tolist()

### Attach to AudioAndGaze

In [40]:
# # AudioAndGaze = pd.read_csv("../Training/AudioAndGaze/" + date + "_Session_" + session + "_audio_gaze.csv")

# for i in range(3):
#     headshaking_col_name = "p" + str(i + 1) + "_headshaking"
#     nodding_col_name = "p" + str(i + 1) + "_nodding"

#     AudioAndGaze[headshaking_col_name] = head_angles_for_three[i]["headshaking"]
#     AudioAndGaze[nodding_col_name] = head_angles_for_three[i]["nodding"]

In [41]:
# AudioAndGaze.to_csv("../Training/AudioGazeGBack/" + date + "_Session_" + session + "_audio_gaze_gback.csv", index = False)