Install dependencies required for this notebook:

In [1]:
%pip install pandas

[0m[33mDEPRECATION: textract 1.6.5 has a non-standard dependency specifier extract-msg<=0.29.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of textract or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063[0m[33m
[0mNote: you may need to restart the kernel to use updated packages.


In [2]:
import pandas as pd

# import motion learning toolbox
import motion_learning_toolbox as mlt

Load data. The following methods expect positional columns to follow the pattern `<joint>_pos_<x/y/z>`, and orientational columns to follow the pattern `<joint>_rot_<x/y/z/w>` (order doesn't matter).

[`examples/data.csv`](examples/data.csv) contains raw data from a HTC Vive Pro setup; specifically, it is a sample from the [Who Is Alyx?](https://github.com/cschell/who-is-alyx) dataset.

In [3]:
time_unit = "ms" # if you have seconds, change to "s"
input_data = (pd.read_csv("examples/data.csv")
                # we cast the timestamp column into the timedelta type and set it as index; this is required for the resampling step.
                .assign(timestamp = lambda df: pd.to_timedelta(df["timestamp"], unit=time_unit)).set_index("timestamp"))
input_data.head()

Unnamed: 0_level_0,hmd_pos_x,hmd_pos_y,hmd_pos_z,hmd_rot_w,hmd_rot_x,hmd_rot_y,hmd_rot_z,left_controller_pos_x,left_controller_pos_y,left_controller_pos_z,...,left_controller_rot_x,left_controller_rot_y,left_controller_rot_z,right_controller_pos_x,right_controller_pos_y,right_controller_pos_z,right_controller_rot_w,right_controller_rot_x,right_controller_rot_y,right_controller_rot_z
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0 days 00:00:00,2.9,161.9,-17.27,0.65,-0.23,0.7,0.2,-16.9,141.36,-17.43,...,0.09,0.23,-0.72,-31.63,136.43,-10.8,0.4,0.32,0.83,-0.24
0 days 00:00:00.012000,2.92,161.91,-17.28,0.65,-0.23,0.7,0.2,-17.39,141.31,-17.03,...,0.09,0.24,-0.71,-31.6,136.44,-10.85,0.4,0.32,0.83,-0.24
0 days 00:00:00.024000,2.95,161.92,-17.3,0.65,-0.23,0.7,0.2,-17.85,141.23,-16.62,...,0.08,0.24,-0.71,-31.57,136.46,-10.91,0.4,0.32,0.83,-0.24
0 days 00:00:00.036000,2.96,161.93,-17.31,0.65,-0.22,0.7,0.2,-18.19,141.13,-16.25,...,0.08,0.24,-0.71,-31.53,136.49,-10.98,0.4,0.32,0.83,-0.24
0 days 00:00:00.049000,2.97,161.95,-17.33,0.65,-0.22,0.7,0.2,-18.47,141.04,-15.92,...,0.08,0.25,-0.71,-31.49,136.54,-11.05,0.4,0.32,0.82,-0.25


## Data Cleanup

### Resampling

It's a good idea to resample the tracking data to a constant framerate, as this ensures that input samples for machine learning models always represent the same time span. `resample` resamples position and rotation features in different ways:

- position features (that have `_pos_` in their column name) are interpolated linearily.
- rotation features (that have `_rot_` in their column name) are interpolated using Slerp.

In [4]:
resampled_data = mlt.resample(target_fps=5, data=input_data, joint_names=["hmd", "left_controller", "right_controller"])
resampled_data[:6]

Unnamed: 0_level_0,hmd_pos_x,hmd_pos_y,hmd_pos_z,left_controller_pos_x,left_controller_pos_y,left_controller_pos_z,right_controller_pos_x,right_controller_pos_y,right_controller_pos_z,hmd_rot_x,...,hmd_rot_z,hmd_rot_w,left_controller_rot_x,left_controller_rot_y,left_controller_rot_z,left_controller_rot_w,right_controller_rot_x,right_controller_rot_y,right_controller_rot_z,right_controller_rot_w
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0 days 00:00:00,2.9,161.9,-17.27,-16.9,141.36,-17.43,-31.63,136.43,-10.8,-0.229382,...,0.199462,0.648252,0.089915,0.229782,-0.719317,0.649383,0.318585,0.826331,-0.238939,0.398232
0 days 00:00:00.200000,2.773636,162.138182,-17.634545,-17.608182,140.045455,-12.906364,-31.045455,137.581818,-12.326364,-0.220331,...,0.190286,0.650977,0.119848,0.317784,-0.660974,0.669151,0.299491,0.81861,-0.269542,0.409305
0 days 00:00:00.400000,2.007333,162.458667,-18.248667,-16.302667,128.166,-10.926667,-31.300667,139.890667,-15.73,-0.220071,...,0.180058,0.670215,0.094477,0.439165,-0.46446,0.763207,0.229803,0.78464,-0.359691,0.449614
0 days 00:00:00.600000,1.09,163.02,-18.59,-11.43,126.29,-11.67,-33.98,141.48,-22.22,-0.21021,...,0.17017,0.690691,0.159864,0.479593,-0.409652,0.759355,-0.04003,0.680511,-0.56042,0.470353
0 days 00:00:00.800000,1.011111,163.07,-18.877778,-12.276667,125.222222,-10.984444,-34.968889,142.71,-24.751111,-0.209082,...,0.170159,0.700654,0.140049,0.500175,-0.390137,0.760266,-0.109974,0.62985,-0.62985,0.441002
0 days 00:00:01,0.9275,162.9675,-19.2675,-12.4975,126.175,-11.1525,-34.745,143.15,-25.135,-0.210147,...,0.170119,0.700491,0.149835,0.499451,-0.389572,0.759165,-0.107543,0.620262,-0.64027,0.440186


### Canonicalize Quaternions

A quaternion $q = a + b \mathbf{i} + c \mathbf{j} + d \mathbf{k}$ and its negation $-q = -a - b \mathbf{i} - c \mathbf{j} - d \mathbf{k}$ represent the same rotation in 3D space. This property is sometimes called the "double cover" feature of quaternions. When working with quaternions to represent rotations, it's common to normalize them to unit quaternions. In this context, $q$ and $-q$ will be antipodal points on the 4D unit sphere, both encoding the same rotation.

The double-cover property of quaternions can pose a challenge for machine learning models. For this reason, you can use `canonicalize_quaternions` to canonicalize quaternions. This method ensures that the scalar part (the "w" component) is always non-negative, so the same rotation will always yield the same values for x,y,z and w.

In [5]:
# for demonstration purposes, we just select the rotation columns of the hmd:
hmd_rot = input_data[["hmd_rot_x", "hmd_rot_y", "hmd_rot_z", "hmd_rot_w"]]

In [6]:
mlt.canonicalize_quaternions(hmd_rot, joint_names=["hmd"]).head()

Unnamed: 0_level_0,hmd_rot_x,hmd_rot_y,hmd_rot_z,hmd_rot_w
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0 days 00:00:00,-0.229382,0.698118,0.199462,0.648252
0 days 00:00:00.012000,-0.229382,0.698118,0.199462,0.648252
0 days 00:00:00.024000,-0.229382,0.698118,0.199462,0.648252
0 days 00:00:00.036000,-0.219901,0.699685,0.19991,0.649708
0 days 00:00:00.049000,-0.219901,0.699685,0.19991,0.649708


This produces the same result even when we provide the negated rotation:

In [7]:
mlt.canonicalize_quaternions(-hmd_rot, joint_names=["hmd"]).head()

Unnamed: 0_level_0,hmd_rot_x,hmd_rot_y,hmd_rot_z,hmd_rot_w
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0 days 00:00:00,-0.229382,0.698118,0.199462,0.648252
0 days 00:00:00.012000,-0.229382,0.698118,0.199462,0.648252
0 days 00:00:00.024000,-0.229382,0.698118,0.199462,0.648252
0 days 00:00:00.036000,-0.219901,0.699685,0.19991,0.649708
0 days 00:00:00.049000,-0.219901,0.699685,0.19991,0.649708


## Data Encoding

The core features of this library target the encoding of tracking data. Identifying users based on their motions usually starts with a raw stream of positional and rotational data, which we term scene-relative (SR) data. While SR data is informative, it includes information that can distort the learning objectives of identification models. For instance, SR data includes not just user-specific characteristics but also information about the user's arbitrary position in the VR scene—features that don't contribute to user identity.

The following methods are based on our paper ["Comparison of Data Representations and Machine Learning Architectures for User Identification on Arbitrary Motion Sequences"](https://ieeexplore.ieee.org/document/10024474) (also available on [Arxiv](https://arxiv.org/abs/2210.00527)), you can find more details about the exact methods and reasoning there.

### Body Relative (BR) Encoding

The body-relative encoding transforms the coordinate system to a frame of reference attached to a specific joint of the user, typically their head/HMD, thereby filtering out some of the scene-specific noise.

> Position and orientation of the wrists are defined with respect to the head, which results in the wrists being positioned and orientated independently of the user's original scene position and orientation. The head's positional features and the rotation around the up axis become obsolete and are therefore removed, which only leaves one quaternion encoding the head's rotation around the horizontal axes. This has the effect that the data yields the same values for the same movement (e.g., waving), even if the user changes position or orientation within the scene in between.
> BR data consists of 18 features: (pos-x, pos-y, pos-z, rot-x, rot-y, rot-z, rot-w) $\times$  (wrist-left, wrist-right) + (rot-x, rot-y, rot-z, rot-w) $\times$ (head), all given with respect to the user's head as frame of reference. 

Make sure to specify the correct coordinate system, the one below should be correct for tracking data from Unity-like (left-hand) systems where the y-axis points up.


In [8]:
br_data = mlt.to_body_relative(resampled_data,
                                 target_joints=["left_controller", "right_controller"],
                                 reference_joint="hmd",
                                 coordinate_system={
                                    "forward": "x",
                                    "right": "z",
                                    "up": "y",
                                 })

br_data.head()

Unnamed: 0,left_controller_pos_x,left_controller_pos_y,left_controller_pos_z,left_controller_rot_w,left_controller_rot_x,left_controller_rot_y,left_controller_rot_z,right_controller_pos_x,right_controller_pos_y,right_controller_pos_z,right_controller_rot_w,right_controller_rot_x,right_controller_rot_y,right_controller_rot_z,hmd_rot_w,hmd_rot_x,hmd_rot_y,hmd_rot_z
0,0.917561,-20.54,19.779375,0.279674,-0.460463,0.629508,-0.559889,8.339203,-25.47,34.126815,-0.326165,0.045569,0.857338,-0.395616,-0.061157,-0.012896,0.950715,0.303702
1,5.851482,-22.092727,20.088165,0.228996,-0.397806,0.704493,-0.541296,7.175955,-24.556364,33.472577,-0.313401,0.010005,0.859902,-0.4028,-0.061921,-0.013181,0.954679,0.290827
2,7.409258,-34.292667,18.274865,0.227103,-0.262559,0.850749,-0.394604,2.677522,-22.568,33.295607,-0.23898,-0.092839,0.872182,-0.416615,-0.01312,-0.027618,0.958632,0.283
3,6.381241,-36.73,12.802991,0.216412,-0.168021,0.871662,-0.406375,-5.118304,-21.54,34.883875,-0.131261,-0.41666,0.81676,-0.37692,0.034629,-0.034027,0.962109,0.268307
4,6.953341,-37.847778,13.802927,0.214708,-0.163732,0.884353,-0.380805,-8.353072,-20.36,35.486372,-0.107198,-0.510073,0.761382,-0.385528,0.061673,-0.036805,0.961003,0.267048


Note that BR data doesn't include the reference_joint's position anymore, as it would always be (0,0,0).

### Body-Relative Velocity (BRV) Data

Computes the derivative of BR data over time, isolating the velocity component to focus machine learning models on actual user motion. `to_velocity` computes velocities for position and rotation features in different ways:

- the velocity for position features (that have `_pos_` in their column name) is just the difference between the current and the previous frame
- for rotation features (that have `_rot_` in their column name) it computes the delta rotation between the current and the previous frame, which isn't the same as subtracting the raw values.

In [9]:
brv_data = mlt.to_velocity(br_data)

brv_data.head()

Unnamed: 0,delta_left_controller_pos_x,delta_left_controller_pos_y,delta_left_controller_pos_z,delta_right_controller_pos_x,delta_right_controller_pos_y,delta_right_controller_pos_z,delta_left_controller_rot_w,delta_left_controller_rot_x,delta_left_controller_rot_y,delta_left_controller_rot_z,delta_right_controller_rot_w,delta_right_controller_rot_x,delta_right_controller_rot_y,delta_right_controller_rot_z,delta_hmd_rot_w,delta_hmd_rot_x,delta_hmd_rot_y,delta_hmd_rot_z
0,,,,,,,0.279674,-0.460463,0.629508,-0.559889,-0.326165,0.045569,0.857338,-0.395616,-0.061157,-0.012896,0.950715,0.303702
1,4.933921,-1.552727,0.30879,-1.163249,0.913636,-0.654238,0.993769,-0.0595,0.079393,0.050796,0.999257,0.016163,-0.026176,-0.023214,0.999909,0.013451,0.000736,0.000799
2,1.557776,-12.199939,-1.8133,-4.498433,1.988364,-0.17697,0.969398,-0.152293,0.049679,0.186029,0.991771,0.03842,-0.109407,-0.054252,0.998667,0.010159,-0.042532,-0.027439
3,-1.028017,-2.437333,-5.471874,-7.795826,1.028,1.588269,0.995186,0.020425,0.054241,0.079027,0.939445,0.075855,-0.2193,-0.252185,0.998725,0.016473,-0.0436,-0.019368
4,0.5721,-1.117778,0.999937,-3.234768,1.18,0.602497,0.999582,-0.026804,0.001678,0.010712,0.993777,0.050191,-0.044007,-0.08917,0.999629,0.001738,-0.025269,-0.01001


### Scene-Relative Velocity (SRV) Data

If you use the raw tracking data as input for `to_velocity`, you get Scene-Relative Velocity data. Note that SRV is usually not a desirable input for identification models, as it still encodes session-specific information.

In [10]:
srv_data = mlt.to_velocity(resampled_data.reset_index())

srv_data.head()

Unnamed: 0,delta_hmd_pos_x,delta_hmd_pos_y,delta_hmd_pos_z,delta_left_controller_pos_x,delta_left_controller_pos_y,delta_left_controller_pos_z,delta_right_controller_pos_x,delta_right_controller_pos_y,delta_right_controller_pos_z,delta_hmd_rot_x,...,delta_hmd_rot_z,delta_hmd_rot_w,delta_left_controller_rot_x,delta_left_controller_rot_y,delta_left_controller_rot_z,delta_left_controller_rot_w,delta_right_controller_rot_x,delta_right_controller_rot_y,delta_right_controller_rot_z,delta_right_controller_rot_w
0,,,,,,,,,,-0.229382,...,0.199462,0.648252,0.089915,0.229782,-0.719317,0.649383,0.318585,0.826331,-0.238939,0.398232
1,-0.126364,0.238182,-0.364545,-0.708182,-1.314545,4.523636,0.584545,1.151818,-1.526364,0.013484,...,0.000499,0.999909,-0.059047,0.079382,0.051071,0.993783,0.016002,-0.026537,-0.022859,0.999258
2,-0.766303,0.320485,-0.614121,1.305515,-11.879455,1.979697,-0.255212,2.308848,-3.403636,0.008626,...,-0.013555,0.999642,-0.170928,0.058116,0.171056,0.968579,0.042357,-0.092684,-0.072908,0.992119
3,-0.917333,0.561333,-0.341333,4.872667,-1.876,-0.743333,-2.679333,1.589333,-6.49,0.015216,...,-0.006825,0.999582,0.007421,0.068094,0.064937,0.995536,0.068868,-0.206275,-0.270583,0.937812
4,-0.078889,0.05,-0.287778,-0.846667,-1.067778,0.685556,-0.988889,1.23,-2.531111,0.00117,...,-0.00306,0.999899,-0.032984,0.010195,0.002399,0.999401,0.041566,-0.040273,-0.098731,0.99343


### Body-Relative Acceleration (BRA) Data

Use `to_acceleration` to compute acceleration values. Behind the scenes this method actually just calls `to_velocity` twice, so it behaves like `to_velocity`.

In [13]:
bra_data = mlt.to_acceleration(br_data)

bra_data.head()

Unnamed: 0,delta_delta_left_controller_pos_x,delta_delta_left_controller_pos_y,delta_delta_left_controller_pos_z,delta_delta_right_controller_pos_x,delta_delta_right_controller_pos_y,delta_delta_right_controller_pos_z,delta_delta_left_controller_rot_w,delta_delta_left_controller_rot_x,delta_delta_left_controller_rot_y,delta_delta_left_controller_rot_z,delta_delta_right_controller_rot_w,delta_delta_right_controller_rot_x,delta_delta_right_controller_rot_y,delta_delta_right_controller_rot_z,delta_delta_hmd_rot_w,delta_delta_hmd_rot_x,delta_delta_hmd_rot_y,delta_delta_hmd_rot_z
0,,,,,,,0.279674,-0.460463,0.629508,-0.559889,-0.326165,0.045569,0.857338,-0.395616,-0.061157,-0.012896,0.950715,0.303702
1,,,,,,,0.326868,0.364525,-0.660084,0.569708,-0.338445,-0.020548,-0.842826,0.417943,-0.060382,0.011536,-0.954769,-0.290925
2,-3.376145,-10.647212,-2.12209,-3.335184,1.074727,0.477268,0.985812,-0.105911,-0.030928,0.126493,0.995779,0.023481,-0.08335,-0.030426,0.998659,-0.003289,-0.04364,-0.027655
3,-2.585793,9.762606,-3.658574,-3.297393,-0.960364,1.765239,0.979017,0.177525,-0.012693,-0.09925,0.972303,0.023444,-0.120287,-0.199016,0.999947,0.006677,-0.000809,0.007804
4,1.600117,1.319556,6.471811,4.561058,0.152,-0.985772,0.99516,-0.047539,-0.050211,-0.069822,0.969544,-0.036688,0.182487,0.159176,0.999679,-0.014678,0.018215,0.009704
