## Group feature extraction

In [1]:
import movekit as mkit
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
path = "./datasets/fish-5-cleaned.csv"
data = mkit.read_data(path)
data = mkit.extract_features(data)
data.head()

Extracting all absolute features: 100%|██████████| 100.0/100 [00:01<00:00, 76.96it/s]


Unnamed: 0,time,animal_id,x,y,distance,direction,turning,average_speed,average_acceleration,stopped
0,1,312,405.29,417.76,0.0,"(0.0, 0.0)",0.0,0.115306,0.005298,1
1,1,511,369.99,428.78,0.0,"(0.0, 0.0)",0.0,0.012708,0.002823,1
2,1,607,390.33,405.89,0.0,"(0.0, 0.0)",0.0,0.045118,0.003392,1
3,1,811,445.15,411.94,0.0,"(0.0, 0.0)",0.0,0.23279,0.026729,1
4,1,905,366.06,451.76,0.0,"(0.0, 0.0)",0.0,0.067,0.001639,1


### Detecting outliers
Function performs detection of outliers, based on the KNN algorithm: user can define the regarding features for the detection, the number of the nearest neighbors taken into account for the outlier classification, the metric to calculate the distance, the method to aggregate the different distances, and the share of outliers.

In [3]:
# Detect outliers based on KNN.
# mkit.outlier_detection(dataset, features=["distance", "average_speed", "average_acceleration",
# "stopped", "turning"], contamination=0.01, n_neighbors=5, method="mean", metric="minkowski")
outs = mkit.outlier_detection(data)
# printing all rows where outliers are present
outs[outs.loc[:,"outlier"] == 1].head()

Unnamed: 0,time,animal_id,outlier,x,y,distance,direction,turning,average_speed,average_acceleration,stopped
5,2,312,1,405.31,417.37,0.390512,"(0.02, -0.39)",0.0,0.122306,0.003547,1
8,2,811,1,445.48,412.26,0.459674,"(0.33, 0.32)",0.0,0.282267,0.028174,1
9,2,905,1,365.86,451.76,0.2,"(-0.2, 0.0)",0.0,0.071,0.000271,1
1098,220,811,1,408.85,468.18,0.325576,"(-0.06, -0.32)",0.707107,0.276132,0.014037,1
1263,253,811,1,404.67,460.6,0.014142,"(0.01, -0.01)",-0.514496,0.154391,0.025176,1


In [4]:
# same function, different parameters
other_outs = mkit.outlier_detection(dataset = data, features = ["average_speed", "average_acceleration"], contamination = 0.05, n_neighbors = 8, method = "median", metric = "euclidean")

# printing all rows where outliers are present
other_outs[other_outs.loc[:,"outlier"] == 1].head()

Unnamed: 0,time,animal_id,outlier,x,y,distance,direction,turning,average_speed,average_acceleration,stopped
1522,305,607,1,179.53,417.02,3.268103,"(-3.26, -0.23)",0.999998,3.195286,-0.0681,0
1602,321,607,1,132.58,411.0,2.21199,"(-2.2, -0.23)",0.999645,2.088872,-0.217909,0
1607,322,607,1,130.81,410.81,1.780169,"(-1.77, -0.19)",0.999996,1.824672,-0.218531,0
1612,323,607,1,129.26,410.63,1.560417,"(-1.55, -0.18)",0.999962,1.561242,-0.214846,0
1617,324,607,1,127.95,410.58,1.310954,"(-1.31, -0.05)",0.997001,1.296308,-0.205183,0


### Group-level Analysis

Below we perform Analysis on Group-Level. This consists of:
- Group-Level averages,
- Centroid Medoid computation
- A dynamic time warping matrix, 
- A clustering over time based on absolute features,
- The centroid direction,
- The heading difference of each animal with respect to the current centroid
- The group - polarization for each timestep. 

#### Obtain group-level records for each point in time
Records consist of total group-distance covered, mean speed, mean acceleration and mean distance from centroid for each timestamp. If input doesn't contain centroid or feature data, it is calculated, showing a warning.

In [5]:
group_data = mkit.group_movement(data)
group_data.head()

Calculating centroid distances: 100%|██████████| 1000/1000 [00:05<00:00, 170.11it/s]


Unnamed: 0_level_0,total_dist,mean_speed,mean_acceleration,mean_distance_centroid
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,0.0,0.094584,0.007976,29.4616
2,1.174908,0.108927,0.007736,29.585
3,1.025155,0.122863,0.007782,29.6914
4,0.91896,0.138007,0.007898,29.7782
5,0.830461,0.155004,0.008365,29.8518


#### Obtain centroid, medoid and distance to centroid for each movement record

In [6]:
movement = mkit.centroid_medoid_computation(data, object_output = False)
movement.head()

Calculating centroid distances: 100%|██████████| 1000/1000 [00:04<00:00, 232.12it/s]


Unnamed: 0,time,animal_id,outlier,x,y,distance,direction,turning,average_speed,average_acceleration,stopped,x_centroid,y_centroid,medoid,distance_to_centroid
0,1,312,0,405.29,417.76,0.0,"(0.0, 0.0)",0.0,0.115306,0.005298,1,395.364,423.226,312,11.331
1,1,511,0,369.99,428.78,0.0,"(0.0, 0.0)",0.0,0.012708,0.002823,1,395.364,423.226,312,25.975
2,1,607,0,390.33,405.89,0.0,"(0.0, 0.0)",0.0,0.045118,0.003392,1,395.364,423.226,312,18.052
3,1,811,0,445.15,411.94,0.0,"(0.0, 0.0)",0.0,0.23279,0.026729,1,395.364,423.226,312,51.049
4,1,905,0,366.06,451.76,0.0,"(0.0, 0.0)",0.0,0.067,0.001639,1,395.364,423.226,312,40.901


#### Get the heading difference between centroids and animal's direction
Heading difference is computed with the cosine similarity of the two direction vectors, thus ranges from -1 to 1. While 1 indicates the animal and the centroid have the same direction, -1 indicates they move in different directions.

In [7]:
centroid_dir = mkit.compute_centroid_direction(data).sort_values(['time','animal_id'])
heading_diff = mkit.get_heading_difference(data)
heading_diff.head()


Calculating centroid distances: 100%|██████████| 1000/1000 [00:02<00:00, 444.64it/s]
Computing centroid direction: 100%|██████████| 100.0/100 [00:00<00:00, 846.21it/s]
Calculating centroid distances: 100%|██████████| 1000/1000 [00:04<00:00, 237.34it/s]
Calculating heading difference: 100%|██████████| 100.0/100 [00:01<00:00, 86.41it/s] 


Unnamed: 0,time,animal_id,outlier,x,y,distance,direction,turning,average_speed,average_acceleration,stopped,x_centroid,y_centroid,medoid,distance_to_centroid,centroid_direction,heading_difference
0,1,312,0,405.29,417.76,0.0,"(0.0, 0.0)",0.0,0.115306,0.005298,1,395.364,423.226,312,11.331,"(0.0, 0.0)",0.0
1,1,511,0,369.99,428.78,0.0,"(0.0, 0.0)",0.0,0.012708,0.002823,1,395.364,423.226,312,25.975,"(0.0, 0.0)",0.0
2,1,607,0,390.33,405.89,0.0,"(0.0, 0.0)",0.0,0.045118,0.003392,1,395.364,423.226,312,18.052,"(0.0, 0.0)",0.0
3,1,811,0,445.15,411.94,0.0,"(0.0, 0.0)",0.0,0.23279,0.026729,1,395.364,423.226,312,51.049,"(0.0, 0.0)",0.0
4,1,905,0,366.06,451.76,0.0,"(0.0, 0.0)",0.0,0.067,0.001639,1,395.364,423.226,312,40.901,"(0.0, 0.0)",0.0


#### Obtain a matrix, based on dynamic time warping
Each Animal-ID is displayed in the indices, the entries reflect the similarity of the animal's trajectories based on the DTW algorithm.

In [8]:
#Obtain dynamic time warping amongst all trajectories from the animals. The lower the value for two animals is, the more similar their trajectories are based on the DTW algorithm.
#mkit.dtw_matrix(preprocessed_data, path=False, distance=euclidean)
#preprocessed_data: DataFrame containing the movement data.
#path: Boolean to specify if matrix of dtw-path gets returned as well. (the warping path for all the sequence pairs which are examined)
#distance: Specify with distance measure to use. Default: "euclidean". Other example alternatives are pdist or minkowski. (all distances defined by fastdtw package are possible.

mkit.dtw_matrix(data)

Calculating dynamic time warping: 100%|██████████| 5/5 [00:05<00:00,  1.09s/it]


Unnamed: 0,312,511,607,811,905
312,0.0,30843.085403,32859.600139,42461.524553,37916.447829
511,30843.085403,0.0,26931.014323,47116.708116,20967.960073
607,32859.600139,26931.014323,0.0,39859.787924,35711.718898
811,42461.524553,47116.708116,39859.787924,0.0,38379.806433
905,37916.447829,20967.960073,35711.718898,38379.806433,0.0
