# Example usage for "movekit"

In [1]:
import movekit

### Read in CSV file:

In [2]:
# Enter absolute/complete path to CSV file-
path_to_file = "datasets/fish-5.csv"

In [3]:
# Read in CSV file using 'path_to_file' variable-
data = movekit.parse_csv(path_to_file)
print(data)

      time  animal_id       x       y
0        1        312  405.29  417.76
1        1        511  369.99  428.78
2        1        607  390.33  405.89
3        1        811  445.15  411.94
4        1        905  366.06  451.76
5        2        312  405.31  417.37
6        2        511  370.01  428.82
7        2        607  390.25  405.89
8        2        811  445.48  412.26
9        2        905  365.86  451.76
10       3        312  405.31  417.07
11       3        511  370.01  428.85
12       3        607  390.17  405.88
13       3        811  445.77  412.61
14       3        905  365.70  451.76
15       4        312  405.30  416.86
16       4        511  370.01  428.86
17       4        607  390.07  405.88
18       4        811  446.03  413.00
19       4        905  365.57  451.76
20       5        312  405.29  416.71
21       5        511  369.99  428.86
22       5        607  389.98  405.87
23       5        811  446.24  413.42
24       5        905  365.47  451.76
25       6  

### Preprocess CSV file:
- "data_preprocessing()" function takes as input the CSV file read in using "csv_to_pandas()" function.
- The function returns the preprocessed data as a Pandas DataFrame. Also, it prints out statistics for the data preprocessing it performs for the user to view.

In [4]:
# To perform data preprocessing-
preprocessed_data = movekit.data_preprocessing(data)


The dimensions/shape of the raw data file is: (5000, 4)


Number of unique animals in raw data are: 5


Number of rows in data having missing values for 'time' attribute are = 0


Number of rows in data having missing values for 'animal_id' attribute are = 0


Rows having missing values for 'time' and 'animal_id' will be deleted.


Number of duplicate rows in data for 'x' & 'y' attributes are = 26


Duplicate rows for 'x' & 'y' attributes will be removed.



### Impute missing values:
- To impute missing values for the attribute/feature/column 'x' and 'y', linear interpolation is used
- 'linear_interpolation()' function takes as argument the preprocessed data which we get by using 'data_preprocessing()' and also takes 'threshold' as the second argument which specifies the number of rows till which data should NOT be deleted.

Example: If threshold = 20, this means that if number of consecutive rows for the data is equal to or greater than 20, they will be deleted!

In [5]:
# Perform linear interpolation-
linear_interpolated_data = movekit.linear_interpolation(preprocessed_data, 20)


Number of missing values in 'x' attribute = 0
Number of missing values in 'y' attribute = 0



### Grouping data according to 'animal_id' attribute-
- 'grouping_data' function groups all values for each 'animal_id'.
- The input parameter is 'processed_data' which is the processed Pandas DataFrame
- The function returns a dictionary where-:
- key is animal_id, value in Pandas DataFrame for that 'animal_id'

In [6]:
# To group data according to 'animal_id' attribute-
data_grouped = movekit.grouping_data(preprocessed_data)

In [7]:
# Iterate through the keys of dictionary (which are animal_ids) and get the shape/dimension of each Pandas DataFrame-
for aid in data_grouped.keys():
    print("\nAnimal ID: {0} has the dimension/shape: {1}".format(aid, data_grouped[aid].shape))


Animal ID: 312 has the dimension/shape: (988, 8)

Animal ID: 511 has the dimension/shape: (988, 8)

Animal ID: 607 has the dimension/shape: (999, 8)

Animal ID: 811 has the dimension/shape: (999, 8)

Animal ID: 905 has the dimension/shape: (1000, 8)


### Calculate absolute features: metric distance, direction, avg_speed, avg_acceleration 
- Calculate the metric distance and direction between two consecutive time frames/time stamps for each moving entity (animals)
- 'compute_average_speed()' function to compute average speed of an animal based on fps (frames per second) parameter
- Formula used-
- Average Speed = Total Distance Travelled / Total Time taken
- 'compute_average_speed()' function computes the average speed of an animal based on fps (frames per second) parameter

In [8]:
data_features = movekit.compute_absolute_features(data_grouped)
print(data_features)


Computing Distance & Direction for Animal ID = 312



  direction = math.degrees(math.atan((y2 - y1) / (x2 - x1)))



Computing Distance & Direction for Animal ID = 511


Computing Distance & Direction for Animal ID = 607


Computing Distance & Direction for Animal ID = 811


Computing Distance & Direction for Animal ID = 905


Computing Average Speed for Animal ID = 312


Computing Average Speed for Animal ID = 511


Computing Average Speed for Animal ID = 607


Computing Average Speed for Animal ID = 811


Computing Average Speed for Animal ID = 905


Computing Average Speed for Animal ID = 312


Computing Average Speed for Animal ID = 511


Computing Average Speed for Animal ID = 607


Computing Average Speed for Animal ID = 811


Computing Average Speed for Animal ID = 905

      time  animal_id       x       y  Distance  Average_Speed  \
0        1        312  405.29  417.76  0.000000       0.000000   
1        2        312  405.31  417.37  0.300000       0.220190   
2        3        312  405.31  417.07  0.210238       0.154184   
3        4        312  405.30  416.86  0.150333       0.107438  

### Using "tsfresh" Python library:

In [10]:
# For extracting all time series related features, do-
extracted_features = movekit.time_series_analyis(data_features)

Feature Extraction: 100%|██████████| 10/10 [00:34<00:00,  3.07s/it]


In [13]:
# Save to disk 
print(extracted_features)
extracted_features.to_json("extraced_features_fish.json")

variable  Average_Acceleration__abs_energy  \
id                                           
312                               1.736188   
511                               2.204457   
607                               1.810427   
811                               1.207602   
905                               3.127378   

variable  Average_Acceleration__absolute_sum_of_changes  \
id                                                        
312                                           10.258680   
511                                           10.833218   
607                                           12.471187   
811                                           13.666506   
905                                           13.880315   

variable  Average_Acceleration__agg_autocorrelation__f_agg_"mean"__maxlag_40  \
id                                                                             
312                                                0.007281                    
511                    