# ICEL9 Data Analysis Workshop
Python version - Christoffer Roepstorff, collaboration with J.I.M Parmentier

### Skeleton of the assignment - part 1 (Looking at 1 file)
1. Declare the datapath and different needed information (e.g., horses identifiers, markers of interest) and load the file
2. Extract information (labels (marker names), markers' trajectories, sampling rate, measurement duration), create a time vector, extract the trajectories of interest
3. Visualise the data, see that some data have missing samples (Motion Capture)
4. Deal with missing samples, if missing samples are less than 3 
5. Extract the segment of interest (no missing data)
6. Explore the effect of mean removal, fitlering 
7. Stride split, based on LH hoof or LH fetlock (why LH? discuss it during the workshop)
8. Extract the parameters of interest
9. Save the parameters in a table (e.g., csv file)

## 1. Load data
before any data is loaded we need to import the packages we are going to use in this notebook.

In [69]:
import pandas as pd
import numpy as np
from pathlib import Path

To load the data, you need to first define a datapath. You can copy-paste below where your data is located (the relative path to the repo data folder should work out of the box). We will use our first imported library to load the csv file of **horseA** at time point **TO**.:

In [65]:
# Specify horse name and the time point 
horse_name = "horseA"
time_point = "T0"

# Create file name variable use the Path to create filepaths that work on any typ of opertating system (Windows, Mac or Linux)
file_name = Path(datapath) / Path(f"{time_point}_{horse_name}_trot1.csv")

# Create a pandas dataframe, a dataframe holds tabular data
df = pd.read_csv(file_name)

# Print the data frame to show a cropped version of the table
display(df)


Unnamed: 0,Poll_x,Poll_y,Poll_z,Poll_unknown,T8_x,T8_y,T8_z,T8_unknown,T15_x,T15_y,...,Tarsus_L_unknown,Fetl_LH_x,Fetl_LH_y,Fetl_LH_z,Fetl_LH_unknown,Hoof_LH_x,Hoof_LH_y,Hoof_LH_z,Hoof_LH_unknown,FrameRate
0,-560.922363,233.692276,977.960938,0.495078,-195.458160,307.712860,780.524841,2.306027,57.767254,304.062531,...,0.655783,351.911530,213.371140,124.607330,0.777076,366.633636,208.959671,61.190224,1.165402,200
1,-558.775208,232.062744,978.336975,0.604814,-195.913132,306.634735,781.833496,1.818279,58.486240,303.271820,...,0.149920,344.841858,217.778244,124.699272,0.721445,358.236267,214.954132,60.818562,1.346640,200
2,-556.945679,230.355881,978.746765,0.775821,-196.926895,305.134888,782.886902,0.865028,58.954975,302.786652,...,0.907110,338.007874,222.316910,124.432686,0.702477,349.764160,220.625397,60.060818,1.154190,200
3,-553.725525,224.122391,978.069153,2.929161,-197.398544,303.622101,784.491760,1.078106,59.357685,302.991974,...,0.869310,331.179749,226.665207,124.176414,0.660892,341.338837,226.140579,58.915882,0.950357,200
4,-552.639099,223.441132,979.258057,3.044172,-197.905212,302.154510,786.777222,1.112610,59.720127,302.331299,...,0.955275,324.350555,230.890381,123.593819,0.556543,332.656433,231.378357,57.439835,0.831671,200
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3995,-440.824341,226.126068,972.912292,1.760206,-63.359879,208.453323,794.693970,0.764555,202.952484,214.636948,...,1.163501,668.258972,99.918739,116.325233,1.649464,718.653564,90.989746,72.394737,1.978567,200
3996,-441.745270,225.849884,971.142273,2.112327,-63.992321,209.072723,793.198853,0.746758,202.626144,215.336670,...,1.418628,659.694519,100.616386,115.604866,1.683566,709.024536,90.713402,72.236969,1.829671,200
3997,-443.325378,225.408127,970.026123,1.982109,-64.456924,209.692566,791.991028,0.824410,202.525757,215.874710,...,1.478843,650.000488,100.346230,115.229919,2.536987,698.965271,92.284508,70.202408,1.889032,200
3998,-445.259003,225.209061,969.582520,1.904915,-65.013855,210.744949,790.916870,0.988608,202.469131,216.429947,...,1.253583,640.394958,102.187538,114.185860,2.556931,688.061340,92.561897,68.899651,1.689516,200


## Extract the ICE
As you can see from the displayed data frame aboa each column holds a timeseries of data for a motion capture marker along one axis. For example, 

column 1: The *Poll* markers *x* position, i.e., the movement along the x axis
column 2 : The *Poll* markers *y* position, i.e., the movement along the x axis
etc

We will work with three upper-body markers: Head ('Poll'), Withers ('T8') and the Pelvis ('TubSac'). You can define them below:

In [66]:
markers_upper_body = ['Poll','T8','TubSac']  # Define the upper body markers

Upper body markers:


['Poll', 'T8', 'TubSac']

For splitting the data into strides for the analysis, we will use the left hind hoof ('Hoof_LH') and the left hind fetlock ('Fetl_LH') markers.

In [67]:
markers_stride_split = ['Hoof_LH','Fetl_LH']  # Define stride split markers

Stride split markers:


['Hoof_LH', 'Fetl_LH']

We also need to define the movement axis' that we have, and the frame rate,

In [71]:
axis = ["x", "y", "z"]  # Define the cartesian axis'

fs = df["FrameRate"][0]  # Save the framerate, the '0' here means we only extract the values from the first frame of the data

Now we can create a time vector, the same length as our timeseries data (the number of rows in the dataframe),

In [74]:
time = np.linspace(0, (len(df)-1)/fs, len(df))

## Plot the trajectories of interest
