# ICEL9 Data Analysis Workshop
Python version - Christoffer Roepstorff, collaboration with J.I.M Parmentier

## Skeleton of the assignment - part 2 (Looking at all files, T0)

- Declare the datapath and different needed information (e.g., horses identifiers, markers of interest) 
- In a loop, load the files and extract the parameters file by file as defined in the assignement part 1 (and some variations, for example in the stride splitting section)
- Remove outliers, based on impossible stride duration values
- Save the parameters in a table (e.g., csv file) for future analysis
- Compare the results between horses

## I. Load the files and declare variables of interest
We start by importing our required packages,

In [1]:
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import scipy as sc
from missing_samples import InterpolationEnum
from pathlib import Path 

Next we declare horse names, marker labels, results variables etc

In [2]:
# Horses
horse_list = ['horseA','horseB','horseC','horseD','horseE','horseF','horseG',
              'horseH','horseJ','horseK','horseL','horseM','horseN','horseO']

# We will only look at time point T0
time_point = "T0"

# Markers
marker_list = ['Poll','T8','TubSac']
stride_split_marker = "Hoof_LH"

# Create a list that links markers to either fore- or hindlimb variables
marker_link = ["fore", "fore", "hind"]

# Data directory
data_directory = Path("../data")

## II. Loop over data
This time around all the things we did manually in assignment 1 we will attemt to make automatic here. Code and algorithms discussed have sometimes been implemented in reusable functions.

The general flow of operations should be the same as in assignment 1,
1. Load a file and extract information that is neccesary
2. Prepare data, fill gaps, filter etc
3. Split into strides
4. Extract variiables
5. Save results

In [3]:
def stride_split_hoof(signal: np.ndarray, frame_rate: float, stride_frequency: float) -> np.ndarray:
    """ Split the input signal into strides by finding acceleration peaks
    """
    
    
    # Calculate the difference i.e., the acceleration
    acceleration = np.gradient(np.gradient(signal, 1/frame_rate), 1/frame_rate)
    
    
    #  Use the stride frequency to decide minimum frame distance between strides, allow som variation
    distance = int(frame_rate/stride_frequency*0.8)
    
    # Use the findpeaks function with a distance and a prominence argument
    peak_indices, _ = sc.signal.find_peaks(acceleration, distance=distance, prominence=90)

    return peak_indices

In [4]:
def find_typical_stride_frequency(signal:np.ndarray, frame_rate: float) -> float:
    """This Function uses the FFT to extract the most common frequency from a signal
    """
    
    # First the signal has to be cleaned from non-finite data
    filled_signal,_ = InterpolationEnum.pchip.fill_missing(signal, max_gap_length=np.inf)
    
    # Get signal length
    L = len(filled_signal);

    # FFT
    signal_detrend = sc.signal.detrend(filled_signal)
    Y = np.fft.fft(signal_detrend - np.mean(signal_detrend));

    # Extract the amplitude spectrum ba taking the abnsolute values of the complex valued transform
    amp = 2*np.abs(Y)/L;

    # Create a vector with frequencies
    f = frame_rate*np.linspace(0,1,L);
    
    # Return the most prominent frequency
    max_frequency_index = np.argmax(amp[0:int(L/2)])
    
    return f[max_frequency_index]

In [5]:
def sos_bandpass_filter(signal: np.ndarray, frame_rate: float, low_cut: float, high_cut: float, 
                        order: int) -> np.ndarray:
    """ Convenience function for a bandpass filter
    """
    
    # Make the cutoffs reltive to nyqist frequency
    low_n = low_cut/frame_rate*2
    high_n = high_cut/frame_rate*2

    # Create filter
    sos = sc.signal.butter(order, [low_n, high_n], btype='band', output='sos')
    
    # Filter and return result
    return sc.signal.sosfiltfilt(sos, signal)

In [6]:
def filter_signal_with_nan(signal:np.ndarray, frame_rate: float, stride_frequency: float) -> np.ndarray:
    """ Filter the input signal by filling missing values and the replacing them. This is a bandpass filter
    that sets its cut off frequencies based on the input stride frequency
    """
    # Set filter paramtersm high and low cut-off frequencies are based on stride frequency
    low_cut = 0.7*stride_frequency
    high_cut = 5*stride_frequency
    order = 4
    
    # First the signal has to be cleaned from non-finite data
    filled_signal,_ = InterpolationEnum.pchip.fill_missing(signal, max_gap_length=np.inf)
    
    # Filter
    filtered_signal = sos_bandpass_filter(signal=filled_signal, frame_rate=frame_rate, low_cut=low_cut, high_cut=high_cut, order=order)
    
    # Put back nan values
    original_nan_mask = np.isnan(signal)
    filled_signal[original_nan_mask] = np.nan
    
    return filtered_signal

In [18]:
def extract_variables(signal: np.ndarray, stride_indices: np.ndarray, frame_rate:float, is_hindlimb: bool, marker: str):
    # Prepare output
    variables = {f"{marker}_stride_duration": [],
                 f"{marker}_min_diff": [],
                 f"{marker}_max_diff": []}

    # Loop over all peak indices except the last one (-1) as a stride happens between indices
    for stride_cnt in range(len(stride_indices) - 1):
        stride_start = stride_indices[stride_cnt]
        stride_end = stride_indices[stride_cnt + 1]
        stride_ts = signal[stride_start:stride_end]

        # Calculate stride duration
        stride_duration = (stride_end - stride_start) / frame_rate
        variables[f"{marker}_stride_duration"].append(stride_duration)

        # Find peaks and troughs/valleys
        stride_peaks, _ = sc.signal.find_peaks(stride_ts)
        stride_troughs, _ = sc.signal.find_peaks(-stride_ts)

        # Calculate min diff
        if len(stride_troughs) == 2:
            mindiff = stride_ts[stride_troughs[1]] - stride_ts[stride_troughs[0]]
        else:
            mindiff = np.nan

        # Calculate max diff
        if len(stride_peaks) == 2:
            maxdiff = stride_ts[stride_peaks[1]] - stride_ts[stride_peaks[0]]
        else:
            maxdiff = np.nan

        variables[f"{marker}_min_diff"].append(mindiff)
        variables[f"{marker}_max_diff"].append(maxdiff)

    return variables

In [21]:
def process_a_single_file(file_path: Path, markers: list[str], marker_link: list[str], stride_split_marker=str) -> tuple[pd.DataFrame, dict]:
    """ This is responsible for the entire processing of a single file
    
    """
    # Read the file
    df = pd.read_csv(file_path)
    
    # Extract frame rate
    frame_rate = df["FrameRate"][0]
    
    # Find the typical stride frequency of this particular measurement, use the stride split markers z - position
    stride_split_signal = df[f"{stride_split_marker}_z"].to_numpy()
    stride_frequency = find_typical_stride_frequency(signal=stride_split_signal, 
                                                     frame_rate=frame_rate)
    
    # Extract strides 
    stride_indices = stride_split_hoof(signal=stride_split_signal, 
                                       frame_rate=frame_rate, 
                                       stride_frequency=stride_frequency)
    
    
    # Filter and stride split marker data and store in dict
    filtered_markers = {marker: None for marker in markers}
    all_variables = {marker: None for marker in markers}
    for marker_cnt, marker in enumerate(markers):
        # Pick out marker signal
        marker_signal = df[f"{marker}_z"].to_numpy()
        
        # Filter using our filter with nan function
        filtered_signal = filter_signal_with_nan(signal=marker_signal, 
                                                 frame_rate=frame_rate, 
                                                 stride_frequency=stride_frequency)
        
        # Store filtered signal 
        filtered_markers[marker] = filtered_signal
        
        # Extract variables, make sure to distinguish between fore- and hindlimb variables
        is_hindlimb = False
        if marker_link[marker_cnt] == "hind":
            is_hindlimb = True
        
        variables = extract_variables(signal=filtered_signal, 
                                      stride_indices=stride_indices, 
                                      frame_rate=frame_rate,
                                      is_hindlimb=is_hindlimb,
                                      marker=marker)
        # save the variables 
        all_variables[marker] = variables
        
    # Make table out out variables
    flat_variable_dict = {}
    for key in all_variables:
        flat_variable_dict.update(all_variables[key])
        
    variables_df = pd.DataFrame.from_dict(flat_variable_dict)
    
    # Return variables and filtered data
    return variables_df, filtered_markers
        

In [23]:
# Get a list of all .csv files in our data directory, matching our specified `time_point`
csv_file_list = data_directory.glob("T0*.csv")

# Loop over the files and load them one by one
for file_path in csv_file_list:
    
    # Call our function for a single file processing
    variables_df, filtered_data = process_a_single_file(file_path=file_path, markers=marker_list, marker_link=marker_link, stride_split_marker=stride_split_marker)
    
    display(variables_df.describe())

Unnamed: 0,Poll_stride_duration,Poll_min_diff,Poll_max_diff,T8_stride_duration,T8_min_diff,T8_max_diff,TubSac_stride_duration,TubSac_min_diff,TubSac_max_diff
count,54.0,53.0,53.0,54.0,54.0,54.0,54.0,54.0,52.0
mean,0.542315,-6.378933,3.870161,0.542315,-1.65658,0.591147,0.542315,-1.273757,0.613386
std,0.013165,6.613497,6.683037,0.013165,4.063824,3.79206,0.013165,4.254131,4.230865
min,0.52,-21.622402,-8.588854,0.52,-13.485494,-6.325212,0.52,-13.17383,-6.86297
25%,0.53,-10.826315,-0.598794,0.53,-4.399605,-1.528513,0.53,-3.48362,-2.395309
50%,0.54,-5.935916,3.419196,0.54,-2.098306,-0.242396,0.54,-1.254263,0.28581
75%,0.55,-1.12811,7.318986,0.55,1.179191,1.95935,0.55,1.443284,3.319071
max,0.57,5.038275,19.295746,0.57,6.467223,11.441674,0.57,7.360082,10.934426


Unnamed: 0,Poll_stride_duration,Poll_min_diff,Poll_max_diff,T8_stride_duration,T8_min_diff,T8_max_diff,TubSac_stride_duration,TubSac_min_diff,TubSac_max_diff
count,56.0,56.0,37.0,56.0,56.0,56.0,56.0,56.0,56.0
mean,0.525893,2.52288,-4.055146,0.525893,2.422359,0.594038,0.525893,4.652669,-2.105886
std,0.007574,7.47089,6.96231,0.007574,2.07211,1.907543,0.007574,2.287632,3.001837
min,0.51,-13.000958,-19.326435,0.51,-2.720863,-2.554888,0.51,-0.159851,-9.279478
25%,0.52,-3.05749,-8.520407,0.52,0.830713,-0.640695,0.52,2.94417,-4.281296
50%,0.525,3.010804,-4.212623,0.525,2.459663,0.15658,0.525,4.279875,-2.430188
75%,0.53,7.19453,0.378086,0.53,3.956833,1.287972,0.53,6.708426,-0.409
max,0.54,22.861707,9.780412,0.54,7.486158,6.076871,0.54,8.998254,5.187915


Unnamed: 0,Poll_stride_duration,Poll_min_diff,Poll_max_diff,T8_stride_duration,T8_min_diff,T8_max_diff,TubSac_stride_duration,TubSac_min_diff,TubSac_max_diff
count,54.0,54.0,51.0,54.0,53.0,53.0,54.0,54.0,53.0
mean,0.549907,9.060865,-14.887556,0.549907,-2.345726,0.176027,0.549907,-5.782997,-5.248963
std,0.018921,13.21202,16.017346,0.018921,5.741774,3.875141,0.018921,4.426473,4.411609
min,0.5,-27.550637,-48.881346,0.5,-13.319375,-13.37663,0.5,-14.94326,-16.560265
25%,0.54,1.682701,-24.24489,0.54,-5.615158,-2.171638,0.54,-9.244971,-7.926032
50%,0.55,8.236288,-17.244105,0.55,-2.643984,0.775243,0.55,-5.406505,-4.676515
75%,0.56,15.190225,-11.044374,0.56,0.283793,3.125483,0.56,-2.676113,-2.42612
max,0.62,40.095019,42.494563,0.62,10.035605,10.829921,0.62,4.196296,3.164697


Unnamed: 0,Poll_stride_duration,Poll_min_diff,Poll_max_diff,T8_stride_duration,T8_min_diff,T8_max_diff,TubSac_stride_duration,TubSac_min_diff,TubSac_max_diff
count,53.0,53.0,53.0,53.0,53.0,53.0,53.0,53.0,53.0
mean,0.558868,2.730798,-14.977053,0.558868,-2.009456,-3.800509,0.558868,5.849084,-2.288174
std,0.00886,5.227549,5.753076,0.00886,3.026684,2.461441,0.00886,3.855137,3.548533
min,0.545,-10.326603,-32.158039,0.545,-10.17996,-10.14375,0.545,-3.371092,-12.451114
25%,0.555,-0.57958,-18.522437,0.555,-3.301616,-5.118523,0.555,2.718481,-4.55088
50%,0.56,2.347761,-15.178679,0.56,-1.530783,-4.141573,0.56,5.660151,-1.856406
75%,0.56,6.305469,-12.536511,0.56,-0.182686,-2.303767,0.56,8.215391,0.306427
max,0.59,12.790288,-2.613339,0.59,3.462348,2.780425,0.59,17.517037,4.29864


Unnamed: 0,Poll_stride_duration,Poll_min_diff,Poll_max_diff,T8_stride_duration,T8_min_diff,T8_max_diff,TubSac_stride_duration,TubSac_min_diff,TubSac_max_diff
count,55.0,55.0,52.0,55.0,53.0,54.0,55.0,55.0,52.0
mean,0.542,-3.585245,1.541067,0.542,-0.801315,-0.245786,0.542,-0.151741,1.292629
std,0.02229,6.339258,5.549542,0.02229,2.131225,2.283981,0.02229,3.604986,3.265403
min,0.44,-23.365733,-9.221751,0.44,-6.564951,-8.777982,0.44,-11.917981,-5.453801
25%,0.54,-6.240863,-2.659878,0.54,-1.6193,-0.866422,0.54,-2.004868,-0.54461
50%,0.54,-3.486149,1.155222,0.54,-0.62256,-0.277354,0.54,0.484517,0.602159
75%,0.55,0.678402,5.168738,0.55,0.656638,0.76416,0.55,2.011725,3.409698
max,0.605,8.370559,15.533272,0.605,2.517419,3.65556,0.605,8.488044,10.406549


Unnamed: 0,Poll_stride_duration,Poll_min_diff,Poll_max_diff,T8_stride_duration,T8_min_diff,T8_max_diff,TubSac_stride_duration,TubSac_min_diff,TubSac_max_diff
count,51.0,51.0,51.0,51.0,51.0,51.0,51.0,51.0,51.0
mean,0.583922,-4.688527,-1.465214,0.583922,-0.548334,1.251528,0.583922,4.644793,-5.034112
std,0.00783,8.542528,7.279462,0.00783,4.553752,2.34905,0.00783,5.401098,3.862768
min,0.57,-22.684266,-18.243762,0.57,-11.131689,-6.23596,0.57,-7.717717,-12.914535
25%,0.58,-9.870602,-5.583904,0.58,-3.388115,-0.35408,0.58,0.709316,-7.236713
50%,0.585,-5.703053,-1.865822,0.585,-0.622623,1.011562,0.585,5.45256,-5.697338
75%,0.59,1.141814,3.206346,0.59,2.559456,3.016649,0.59,8.745064,-3.179524
max,0.605,13.67067,15.848262,0.605,10.299564,5.80947,0.605,17.852077,7.663644


Unnamed: 0,Poll_stride_duration,Poll_min_diff,Poll_max_diff,T8_stride_duration,T8_min_diff,T8_max_diff,TubSac_stride_duration,TubSac_min_diff,TubSac_max_diff
count,35.0,32.0,34.0,35.0,33.0,35.0,35.0,33.0,33.0
mean,0.557286,1.817228,2.324218,0.557286,-0.968226,-0.242007,0.557286,-3.507756,-1.800634
std,0.041628,16.870025,10.971736,0.041628,5.604487,3.291318,0.041628,4.106097,5.69057
min,0.45,-71.551755,-37.179698,0.45,-14.284035,-9.000402,0.45,-13.696569,-19.476645
25%,0.545,-0.521979,-2.058047,0.545,-3.494159,-1.907727,0.545,-5.66778,-3.786693
50%,0.555,4.057923,3.699957,0.555,-1.353763,-0.426359,0.555,-4.128113,-1.424174
75%,0.565,10.454795,9.052607,0.565,3.373206,1.473222,0.565,-0.954798,1.112028
max,0.72,25.780935,17.126173,0.72,7.397917,7.413267,0.72,7.103353,8.695864


Unnamed: 0,Poll_stride_duration,Poll_min_diff,Poll_max_diff,T8_stride_duration,T8_min_diff,T8_max_diff,TubSac_stride_duration,TubSac_min_diff,TubSac_max_diff
count,51.0,39.0,40.0,51.0,42.0,41.0,51.0,42.0,46.0
mean,0.587843,1.380218,-6.156217,0.587843,-3.422432,-4.785524,0.587843,-1.547949,-2.32853
std,0.104203,11.750331,6.734411,0.104203,5.221583,3.289499,0.104203,4.166836,4.629436
min,0.45,-45.235899,-18.123308,0.45,-12.308749,-12.915709,0.45,-9.750695,-13.567024
25%,0.555,-2.183479,-10.859388,0.555,-6.004152,-6.97975,0.555,-4.623002,-5.500382
50%,0.56,4.461496,-5.997541,0.56,-4.706566,-5.52172,0.56,-1.936513,-1.235709
75%,0.57,8.404962,-2.953433,0.57,-1.264569,-3.087762,0.57,0.892388,1.09176
max,0.905,14.899485,13.096448,0.905,15.247062,4.432674,0.905,7.301823,7.123098


Unnamed: 0,Poll_stride_duration,Poll_min_diff,Poll_max_diff,T8_stride_duration,T8_min_diff,T8_max_diff,TubSac_stride_duration,TubSac_min_diff,TubSac_max_diff
count,56.0,56.0,56.0,56.0,56.0,56.0,56.0,56.0,56.0
mean,0.528125,6.710268,-10.598544,0.528125,2.460334,-2.661178,0.528125,3.18277,5.546066
std,0.009703,4.549687,5.761042,0.009703,3.775736,2.465975,0.009703,5.79029,2.618846
min,0.495,-7.361834,-32.365537,0.495,-5.067664,-7.525703,0.495,-9.363196,-2.108539
25%,0.52375,4.478703,-13.89654,0.52375,0.032606,-4.037097,0.52375,-0.995836,3.771977
50%,0.53,7.617904,-10.888809,0.53,2.861422,-2.913531,0.53,3.204078,5.810432
75%,0.535,9.33949,-7.911674,0.535,5.135219,-1.199074,0.535,5.812124,7.009691
max,0.545,19.275167,9.576109,0.545,10.159372,3.885161,0.545,18.043794,11.237688


Unnamed: 0,Poll_stride_duration,Poll_min_diff,Poll_max_diff,T8_stride_duration,T8_min_diff,T8_max_diff,TubSac_stride_duration,TubSac_min_diff,TubSac_max_diff
count,54.0,52.0,52.0,54.0,52.0,52.0,54.0,52.0,52.0
mean,0.544074,10.553992,-3.562993,0.544074,-7.747642,-7.348743,0.544074,1.193929,19.703993
std,0.048672,4.752784,5.360986,0.048672,2.626183,2.599852,0.048672,4.437689,3.323708
min,0.51,-0.7223,-14.391424,0.51,-13.506973,-11.498005,0.51,-12.310155,11.226827
25%,0.53,7.582617,-7.976977,0.53,-8.751148,-9.292627,0.53,-0.52156,17.363893
50%,0.535,10.528197,-2.896138,0.535,-7.841519,-7.711137,0.535,1.91083,19.685178
75%,0.54375,12.507882,-0.2209,0.54375,-6.381937,-5.270044,0.54375,4.070549,21.608411
max,0.795,21.617305,13.469628,0.795,-0.893669,-0.066513,0.795,9.643733,26.92976


Unnamed: 0,Poll_stride_duration,Poll_min_diff,Poll_max_diff,T8_stride_duration,T8_min_diff,T8_max_diff,TubSac_stride_duration,TubSac_min_diff,TubSac_max_diff
count,50.0,50.0,49.0,50.0,50.0,47.0,50.0,50.0,47.0
mean,0.5918,3.448941,-5.956422,0.5918,1.112977,2.538142,0.5918,-6.515063,7.18
std,0.01916,6.06892,5.659465,0.01916,3.59892,3.749256,0.01916,4.316186,4.974018
min,0.51,-9.070395,-19.075935,0.51,-7.567227,-4.821301,0.51,-14.708622,-4.838502
25%,0.585,-0.689171,-9.454551,0.585,-1.295646,0.856512,0.585,-9.273187,4.369702
50%,0.595,3.446446,-6.673555,0.595,1.175944,2.627952,0.595,-6.638906,6.401277
75%,0.60375,7.464029,-2.299948,0.60375,2.789389,5.820343,0.60375,-4.077451,11.136609
max,0.625,16.861245,10.173995,0.625,8.282022,9.769995,0.625,7.401836,20.426268


Unnamed: 0,Poll_stride_duration,Poll_min_diff,Poll_max_diff,T8_stride_duration,T8_min_diff,T8_max_diff,TubSac_stride_duration,TubSac_min_diff,TubSac_max_diff
count,56.0,56.0,31.0,56.0,56.0,56.0,56.0,56.0,56.0
mean,0.528482,-5.250543,-5.386433,0.528482,1.059646,1.021025,0.528482,2.210519,-4.527176
std,0.012133,8.754254,7.94425,0.012133,2.9134,3.012392,0.012133,4.027081,3.243197
min,0.5,-23.767459,-23.866958,0.5,-5.415848,-7.434202,0.5,-7.13487,-9.819652
25%,0.52,-11.124349,-9.837668,0.52,-0.740019,-0.874129,0.52,-0.145507,-6.760461
50%,0.525,-5.439383,-5.076157,0.525,0.90538,1.596067,0.525,2.469329,-4.696313
75%,0.535,2.044708,-0.764139,0.535,3.176491,2.761697,0.535,5.207167,-2.397142
max,0.56,11.623481,13.090144,0.56,7.338097,6.463736,0.56,10.533259,3.198226


Unnamed: 0,Poll_stride_duration,Poll_min_diff,Poll_max_diff,T8_stride_duration,T8_min_diff,T8_max_diff,TubSac_stride_duration,TubSac_min_diff,TubSac_max_diff
count,55.0,55.0,55.0,55.0,55.0,55.0,55.0,55.0,55.0
mean,0.542091,1.851155,4.775907,0.542091,5.513402,2.531271,0.542091,-0.081131,-5.077561
std,0.016574,7.113531,6.752956,0.016574,2.76905,2.680795,0.016574,4.700252,3.32505
min,0.465,-14.103419,-18.284561,0.465,-5.756497,-3.662044,0.465,-22.884008,-13.153893
25%,0.535,-2.136903,1.168033,0.535,4.345416,0.797272,0.535,-1.712527,-7.05924
50%,0.54,1.425537,5.479155,0.54,5.497096,2.558839,0.54,-0.602361,-5.30885
75%,0.55,7.36826,8.960231,0.55,7.107,4.101217,0.55,2.37561,-3.166093
max,0.585,16.218644,17.910648,0.585,12.359778,10.217137,0.585,9.529675,2.967807


Unnamed: 0,Poll_stride_duration,Poll_min_diff,Poll_max_diff,T8_stride_duration,T8_min_diff,T8_max_diff,TubSac_stride_duration,TubSac_min_diff,TubSac_max_diff
count,61.0,61.0,61.0,61.0,61.0,61.0,61.0,61.0,61.0
mean,0.483689,2.997227,-3.475304,0.483689,-0.413916,-0.836941,0.483689,-1.852205,1.133639
std,0.007124,2.956219,4.685772,0.007124,2.007787,1.78431,0.007124,2.310993,2.592416
min,0.465,-4.761032,-12.870366,0.465,-5.60027,-5.421671,0.465,-7.049084,-7.4294
25%,0.48,1.376116,-6.64632,0.48,-1.647304,-1.660044,0.48,-3.45031,-0.313928
50%,0.485,2.799421,-4.562229,0.485,-0.432126,-0.776018,0.485,-1.52767,0.981215
75%,0.485,4.898291,0.420912,0.485,0.967981,0.431177,0.485,-0.334031,3.315813
max,0.505,9.318198,6.147811,0.505,4.127235,2.69023,0.505,3.933106,5.46007


### Applying the stride split to all markers (putting it all together)
So far we only worked with the Poll marker to look at data. Now lets apply the data imputation, filter and stride split on all of our markers. Beacuse the filter will not work on non-finite data, we use the gapfill without thinking about the maximum fill length. Once the filtering is done we will put back the nan values. Once the data has been filtered we will apply the stride split to look at data stride by stride. 


In [10]:
# print(peak_indices)

# Loop over ALL markers
all_markers = []
all_markers.extend(markers_upper_body)
all_markers.extend(markers_stride_split)

# Prepare figure
fig = make_subplots(rows=len(all_markers), cols=1, subplot_titles=all_markers, vertical_spacing=0.04)

for mcnt, marker in enumerate(all_markers):
    # Get raw data from data frame
    raw = df[f"{marker}_z"][start_index:end_index]
    
    # Gapfill, but remember indices with nan
    nan_indices = np.isnan(raw)
    filled, _ = InterpolationEnum.pchip.fill_missing(raw, max_gap_length=np.inf)
     
    # Filter use our function, lets use cutoffs here that we know will work for most trotting horses
    filtered = sos_bandpass_filter(data=filled, frame_rate=fs, low_cut=1, high_cut=6, order=4)
    
    # Put back nans
    filtered[nan_indices] = np.nan
    
    #Use the stride split we made
    for pcnt, peak_ind in enumerate(range(len(peak_indices) - 1)):
        stride_start = peak_indices[peak_ind]
        stride_end = peak_indices[peak_ind + 1]
        stride_ts = filtered[stride_start:stride_end]
        
        fig.add_trace(go.Scatter(y=stride_ts,
                                 name=f"{marker} stride {pcnt}",
                                 line=go.scatter.Line(color=colors[mcnt])), row=mcnt + 1, col=1)
        fig.update_yaxes(title="Amplitude (mm)", row=mcnt + 1, col=1)

fig.update_xaxes(title="% of stride", row=len(all_markers), col=1)
fig.update_layout(height=1000)    
fig.show()

NameError: name 'markers_upper_body' is not defined