## Coding for biomechanics

By completing this notebook, you will...
1. Learn about a few important python data structures
2. Learn about python functions and how to use them
3. Load, inspect, and visualize real biomechanics data
4. Learn about the basics of debugging

## Package imports

You can find a python package for just about anything, but you will need to install it using pip/conda before importing it. One big upside of Google Colab is that is has many popular data science, machine learning, and visuzliation libraries pre-installed.

In [None]:
# Often we import libraries at the start of our code and we may use a shorthand for them (i.e. import packagename as pckg)

import numpy as np # A versatile packages for scientific computing
import matplotlib.pyplot as plt  # A package for plotting and visualization
import pandas as pd # A package for data loading and manipulation
import requests  # A package for downloading files from the internet

## 1. Python data structures (containers)

There are many data types in python. You should be aware of a few common data types and know how to check the data type of an object.

**1.1 Lists**

Let's define a list, which is just an ordered container. Notice that you can add or delete items, or reorder the list. This container can be *indexed*, meaning we can grab the $i^{th}$ element by placing $[i]$ after a list object. **Note:** Python indexing begins at zero. MATLAB indexing begins at one. **This trips many people up!**

The NumPy package contains a data type known as an ndarray. They are very similar to lists, but they do not allow you to mix data types within the same container. They are more computationally efficient than traditional python lists and are used extensively in scientific libraries like Pandas, Scipy, and Scikit-learn.

In [None]:
# Example: Marker positions over time (X coordinate only)
x_positions = [0.1, 0.15, 0.23, 0.31, 0.40]
print(f"First value: {x_positions[0]}")
print(f"Length: {len(x_positions)}")

# Check the type of x_positions
# YOUR CODE HERE

# Let's create an array of time values between 0 and 0.4 seconds, with the same length as x_positions using the numpy package
times = # YOUR CODE HERE

# Now let's "cast" our array to a list
time_list = # YOUR CODE HERE
print(f"Our list: {time_list}")

**1.2 Dictionaries**

Now let's define a dictionary, which contains key-value pairs that are unordered. This allows you to go and retrieve values using the keys.

In [None]:
# Example: Subject metadata
subject_data = {
    "ID": "S001",
    "height_cm": 175,
    "mass_kg": 72,
    "condition": "normal_walking"
}
# Let's grab the mass of the subject
# YOUR CODE HERE

# Add our x_positions list to the subject dictionary
# YOUR CODE HERE

# Let's inspect the keys and values in the dictionary
# YOUR CODE HERE

**1.3 Pandas DataFrames and Series**

- The Pandas Series is a 1D *labeled* array (think a single column of data that has a column header tied to it)
- The Pandas DataFrame is a 2D table (think of this as an Excel sheet with a tabular, row-and-column format)

Pandas has some functions built-in for loading ```csv```, ```txt```, and ```xlsx``` files into a DataFrame for further manipulation. We'll revisit this shortly.

In [None]:
# Define a Pandas Series
force_z = pd.Series([100, 150, 200, 180], name="Fz")

# Check the datatype
# YOUR CODE HERE

# Define a Pandas DataFrame
times = [0,0.1,0.2,0.3]
forces = [100, 150, 200, 180]
data_df = pd.DataFrame({'Time':times, 'Fz':forces})

# Pull only the time column using the column header name
time_col_header = # YOUR CODE HERE

# Pull only the time column using slice indexing
time_col_slice = # YOUR CODE HERE

# Check datatype(s)
print(f"data type: {type(time_col_header)}")
print(f"data type: {type(time_col_slice)}")


# 2. Functions

You have already used functions that are built into python  (```type```, ```size```) and a function included in a package you imported (```np.linspace()```). But what about when you need to define your own function? Let's build a simple plotting function using the matplotlib.pyplot package.

Key elements of a python function:
1. **def keyword** - Informs Python that a function is being defined.
2. **Function Name** - A unique identifier used to call the function later in the code. Names must start with a letter or underscore and are case-sensitive.
3. **Parentheses ()** - Follow the function name and contain optional parameters.
4. **Parameters** - Placeholders for data (arguments) that can be passed into the function when it is called.
5. **Colon** : - Marks the end of the function header and signals the start of the function body.
6. **Optional -> Return Annotation** - (In modern Python) Can be used to hint the expected return data type, although this is not enforced by Python itself. 

In [None]:
# YOUR CODE HERE

# Use our plotting function with our synthetic DataFrame from before
plot_data(data_df)

# 3. Let's practice with real data

**3.1 Define helper functions**

In [None]:
# Create a helper function to read in a file from a URL
def download_file(url, save_path) -> str:
    """
    Downloads a file from a given URL and saves it to a specified path.

    Args:
        url (str): The URL of the file to download.
        save_path (str): The local path where the file will be saved.

    Returns:
        save_path (str): The path where the file was saved.

    Raises:
        requests.exceptions.RequestException: If an error occurs during the download.
    """
    r = requests.get(url, stream=True)
    r.raise_for_status()
    with open(save_path, 'wb') as f:
        for chunk in r.iter_content(chunk_size=8192):
            f.write(chunk)
    return save_path

# Create a helper function to plot multiple timeseries signals from the same dataframe
def comparison_plot(df: pd.DataFrame, col_names: list, co_plot: bool = True) -> tuple:
    '''
    Function that creates an nx1 subplot or co plots n signals, where n is the length of col_names input arg.

    Args:
    --------
        df (DataFrame): DataFrame containing timeseries signals as columns and a column containing a time column
        col_names (list): A list of strings that specify which columns of the df to plot
        co_plot (bool): An optional flag that switches between co-plotting and creating an nx1 subplot with a shared x-axis

    Returns:
    ---------
        (fig,ax) pair

    Raises:
    ---------
        ValueError: If there is not a column that contains "time" (not case sensitive)
    '''
    
    # 1. Validate and Identify the Time Column
    # Search for a column name containing "time" (case-insensitive)
    time_col = next((col for col in df.columns if 'time' in col.lower()), None)
    
    if time_col is None:
        raise ValueError("Input DataFrame does not contain a column with 'time' in the name.")

    # 2. Determine Plot Layout
    num_vars = len(col_names)
    
    if co_plot:
        # Scenario A: Co-plot (All signals on one axis)
        fig, ax = plt.subplots(figsize=(12, 6))
        
        for col in col_names:
            ax.plot(df[time_col], df[col], label=col)
            
        ax.set_xlabel(time_col)
        ax.set_ylabel("Values")
        ax.set_title("Comparison Plot")
        ax.legend(loc='best')
        ax.grid(True, linestyle='--', alpha=0.6)
        
    else:
        # Scenario B: Subplots (nx1 grid with shared x-axis)
        fig, ax = plt.subplots(num_vars, 1, figsize=(12, 3 * num_vars), sharex=True)
        
        # If there is only 1 column to plot, ax is not a list. We make it a list for consistency.
        if num_vars == 1:
            ax = [ax]
            
        for i, col in enumerate(col_names):
            ax[i].plot(df[time_col], df[col], label=col)
            ax[i].set_ylabel(col)
            ax[i].grid(True, linestyle='--', alpha=0.6)
            ax[i].legend(loc='upper right')
            
        # Set x-label only on the bottom-most plot
        ax[-1].set_xlabel(time_col)
        
    plt.tight_layout()
    
    return fig, ax

**3.2 Load and visualize some real force plate data**

Let's assume we know that we are given 6 DoF ($F_x, F_y, F_z, M_x, M_y, M_z$) ground reaction force data for someone walking, but we don't know the orientation of the coordinate system. Given the figure below of $F_v$ (superior/inferior), $F_{AP}$ (anterior/posterior), and $F_{ML}$ (medial/lateral) anatomical directions, what is your conclusion? Which cartesian coordinate axis in the data corresponds to which set of anatomical directions?

<img src="assets/tekscan_locomotion_grf.jpg" alt="Gait GRF Signals" width="600">

In [None]:
# Download example force plate data
fp_data_filepath = download_file("https://uofi.box.com/shared/static/jm3dapmslt2ylbbsi7p0yux3mvpngpib.csv", "force_plate_data.csv")

# Read force plate data using built-in pandas csv reader
# YOUR CODE HERE

# Inspect the first few rows of the data using .head() method for pd.DataFrame
# YOUR CODE HERE

In [None]:
# Define which columns we wish to plot
plot_cols = # YOUR CODE HERE

# Use the provided plotting function to visualize the force plate timeseries data
# YOUR CODE HERE

# If needed, adjust axis limits
# YOUR CODE HERE

## 4. Debugging


For this example, let's say we want to are told the subject's mass for the locomotion force plate data was 75 kg, and we want to find the peak value of the normalized superior/inferior GRF signal. Let's write a function to extract this from our dataframe.

**4.1 Define function**

In [None]:
def peak_normalized_grf(df,subject_mass) -> float:
    '''
    Extract the peak normalized ground reaction force as a multiple of subject's bodyweight.
    
    Args:
    -------
        df (DataFrame) : Dataframe that should contain the superior/inferior GRF signal (in N)
        subject_mass (float): The subject's mass (in kg)

    Returns:
    -------
        peak_norm_grf (float) : The peak normalized GRF value (in N/kg)
 
    '''
    mass = 60 # Define the subject mass
    Fz = df[:,16] # Extract superior/inferior GRF signal
    Fz_max = Fz.max() # Find the maximum value

    return Fz_max/subject_mass

**4.2 Use the function**

Attempt to run the cell below. You should observe that the cell does not run, and instead you get what is called a **Traceback**. Let's go over a few basic steps for debugging:
1. Read the Traceback bottom to top, examining line numbers (if applicable)
2. Use the "print and inspect" method to check assumptions (what we think is happening vs. what is really happening)

**Note:** Gen AI, when provided file context, can come in very handy here. In some cases, however, it may suggest how to fix the error, but it may create other hidden issues. Understanding what should be happening and how everything works together becomes very important!

In [None]:
# Use the function we just defined
peak_grf = peak_normalized_grf(fp_data,75)

# Inspect the value we extracted
print(peak_grf)