In [1]:
import pandas as pd

In [None]:
# load the dataframe
df = pd.read_csv('./tractmeasures.csv')
df

# Building a laterality dataframe
Now that we have a better understanding of the data and what can be done with pandas, we now need to get to the fun part. We need to compute the laterality index for each tract and build our laterality dataframe!

As a refesher, laterality is the concept that the properties of the two hemispheres differ. This follows along with functional work that have found hemispheric dominance in certain cognitive abilities, such as language processing, comprehension, and production.

In white matter, this can be examined by looking at the difference between structural properties of white matter tracts between the two hemispheres.

This index is just that...an index. Thus, we define laterality index as the ratio between the difference between the hemispheres and the sum of the hemispheres. We can quantify this using the following formula:
    
```math
laterality index = (left - right) / (left + right)
```

where left and right correspond to the microstructural property for a the left hemisphere tract and the right hemisphere tract.

So, now that we know how to define laterality index, we need to talk about some caveats to computing laterality index with this dataset:

### caveats
1. Nan's exist in this data. This is because outlier removal was already performed on this dataset. This detection was done  on a measure and structure basis. Because of this, hemisphere data may be uneven. We will have to do some specific things for this.
2. There exists some tracts that cross hemispheres (corpus callosal tracts). These include: anterioFrontalCC, forcepsMajor, forcepsMinor, middleFrontalCC, parietalCC. Fortunately, all the other tracts have a 'left' or 'right' suffix, which we can use to our advantage.

So, with this knowledge, here's the roadmap ahead of us.

First, we need to define functions that will do the following:

## What do we need to do
1. Seperate the dataframes 
2. Identify when there are uneven numbers between the left and right hemispheres for a given measure (i.e. **caveat 1**), and reduce it down to only the common subjects. 
3. compute laterality index between the left and right hemispheric measures and return the laterality index value and return it as a dataframe

First, let's define all the functions that we will need to build our laterality index dataframe! It is good practice to build code that you need to repeat into functions, and to define those at the very beginning.

## Define laterality index function
So, let's take this one step at a time. And we can make the first one a bit easier to start.

Let's first define a function that will compute laterality index for us. This function will take in as input the following values: 'left' which is the value of the microstructural property from the left hemisphere tract, and 'right' which is the same but for the right hemisphere. These are a series of numerical values of length N where N is the number of rows (data points). This function should then return the laterality index as a series of the same length.

In [None]:
# define laterality functions
def laterality_index(left,right):
    
    # HINT: look back at the forula from above
    li = 
    
    return li

Great! We now have a function to define laterality index!

Now, let's move onto probably the most complicated function to write.

## Define function to reduce dataframes to same size
Because of **caveat 1**, it is likely that there will be uneven numbers of subjects between the left hemisphere and right hemisphere data for a given tract. In these situations, we need to identify the common subjects between the two hemispheres and set both the left and right hemispheres to have the same subjects and same size. 

For this, we can write a function! This function takes in as inputs dataframes corresponding to the left and right hemisphere data. This function should then output dataframes for left hemisphere and right hemisphere tracts that are even in lenght and contain the same subjects and tracts. We will need to identify when the dataframes are of uneven length, and then subselect only the tracts and subjects that are common between the two hemispheres. We can then concatenate across all subjects to return a left and right dataframe of equal length.

In [None]:
# define function to grab some participants and structures from reference datasets (need to be done because outliers were detected on a measure and structure basis. hemisphere data may be uneven
def reduce_dataframes_to_same(left, right):
    
    # setup blank dataframes. one for left hemisphere and one for right (see pd.DataFrame)
    final_df_left = 
    final_df_right = 
    
    # set a "stem name" for each tract (i.e. doesn't have the left or right prefix). (see list comprehension with pandas)
    left['stem'] = [ f.split('left')[1] for f in left['structureID'] ]
    right['stem'] = 

    # write a logical statement that compares the lengths of left and right dataframes. if left is greater than right, set a new variable corresponding to the final subjects
    # 'sub' to the unique subjectIDs found in the right hemisphere. Else, set it to the subjectIDs from the left hemisphere (see pandas .unique())
    if len(left) > len(right):
        subs = 
    else:
        subs = 
        
    # loop through all the subjects to extract the data and build the final left and right dataframes. this may be slow but that's okay!
    for i in subs:
        # set a temporary dataframe for left and right for only the i subjects data (see pd.loc)
        tmp_left =  # subset data on subject basis: left hemisphere
        tmp_right = right.loc[right['subjectID'] == i] # subset data on subject basis: right hemisphere

        # generate lists of the tract stem names for both left and right hemispheres (see pandas tolist())
        tmp_left_stems = 
        tmp_right_stems = 
        
        # use list comprehension to identify the common stem names between left and right (see list comprehension)
        common_structures = 

        # subselect all the data corresponding to only the tracts that are found in the common structures list for both left and right hemispheres (see pandas .loc and pandas .isin)
        tmp_left_sub = tmp_left.loc[tmp_left['stem'].isin(common_structures)]
        tmp_right_sub = 

        # concatenate data to final dataframe for each hemisphere (see pd.concat)
        final_df_left =     
        final_df_right = 
    
    return final_df_left, final_df_right

## Define function to build a laterality dataframe
The final function we need to write is one that actually creates the laterality dataframe. This function will first generate two dataframes, one for left hemisphere tracts and one for right. It will also remove any missing data values. It will then need to check the lengths of the two dataframes to  see if they are unequal in length. If so, run the function we just defined above! If not, do a step that's in the function above which is adding the stem name column. Then, it will grab the values for the inputted measure, compute the laterality index, and return the laterality index dataframe.

In [5]:
# define function to build laterality dataframe
def laterality_dataframe(data, measure):

    # seperate hemispheric data (see multiple conditional .loc statments with pandas)
    left = data.loc[(data["hemisphere"] == ) & (~data[measure].isna())]
    right = 

    # check to make sure left and right comparisons are same. if not, call the function from above. if so, add a stem name column to each
    if len(left) != len(right):
        left, right = # call function we created above for reducing the dataframes to the same subjects and tracts
    else:
        left['stem'] = # see how this was done in previous function
        right['stem'] = 

    # grab data for given measure (see pandas .values)
    left_data = left[measure].
    right_data = 
    
    # compute laterality index and make dataframe. HINT: you can just subselect the 'subjectID', 'classID', 'structureID' columns from the left dataframe.
    # (see multiple column indexing in pandas)
    li = (left_data, right_data)
    laterality = left[] # subselect the appropriate columns
    
    # update with 'structureID' name with the 'stem' name and add the laterality index values
    laterality["structureID"] = 
    laterality["laterality_index"] = 
    
    return laterality

Excellent! We've now defined all the functions we need!

We can now start building our dataframes!

But first, let's do a couple more things to the starting dataframe. We will first set a 'hemisphere' column where the values will either be 'left' if the string 'left' is present in the structureID, 'right' if the string 'right' is present, otherwise NA. Remember we used this column to split the tracts in our previous function. Following this, we will compute some age bins (for replication purposes) using bins of 0-20, 20-40, 40-60, 60-80, 80-100.

## Add a hemisphere column and compute age bins

In [None]:
# set hemispheres. if 'left' in structureID, set value as 'left'. if 'right', set it as 'right'. if neither, set as 'NA' (see list comprehension and pandas columns)
df['hemisphere'] = [  if _ in f else _ if _ in f else 'NA' for f in df.structureID ] # make sure to replace the _ with appropriate things

# compute age bin. first set list of bins [0,20,40,60,80,100]. then see pd.cut to create a new column called 'age_bin'
bins = 
df['age_bin'] = pd.cut()

We are finally ready to generate our laterality dataframe!

## Generate laterality dataframe

In [None]:
# let's set a variable named measure to 'fa' to compute the laterality dataframe for the fa measure. we will then call our laterality_dataframe function
measure = 'fa'
laterality = laterality_dataframe()

# finally, let's merge subject-related demogrpahic data back to laterality dataframe and then save it to a csv (see pd.merge). before saving, we will remove duplicates
# see pd.drop_duplicates
laterality = pd.merge(,combined[['subjectID','classID','age','age_bin']],on=[])
laterality = 

# save as csv (see pandas .to_csv). make sure to not save the index. 
laterality.

Now let's take a look at our final dataframe!

In [None]:
laterality