# Project Outline/Data Set Plan

### Statistical Analysis of Density and other structural brain network topologies associate with aspects of value-based decision making

## Background

The brain is a highly organized network consisting of approximately 86 billion interconnected neurons (CITE). There are input-output computations made across all regions of the brain that are connected by bundles of axon fibers that communicate across long distances (CITE). Fast and efficient communication throughout the brain is necessary for nearly all cognitive processes conveying information that is processed in different regions of the brain to other regions throughout the brain. This highly complex network is organized through cell bodies, dendrites, and axon terminals of these neurons that, together, make up the “grey matter”, whereas the axons connecting the cell bodies to the axon terminals make up the “white matter”. Grey matter therefore is the brain regions or nodes that process information, and white matter axon bundles or edges are the connections in which information is sent between separate and distinct brain regions. Together they form the wiring architecture of the human brain. 

The exact nature of the wiring architecture of the human brain, much like a circuit in a computer, impacts brain function, leading to subsequent cognition (McCulloch, 1944; Johansen-Berg, 2010; Hermundstad et al., 2014). The static organization of the structural architecture of the brain is both modular and hierarchical, which supports executing local operations and global integration of segregated functions (Park & Friston, 2013). Being able to measure the individual differences in structural connectivity of brain networks and their subsequent behavior would allow for the ability to explain the neural constraints on complex cognition (Verstynen 2015). 

The structural connectivity of brain networks are measured through a technique known as diffusion-weighted imaging (DWI) that takes advantage of the diffusion properties of water molecules within the axons of the myelinated white matter fascicles. Diffusion tensor imaging is one of the most popular DWI sampling schemes, sampling a few dozen orthogonal diffusion directions that are used to calculate a tensor of average diffusion direction within each voxel  (for review, see Vettel et al. 2017). DWI has been used in conjunction with graph theoretic structural topology measures in order to understand the functional organization underlying structural networks (for review see Bullmore & Sporns, 2009). There is growing evidence that many complex brain networks have small-worldness properties, that can be characterized as a dense local clustering between neighboring nodes forming modules paired with short path length between any pairs between modules. This small-worldness supports the distributed nature of distinct brain areas while also demonstrating how these modules are integrated into global brain networks (Bassett & Bullmore 2006). However, we still have a limited understanding on how the topological organization of structural connections in the brain predicts individual differences in complex cognitive abilities. 

In order to gain insights into how the brain may be organized to carry out executive processes, we looked into how the structural network organization may explain differences in executive abilities that are  necessary to complete complex cognitive tasks. Using DWI methods, we measured whether individual differences in white matter topology, as seen through graph theoretic measures, associate with value based decision making (payoff or sensitivity to frequency of reward). Value-based decision-making is a complex task that uses visual perception, attention, working memory, reinforcement learning, executive control, and other lower order functions in order to synthesize our decisions, and therefore relies on the efficient communication across global brain networks (Bechara et al., 1994). We hypothesize that individual differences in brain network topology predict differences in a complex, value-based decision task. 


### Variables of Interest

**Subject ID:** Lab_ID


**Demographic Measures:**
 - **Sex:** (Sex) As reported by participants (options: male or female)
 - **Age:** (Age) As collected by the experimenters (in years)


**Binary and Baseline Structural Topology Measures:**
 - **Density:** (density_baseline) the fraction of present connections to all possible connections without taking into account any connection weights in the calculation (Rubinov & Sporns 2010)
 - **Clustering Coefficient Average**: (clustering_coeff_average.binary._baseline)
 - **Transitivity:** (transitivity.binary._baseline) the ratio of triangles to triplets in the network, and can be used as an alternative measure to the clustering coefficient (Rubinov & Sporns 2010), although these are not identical metrics
 - **Network Characteristic Path Length:** (network_characteristic_path_length.binary._baseline) the average shortest path length in the network (Rubinov & Sporns)
 - **Small Worldness:** (small.worldness.binary._baseline) dense local clustering or cliquishness of connections between neighboring nodes yet a short path length between any (distant) pair of nodes due to the existence of relatively few long range connections (Bassett & Bullmore 2006)
 - **Global Efficiency:** (global_efficiency.binary._baseline) the average inverse shortest path length in the network (Rubinov & Sporns 2010)
 - **Local Efficiency:** (local_efficiency.binary._baseline) the global efficiency computed on node neighborhoods, and is related to the clustering coefficient (Rubinov & Sporns 2010)
 - **Assortativity Coefficient:** (assortativity_coefficient.binary._baseline) a correlation coefficient between the degrees of all nodes on two opposite ends of a link (Rubinov & Sporns). A positive value would indicate that nodes tend to link to other nodes with the same or a similar degree (Rubinov & Sporns 2010)


**Baseline IGT Measures**
 - **Payoff (P score):** (baseline_p) difference between the participant’s total selections from the “advantageous” decks minus the “disadvantageous” decks
 - **Sensitivity to Frequency of Reward (Q score):** (baseline_q) difference between the participant’s total selections from decks with a high reward frequency minus the decks with a low reward frequency

### Hypotheses

- We hypothesize that individual differences in brain network topology predict differences in a complex, value based decision task.
- Tentative: Which measures show the greatest differences between low and high P scores, as well as low and high Q scores. (Slightly redundant)
- Tentative: Interaction between individual differences in brain network topology and sex when predicting differences in a complex, value based decision task.
- Tentative: Interaction between individual differences in brain network topology and age when predicting differences in a complex, value based decision task.


### Data Organization

 - Description of Data Architecture
 
 - Completed Analyses:
    - Distribution Analysis
    - Correlation Matrices 
    - Principal Component Analyses 
    - Generalized Linear Model
    - Cross-Validation Analysis
 
 - New Possible Analyses:
 
    - **Bootstrapping**
    - **Prediction question - holdout test prediction** (cross validation)
    - Weight question
    - Permutation testing
    - Decision Trees
    - Model Selection (?)
    - Support Vector Regression (Machines are categorical)
    - Come up with my own statistic
    - Come up with a neural network, (spiking neural network) → 
    - Boltzmann machine with back propagation


### Data Cleansing

#### Data Auditing

(Rough steps completed (more detail to be added later))

0. Make new csv file from original data file

1. Only keep data that is relevant to my questions

2. Remove participants with missing data

#### Workflow Specification

No data anamolies present.

#### Workflow Execution and Post-processing Control

No automated processes made yet to assess data quality.

### Resulting Tidy-compliant Table:

In [3]:
# Not final tidy-complaint table
mergedWINData = read.csv(file="~/Desktop/coax/WINCode/mergedWINData.csv")
mergedWINData

X,Lab_ID,density_baseline,clustering_coeff_average.binary._baseline,transitivity.binary._baseline,transitivity.weighted._baseline,network_characteristic_path_length.binary._baseline,network_characteristic_path_length.weighted._baseline,small.worldness.binary._baseline,small.worldness.weighted._baseline,⋯,Sex,Age,yrs_edu,Smoker,Handedness,base_bmi_measured,DXA,VO2mlkgmin,baseline_p,baseline_q
1,106,0.0502380,0.237317,0.01407030,0.0576544,3.72258,3.72258,0.0637507,0.00324428,⋯,1,46,20,0,0,27.98209,0.4450000,20.8,12,42
2,127,0.0544685,0.292529,0.01329590,0.0624257,3.45408,3.45408,0.0846908,0.00433501,⋯,0,45,16,0,0,31.63888,0.3238345,31.3,8,10
3,129,0.0565838,0.262644,0.04731490,0.1098690,3.45600,3.45600,0.0759963,0.00973699,⋯,1,52,16,0,0,29.36326,0.4492734,23.0,58,46
4,130,0.0528821,0.241766,0.05706050,0.1508740,4.01388,4.01388,0.0602323,0.00896556,⋯,1,51,16,0,0,35.26349,0.4948101,16.9,30,32
5,131,0.0560550,0.271800,0.02034770,0.1149910,3.42474,3.42474,0.0793635,0.00614783,⋯,0,43,19,0,0,28.72069,0.3286354,25.8,68,38
6,133,0.0608144,0.257770,0.02232990,0.1020630,3.60107,3.60107,0.0715815,0.00790056,⋯,0,51,12,0,0,30.70825,0.3109150,37.3,48,-38
7,136,0.0549974,0.252375,0.05229420,0.1318120,3.38854,3.38854,0.0744791,0.00563007,⋯,1,39,18,0,0,33.44539,0.4359296,15.6,36,62
8,140,0.0592279,0.260411,0.02913240,0.0550979,3.30548,3.30548,0.0787814,0.00380613,⋯,1,53,18,0,0,28.61882,0.4190068,24.5,62,38
9,142,0.0565838,0.240678,0.05255050,0.1780750,3.80262,3.80262,0.0632926,0.01689790,⋯,1,47,16,0,0,32.78539,0.4621656,17.7,8,70
10,143,0.0555262,0.259412,0.02611450,0.0936502,3.23769,3.23769,0.0801225,0.00762512,⋯,1,52,12,1,0,28.78499,0.4298379,25.8,6,-2


### Analysis

- show both data visualizations and summarize the results from your models

### Conclusions

- short (1 paragraph) conclusion with respect to the models you have proposed