# scona

scona is a tool to perform network analysis over correlation networks of brain regions. 
This tutorial will go through the basic functionality of scona, taking us from our inputs (a matrix of structural regional measures over subjects) to a report of local network measures for each brain region, and network level comparisons to a cohort of random graphs of the same degree. 

In [2]:
import numpy as np
import networkx as nx
import scona as scn
import scona.datasets as datasets

### Importing data

A scona analysis starts with four inputs.
* __regional_measures__
    A pandas DataFrame with subjects as rows. The columns should include structural measures for each brain region, as well as any subject-wise covariates. 
* __names__
    A list of names of the brain regions. This will be used to specify which columns of the __regional_measures__ matrix to want to correlate over.
* __covars__ _(optional)_ 
    A list of your covariates. This will be used to specify which columns of __regional_measure__ you wish to correct for. 
* __centroids__
    A list of tuples representing the cartesian coordinates of brain regions. This list should be in the same order as the list of brain regions to accurately assign coordinates to regions. The coordinates are expected to obey the convention the the x=0 plane is the same plane that separates the left and right hemispheres of the brain. 

In [3]:
# Read in sample data from the NSPN WhitakerVertes PNAS 2016 paper.
df, names, covars, centroids = datasets.NSPN_WhitakerVertes_PNAS2016.import_data()

In [4]:
df.head()

Unnamed: 0.1,Unnamed: 0,nspn_id,occ,centre,study_primary,age_scan,sex,male,age_bin,mri_centre,...,rh_supramarginal_part5,rh_supramarginal_part6,rh_supramarginal_part7,rh_frontalpole_part1,rh_temporalpole_part1,rh_transversetemporal_part1,rh_insula_part1,rh_insula_part2,rh_insula_part3,rh_insula_part4
0,0,10356,0,Cambridge,2K_Cohort,20.761,Female,0.0,4,WBIC,...,2.592,2.841,2.318,2.486,3.526,2.638,3.308,2.583,3.188,3.089
1,1,10702,0,Cambridge,2K_Cohort,16.055,Male,1.0,2,WBIC,...,3.448,3.283,2.74,3.225,4.044,3.04,3.867,2.943,3.478,3.609
2,2,10736,0,Cambridge,2K_Cohort,14.897,Female,0.0,1,WBIC,...,3.526,3.269,3.076,3.133,3.9,2.914,3.894,2.898,3.72,3.58
3,3,10778,0,Cambridge,2K_Cohort,20.022,Female,0.0,4,WBIC,...,2.83,2.917,2.647,2.796,3.401,3.045,3.138,2.739,2.833,3.349
4,4,10794,0,Cambridge,2K_Cohort,14.656,Female,0.0,1,WBIC,...,2.689,3.294,2.82,2.539,2.151,2.734,2.791,2.935,3.538,3.403


### Create a correlation matrix
We calculate residuals of the matrix df for the columns of names, correcting for the columns in covars.

In [5]:
df_res = scn.create_residuals_df(df, names, covars)

In [6]:
df_res

Unnamed: 0,lh_bankssts_part1,lh_bankssts_part2,lh_caudalanteriorcingulate_part1,lh_caudalmiddlefrontal_part1,lh_caudalmiddlefrontal_part2,lh_caudalmiddlefrontal_part3,lh_caudalmiddlefrontal_part4,lh_cuneus_part1,lh_cuneus_part2,lh_entorhinal_part1,...,rh_supramarginal_part5,rh_supramarginal_part6,rh_supramarginal_part7,rh_frontalpole_part1,rh_temporalpole_part1,rh_transversetemporal_part1,rh_insula_part1,rh_insula_part2,rh_insula_part3,rh_insula_part4
0,-0.018479,-0.039867,0.038789,-0.004891,0.042484,-0.005625,-0.259031,-0.181875,-0.207344,-0.125609,...,-0.447062,-0.133609,-0.40075,-0.459313,-0.149781,-0.108094,-0.340437,-0.170906,-0.202625,-0.426625
1,0.278521,0.351133,0.485789,0.697109,0.408484,0.445375,0.383969,0.315125,0.374656,0.334391,...,0.408937,0.308391,0.02125,0.279688,0.368219,0.293906,0.218562,0.189094,0.087375,0.093375
2,0.166521,0.078133,0.368789,0.412109,0.285484,0.187375,0.569969,-0.072875,-0.077344,0.368391,...,0.486938,0.294391,0.35725,0.187688,0.224219,0.167906,0.245562,0.144094,0.329375,0.064375
3,-0.088479,-0.252867,-0.401211,-0.362891,-0.044516,-0.154625,-0.163031,0.018125,-0.261344,-0.015609,...,-0.209062,-0.057609,-0.07175,-0.149312,-0.274781,0.298906,-0.510438,-0.014906,-0.557625,-0.166625
4,0.398521,0.133133,0.128789,-0.218891,-1.001516,-0.142625,-0.759031,-0.102875,-0.565344,-1.229609,...,-0.350063,0.319391,0.10125,-0.406312,-1.524781,-0.012094,-0.857437,0.181094,0.147375,-0.112625
5,0.008521,-0.093867,-0.245211,-0.131891,-0.078516,-0.110625,-0.149031,0.135125,-0.062344,0.149391,...,-0.123062,-0.084609,-0.14975,-0.137312,-0.247781,0.204906,0.307563,0.071094,0.256375,0.066375
6,-0.120479,0.130133,0.092789,0.047109,0.270484,0.017375,-0.044031,0.050125,0.013656,-0.220609,...,0.054937,0.017391,-0.01375,-0.090313,0.304219,-0.112094,-0.052437,-0.026906,-0.103625,0.015375
7,0.104521,0.040133,0.358789,0.174109,0.251484,0.068375,-0.024031,0.109125,0.080656,0.327391,...,0.401938,0.124391,0.23525,-0.252312,-0.220781,0.324906,-0.374438,0.004094,-0.039625,-0.151625
8,0.431521,0.034133,0.194789,0.103109,0.208484,-0.091625,0.203969,-0.116875,-0.005344,0.382391,...,0.077937,0.172391,0.29525,-0.106313,0.170219,0.057906,0.385562,0.013094,-0.071625,0.008375
9,0.189521,0.078133,0.096789,0.100109,0.014484,0.180375,0.116969,-0.073875,-0.195344,-0.329609,...,-0.223062,-0.032609,-0.19275,0.504688,-0.001781,0.118906,0.394563,-0.196906,-0.228625,0.032375


Now we create a correlation matrix over the columns of df_res

In [7]:
M = scn.create_corrmat(df_res, method='pearson')

## Create a weighted graph

A short sidenote on the BrainNetwork class: This is a very lightweight subclass of the [`Networkx.Graph`](https://networkx.github.io/documentation/stable/reference/classes/graph.html) class. This means that any methods you can use on a `Networkx.Graph` object can also be used on a `BrainNetwork` object, although the reverse is not true. We have added various methods which allow us to keep track of measures that have already been calculated, which, especially later on when one is dealing with 10^3 random graphs, saves a lot of time.  
All scona measures are implemented in such a way that they can be used on a regular `Networkx.Graph` object. For example, instead of `G.threshold(10)` you can use `scn.threshold_graph(G, 10)`.  
Also you can create a `BrainNetwork` from a `Networkx.Graph` `G`, using `scn.BrainNetwork(network=G)`

Initialise a weighted graph `G` from the correlation matrix `M`. The `parcellation` and `centroids` arguments are used to label nodes with names and coordinates respectively. 

In [8]:
G = scn.BrainNetwork(network=M, parcellation=names, centroids=centroids)

### Threshold to create a binary graph

We threshold G at cost 10 to create a binary graph with 10% as many edges as the complete graph G. Ordinarily when thresholding one takes the 10% of edges with the highest weight. In our case, because we want the resulting graph to be connected, we calculate a minimum spanning tree first. If you want to omit this step, you can pass the argument `mst=False` to `threshold`.
The threshold method does not edit objects inplace

In [9]:
H = G.threshold(10)

### Calculate nodal summary. 

`calculate_nodal_measures` will compute and record the following nodal measures 

* average_dist (if centroids available)
* total_dist (if centroids available)
* betweenness
* closeness
* clustering coefficient
* degree
* interhem (if centroids are available)
* interhem_proportion (if centroids are available)
* nodal partition
* participation coefficient under partition calculated above
* shortest_path_length

`export_nodal_measure` returns nodal attributes in a DataFrame. Let's try it now.

In [10]:
H.report_nodal_measures().head()

Unnamed: 0,name,centroids,x,y,z
0,lh_bankssts_part1,"[-56.40355, -40.152663, 1.708876]",-56.4036,-40.1527,1.70888
1,lh_bankssts_part2,"[-53.140506, -49.843038, 8.264557]",-53.1405,-49.843,8.26456
2,lh_caudalanteriorcingulate_part1,"[-5.001684, 20.645903, 25.733446]",-5.00168,20.6459,25.7334
3,lh_caudalmiddlefrontal_part1,"[-33.265925, 20.200202, 45.347826]",-33.2659,20.2002,45.3478
4,lh_caudalmiddlefrontal_part2,"[-31.958115, 2.146597, 51.26911]",-31.9581,2.1466,51.2691


Use `calculate_nodal_measures` to fill in a bunch of nodal measures

In [11]:
H.calculate_nodal_measures()

In [12]:
H.report_nodal_measures().head()

Unnamed: 0,name,centroids,betweenness,closeness,clustering,degree,module,participation_coefficient,shortest_path_length,x,y,z
0,lh_bankssts_part1,"[-56.40355, -40.152663, 1.708876]",0.00824713,0.495961,0.3358,47,0,0.717067,2.00974,-56.4036,-40.1527,1.70888
1,lh_bankssts_part2,"[-53.140506, -49.843038, 8.264557]",0.0124798,0.507438,0.278788,55,0,0.809587,1.96429,-53.1405,-49.843,8.26456
2,lh_caudalanteriorcingulate_part1,"[-5.001684, 20.645903, 25.733446]",0.0,0.336254,1.0,2,1,0.75,2.96429,-5.00168,20.6459,25.7334
3,lh_caudalmiddlefrontal_part1,"[-33.265925, 20.200202, 45.347826]",0.0120765,0.525685,0.383485,83,2,0.459864,1.8961,-33.2659,20.2002,45.3478
4,lh_caudalmiddlefrontal_part2,"[-31.958115, 2.146597, 51.26911]",0.0292617,0.549195,0.293617,95,2,0.688753,1.81494,-31.9581,2.1466,51.2691


We can also add measures as one might normally add nodal attributes to a networkx graph

In [13]:
nx.set_node_attributes(H, name="hat", values={x: x**2 for x in H.nodes})

These show up in our DataFrame too

In [14]:
H.report_nodal_measures(columns=['name', 'degree', 'hat']).head()

Unnamed: 0,name,degree,hat
0,lh_bankssts_part1,47,0
1,lh_bankssts_part2,55,1
2,lh_caudalanteriorcingulate_part1,2,4
3,lh_caudalmiddlefrontal_part1,83,9
4,lh_caudalmiddlefrontal_part2,95,16


### Calculate Global measures

In [15]:
H.calculate_global_measures()

{'average_clustering': 0.4498887255891581,
 'average_shortest_path_length': 2.376242649858285,
 'assortativity': 0.09076922258276784,
 'modularity': 0.3828553111606414,
 'efficiency': 0.47983958611582744}

In [16]:
H.rich_club();

## Create a GraphBundle

The `GraphBundle` object is the scona way to handle across network comparisons. What is it? Essentially it's a python dictionary with `BrainNetwork` objects as values. 

In [17]:
brain_bundle = scn.GraphBundle([H], ['NSPN_cost=10'])

This creates a dictionary-like object with BrainNetwork `H` keyed by `'NSPN_cost=10'`

In [18]:
brain_bundle

{'NSPN_cost=10': <scona.classes.BrainNetwork at 0x7f9c66705208>}

Now add a series of random_graphs created by edge swap randomisation of H (keyed by `'NSPN_cost=10'`)

In [19]:
# Note that 10 is not usually a sufficient number of random graphs to do meaningful analysis,
# it is used here for time considerations
brain_bundle.create_random_graphs('NSPN_cost=10', 10)

        Creating 10 random graphs - may take a little while


In [20]:
brain_bundle

{'NSPN_cost=10': <scona.classes.BrainNetwork at 0x7f9c66705208>,
 'NSPN_cost=10_R0': <scona.classes.BrainNetwork at 0x7f9c60f85710>,
 'NSPN_cost=10_R1': <scona.classes.BrainNetwork at 0x7f9c64603d30>,
 'NSPN_cost=10_R2': <scona.classes.BrainNetwork at 0x7f9c646035c0>,
 'NSPN_cost=10_R3': <scona.classes.BrainNetwork at 0x7f9c666bac88>,
 'NSPN_cost=10_R4': <scona.classes.BrainNetwork at 0x7f9c666baac8>,
 'NSPN_cost=10_R5': <scona.classes.BrainNetwork at 0x7f9c666ba7b8>,
 'NSPN_cost=10_R6': <scona.classes.BrainNetwork at 0x7f9c61fe8588>,
 'NSPN_cost=10_R7': <scona.classes.BrainNetwork at 0x7f9c61fe8c50>,
 'NSPN_cost=10_R8': <scona.classes.BrainNetwork at 0x7f9c61fe8940>,
 'NSPN_cost=10_R9': <scona.classes.BrainNetwork at 0x7f9c61fe8860>}

### Report on a GraphBundle

The following method will calculate global measures ( if they have not already been calculated) for all of the graphs in `graph_bundle` and report the results in a DataFrame. We can do the same for rich club coefficients below.

In [21]:
brain_bundle.report_global_measures()

Unnamed: 0,assortativity,average_clustering,average_shortest_path_length,efficiency,modularity
NSPN_cost=10,0.090769,0.449889,2.376243,0.47984,0.382855
NSPN_cost=10_R0,-0.088852,0.243845,2.078768,0.520413,0.127825
NSPN_cost=10_R1,-0.066267,0.225807,2.092094,0.518241,0.126067
NSPN_cost=10_R2,-0.103746,0.233042,2.077816,0.520524,0.127926
NSPN_cost=10_R3,-0.078599,0.230792,2.090148,0.518603,0.132649
NSPN_cost=10_R4,-0.061296,0.223642,2.087906,0.518872,0.131772
NSPN_cost=10_R5,-0.100853,0.229629,2.083146,0.519853,0.127226
NSPN_cost=10_R6,-0.069217,0.222723,2.089048,0.518659,0.126808
NSPN_cost=10_R7,-0.073193,0.226694,2.089852,0.518797,0.130764
NSPN_cost=10_R8,-0.100933,0.222562,2.088307,0.518941,0.124251


In [22]:
brain_bundle.report_rich_club()

Unnamed: 0,NSPN_cost=10,NSPN_cost=10_R0,NSPN_cost=10_R1,NSPN_cost=10_R2,NSPN_cost=10_R3,NSPN_cost=10_R4,NSPN_cost=10_R5,NSPN_cost=10_R6,NSPN_cost=10_R7,NSPN_cost=10_R8,NSPN_cost=10_R9
0,0.100004,0.100004,0.100004,0.100004,0.100004,0.100004,0.100004,0.100004,0.100004,0.100004,0.100004
1,0.103228,0.103228,0.103228,0.103228,0.103228,0.103228,0.103228,0.103228,0.103228,0.103228,0.103228
2,0.107244,0.107175,0.107175,0.107175,0.107175,0.107175,0.107175,0.107175,0.107175,0.107175,0.107175
3,0.112039,0.111920,0.111920,0.111920,0.111920,0.111920,0.111920,0.111920,0.111920,0.111920,0.111920
4,0.117842,0.117564,0.117564,0.117564,0.117564,0.117564,0.117564,0.117564,0.117564,0.117589,0.117589
5,0.122398,0.121950,0.121950,0.121950,0.121950,0.121950,0.121950,0.121976,0.121950,0.121976,0.121976
6,0.127975,0.127254,0.127226,0.127282,0.127254,0.127282,0.127226,0.127282,0.127226,0.127282,0.127254
7,0.131899,0.131121,0.131092,0.131179,0.131150,0.131150,0.131092,0.131150,0.131092,0.131179,0.131121
8,0.136820,0.135855,0.135855,0.135976,0.135885,0.135915,0.135855,0.135945,0.135885,0.135976,0.135976
9,0.141069,0.139877,0.139908,0.140003,0.139971,0.140003,0.139877,0.140034,0.139940,0.140034,0.140034
