The goal of this notebook is to test the modularity data analysis pipeline on a test case. 

In [1]:
from neuprint import Client
# remove my token before making notebook public
c = Client('neuprint.janelia.org', dataset='hemibrain:v1.2.1', token='eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJlbWFpbCI6ImdnMjExNEBjb2x1bWJpYS5lZHUiLCJsZXZlbCI6Im5vYXV0aCIsImltYWdlLXVybCI6Imh0dHBzOi8vbGgzLmdvb2dsZXVzZXJjb250ZW50LmNvbS9hLS9BT2gxNEdpb1lJLUVPLWdidGxPRTh6SmQ0eF9ZQ1Y4ZHF0YVFjWGlHeG5CMz1zOTYtYz9zej01MD9zej01MCIsImV4cCI6MTgxMDUyOTYzNH0.jv9eR0SH5RhfBdXrtp4r-dDFOhcsT8GBbE4v69ysCKs') 
c.fetch_version()

# import important stuff here
import numpy as np
import pandas as pd
import matplotlib

In [2]:
# testing on a neuron with few connections
test_neuron_Id = 676124666 #1815929980#

In [3]:
from neuprint import fetch_neurons
test_neuron_df, test_syns = fetch_neurons(test_neuron_Id)
test_neuron_df

Unnamed: 0,bodyId,instance,type,pre,post,downstream,upstream,mito,size,status,cropped,statusLabel,cellBodyFiber,somaRadius,somaLocation,roiInfo,notes,inputRois,outputRois
0,676124666,,,1,1,9,1,0,178283,,,,,,,"{'SNP(R)': {'pre': 1, 'post': 1, 'downstream':...",,"[SMP(R), SNP(R)]","[SMP(R), SNP(R)]"


In [4]:
test_syns

Unnamed: 0,bodyId,roi,pre,post,downstream,upstream,mito
0,676124666,NotPrimary,0,0,0,0,0
1,676124666,SMP(R),1,1,9,1,0
2,676124666,SNP(R),1,1,9,1,0


In [5]:
from neuprint import fetch_simple_connections

test_inputs = fetch_simple_connections(None, test_neuron_Id)
test_inputs

Unnamed: 0,bodyId_pre,bodyId_post,weight,type_pre,type_post,instance_pre,instance_post,conn_roiInfo
0,644421171,676124666,1,,,,,"{'SNP(R)': {'pre': 1, 'post': 1}, 'SMP(R)': {'..."


All of these downstream neurons are contacted from the same pre site.

In [6]:
from neuprint import fetch_simple_connections

test_outputs = fetch_simple_connections(test_neuron_Id,None)
test_outputs

Unnamed: 0,bodyId_pre,bodyId_post,weight,type_pre,type_post,instance_pre,instance_post,conn_roiInfo
0,676124666,423101189,1,,oviIN,,oviIN_R,"{'SNP(R)': {'pre': 1, 'post': 1}, 'SMP(R)': {'..."
1,676124666,644761952,1,,LHPD2c5,,LHPD2c5_R,"{'SNP(R)': {'pre': 1, 'post': 1}, 'SMP(R)': {'..."
2,676124666,5813009620,1,,SMP109,,SMP109_R,"{'SNP(R)': {'pre': 1, 'post': 1}, 'SMP(R)': {'..."
3,676124666,5813056054,1,,SMP050,,SMP050_R,"{'SNP(R)': {'pre': 1, 'post': 1}, 'SMP(R)': {'..."


Now that I got to see exactly what this test neuron is connected to, the official analysis pipeline starts below. 

I use the get_connectome function to get a dataframe with the neuron's connectome. I want connectome_to_undirected to be a function that can be called separately and that returns the directed connectome dataframe. It is useful for these to be separate so that we can do other analyses with them besides modularity. 

In [7]:
# obtain the connectome dataframe for the test neuron
from get_connectome import get_connectome

test_connectome = get_connectome(test_neuron_Id)
test_connectome

Unnamed: 0,bodyId_pre,bodyId_post,weight
0,423101189,644421171,9
1,423101189,644761952,59
2,423101189,5813009620,15
3,423101189,5813056054,78
4,644421171,423101189,51
5,644421171,644761952,3
6,644421171,676124666,1
7,644421171,5813009620,1
8,644421171,5813056054,1
9,644761952,423101189,20


In [8]:
# make the connectome undirected
from get_connectome import connectome_to_undirected

test_connectome_undirected = connectome_to_undirected(test_connectome)
test_connectome_undirected

Unnamed: 0,source,target,weight
0,423101189,644421171,60
1,423101189,644761952,79
2,423101189,5813009620,107
3,423101189,5813056054,80
4,644421171,644761952,3
5,644421171,676124666,1
6,644421171,5813009620,1
7,644421171,5813056054,1
8,644761952,5813009620,18
9,644761952,5813056054,5


The undirected connectome dataframe is ready to be exported to a csv file. From there it will be formatted using format_edgelist.py from the terminal window so that the nodes are no longer bodyIds but sequential numbers instead. Then the modularity should be able to run. I think that format_edgelist.py also makes the connectome undirected, so it is possible that this step is redundant.

When exporting, either remove the header row or make sure to use the header flag when calling format_edgelist.py. Appropriate results were obtained with
`python format_edgelist.py test_connectome_undirected.txt --sep comma` in the terminal window using the exported file created below. test_connectome_undirected.txt gets overwritten and replaced by the sequentially numbered nodes. A copy of the original is created that is called original_test_connectome_undirected.txt, as well as a few other files (clean_, degree_, info_, and key_ files).

The exported file can be .txt or .csv. The files that are created from format_edgelist.py will inherit the same file type. I chose to export as .txt.

In [34]:
# export the undirected connectome to a csv or txt file
test_connectome_undirected.to_csv('test_connectome_undirected.txt', index = False, header=False)
#test_connectome_undirected.to_csv('test_connectome_undirected.txt', index = False, header=False, sep=' ')

After running format_edgelist.py in the terminal window, I then run: `sh work.sh test_connectome_undirected.txt` but I think this actually generates the same files that Alex generates with format_edgelist.py. The only problem with going directly to work.sh after saving test_connectome_undirected.txt is that work.sh doesn't renumber the nodes which we need. So effectively, format_edgelist.py rolls that step into the work.sh step all in one go. Alternatively, we could just run the functions separately from format_edgelist.py in this notebook. I had to change the separator to ' ' when exporting 'test_connectome_undirected.txt' in order to avoid errors. Below I made an attempt but the resulting new_test_connectome_undirected.txt file had renumbered the nodes correctly but the numbers were not sequential which was odd. I think it would be better to run `format_edgelist(input)` in a notebook where the input is a struct that contains prefix, suffix, and file path as far as I can tell. This would run all the things the way that running format_edgelist.py in the terminal does. I can play around with that if needed but I'll move on since we have a working solution.

In [32]:
# alternative
from format_edgelist import read_graph
from format_edgelist import write_edges
from pathlib import Path

myfile = Path('test_connectome_undirected.txt')
nodes, degrees, edges = read_graph(myfile)

myoutfile = Path('new_test_connectome_undirected.txt')
write_edges(myoutfile, edges, nodes)

If I run `python format_edgelist.py test_connectome_undirected.txt --sep comma` in the terminal window and follow that `clang main.c help.c rg.c -Xpreprocessor -fopenmp -lomp -lm` and `./a.out 2 5 2 12345 0 test_connectome_undirected.txt`, the results_ file is blank for this test. It is also blank when the `sh work.sh test_connectome_undirected.txt` step is included, so it could just be that the dataset is too small to get any meaningful results.

After running `sh work.sh test_connectome_undirected.txt`, the required files appear and we next compile the C codes. I had to modify Pramesh's command to use clang given the weird compatibility issues with new macs and gcc and CMake.

`clang main.c help.c rg.c -Xpreprocessor -fopenmp -lomp -lm` to generate the a.out file.


`./a.out 2 5 2 12345 0 test_connectome_undirected.txt` generates results_test_connectome_undirected.txt and partition_test_connectome_undirected.txt, although in this test case that partition_ file didn't appear, perhaps because there are no partitions at chi=0 for this test connectome. Also the results_ file is empty.