# Looking at subgraphs

This notebook showcases: 

* How to count (maximal) directed simplices on certain subgraphs. 
* How to make (simple) computations on neighborhoods.

To make this computationally easy, we will use the C.elegans connectome. 

In [1]:
from connalysis.network import local, topology
from connalysis import randomization 
import pandas as pd
import numpy as np

In [2]:
# Load the connectome (see the "Loading_the_data" notebook)
from helpers import read_connectomes
data_dir="data" # You chosen data directory 
# Loading Celegans connectome 
conn=read_connectomes.load_C_elegans_stages(data_dir).filter("type").eq("chemical").default(8)
conn.add_vertex_property('valid_cell', (topology.node_degree(conn.matrix)!=0)) 
conn=conn.index("valid_cell").isin(True)  
adj=conn.matrix
adj.eliminate_zeros()

    Connections that are not present at a given stage, but at other stages will be represented as edges,
    but with a value of ``0`` synapses associated with them.  For structural analysis always use .eliminate_zeros


  edges_chem = syns_chem.index.to_frame().applymap(lambda x: nrn_idxx[x]).reset_index(drop=True)
  edges_elec = syns_elec.index.to_frame().applymap(lambda x: nrn_idxx[x]).reset_index(drop=True)


## Properties on neighborhoods 

For small graphs you can quickly obtain a the values of a property across neighborhoods.  See an example below for in- and out-degree on neighborhoods centered on a few nodes.

Take a minute to think about what are the types of neighborhoods of a node you can consider and how to call the different types.

In [3]:
local.property_at_neighborhoods(adj,topology.node_degree,
                                pre=True, post=True, include_center=True,
                                all_nodes=False,
                                centers=np.array([0,1,7]),
                                direction=("IN", "OUT"))

{0:       IN  OUT
 node         
 0     14   14
 1      1    5
 2      3   10
 3      3    4
 4      3    8
 5      3    3
 6     11    0
 7      6    0
 8      9    0
 9      7    0
 10     5    0
 11     4    3
 12     6    6
 13     1    3
 14     4    8
 15     2   10
 16     3    9
 17     3    7
 18     4    8
 19     6    7
 20     1    0
 21     1    0
 22     3    1
 23     3    3
 24     0    1
 25     3    2
 26     5    0
 27     3    5,
 1:       IN  OUT
 node         
 0     11   11
 1      1    5
 2      2    2
 3      1    5
 4      1    9
 5      8    0
 6     10    0
 7      4    0
 8      8    0
 9      4    0
 10     3    1
 11     0    7
 12     1    5
 13     1    6
 14     2    5
 15     2    6
 16     3    4
 17     5    6
 18     2    0
 19     2    0
 20     3    0
 21     1    3,
 7:       IN  OUT
 node         
 0      6   16
 1      8    5
 2      6    2
 3      5    4
 4      3   10
 5      2    5
 6      6    0
 7      5    0
 8      4    0
 9      3    0

In general, but specially if you graph is large, it is better to compute many properties at a time.  This way you don't recompute the neighborhoods each time you compute a property.  You can do this directly as shown below.

We compute for all neighborhoods:

* the size
* the number of edges
* the number of reciprocal edges
* the number of simplices up to dimension 3


In [4]:
func_config={
    'rc_edges':{'function': lambda x : topology.rc_submatrix(x).sum(),
                'kwargs': {}},
    'edges':{'function': lambda x : x.sum(),

                                       'kwargs': {}},
    'nodes':{'function':lambda x : x.shape[0],

                       'kwargs': {}}, 
    'simplex_counts':{'function':topology.simplex_counts,

                                 'kwargs': {"threads": 8, "max_dim":3}
                                }
}

out=local.properties_at_neighborhoods(adj, func_config, 
                                      pre=True, post=True, include_center=True, all_nodes=True)
# We format the output a bit to aid readability 
props=pd.concat([pd.DataFrame.from_dict(out).drop("simplex_counts", axis=1), 
             pd.DataFrame.from_dict(out['simplex_counts'], orient="index").fillna(0).rename(columns={i:f"{i}_simplices" for i in range(4)})],
             axis=1)
display(props)

Unnamed: 0,rc_edges,edges,nodes,0_simplices,1_simplices,2_simplices,3_simplices
0,104,428,28,28,117,193.0,125.0
1,17,280,22,22,75,86.0,38.0
2,71,295,30,30,116,150.0,80.0
3,138,407,32,32,126,192.0,126.0
4,149,458,19,19,98,232.0,266.0
...,...,...,...,...,...,...,...
214,7,32,9,9,15,8.0,1.0
215,7,53,12,12,20,12.0,3.0
216,0,6,3,3,3,1.0,0.0
217,0,6,4,4,6,4.0,1.0


# Simplices across populations 

We can compute generalized k-degree on a single line.  This is the number of k-simplices that map into a node for ``IN`` or that a node maps to for ``OUT``.

In [5]:
topology.node_k_degree(adj, direction=('IN', 'OUT'),max_dim=-1)

Unnamed: 0_level_0,1_out_degree,1_in_degree,2_out_degree,2_in_degree,3_out_degree,3_in_degree,4_out_degree,4_in_degree,5_out_degree,5_in_degree
node,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0,15.0,30.0,6.0,21.0,0.0,0.0,0.0,0.0,0.0,0.0
1,11.0,14.0,3.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0
2,68.0,1.0,48.0,0.0,15.0,0.0,0.0,0.0,0.0,0.0
3,85.0,4.0,73.0,0.0,29.0,0.0,0.0,0.0,0.0,0.0
4,61.0,13.0,98.0,7.0,62.0,0.0,9.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...
214,0.0,7.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
215,0.0,9.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0
216,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
217,0.0,3.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0


However, you might be more interested in doing this between specific populations.  That is, given two subsets of the nodes A and B (representing two different subpopulations of neurons) one could ask.  What is the the number of k-simplices in A such that all the nodes in that simplex map to B.  For example:

* If k = 0 this is just in-degree
* If k = 1 this is how many edges of A map to B


We can do this for subpopulations in Celegans given by their Soma Region.  If the source subpopulation is too small, there is a small change for finding any simplices.  

In [6]:
H_mat = conn.index("Soma Region").isin("H").matrix
from_H_mat = conn.submatrix(conn.index("Soma Region").isin("H").gids, sub_gids_post=conn.gids)
print("Shape of H submat and H to all submat")
print(H_mat.shape, from_H_mat.shape)

H_mat = conn.index("Soma Region").isin("H").matrix
from_H_to_T_mat = conn.submatrix(conn.index("Soma Region").isin("H").gids, sub_gids_post=conn.index("Soma Region").isin("T").gids)
print("Shape of H submat and H to T submat")
print(H_mat.shape, from_H_to_T_mat.shape)

Shape of H submat and H to all submat
(155, 155) (155, 219)
Shape of H submat and H to T submat
(155, 155) (155, 16)


Given the number of neurons in the different regions, we choose "H" as the source population and either all neurons or neurons in "T" at the target populations. 

We first need to extract the follwing matrices: 
* The square matrix of "H",
* The non-square matrix of edges from H to all
* The non-square matrix of edges from H to T

This is easily done with ``conntility``

Then we can compute the k-indegree from H to all or from H to T with the following lines.  Note that there are options for max_simplices if you want to count them instead.  You can also use threads and max_dim in case you computation is expensive.

In [7]:
df_H_to_all = topology.cross_col_k_in_degree(adj_cross = from_H_mat, 
                               adj_source = H_mat,
                               max_simplices=False,
                               threads=8,
                               max_dim=-1)
df_H_to_T = topology.cross_col_k_in_degree(adj_cross = from_H_to_T_mat, 
                               adj_source = H_mat,
                               max_simplices=False,
                               threads=8,
                               max_dim=-1)

You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  cross_col_deg[dim].loc[index]=deg
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Serie

In [8]:
print("H to all in-degree: ") 
display(df_H_to_all)
print("\n\nH to T in-degree: ") 
display(df_H_to_T)

H to all in-degree: 


dim,1,2,3,4,5,6
155,23,25,18,0,0,0
156,19,11,3,0,0,0
157,10,1,0,0,0,0
158,9,4,0,0,0,0
159,12,13,7,0,0,0
...,...,...,...,...,...,...
369,19,5,1,0,0,0
370,19,3,0,0,0,0
371,4,1,0,0,0,0
372,10,3,1,0,0,0




H to T in-degree: 


dim,1,2,3
155,3,1,0
156,6,1,0
157,18,12,6
158,9,2,0
159,3,1,0
160,3,0,0
161,11,10,2
162,10,0,0
163,10,0,0
164,6,0,0
