# Manipulating data with an AdjacencyFrame
Examples from a maggot brain connectome dataset.

```{note}
Eventually this will be using neuropull itself to grab the data, instead of
pulling from my computer.
```

## Load the data


In [1]:
from pathlib import Path

import networkx as nx
import numpy as np
import pandas as pd

data_path = Path("neuropull/processing/raw_data/maggot/2022-09-25")

g = nx.read_edgelist(
    data_path / "G_edgelist.txt",
    delimiter=" ",
    data=[("weight", float)],
    create_using=nx.DiGraph,
    nodetype=int,
)
nodes = pd.read_csv(data_path / "meta_data.csv", index_col=0)
nodes = nodes[nodes.index.isin(g.nodes)]
adj = nx.to_pandas_adjacency(g, nodelist=nodes.index)

## Load the data into an adjacency frame
To load the data as an adjacency frame, we simply need to specify the adjacency matrix
(as a pandas DataFrame) and the node data (as a pandas DataFrame). Note that the
data is not copied by default, so if you modify the underlying data, this would be
reflected in the adjacency frame.

In [2]:

from neuropull.graph import AdjacencyFrame

af = AdjacencyFrame(adj.copy(), nodes.copy(), nodes.copy())
af

AdjacencyFrame with shape: (3549, 3549)
Source node features: 58
Target node features: 58

Let's look at what this frame consists of. First is the adjacency matrix itself

In [3]:
af.data

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

This adjacency matrix is always indexed the same way as the node data. We can look
at this index (for the rows) with the `index` attribute.

In [4]:
af.index

Int64Index([ 7766016,  7790597, 18833414, 15564807, 17383431,  7782409,
            14434314, 15458316, 11018254, 11714576,
            ...
             8142831, 12787696, 15589362, 11067379,  2637812,  3882998,
            15638520, 19111930, 15679484, 16629757],
           dtype='int64', length=3549)

And similarly for the columns via the `columns` attribute.

In [5]:
af.columns

Int64Index([ 7766016,  7790597, 18833414, 15564807, 17383431,  7782409,
            14434314, 15458316, 11018254, 11714576,
            ...
             8142831, 12787696, 15589362, 11067379,  2637812,  3882998,
            15638520, 19111930, 15679484, 16629757],
           dtype='int64', length=3549)

Note that here these are the same, but be aware that this need not be the case.

The other components of the data are the node metadata (again, by default,
potentially different for rows and columns). This is stored in row_objects and col_objects.

In [6]:
print(af.row_objects)

                                                       name  neurons  \
7766016                                       H-shaped _a1l    False   
7790597                                          vchA/B a1l    False   
18833414                                 MN-R-Sens-B2-VM-23    False   
15564807                                AN-R-Sens-B1-AVa-22    False   
17383431                   BAlp_ant ascending dendrite left     True   
...                                                     ...      ...   
3882998   CP contra to SEZ left; MB1: incomplete neuron ...     True   
15638520                                 MN-R-Sens-B3-VM-06    False   
19111930                               MN13 ISN MN-VL2_a1l     False   
15679484                                 MN-L-Sens-B2-VM-08    False   
16629757                                         KC no pair     True   

          paper_clustered_neurons   left  right  center   sink  \
7766016                      True   True  False   False  False   
779

## Sorting

We can sort the adjacency matrix and metadata according to some attributes. For example,
alphabetically by name:

In [7]:
sorted_af = af.sort_values("name", axis="both")
print(sorted_af.row_objects.head().iloc[:, :5])

                                          name  neurons  \
7753261          (dda E3 or dda A)_a1l_JMpair2    False   
3486867   130729_SFO Cand 42a-42b_OSN_IN1 left     True   
5478818  130729_SFO Cand 42a-42b_OSN_IN1 right     True   
7527710                           13a ORN left    False   
4073353                          13a ORN right    False   

         paper_clustered_neurons   left  right  
7753261                    False   True  False  
3486867                     True   True  False  
5478818                     True  False   True  
7527710                     True   True  False  
4073353                     True  False   True  


... or by the amount of incoming synapses onto the dendrite:

In [8]:
sorted_af = af.sort_values("dendrite_input", axis="both", ascending=False)
print(sorted_af.row_objects.head()[["name", "dendrite_input"]])

                    name  dendrite_input
8980589      MBE18 right          1481.0
16223537      MBE18 left          1282.0
8198238   broad T2 right          1201.0
5030808    keystone left          1129.0
6557581   broad T3 right          1127.0


In any case, the adjacency matrix is always sorted the same way as the metadata.

In [9]:
from graspologic.plot import adjplot

# adjplot(af.data, plot_type="scattermap", sizes=(1, 1))
# adjplot(sorted_af.data, plot_type="scattermap", sizes=(1, 1))


  from .autonotebook import tqdm as notebook_tqdm


## Filtering and subsetting

We can filter the adjacency matrix and metadata according to some attributes.
For example, we could select only the nodes in the left hemisphere.

In [10]:
query_af = af.query("hemisphere == 'L'", axis="both")
print(query_af)

AdjacencyFrame with shape: (1772, 1772)
Source node features: 58
Target node features: 58



Note that since the adjacency frame is just a thin wrapper around pandas DataFrames,
we can use all the usual pandas methods to filter the data. For example, we can select
only the nodes that are in the left hemisphere and have a dendrite input greater than
some threshold.

In [11]:
query_af = af.query("hemisphere == 'L' and dendrite_input > 100", axis="both")
print(query_af)

AdjacencyFrame with shape: (608, 608)
Source node features: 58
Target node features: 58



We can also select a non-induced subgraph: that is, a set of edges that go from one
set of nodes to a potentially different set of target nodes. Here, we select the
connections from the left hemisphere to the right hemisphere.

In [12]:
query_af = af.query("hemisphere == 'L'", axis=0).query("hemisphere == 'R'", axis=1)
print(query_af.source_nodes["hemisphere"])
print(query_af.target_nodes["hemisphere"])

7766016     L
7790597     L
17383431    L
7782409     L
14434314    L
           ..
11067379    L
3882998     L
19111930    L
15679484    L
16629757    L
Name: hemisphere, Length: 1772, dtype: object
18833414    R
15564807    R
15458316    R
11714576    R
10133525    R
           ..
15564782    R
12787696    R
15589362    R
2637812     R
15638520    R
Name: hemisphere, Length: 1775, dtype: object


## Grouping
Rather than selecting groups of nodes (like, say, the left hemisphere) one at a time,
it is often convenient to subselect the entire network based on some column.

First, let's remove a couple of nodes which are in the center (not on left or right).

In [13]:
af = af.query("hemisphere != 'C'", axis="both")

Then, let's do a grouping operation.

In [14]:
groupby = af.groupby("hemisphere", axis="both")
groupby

<neuropull.graph.base_frame.FrameGroupBy at 0x13d1fc880>

Much like in pandas, this returns a GroupBy object, which can be iterated over.

In [15]:
for name, subframe in groupby:
    print(name)
    print(subframe)

('L', 'L')
AdjacencyFrame with shape: (1772, 1772)
Source node features: 58
Target node features: 58

('L', 'R')
AdjacencyFrame with shape: (1772, 1775)
Source node features: 58
Target node features: 58

('R', 'L')
AdjacencyFrame with shape: (1775, 1772)
Source node features: 58
Target node features: 58

('R', 'R')
AdjacencyFrame with shape: (1775, 1775)
Source node features: 58
Target node features: 58



We could, for instance, compute the number of possible edges in each group.

In [16]:


def possible_edges(frame):
    return frame.shape[0] * frame.shape[1]


for name, subframe in groupby:
    print(name)
    print("Possible edges:", possible_edges(subframe))
    print()

('L', 'L')
Possible edges: 3139984

('L', 'R')
Possible edges: 3145300

('R', 'L')
Possible edges: 3145300

('R', 'R')
Possible edges: 3150625



Just like in pandas, since this pattern is so common, we can use the apply function
of the FrameGroupBy object to apply a function to each group and collate the results.

In [17]:
groupby.apply(possible_edges)

Unnamed: 0,L,R
L,3139984,3145300
R,3145300,3150625


If we want to use a function which operates on the underlying adjacency matrix only,
we can just modify the function slightly.

In [18]:


def matrix_sum(frame):
    return np.sum(frame.data)


groupby.apply(matrix_sum)

Unnamed: 0,L,R
L,123720.0,58409.0
R,54783.0,136440.0


However, shorthand for this is to just pass the `data=True` flag to the `apply` function,
and then a function which operates on the adjacency matrix will work.

In [19]:

groupby.apply(np.sum, data=True)

Unnamed: 0,L,R
L,123720.0,58409.0
R,54783.0,136440.0


Another example is using this to compute the probability of an edge existing between
each group.

In [20]:


def density(data):
    return np.count_nonzero(data) / data.size


groupby.apply(density, data=True)

Unnamed: 0,L,R
L,0.01238,0.006056
R,0.00588,0.013209


In [21]:
pair_counts = af.source_nodes["pair_id"].value_counts()
af.source_nodes["pair_count"] = af.source_nodes["pair_id"].map(pair_counts) == 2
af.target_nodes["pair_count"] = af.target_nodes["pair_id"].map(pair_counts) == 2
pair_af = af.query("pair_count", axis="both")
pair_af = pair_af.sort_values(["hemisphere", "pair_id"], axis="both")
pair_af = pair_af.set_index(["hemisphere", "pair_id"])
print(pair_af.source_nodes.iloc[:5, :5])

                                                             name  neurons  \
hemisphere pair_id                                                           
L          2        BAmas12 contra left; Interneuron--35 in total     True   
           3                           DALd bushy left; DALd_l  2     True   
           4                                            DALd_l  3     True   
           5                                     BU ; DALd 4_left     True   
           6                             UNK brain ascending left     True   

                    paper_clustered_neurons  left  right  
hemisphere pair_id                                        
L          2                           True  True  False  
           3                           True  True  False  
           4                           True  True  False  
           5                           True  True  False  
           6                           True  True  False  


In [22]:
print(pair_af.source_nodes.loc["L"].iloc[:5, :5])

                                                  name  neurons  \
pair_id                                                           
2        BAmas12 contra left; Interneuron--35 in total     True   
3                           DALd bushy left; DALd_l  2     True   
4                                            DALd_l  3     True   
5                                     BU ; DALd 4_left     True   
6                             UNK brain ascending left     True   

         paper_clustered_neurons  left  right  
pair_id                                        
2                           True  True  False  
3                           True  True  False  
4                           True  True  False  
5                           True  True  False  
6                           True  True  False  


In [23]:
print(pair_af.source_nodes.loc["R"].iloc[:5, :5])

                                      name  neurons  paper_clustered_neurons  \
pair_id                                                                        
2                     BAmas12 contra right     True                     True   
3        DALd bushy right; DALd_r 5 Review     True                     True   
4                  DALd_r 10 10276162 - JL     True                     True   
5                        BU ; DALd 4_right     True                     True   
6                UNK brain ascending right     True                     True   

          left  right  
pair_id                
2        False   True  
3        False   True  
4        False   True  
5        False   True  
6        False   True  


In [24]:
for hemisphere, side_af in pair_af.groupby("hemisphere", axis="both"):
    print(hemisphere)
    print(side_af)

('L', 'L')
AdjacencyFrame with shape: (1634, 1634)
Source node features: 57
Target node features: 57

('L', 'R')
AdjacencyFrame with shape: (1634, 1634)
Source node features: 57
Target node features: 57

('R', 'L')
AdjacencyFrame with shape: (1634, 1634)
Source node features: 57
Target node features: 57

('R', 'R')
AdjacencyFrame with shape: (1634, 1634)
Source node features: 57
Target node features: 57

