# Edge queries

## Preamble
The code in this section assumes that you have already downloaded the circuit. If not, take a look at the [first notebook](./01_node_properties.ipynb) in the series.

In [1]:
import bluepysnap
import numpy as np

circuit_path = "sonata/circuit_sonata.json"
circuit = bluepysnap.Circuit(circuit_path)

## Differences between Node and Edge queries
Everything covered in the querying part of the previous [Node Sets and Querying notebook](./08_queries_and_nodesets.ipynb) also applies when querying edges. Except for, of course, the obvious: node sets can only be used to query nodes. 

So queries work the same fashion, but querying itself is a bit different.

### Getting all nodes and their properties
First of all, to get all possible nodes, and their properties, one can do 

In [2]:
data = circuit.nodes.get()
for _, df in data:
    display(df.head())

Unnamed: 0_level_0,Unnamed: 1_level_0,model_template,model_type
population,node_ids,Unnamed: 2_level_1,Unnamed: 3_level_1
CorticoThalamic_projections,0,,virtual
CorticoThalamic_projections,1,,virtual
CorticoThalamic_projections,2,,virtual
CorticoThalamic_projections,3,,virtual
CorticoThalamic_projections,4,,virtual


Unnamed: 0_level_0,Unnamed: 1_level_0,model_template,model_type
population,node_ids,Unnamed: 2_level_1,Unnamed: 3_level_1
MedialLemniscus_projections,0,,virtual
MedialLemniscus_projections,1,,virtual
MedialLemniscus_projections,2,,virtual
MedialLemniscus_projections,3,,virtual
MedialLemniscus_projections,4,,virtual


Unnamed: 0_level_0,Unnamed: 1_level_0,@dynamics:holding_current,@dynamics:threshold_current,etype,layer,model_template,model_type,morph_class,morphology,mtype,orientation_w,...,orientation_y,orientation_z,region,rotation_angle_xaxis,rotation_angle_yaxis,rotation_angle_zaxis,synapse_class,x,y,z
population,node_ids,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
thalamus_neurons,0,-0.04527,0.08316,cAD_noscltb,Rt,hoc:cAD_noscltb,biophysical,RC,dend-04446-04462-X10187-Y13578_final_axon-0456...,Rt_RC,0.988265,...,0.152752,-0.0,mc0;Rt,-0.0,0.306704,-0.0,INH,175.0,575.0,225.0
thalamus_neurons,1,-0.033646,0.049149,cAD_noscltb,Rt,hoc:cAD_noscltb,biophysical,RC,dend-04901-04913-X12280-Y25667_final_axon-0444...,Rt_RC,0.780058,...,0.625707,-0.0,mc0;Rt,-0.0,1.352075,-0.0,INH,179.044281,593.194763,200.260788
thalamus_neurons,2,-0.03711,0.060735,cNAD_noscltb,Rt,hoc:cNAD_noscltb,biophysical,RC,dend-04446-04462-X10187-Y13578_final_axon-0453...,Rt_RC,0.948338,...,0.317262,-0.0,mc0;Rt,-0.0,0.645681,-0.0,INH,196.75148,563.684509,206.200989
thalamus_neurons,3,-0.02114,0.043437,cNAD_noscltb,Rt,hoc:cNAD_noscltb,biophysical,RC,dend-04392-04406-X11579-Y24237_final_axon-0490...,Rt_RC,0.990075,...,0.140538,-0.0,mc0;Rt,-0.0,0.282009,-0.0,INH,169.940216,579.091736,253.004227
thalamus_neurons,4,-0.042115,0.077446,cNAD_noscltb,Rt,hoc:cNAD_noscltb,biophysical,RC,dend-jy180406_C_idA_axon-04527-04540-X11773-Y2...,Rt_RC,0.971375,...,-0.237553,0.0,mc0;Rt,0.0,-0.479691,0.0,INH,156.274872,572.608337,235.78624


### Getting all edges and their properties?
One can not query all the edges and their properties like with nodes:
```python
circuit.edges.get() # Would raise an exception
circuit.edges['MedialLemniscus_projections__thalamus_neurons__chemical'].get() # Would also raise an exception
```

Why the different behavior? The reason is very simple: the number of edges massively exceeds the number of nodes in the circuit:

In [3]:
n_edges = circuit.edges.size
n_nodes = circuit.nodes.size
print(f"# of nodes: {n_nodes}")
print(f"# of edges: {n_edges}")
print(f"There are roughly {n_edges // n_nodes} times more edges than nodes.")

# of nodes: 189208
# of edges: 63340787
There are roughly 334 times more edges than nodes.


Because of this, it's extremely easy to run out of memory. In fact, since the `circuit.edges.ids` returns a `CircuitEdgeIds` object with indices consisting of both the edge id and the population name, you can easily run out of memory even with:
```python
circuit.edges.ids() # This will very likely run out of memory
```
So, since we're out of memory with just the ids, obviously we can't even consider fetching all the properties for all the edges. One also needs to define which properties are returned when doing `get` with edges. Otherwise, only ids are returned.

### `Edges`/`EdgePopulation` `get` requires the query to be defined
```python
# These return ids instead of all properties.
circuit.edges.get(query)  # same as .ids(query)
circuit.edges['MedialLemniscus_projections__thalamus_neurons__chemical'].get(query) # same as .ids(query)
```
Let's try to query and show afferent center positions for edges having their afferent center position between XYZ coordinates `[450,450,450]` and `[460,460,460]`:

In [4]:
query = {
    'afferent_center_x': [450, 460],
    'afferent_center_y': [450, 460],
    'afferent_center_z': [450, 460],
}
properties = list(query)

# This query only returns results for one population
data = circuit.edges.get(query, properties)
for _, df in data:
    display(df.head())

# Let's query the same but using the edge population
edge_population = circuit.edges['thalamus_neurons__thalamus_neurons__chemical']
edge_population.get(query, properties).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,afferent_center_x,afferent_center_y,afferent_center_z
population,edge_ids,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
thalamus_neurons__thalamus_neurons__chemical,2827859,454.395416,455.317352,453.267151
thalamus_neurons__thalamus_neurons__chemical,2842049,451.251099,459.769501,457.176605
thalamus_neurons__thalamus_neurons__chemical,2842050,453.582245,457.467682,456.495728
thalamus_neurons__thalamus_neurons__chemical,2872204,456.010925,454.797028,450.967194
thalamus_neurons__thalamus_neurons__chemical,3072076,451.312164,455.902893,452.170288


Unnamed: 0,afferent_center_x,afferent_center_y,afferent_center_z
2827859,454.395416,455.317352,453.267151
2842049,451.251099,459.769501,457.176605
2842050,453.582245,457.467682,456.495728
2872204,456.010925,454.797028,450.967194
3072076,451.312164,455.902893,452.170288


## Typical use cases

Now that we've covered the differences, let's get deeper into edge queries. In this section we'll cover some of the typical use cases.

We already know that we can get any of the edge properties with `get` and and use any of the properties to filter which edges will be returned so we'll not cover that here. However, more often than not, that is not how we query edges. Most of the time, we want to find edges that connect certain nodes (or node sets) or want to find which nodes are connected to certain pre-synaptic (or post-synaptic) cells.

In the examples, we'll be using a single edge population, but they work the same with `circuit.edges`, too.

### Edges connecting cells with known ids

In these examples, we're demonstrating the various methods of finding edges connecting nodes with already resolved ids.

In [5]:
source_ids = [1]
target_ids = [27204]
properties = ['@source_node', '@target_node']

#### using `@source_node` and/or `@target_node`

If we have the source/target ids already resolved we can, again, query using the properties. In the following example we define both source and target node ids, but obviously you can just use one of them.

In [6]:
edge_population.get({'@source_node': source_ids, '@target_node': target_ids}, properties=properties)

Unnamed: 0,@source_node,@target_node
11570852,1,27204
11570853,1,27204
11570854,1,27204
11570855,1,27204


#### using `pathway_edges` / `pair_edges`

**Note:** `pathway_edges` and `pair_edges` are the same function, so the following applies to both of them.

We can get the just the edge ids without specifying properties:

In [7]:
edge_population.pathway_edges(source_ids, target_ids)

array([11570852, 11570853, 11570854, 11570855])

or just as easily get the wanted properties by passing them as an argument:

In [8]:
edge_population.pathway_edges(source_ids, target_ids, properties=properties)

Unnamed: 0,@source_node,@target_node
11570852,1,27204
11570853,1,27204
11570854,1,27204
11570855,1,27204


#### Getting edges based on either source or target nodes (but not both)

To get edges based on given source/target nodes, we can use the `get` with the defined `@source_node` / `@target_node` or even with `pathway_edges`/`pair_edges`:

In [9]:
edges_with_source = edge_population.pathway_edges(source_ids, None) # get all edges with given source_ids
edges_with_target = edge_population.pathway_edges(None, target_ids) # get all edges with given target_ids

but SNAP also has dedicated functions for this:

In [10]:
afferent_edges = edge_population.afferent_edges(target_ids, properties=None) # note that these functions also...
efferent_edges = edge_population.efferent_edges(source_ids, properties=None) # ...can get the wanted properties
print(f"afferent edges == edges with target ids: {all(afferent_edges == edges_with_target)}")
print(f"efferent edges == edges with source ids: {all(efferent_edges == edges_with_source)}")

afferent edges == edges with target ids: True
efferent edges == edges with source ids: True


### Finding source/target nodes based on known target/source ids

We already covered how to get the synapses/edges based on known source/target ids. Obviously we could use any of the previously covered functions and define `@source_node` or `@target_node` in the wanted properties to get the wanted nodes. However, again, SNAP has dedicated functions for it:

In [11]:
source_nodes = edge_population.afferent_nodes(target_ids) 
target_nodes = edge_population.efferent_nodes(source_ids)

These functions do not allow you to define `properties` to get the node properties within the same call, but we have an easy way to access the source and target populations to fetch the wanted properties:

In [12]:
display(edge_population.source.get(source_nodes, properties=['mtype', 'etype','layer']).head())
display(edge_population.target.get(target_nodes, properties=['mtype', 'etype','layer']).head())

Unnamed: 0_level_0,mtype,etype,layer
node_ids,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Rt_RC,cAD_noscltb,Rt
12,Rt_RC,cAD_noscltb,Rt
34,Rt_RC,cNAD_noscltb,Rt
99,Rt_RC,cNAD_noscltb,Rt
119,Rt_RC,cAD_noscltb,Rt


Unnamed: 0_level_0,mtype,etype,layer
node_ids,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
5,Rt_RC,cAD_noscltb,Rt
421,Rt_RC,cAD_noscltb,Rt
561,Rt_RC,cAD_noscltb,Rt
1080,Rt_RC,cNAD_noscltb,Rt
1276,Rt_RC,cAD_noscltb,Rt


### Getting edges (or nodes) based on source/target node properties

Now that we're familiar with all the different functions 
* `afferent_edges` / `efferent_edges`
* `afferent_nodes` / `efferent_nodes`
* `pathway_edges` / `pair_edges`

let's continue on the tutorial. 

With the `get` function, it would have been cumbersome if we had to first resolve the ids and then pass them to the get function. Luckily, the ids were resolved internally and we can just pass the query to the `get` function. In general, this is what we'd like to do with the edges: just simply find the synapses between certain regions, node sets, mtypes, etc., right?

For this exact reason, all of the above functions resolve the ids internally. I.e., instead of list of ids, we can pass queries to them instead:

In [13]:
print('Fetching connecting edges...')
# using a node set
display(edge_population.afferent_edges('mc2;VPL'))

# using an external node set
ext_node_set = bluepysnap.node_sets.NodeSets.from_dict({'ext_mc2;VPL': {'region': 'mc2;VPL'}})
display(edge_population.afferent_edges(ext_node_set['ext_mc2;VPL']))

# using a query
display(edge_population.afferent_edges({'region': 'mc2;VPL'}))

# just to demonstrate the queries with a node function
print("\nFetching source nodes...")

# using afferent_edges and properties
source_nodes = np.unique(edge_population.afferent_edges({'region': 'mc2;VPL'}, properties=['@source_node']))
display(source_nodes)

# using afferent_nodes
display(edge_population.afferent_nodes('mc2;VPL'))

Fetching connecting edges...


array([13588618, 13588619, 13588620, ..., 19297682, 19297683, 19297684])

array([13588618, 13588619, 13588620, ..., 19297682, 19297683, 19297684])

array([13588618, 13588619, 13588620, ..., 19297682, 19297683, 19297684])


Fetching source nodes...


array([    10,     25,     54, ..., 100762, 100763, 100764])

array([    10,     25,     54, ..., 100762, 100763, 100764])

So in short: each of these functions can have queries as parameters instead of the node ids.

### Finding edges based on their type
Same as with the nodes, we can also query edges based on their `population_type`. 

Let's see how we can get the `chemical` synapses:

In [14]:
circuit.edges.get({'population_type':'chemical',
                   'edge_id': [*range(5)]}) # Since the number of edges is massive, we only query a fraction of all the `edge_id`s

CircuitEdgeIds([('CorticoThalamic_projections__thalamus_neurons__chemical', 0),
            ('CorticoThalamic_projections__thalamus_neurons__chemical', 1),
            ('CorticoThalamic_projections__thalamus_neurons__chemical', 2),
            ('CorticoThalamic_projections__thalamus_neurons__chemical', 3),
            ('CorticoThalamic_projections__thalamus_neurons__chemical', 4),
            ('MedialLemniscus_projections__thalamus_neurons__chemical', 0),
            ('MedialLemniscus_projections__thalamus_neurons__chemical', 1),
            ('MedialLemniscus_projections__thalamus_neurons__chemical', 2),
            ('MedialLemniscus_projections__thalamus_neurons__chemical', 3),
            ('MedialLemniscus_projections__thalamus_neurons__chemical', 4),
            (           'thalamus_neurons__thalamus_neurons__chemical', 0),
            (           'thalamus_neurons__thalamus_neurons__chemical', 1),
            (           'thalamus_neurons__thalamus_neurons__chemical', 2),
        

#### _"Cool beans... but... wait a minute! Why does it also return the projections?"_
Indeed. Projections are also chemical synapses and they share the same population type, so defining `chemical` as the type also returns the projections. 

#### _I see. How do I get **only** the projections?"_
You can use the population names to query only projections, but there is also way to get only projections without specifying the population names by taking advantage of the fact that the source node type of projections is `virtual` and the target node type is `biophysical`:

In [15]:
circuit.edges.pathway_edges({'population_type': 'virtual'}, {'population_type': 'biophysical'})

CircuitEdgeIds([('CorticoThalamic_projections__thalamus_neurons__chemical', ...),
            ('CorticoThalamic_projections__thalamus_neurons__chemical', ...),
            ('CorticoThalamic_projections__thalamus_neurons__chemical', ...),
            ('CorticoThalamic_projections__thalamus_neurons__chemical', ...),
            ('CorticoThalamic_projections__thalamus_neurons__chemical', ...),
            ('CorticoThalamic_projections__thalamus_neurons__chemical', ...),
            ('CorticoThalamic_projections__thalamus_neurons__chemical', ...),
            ('CorticoThalamic_projections__thalamus_neurons__chemical', ...),
            ('CorticoThalamic_projections__thalamus_neurons__chemical', ...),
            ('CorticoThalamic_projections__thalamus_neurons__chemical', ...),
            ...
            ('MedialLemniscus_projections__thalamus_neurons__chemical', ...),
            ('MedialLemniscus_projections__thalamus_neurons__chemical', ...),
            ('MedialLemniscus_projections__t

In this case, the same can be achieved by only defining the source nodes:

In [16]:
circuit.edges.efferent_edges({'population_type': 'virtual'})

CircuitEdgeIds([('CorticoThalamic_projections__thalamus_neurons__chemical', ...),
            ('CorticoThalamic_projections__thalamus_neurons__chemical', ...),
            ('CorticoThalamic_projections__thalamus_neurons__chemical', ...),
            ('CorticoThalamic_projections__thalamus_neurons__chemical', ...),
            ('CorticoThalamic_projections__thalamus_neurons__chemical', ...),
            ('CorticoThalamic_projections__thalamus_neurons__chemical', ...),
            ('CorticoThalamic_projections__thalamus_neurons__chemical', ...),
            ('CorticoThalamic_projections__thalamus_neurons__chemical', ...),
            ('CorticoThalamic_projections__thalamus_neurons__chemical', ...),
            ('CorticoThalamic_projections__thalamus_neurons__chemical', ...),
            ...
            ('MedialLemniscus_projections__thalamus_neurons__chemical', ...),
            ('MedialLemniscus_projections__thalamus_neurons__chemical', ...),
            ('MedialLemniscus_projections__t

Please note, however, that also `neuromodulatory` edges use `virtual` source nodes, so if these type of edges are present in the circuit, those will also be returned by `efferent_edges`.

## Conclusion
In this notebook, we learned all the different queries related to edges: how to find connecting nodes/edges, how to query for the properties we're interested in, etc. We also learned about the differences between querying nodes and querying edges and the reasons behind the differences. 

In the next notebook, we'll cover a more memory efficient way to go over edges: `iter_connections`.