# Edge queries
## Introduction
In this tutorial we cover edge queries and how to use them in different use cases.

## Preamble
The code in this section assumes that you have already downloaded the circuit. If not, take a look at the [first notebook](./01_node_properties.ipynb) in the series.

In [1]:
import bluepysnap
import pandas as pd
import numpy as np
from time import time

circuit_path = "sonata/circuit_sonata.json"
circuit = bluepysnap.Circuit(circuit_path)

## Differences between Node and Edge queries
Everything covered in the querying part of the previous [Node Sets and Querying notebook](./08_queries_and_nodesets.ipynb) also applies when querying edges. Except for, of course, the obvious: node sets can only be used to query nodes. 

So queries work the same fashion, but querying itself is a bit different.

### Getting all nodes and their properties
First of all, to get all possible nodes, and their properties, one can do 

In [2]:
data = circuit.nodes.get()
for _, df in data:
    display(df.head())

Unnamed: 0_level_0,Unnamed: 1_level_0,model_template,model_type
population,node_ids,Unnamed: 2_level_1,Unnamed: 3_level_1
CorticoThalamic_projections,0,,virtual
CorticoThalamic_projections,1,,virtual
CorticoThalamic_projections,2,,virtual
CorticoThalamic_projections,3,,virtual
CorticoThalamic_projections,4,,virtual


Unnamed: 0_level_0,Unnamed: 1_level_0,model_template,model_type
population,node_ids,Unnamed: 2_level_1,Unnamed: 3_level_1
MedialLemniscus_projections,0,,virtual
MedialLemniscus_projections,1,,virtual
MedialLemniscus_projections,2,,virtual
MedialLemniscus_projections,3,,virtual
MedialLemniscus_projections,4,,virtual


Unnamed: 0_level_0,Unnamed: 1_level_0,@dynamics:holding_current,@dynamics:threshold_current,etype,layer,model_template,model_type,morph_class,morphology,mtype,orientation_w,...,orientation_y,orientation_z,region,rotation_angle_xaxis,rotation_angle_yaxis,rotation_angle_zaxis,synapse_class,x,y,z
population,node_ids,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
thalamus_neurons,0,-0.04527,0.08316,cAD_noscltb,Rt,hoc:cAD_noscltb,biophysical,RC,dend-04446-04462-X10187-Y13578_final_axon-0456...,Rt_RC,0.988265,...,0.152752,-0.0,mc0;Rt,-0.0,0.306704,-0.0,INH,175.0,575.0,225.0
thalamus_neurons,1,-0.033646,0.049149,cAD_noscltb,Rt,hoc:cAD_noscltb,biophysical,RC,dend-04901-04913-X12280-Y25667_final_axon-0444...,Rt_RC,0.780058,...,0.625707,-0.0,mc0;Rt,-0.0,1.352075,-0.0,INH,179.044281,593.194763,200.260788
thalamus_neurons,2,-0.03711,0.060735,cNAD_noscltb,Rt,hoc:cNAD_noscltb,biophysical,RC,dend-04446-04462-X10187-Y13578_final_axon-0453...,Rt_RC,0.948338,...,0.317262,-0.0,mc0;Rt,-0.0,0.645681,-0.0,INH,196.75148,563.684509,206.200989
thalamus_neurons,3,-0.02114,0.043437,cNAD_noscltb,Rt,hoc:cNAD_noscltb,biophysical,RC,dend-04392-04406-X11579-Y24237_final_axon-0490...,Rt_RC,0.990075,...,0.140538,-0.0,mc0;Rt,-0.0,0.282009,-0.0,INH,169.940216,579.091736,253.004227
thalamus_neurons,4,-0.042115,0.077446,cNAD_noscltb,Rt,hoc:cNAD_noscltb,biophysical,RC,dend-jy180406_C_idA_axon-04527-04540-X11773-Y2...,Rt_RC,0.971375,...,-0.237553,0.0,mc0;Rt,0.0,-0.479691,0.0,INH,156.274872,572.608337,235.78624


### Getting all edges and their properties?
One can not query all the edges and their properties like with nodes:
```python
circuit.edges.get() # Would raise an exception
circuit.edges['MedialLemniscus_projections__thalamus_neurons__chemical'].get() # Would also raise an exception
```

Why the different behavior? The reason is very simple: the number of edges massively exceeds the number of nodes in the circuit:

In [3]:
n_edges = circuit.edges.size
n_nodes = circuit.nodes.size
print(f"# of nodes: {n_nodes}")
print(f"# of edges: {n_edges}")
print(f"There are roughly {n_edges // n_nodes} times more edges than nodes.")

# of nodes: 189208
# of edges: 63340787
There are roughly 334 times more edges than nodes.


Because of this, it's extremely easy to run out of memory. In fact, since the `circuit.edges.ids` returns a `CircuitEdgeIds` object with indices consisting of both the edge id and the population name, you can easily run out of memory even with:
```python
circuit.edges.ids() # This will very likely run out of memory
```
So, since we're out of memory with just the ids, obviously we can't even consider fetching all the properties for all the edges. One also needs to define which properties are returned when doing `get` with edges. Otherwise, only ids are returned.

### `Edges`/`EdgePopulation` `get` requires the query to be defined
```python
# These return ids instead of all properties.
circuit.edges.get(query)  # same as .ids(query)
circuit.edges['MedialLemniscus_projections__thalamus_neurons__chemical'].get(query) # same as .ids(query)
```
Let's try to query and show afferent center positions for edges having their afferent center position between XYZ coordinates `[450,450,450]` and `[460,460,460]`:

In [4]:
query = {
    'afferent_center_x': [450, 460],
    'afferent_center_y': [450, 460],
    'afferent_center_z': [450, 460],
}
properties = list(query)

# This query only returns results for one population
data = circuit.edges.get(query, properties)
for _, df in data:
    display(df.head())

# Let's query the same but using the edge population
edge_population = circuit.edges['thalamus_neurons__thalamus_neurons__chemical']
edge_population.get(query, properties).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,afferent_center_x,afferent_center_y,afferent_center_z
population,edge_ids,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
thalamus_neurons__thalamus_neurons__chemical,2827859,454.395416,455.317352,453.267151
thalamus_neurons__thalamus_neurons__chemical,2842049,451.251099,459.769501,457.176605
thalamus_neurons__thalamus_neurons__chemical,2842050,453.582245,457.467682,456.495728
thalamus_neurons__thalamus_neurons__chemical,2872204,456.010925,454.797028,450.967194
thalamus_neurons__thalamus_neurons__chemical,3072076,451.312164,455.902893,452.170288


Unnamed: 0,afferent_center_x,afferent_center_y,afferent_center_z
2827859,454.395416,455.317352,453.267151
2842049,451.251099,459.769501,457.176605
2842050,453.582245,457.467682,456.495728
2872204,456.010925,454.797028,450.967194
3072076,451.312164,455.902893,452.170288


## Typical use cases

Now that we've covered the differences, let's get deeper into edge queries. In this section we'll cover some of the typical use cases.

We already know that we can get any of the edge properties with `get` and and use any of the properties to filter which edges will be returned so we'll not cover that here. However, more often than not, that is not how we query edges. Most of the time, we want to find edges that connect certain nodes (or node sets) or want to find which nodes are connected to certain pre-synaptic (or post-synaptic) cells.

In the examples, we'll be using a single node population, but they work the same with `circuit.edges`, too.

### Edges connecting cells with known ids

In these examples, we're demonstrating the various methods of finding edges connecting nodes with already resolved ids.

In [5]:
source_ids = [1]
target_ids = [27204]
properties = ['@source_node', '@target_node']

#### using `@source_node` and/or `@target_node`

If we have the source/target ids already resolved we can, again, query using the properties. In the following example we define both source and target node ids, but obviously you can just use one of them.

In [6]:
edge_population.get({'@source_node': source_ids, '@target_node': target_ids}, properties=properties)

Unnamed: 0,@source_node,@target_node
11570852,1,27204
11570853,1,27204
11570854,1,27204
11570855,1,27204


#### using `pathway_edges` / `pair_edges`

**Note:** `pathway_edges` and `pair_edges` are the same function, so the following applies to both of them.

We can get the just the edge ids without specifying properties:

In [7]:
edge_population.pathway_edges(source_ids, target_ids)

array([11570852, 11570853, 11570854, 11570855])

or just as easily get the wanted properties by passing them as an argument:

In [8]:
edge_population.pathway_edges(source_ids, target_ids, properties=properties)

Unnamed: 0,@source_node,@target_node
11570852,1,27204
11570853,1,27204
11570854,1,27204
11570855,1,27204


#### Getting edges based on either source or target nodes (but not both)

To get edges based on given source/target nodes, we can use the `get` with the defined `@source_node` / `@target_node` or even with `pathway_edges`/`pair_edges`:

In [9]:
edges_with_source = edge_population.pathway_edges(source_ids, None) # get all edges with given source_ids
edges_with_target = edge_population.pathway_edges(None, target_ids) # get all edges with given target_ids

but SNAP also has dedicated functions for this:

In [10]:
afferent_edges = edge_population.afferent_edges(target_ids, properties=None) # note that these functions also...
efferent_edges = edge_population.efferent_edges(source_ids, properties=None) # ...can get the wanted properties
print(f"afferent edges == edges with target ids: {all(afferent_edges == edges_with_target)}")
print(f"efferent edges == edges with source ids: {all(efferent_edges == edges_with_source)}")

afferent edges == edges with target ids: True
efferent edges == edges with source ids: True


### Finding source/target nodes based on known target/source ids

We already covered how to get the synapses/edges based on known source/target ids. Obviously we could use any of the previously covered functions and define `@source_node` or `@target_node` in the wanted properties to get the wanted nodes. However, again, SNAP has dedicated functions for it:

In [11]:
source_nodes = edge_population.afferent_nodes(target_ids) 
target_nodes = edge_population.efferent_nodes(source_ids)

These functions do not allow you to define `properties` to get the node properties within the same call, but we have an easy way to access the source and target populations to fetch the wanted properties:

In [12]:
display(edge_population.source.get(source_nodes, properties=['mtype', 'etype','layer']).head())
display(edge_population.target.get(target_nodes, properties=['mtype', 'etype','layer']).head())

Unnamed: 0_level_0,mtype,etype,layer
node_ids,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Rt_RC,cAD_noscltb,Rt
12,Rt_RC,cAD_noscltb,Rt
34,Rt_RC,cNAD_noscltb,Rt
99,Rt_RC,cNAD_noscltb,Rt
119,Rt_RC,cAD_noscltb,Rt


Unnamed: 0_level_0,mtype,etype,layer
node_ids,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
5,Rt_RC,cAD_noscltb,Rt
421,Rt_RC,cAD_noscltb,Rt
561,Rt_RC,cAD_noscltb,Rt
1080,Rt_RC,cNAD_noscltb,Rt
1276,Rt_RC,cAD_noscltb,Rt


### Getting edges (or nodes) based on source/target node properties

Now that we're familiar with all the different functions 
* `afferent_edges` / `efferent_edges`
* `afferent_nodes` / `efferent_nodes`
* `pathway_edges` / `pair_edges`

let's continue on the tutorial. 

With the `get` function, it would have been cumbersome if we had to first resolve the ids and then pass them to the get function. Luckily, the ids were resolved internally and we can just pass the query to the `get` function. In general, this is what we'd like to do with the edges: just simply find the synapses between certain regions, node sets, mtypes, etc., right?

For this exact reason, all of the above functions resolve the ids internally. I.e., instead of list of ids, we can pass queries to them instead:

In [13]:
print('Fetching connecting edges...')
# using a node set
display(edge_population.afferent_edges('mc2;VPL'))

# using an external node set
ext_node_set = bluepysnap.node_sets.NodeSets.from_dict({'ext_mc2;VPL': {'region': 'mc2;VPL'}})
display(edge_population.afferent_edges(ext_node_set['ext_mc2;VPL']))

# using a query
display(edge_population.afferent_edges({'region': 'mc2;VPL'}))

# just to demonstrate the queries with a node function
print("\nFetching source nodes...")

# using afferent_edges and properties
source_nodes = np.unique(edge_population.afferent_edges({'region': 'mc2;VPL'}, properties=['@source_node']))
display(source_nodes)

# using afferent_nodes
display(edge_population.afferent_nodes('mc2;VPL'))

Fetching connecting edges...


array([13588618, 13588619, 13588620, ..., 19297682, 19297683, 19297684])

array([13588618, 13588619, 13588620, ..., 19297682, 19297683, 19297684])

array([13588618, 13588619, 13588620, ..., 19297682, 19297683, 19297684])


Fetching source nodes...


array([    10,     25,     54, ..., 100762, 100763, 100764])

array([    10,     25,     54, ..., 100762, 100763, 100764])

So in short: each of these functions can have queries as parameters instead of the node ids.

## Iterating over connections with `iter_connections`
As mentioned before, due to the huge number of edges, we may run into memory issues. Therefore, it's highly recommended to use iterators instead of gathering all of the data at once. For this very reason, SNAP has `iter_connections`:
```python
edge_population.iter_connections(
    source,                   # the source nodes / query
    target,                   # the target nodes / query
    unique_node_ids=False,    # only use each source/target id once
    shuffle=False,            # shuffle the order of results
    return_edge_ids=False,    # return also the edge ids
    return_edge_count=False,  # return the edge count between the source-target pairs
)
# Returns a generator of tuples containing:
# (source_id, target_id)             : normally
# (source_id, target_id, edge_ids)   : if return_edge_ids=True
# (source_id, target_id, edge_count) : if return_edge_count=True
```
**NOTE:** `return_edge_ids` and `return_edge_count` are mutually exclusive options.

In a nutshell, what `iter_connections` does, is that it iterates through **all** of the existing connections (source-target pairs) from **any** of the **source nodes** to **any** of the **target nodes** and returns a generator yielding those source-target pairs.

Let's look at a few examples.

### Return value is a generator that we can iterate over
This is just to empahasize that we don't get the results of the function until we actually loop over it:

In [14]:
it = edge_population.iter_connections(source_ids, target_ids)
print(f"The result is not a tuple or a list but a {type(it)}")

The result is not a tuple or a list but a <class 'generator'>


Now, we could convert the result to a list using `list(it)` or `[*it]` but that kind of defeats the purpose of using generators and iterators. We'll just loop through them in the examples to not reinforce "bad habits".

### No optional flags set
This example is just to demonstrate that without `return_edge_ids`/`return_edge_count`, we're merely getting the source and target nodes ids as output:

In [15]:
for _source_id, _target_id in edge_population.iter_connections(source_ids, target_ids):
    print(_source_id, '-', _target_id)

CircuitNodeId(population='thalamus_neurons', id=1) - CircuitNodeId(population='thalamus_neurons', id=27204)


### Returning edge ids

By setting `return_edge_ids=True`, we get the ids of the edges connecting each source-target pair:

In [16]:
for _source_id, _target_id, _edge_ids in edge_population.iter_connections(source_ids, target_ids, return_edge_ids=True):
    print(_source_id, '-', _target_id)
    print(f'\n{_edge_ids}')

CircuitNodeId(population='thalamus_neurons', id=1) - CircuitNodeId(population='thalamus_neurons', id=27204)

CircuitEdgeIds([('thalamus_neurons__thalamus_neurons__chemical', 11570852),
            ('thalamus_neurons__thalamus_neurons__chemical', 11570853),
            ('thalamus_neurons__thalamus_neurons__chemical', 11570854),
            ('thalamus_neurons__thalamus_neurons__chemical', 11570855)],
           names=['population', 'edge_ids'])


### Returning the number of connecting edges

By setting `return_edge_count=True`, we get the number of edges connecting each source-target pair. Based on the previous example, we should be getting four connecting edges:

In [17]:
for _source_id, _target_id, _edge_count in edge_population.iter_connections(source_ids, target_ids, return_edge_count=True):
    print(_source_id, '-', _target_id)
    print(f'Edge count: {_edge_count}')

CircuitNodeId(population='thalamus_neurons', id=1) - CircuitNodeId(population='thalamus_neurons', id=27204)
Edge count: 4


### Randomizing the output order

We can use `shuffle=True` To randomize the order of the results. 

So let's see the non-randomized order of the connections between the first 10 nodes. For easier reading, let's only print the numeric part of the `CircuitNodeIds`:

In [18]:
it = enumerate(edge_population.iter_connections(range(10), range(10)))
for i, (_source_id, _target_id) in it:
    print(f'{i+1:2d}: source: {_source_id.id:2d} --- target: {_target_id.id:2d}')

 1: source:  2 --- target:  0
 2: source:  0 --- target:  4
 3: source:  5 --- target:  4
 4: source:  1 --- target:  5


Now by setting the `shuffle` flag in the call, we'll get the above source-target pairs in a different order:

In [19]:
np.random.seed(0) # Just to keep the results consistent in the notebook

it = enumerate(edge_population.iter_connections(range(10), range(10), shuffle=True))
for i, (_source_id, _target_id) in it:
    print(f'{i+1:2d}: source: {_source_id.id:2d} --- target: {_target_id.id:2d}')

 1: source:  5 --- target:  4
 2: source:  0 --- target:  4
 3: source:  1 --- target:  5
 4: source:  2 --- target:  0


### Using each node only once (at max.) as a source and as a target

Let's look at the connections between first 15 node ids. For easier reading, again, let's only print the numeric part of the `CircuitNodeId`s:

In [20]:
it = enumerate(edge_population.iter_connections(range(15), range(15)))
for i, (_source_id, _target_id) in it:
    print(f'{"W"+str(i+1):3s}: source: {_source_id.id:2d} --- target: {_target_id.id:2d}')

W1 : source:  2 --- target:  0
W2 : source: 10 --- target:  0
W3 : source: 14 --- target:  0
W4 : source:  0 --- target:  4
W5 : source:  5 --- target:  4
W6 : source:  1 --- target:  5
W7 : source: 13 --- target:  5
W8 : source: 11 --- target:  6
W9 : source:  8 --- target: 10
W10: source:  2 --- target: 14
W11: source: 13 --- target: 14


As we can see, we have 11 different source-target pairs. Note that the indices are prefixed with `W` (stands for Without a flag) to distinct them from the following. Let's see what happens when we set `unique_node_ids=True`:

In [21]:
it = enumerate(edge_population.iter_connections(range(15), range(15), unique_node_ids=True))
for i, (_source_id, _target_id) in it:
    print(f'{"U"+str(i+1):3s}: source: {_source_id.id:2d} --- target: {_target_id.id:2d}')

U1 : source:  2 --- target:  0
U2 : source:  0 --- target:  4
U3 : source:  1 --- target:  5
U4 : source: 11 --- target:  6
U5 : source:  8 --- target: 10
U6 : source: 13 --- target: 14


Cool, we've effectively lost 5 pairs of source-target pairs somewhere. 

So what happened here? The indices were prefixed with `U` (unique nodes only) to distinct them from the previous output. Let's go through the output and indices `W1`-`W11` of the previous example without `unique_node_ids` flag set and compare it to the output above:
* `W1`: kept (`U1`)
* `W2`,`W3`: removed (id `0` used as a **target** in `W1`)
* `W4`: kept (`U2`)
* `W5`: removed (id `4` used as a **target** in `W4`)
* `W6`: kept (`U3`)
* `W7`: removed (id `5` used as a **target** in `W6`)
* `W8`: kept (`U4`)
* `W9`: kept (`U5`)
* `W10`: removed (id `2` used as a **source** in `W1`)
* `W11`: kept (`U6`)

### _"Please tell me the above also works with queries"_
What kind of a software you think we're running here, pal? 

Obviously, `iter_connections` can also be called with any of the accepted node queries. The ids will be resolved on the fly:

In [22]:
it = edge_population.iter_connections(
    'mc2;VPL',            # node set
    {'region': 'mc2;Rt'}, # dict query
)
for i, (_source_id, _target_id) in enumerate(it):
    if i == 10: # Let's only print first 10
        print(f'{i+1:2d}: ...')
        break
    print(f'{i+1:2d}: source: {_source_id.id:2d} --- target: {_target_id.id:2d}')

 1: source: 33550 --- target: 28603
 2: source: 33743 --- target: 28603
 3: source: 33794 --- target: 28603
 4: source: 33818 --- target: 28603
 5: source: 34043 --- target: 28603
 6: source: 34773 --- target: 28603
 7: source: 34942 --- target: 28603
 8: source: 35126 --- target: 28603
 9: source: 35169 --- target: 28603
10: source: 35579 --- target: 28603
11: ...


### Performance optimizations

Now that we understand how `iter_connections` works, what can we do with it? What is the magic therein? 

Well, it's not really about _what_ it can do but _how_ it does it. As mentioned before, the whole purpose of using the iterators is to be memory efficient. Where it especially shines are the cases in which you are handling large number of nodes/edges and aren't necessarily interested in all of the data collected in the process.

Let's take a look at an example.

#### CASE: Synapses between node sets

To demonstrate the magick of `iter_connections`, let's have a simple, straightforward example. We want to count the number of synapses between two node sets.

Now, we're not interested in the individual edge ids, just the number of synapses between two node sets. Perhaps we'd also like some statistics on how many of them are there on average between each of the source-target node pair, what is the deviation, etc.

Let's first define a source and a target node set and a helper function for printing the stats:

In [23]:
source_node_set = 'mc2;Rt'
target_node_set = 'mc2;VPL'

def print_statistics(pair_syns):
    print(f"There is a total of {np.sum(pair_syns)} synapses from '{source_node_set}' to '{target_node_set}'")
    print("\nSynapses between source-target node pairs:")
    print(f"- avg: {np.mean(pair_syns):.2f}")
    print(f"- std: {np.std(pair_syns):.2f}")
    print(f"- min: {np.min(pair_syns)}")
    print(f"- max: {np.max(pair_syns)}")

print(f'Number of source nodes: {len(edge_population.source.ids(source_node_set))}')
print(f'Number of target nodes: {len(edge_population.target.ids(target_node_set))}')

Number of source nodes: 4909
Number of target nodes: 8999


and now let's get the synapses and print the statistics:

In [24]:
t0 = time()

it = edge_population.iter_connections(source_node_set, target_node_set, return_edge_count=True)
pairwise_syns = np.fromiter((count for _,__,count in it), dtype=int)
print_statistics(pairwise_syns)

print(f'\nRuntime: {time()-t0:.2f} seconds')

There is a total of 3245906 synapses from 'mc2;Rt' to 'mc2;VPL'

Synapses between source-target node pairs:
- avg: 4.86
- std: 4.23
- min: 1
- max: 95

Runtime: 12.79 seconds


This is how we'd achieve the same with the a bit more memory-heavy approach:

In [25]:
t0 = time()
result = edge_population.pathway_edges(source_node_set,target_node_set, properties=['@source_node', '@target_node'])
print_statistics(result.value_counts().values)
print(f'\nRuntime: {time()-t0:.2f} seconds')

There is a total of 3245906 synapses from 'mc2;Rt' to 'mc2;VPL'

Synapses between source-target node pairs:
- avg: 4.86
- std: 4.23
- min: 1
- max: 95

Runtime: 5.21 seconds


#### _"Dude, you just told me that `iter_connections` is supposed to be awesome, why is it slower?"_

Well spotted. There are cases, in which `iter_connections` is actually outperformed (runtime-wise) by the memory-heavy options. Worry not, we're merely warming up here. Let's shift gears and introduce a significantly bigger target node set:

In [26]:
target_node_set = 'VPL_TC'
print(f'Number of source nodes: {len(edge_population.source.ids(source_node_set))}')
print(f'Number of target nodes: {len(edge_population.target.ids(target_node_set))}')

Number of source nodes: 4909
Number of target nodes: 64856


We bumped up the number of target nodes by roughly one order of magnitude. The source nodes were left intact. Now, let's see what happens to the runtimes. Let's first run the `iter_connections` version:

In [27]:
t0 = time()
it = edge_population.iter_connections(source_node_set, target_node_set, return_edge_count=True)
pairwise_syns = np.fromiter((count for _,__,count in it), dtype=int)
print_statistics(pairwise_syns)
print(f'\nRuntime: {time()-t0:.2f} seconds')

There is a total of 4706551 synapses from 'mc2;Rt' to 'VPL_TC'

Synapses between source-target node pairs:
- avg: 4.63
- std: 3.96
- min: 1
- max: 95

Runtime: 81.80 seconds


That took quite some time. Let's see how the previously faster, `pathway_edges` implementation performs:

In [28]:
t0 = time()
result = edge_population.pathway_edges(source_node_set,target_node_set, properties=['@source_node', '@target_node'])
print_statistics(result.value_counts().values)
print(f'\nRuntime: {time()-t0:.2f} seconds')

There is a total of 4706551 synapses from 'mc2;Rt' to 'VPL_TC'

Synapses between source-target node pairs:
- avg: 4.63
- std: 3.96
- min: 1
- max: 95

Runtime: 132.81 seconds


There is a significant drop in performance in comparison to `iter_connections`, even though the number of edges/synapses wasn't _that_ much higher. Why the difference?

By not going into too much of technicalities, it boils down to:
* `pathway_synapses` needs to handle all the data at once
   * it needs to get all connecting edges and their source and target nodes
   * from the huge dataframe, it needs to find unique source-target pairs and count how many times they appear
      * `pandas.value_counts()` leads to creating another dataframe which consumes even more memory  
   * everything is kept in memory throughout the process
* `iter_connections` only needs to handle one source-target pair at a time
   * after required data from one iteration is collected, rest of the data can be discarded from memory

You might wonder what would happen if we bumped up the number of source nodes. Let's see:

In [29]:
source_node_set = 'Rt_RC'
print(f'Number of source nodes: {len(edge_population.source.ids(source_node_set))}')
print(f'Number of target nodes: {len(edge_population.target.ids(target_node_set))}')

Number of source nodes: 35567
Number of target nodes: 64856


So by using `'Rt_RC'`, we roughly bumped up the number of source sodes by one order of magnitude. 

Let's see what happens when we run it with `iter_connections`: 

In [30]:
t0 = time()
it = edge_population.iter_connections(source_node_set, target_node_set, return_edge_count=True)
pairwise_syns = np.fromiter((count for _,__,count in it), dtype=int)
print_statistics(pairwise_syns)
print(f'\nRuntime: {time()-t0:.2f} seconds')

There is a total of 29455533 synapses from 'Rt_RC' to 'VPL_TC'

Synapses between source-target node pairs:
- avg: 4.70
- std: 4.04
- min: 1
- max: 102

Runtime: 100.10 seconds


As we can see, `iter_connections` was still faster than the `pathway_synapses` method was with the smaller source node set. 

Surely, the `pathway_synapses` can't be that much worse, right?  Boy, can it ever. Running the same with the `pathway_synapses` method took **over 30 minutes**. 

Obviously, we didn't include it in the notebook, but if you **truly** want to try it out yourself, feel free to do so. You have been warned.

#### Lesson's learned
* If you know you'll be working with big sample sizes and big datasets, use `iter_connections`
  * if you are unsure, you can still use it, it's not _that_ much slower 
* If your code hangs seemingly forever on
  * a call to `pathway_synapses`/`pair_edges`/`afferent_edges`/`efferent_edges`, you might want to try if `iter_connections` solves your issue
  * a `pandas.DataFrame` operation, see if you can achieve the same with `iter_connections`. It might just save your day

## Conclusion
In this notebook, we learned all the different queries related to edges: how to find connecting nodes/edges, how to query for the properties we're interested in, etc. We also learned about the differences between querying nodes and querying edges and the reasons behind the differences. 

On top of that, we learned how to and why use the iterative approach (`iter_connections`) when working with bigger datasets to avoid having our code hanging / nodes running out of memory.