### CS4423 - Networks
Angela Carnevale  
School of Mathematical and Statistical Sciences  
University of Galway

#### 6. Directed Networks

# Week 11, lecture 1:  More on the Bow-Tie components. Importance of nodes in directed networks


### The Bow-Tie Structure of the WWW

We have seen two types of connected components in a directed network:
* weakly connected components (essentially, connected components of the underlying undirected graph. Can be obtained as equiv. classes of a certain equiv. relation)
* strongly connected components (SCC) (as equivalence classes of a new equivalence relation defined in terms of directed paths)

It turns out that directed graph with sufficiently many edges
has  a **giant SCC**.

The remainder of the graph consists of four more sets of components of nodes, as follows:

1. IN: upstream components, the set of all components
$C$ with $C <$ SCC.

2. OUT: downstream components,
the set of all components $C$ with $C >$ SCC.

3. tendrils: the set of all components $C$ with either $C >$ IN and $C \not<$ OUT
or $C <$ OUT and $C \not>$ IN; <BR />
and tubes: components $C$ with $C >$ IN, $C <$ OUT but $C \not <$ SCC.

4. disconnected components.

Thus, in any directed graph with a distinguished SCC,
the WCC in which it is contained
necessarily has the following global bow-tie structure:

![bow tie](images/bowtie.png)

We've seen that the above structure has been detected in the WWW. 

Variations of BFS and DFS can be used to algorithmically compute the components of the Bow-Tie structure.


## Computing Bow-Tie Components

In [10]:
import networkx as nx
import matplotlib.pyplot as plt
opts = { "with_labels" : True, "node_color" : 'y' }

**Example.**  Let's start with a reasonably large random **directed graph**,
using the Erdős-R&eacute;nyi $G(n, m)$ model:

In [11]:
n, m = 100, 120
G = nx.gnm_random_graph(n, m, directed=True) ## note we are using the directed version of the usual models

### Weakly Connected Components

The weakly connected components of a directed graph $G$ can be determined by BFS, as before,
counting as "neighbors" of a node $x$ **both** its _successors_ and it _predecessors_ in the graph.

A single component, the weakly connected component of node $x$, is found as follows.

In [3]:
def weak_component(G, x):
    nodes = {x}
    queue = [x]
    for y in queue:
        G.nodes[y]["seen"] = True
        for z in set(G.successors(y)) | set(G.predecessors(y)): ## preds+succs are the neighbours of a node
            if z not in nodes:
                nodes.add(z)
                queue.append(z)
    return nodes

The list of all weakly connected components is computed by looping over all the  nodes of `G`,
computing the components of "unseen" nodes and collecting them in a list.
The final result is sorted by decreasing length before it is returned.

In [12]:
def weak_components(G):
    
    # initialize
    wccs = []
    
    # find each node's wcc
    for x in G:
        if not G.nodes[x].get("seen"):
            wccs.append(weak_component(G, x))
            
    # clean up afterwards
    for x in G:
        del G.nodes[x]["seen"]
        
    # return sorted list of wccs
    return sorted(wccs, key=len, reverse=True)

In [13]:
wccs = weak_components(G)
len(wccs)

9

In [14]:
[len(c) for c in wccs]

[88, 3, 2, 2, 1, 1, 1, 1, 1]

### Strongly Connected Components

**Strongly** connected components are efficiently found by DFS.
[Tarjan's Algorithm](https://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm) cleverly
uses recursion and an additional stack for this. 

The following function finds strongly connected components in a recursive fashion. 

In [15]:
def connect(G, stack, sccs, idx, x):
    G.nodes[x]["low"] = G.nodes[x]["idx"] = idx
    idx += 1
    stack.append(x)
    G.nodes[x]["stacked"] = True
    for y in G[x]:
        if "idx" not in  G.nodes[y]: ## if neighbour of x not yet seen, recursively call connect on it
            idx = connect(G, stack, sccs, idx, y)
            G.nodes[x]["low"] = min(G.nodes[x]["low"], G.nodes[y]["low"])
        elif G.nodes[y]["stacked"]: ## or if neighbour was seen but still on stack
            G.nodes[x]["low"] = min(G.nodes[x]["low"], G.nodes[y]["idx"])
                
    if G.nodes[x]["low"] == G.nodes[x]["idx"]: 
        scc = []
        while True:
            y = stack.pop()
            G.nodes[y]["stacked"] = False
            scc.append(y)
            if y == x:
                break
        sccs.append(scc)
            
    return idx


Similar to the case of the weakly connected components, the overall algorithm
uses a loop over all the nodes of `G` to find all strongly connected components.

In [16]:
def strong_components(G):
    
    # initialize
    idx = 0
    stack = []
    sccs = []
    
    # find each node's scc
    for x in G:
        if "idx" not in G.nodes[x]:
            idx = connect(G, stack, sccs, idx, x)

    # clean up afterwards
    for x in G:
        del G.nodes[x]["idx"]
        del G.nodes[x]["low"]
        del G.nodes[x]["stacked"]
    
    # return sorted list of sccs
    return sorted(sccs, key = len, reverse=True)

In [17]:
sccs = strong_components(G)
[len(c) for c in sccs[:10]] ## this list is eventually 1 so looking at a few entries suffices

[3, 1, 1, 1, 1, 1, 1, 1, 1, 1]

As the resulting list of components is sorted by length, in descending order,
`sccs[0]` is the **Giant SCC**.

### The OUT Components

The components reachable from the Giant can be found by BFS applied to any
vertex in the Giant. So here is a node representating the Giant:

In [18]:
rep = min(sccs[0])
rep

18

This variant of BFS considers (only) the `successors` of a node $x$ as its neighbors.

In [19]:
def outreach(G, x):
    """find all nodes that can be reached 
    from node x in the directed graph G"""
    nodes = {x}
    queue = [x]
    for y in queue:
        for z in G.successors(y):
            if z not in nodes:
                nodes.add(z)
                queue.append(z)
    return nodes

The resulting list of nodes consists of the Giant and the OUT components.

In [20]:
out = outreach(G, rep)
len(out)

9

The components reachable from the Giant can be found by BFS applied to any
vertex in the Giant, following the arrows in reverse. 

This variant of BFS considers (only) the `predecessors` of a node $x$ as its neighbors.

In [21]:
def innreach(G, x):
    """find all nodes that can reach 
    node x in the directed graph G"""
    nodes = {x}
    queue = [x]
    for y in queue:
        for z in G.predecessors(y):
            if z not in nodes:
                nodes.add(z)
                queue.append(z)
    return nodes

The resulting list of nodes consists of the Giant and the IN components.

In [22]:
inn = innreach(G, rep)
len(inn)

24

The Giant is the intersection of `inn` and `out`. 

In [23]:
giant = inn & out
len(giant)

3

Let's call the union of `inn` and `out` the **core** of the graph `G`.

In [24]:
core = inn | out
len(core)

30

And let's remove the Giant from `inn` and `out`:

In [25]:
inn1 = inn - out
out1 = out - inn
len(giant), len(inn1), len(out1)

(3, 21, 6)

## Code Corner

### `python`

* `setX & setY`, `setX | setY`, `setX - setY`: [[doc]](https://docs.python.org/2/library/stdtypes.html#set) set operations, intersection, union, difference


* `x in setY`, `x not in setY`: set membership


* `setX.issubset(setY)`, `setX <= setY`, `setX < setY`: subset relationship

* `sorted`: [[doc]](https://docs.python.org/3/library/functions.html#sorted)