# Network Connectivity

Importing and analyzing an internal email communication network between employees of a mid-sized manufacturing company.
Each node represents an employee and each directed edge between two nodes represents an individual email. The left node represents the sender and the right node represents the recipient.

[Networkx basic tutorial](http://pynetwork.readthedocs.io/en/latest/networkx_basics.html)

In [None]:
import networkx as nx

In [None]:
!find ../.. |grep -i email_network

#### Load a directed multigraph networkx graph

In [None]:
G = nx.read_edgelist('../../_data/email_network.txt', data=[('time', int)], create_using=nx.MultiDiGraph())
assert G.is_directed()
assert G.is_multigraph()
print(nx.info(G))

#### Nodes and edges

In [None]:
frequency = G.degree()
employees = len(G.nodes())
emails = len(G.edges())
employees, emails

#### Connectivity

In the mathematical theory of directed graphs, a graph is said to be __strongly connected__ if every node is reachable from every other node.
 
* Part 1. __Strongly connected__
Assume that information in this company can only be exchanged through email. When an employee sends an email to another employee, a communication channel has been created, allowing the sender to provide information to the receiver, but not vice versa. Based on the emails sent in the data, is it possible for information to go from every employee to every other employee?
    
* Part 2. __Weakly connected__
Assume that a communication channel established by an email allows information to be exchanged both ways. Based on the emails sent in the data, is it possible for information to go from every employee to every other employee?

*This function should return a tuple of bools (part1, part2).*


In a directed graph, we can make connectivity symmetric in one of two different ways:

Define `u` to be strongly connected to `v` if `u →* v` and `v →* u`. 
I.e. `u` and `v` are strongly connected if you can go from `u` to `v` and back again (not necessarily through the same nodes). 
Easy to see that strong connectivity is an equivalence relation. Equivalence class are called strongly-connected components. G is strongly connected if it has one strongly-connected component, i.e. if every node is reachable from every other node.

Define u to be weakly connected to `v` if `u →* v` in the undirected graph obtained by ignoring edge orientation.

Intuition is that `u` is weakly connected to `v` if there is a path from `u` to `v` if you are allowed to cross edges backwards. Weakly-connected components are defined by equivalence classes; graph is weakly-connected if it has one component. 

__Weak connectivity is a "weaker" property that strong connectivity in the sense that if `u` is strongly connected to `v`, then `u` is also weakly connected to `v`; but the converse does not necessarily hold.__

In [None]:
# 1. Strongly connected?
strong = nx.is_strongly_connected(G)

# 2. Weak connected?
weak = nx.is_weakly_connected(G)
strong, weak

#### Number of nodes are in the largest (in terms of nodes) weakly connected component

In [None]:
no_weak_components = nx.number_weakly_connected_components(G)
no_weak_nodes = len(list(nx.weakly_connected_components(G))[0])
no_weak_nodes

#### Number of nodes are in the largest (in terms of nodes) strongly connected component

In [None]:
#     no_strong_components = nx.number_strongly_connected_components(G)

components = sorted(nx.strongly_connected_components(G))
max_nodes_per_component = max([len(c) for c in components])

# or
#     components = nx.strongly_connected_component_subgraphs(G)
#     max_nodes_per_component = max([len(c) for c in components])

max_nodes_per_component

#### Subgraph of nodes in a largest strongly connected component

In [None]:
G_sc = [Gc for Gc in sorted(nx.strongly_connected_component_subgraphs(G), key=len, reverse=True)][0]
print(nx.info(G_sc))

#### Average distance between nodes in G_sc

In [None]:
nx.average_shortest_path_length(G_sc)

#### Largest possible distance between two employees in G_sc

In [None]:
nx.diameter(G_sc)

#### set of nodes in G_sc with eccentricity equal to the diameter

In [None]:
set(nx.periphery(G_sc))

#### set of node(s) in G_sc with eccentricity equal to the radius

In [None]:
set(nx.center(G_sc))

### Most connected node by shortest path equal to graph diameter
Which node in G_sc is connected to the most other nodes by a shortest path of length equal to the diameter of G_sc?
How many nodes are connected to this node?
*This function should return a tuple (name of node, number of satisfied connected nodes).*

In [None]:
from collections import Counter

peri_nodes = nx.periphery(G_sc)
diameter = nx.diameter(G_sc)
list(peri_nodes)
diameter

# max(frequency of values == diameter per node)
node_11 = max([
    (node, Counter(nx.shortest_path_length(G_sc, node).values())[diameter])
    for node in peri_nodes])
node_11

#### Node cut
Suppose you want to prevent communication from flowing to the node that you found in the previous question from any node in the center of G_sc, what is the smallest number of nodes you would need to remove from the graph (you're not allowed to remove the node from the previous question or the center nodes)?

In [None]:
center_nodes = nx.center(G_sc)
center_nodes[0]

no_cut_nodes = len([nx.minimum_node_cut(G_sc, cn, node_11[0]) for cn in center_nodes][0])
no_cut_nodes

#### Construct an undirected graph G_un using G_sc (you can ignore the attributes).

In [None]:
assert G_sc.is_multigraph() | G_sc.is_directed()

G_un = nx.Graph(G_sc).to_undirected()

assert ~G_un.is_multigraph() & ~G_un.is_directed()
print(nx.info(G_un))

#### Transitivity and average clustering coefficient of graph G_un

In [None]:
transitivity = nx.transitivity(G_un)
avg_clustering = nx.average_clustering(G_un)

transitivity, avg_clustering