# 3 
# The Internet Network
## Introduction
- Network science has developed quickly over recent year because of the internet.
- Internet : technically only the physical layer of PCs, computers, and servers connected by cables.
- World Wide Web, Wikipedia, Facebook, Twitter...
- [GGG(Giant global Graph)](https://en.wikipedia.org/wiki/Giant_Global_Graph) : to refer to the next revolution where all the information produced and stored in various services will be <u>aggregated, categorised, and distributed in various formats according to the user’s need</u>(by Tim Berners-Lee)

## Data from CAIDA 
- *The Internet* is <u>the set of the various computers worldwide</u>, connected by cables, servers etc, it indicates a <u>physical framework</u>.
- “Looking for something on the Internet" : mean <u>browsing a web site</u> stored in one of those computers.
- The existence of this network is due to <u>military research first</u> (how to build a network for communication able to work after bombing and destruction of some of its parts) , while <u>scientific needs</u> (how to efficiently share resource and information) appeared only late.
- Protocole Suite(TCP/IP) - fixed as 1982, able to operate independently from the HW available.
- Becasue of the redundacy of connections, provide high immunity from the deleterious effects of damage
- So, no complete map ofthe Internet available.
- <span style="font-family:Courier; font-size:1em;">Traceroute</span> : traces the path data takes from one computer to another.
- These “hops" count the intermediate devices (like routers) through which data must pass between source and destination, rather than flowing directly over a simple cable. Each device along the data path constitutes a hop,or in other words is a vertex in the graph. Therefore a hop count gives the distance of two nodes in the Internet network.
- Various projects have set up repositories of traceroute data,l while the most comprehensive repository is based in CAIDA (Center for Applied Internet Data Analysis)
- The best source of data for the Internet is from the Center for Applied Internet Data Analysis(CAIDA), based at the University of Californica's San Diego Supercomputer Center.
- CAIDA is "a colloboration of different organisations in the commercial, goverment, and research sectors investigating practical and theoretical aspects of the Internet in order to:
    - provide macroscopic insights into Internet infrastructure, behavior, usage, and evolution,
    - foster a collaborative environment in which data can be acquired, analyzed, and (as appropriate) shared,
    - improve the integrity of the field of Internet science,
    - inform science, technology, and communications public policies."

<img src="http://cheswick.com/ches/map/gallery/aug98-ip.gif">
<center>A snapshot of the global Internet map, realised by the Internet Mapping Project (http://cheswick.com/ches/map/) on August 1998</center>
    
## Visualisation
- How to visually represent a graph
- The bset visualisation is the one that reduces the number of crossing edges.
- As in almost all cases of complex networks applications, the graphs are rather sparse, therefore, it is <u>not</u> particularly <u>efficient to keep in the memory the whole adjacency matrix</u>; rather a <u>better choice is to consider the list of edges</u>.
- Tools
    - [Pajek software](http://mrvar.fdv.uni-lj.si/pajek/)
    - [Grephi](https://gephi.org)

In [None]:
%pylab inline

In [None]:
# Network from SVG with the best node positioning
import networkx as nx
from bs4 import BeautifulSoup

def Graph_from_SVG(stream):
    G=nx.Graph()
    attrs = {
        "line" :  ["x1","y1","x2","y2"]
    }
    
    op = open(stream,"r")
    xml = op.read()
    
    soup = BeautifulSoup(xml, 'lxml')
    
    count=0
    node_diz={}
    pos={}
    for attr in attrs.keys():
        tmps = soup.findAll(attr)
        for t in tmps:
            node1=(t['x1'],t['y1'])
            node2=(t['x2'],t['y2'])            
            if not node1 in node_diz:
                node_diz[node1]=str(count)
                pos[str(count)]=(float(node1[0]),float(node1[1]))
                count+=1
            if not node2 in node_diz:                
                node_diz[node2]=str(count)
                pos[str(count)]=(float(node2[0]),float(node2[1]))
                count+=1
            G.add_edge(node_diz[node1],node_diz[node2])
            
    #save the graph in an edge list format
    nx.write_edgelist(G, "./data/test_graph.dat",data=False)
    
    return G,pos

In [None]:
# Visualisation tools
# getting the network in the SVG format 
file="./data/test_graph.svg"
(G,pos)=Graph_from_SVG(file)

#plot the optimal node distribution
nx.draw(G, pos, node_size = 150, node_color='black')
# nx.draw_networkx(G, pos, node_size = 100, node_color='black')
#save the graph on a figure file
savefig("./data/test_network_best.png", dpi=200)

In [None]:
#plotting the basic network
G=nx.read_edgelist("./data/test_graph.dat")
# graphviz_pos=nx.graphviz_layout(G)
graphviz_pos=nx.nx_pydot.graphviz_layout(G)
nx.draw(G, graphviz_pos, node_size = 150, node_color='black')
# nx.draw_networkx(G, pos, node_size = 100, node_color='black')
#save the graph on a figure file
savefig("./data/test_network_graphviz.png", dpi=200)

## Importance or centrality
- The centrality of a vertex or edge is generally perceived as <u>a measure of the importance of this element</u> within the whole network.
<img src="./Fig.3.3.png">
<center><font size=-1>[Examples of A) degree centrality,B) closeness centrality,C) betweenness centrality, D) eigenvector centrality of the same graph.]</font></center>
- 참고 : http://bab2min.tistory.com/554

### Degree centrality(연결 중심성)
- One “local" measure of centrality is to look for <u>the vertices with the largest degrees</u>.
- Being very well connected they are probably often visited by anyone travelling on the graph.
- This quantity called “**degree centrality**" is local since it can only be computed by checking the vertex itself and, in most cases, it represents <u>a fast and reasonably accurate quantity</u> to describe the importance of vertices in a graph.

In [None]:
# Degree sequence
degree_centrality = nx.degree(G)
print(degree_centrality)

In [None]:
# Original

l=[]
res = degree_centrality

for n in G.nodes():
#     if not res.has_key(n):
    if not n in res:        
        res[n] = 0.0
    l.append(res[n])
    
nx.draw_networkx_edges(G, pos)
for n in G.nodes():
    list_nodes=[n]
    color = str((res[n]-min(l))/float((max(l)-min(l))))
    nx.draw_networkx_nodes(G, {n: pos[n]}, [n], 
                           node_size=100, 
                           node_color=color, cmap=plt.cm.gray)
#     print(color)
    
savefig("./data/degree_200.png", dpi=200)

In [None]:
# Modified by etc.
l=[]
colors=[]
res = degree_centrality

for n in G.nodes():
#     if not res.has_key(n):
    if not n in res:        
        res[n] = 0.0
    l.append(res[n])

for n in G.nodes():
    list_nodes=[n]   # 도대체 용도가?
    colors.append( str((res[n]-min(l))/float((max(l)-min(l)))))

nx.draw_networkx_edges(G, pos)
nx.draw_networkx_nodes(G, pos, G.nodes(), 
                       node_size=100, 
                       node_color=colors, cmap=plt.cm.Greys_r).set_edgecolor('k')
# nx.draw_networkx_labels(G, pos)

    
savefig("./data/degree_200.png", dpi=200)

### Closeness centrality(근접 중심성)
- A non-local definition of centrality is <b>based on the notion of distance</b>.
- The quantity is non-local since we need to inspect the whole graph to compute it.  <--??
- **The lower the distance from the other vertices the larger is the closeness**.

For vertex $i$, the closeness $c_i$ formula
$$c_i=\frac{1}{\sum_{j\neq i}d_{ij}}$$
For networks that are not strongly connected, a viable alternative is harmonic centrality:
$$c^h_i=\sum_{j\neq i} \frac{1}{d_{ij}}=\sum_{d_{ij} < \infty, j\neq i} \frac{1}{d_{ij}}$$

- To compute these centrality measures we need a function that computers all the distances from a root node. Use [BFS alorithm](https://en.wikipedia.org/wiki/Breadth-first_search).

In [None]:
# Distance function
def node_distance(G, root_node):
    queue=[]
    list_distances=[]
    queue.append(root_node)
    
    # deleting the old keys
    if 'distance' in G.node[root_node]:
        for n in G.nodes():
            del G.node[n]['distance']
    G.node[root_node]['distance'] = 0
    
    while len(queue):
        working_node = queue.pop(0)
        for n in G.neighbors(working_node):
            if len(G.node[n]) == 0:
                G.node[n]['distance']=G.node[working_node]['distance']+1
                queue.append(n)
    
    for n in G.nodes():
        list_distances.append(((root_node,n), G.node[n]['distance']))
        
    return list_distances

In [None]:
## Closeness
norm=0.0
diz_c={}
l_values=[]
colors=[]

for n in G.nodes():
    l = node_distance(G, n)
    ave_length=0
    for path in l:
        ave_length += float(path[1])/(G.number_of_nodes()-1-0)  ## ?? -1-0
        
    norm += 1/ave_length
    diz_c[n] =1 /ave_length
    l_values.append(diz_c[n])

for n in G.nodes():
    list_nodes=[n]
    colors.append(str((diz_c[n] - min(l_values)) / (max(l_values) - min(l_values))))
  
nx.draw_networkx_edges(G, pos)
nx.draw_networkx_nodes(G, pos, G.nodes(), 
                       node_size=100, 
                       node_color=colors, cmap=plt.cm.Greys_r).set_edgecolor('k')

savefig("./data/closeness_200.png", dpi=200)

### Betweenness centrality(매개 중심성)
- Another “non-local" way to <u>measure the importance of one vertex or edge is to check how often we visit it when walking on the network</u>.
    $$b(i)=\sum_{\substack{j,l = 1, n\\ i\neq j\neq l}}\frac{D_{jl}(i)}{D_{jl}}$$
-  $D_{jl}$ is the total number of different shortest paths (distances) going from $j$ to $l$ and $D_{jl}(i)$ is the subset of those distances passing through $i$. The sum runs over all pairs with $i\neq j \neq l$.
- The larger the degree of a vertex, the larger is on average its betweenness; the two quantities are correlated and it is possible to connect the properties of the betweenness distribution to that of the degree distribution

In [None]:
## Betweenness
list_of_nodes=G.nodes()
num_of_nodes=G.number_of_nodes()
bc={}

for i in range(num_of_nodes-1):
    for j in range(i+1, num_of_nodes):
        paths=nx.all_shortest_paths(G, source=list_of_nodes[i], target=list_of_nodes[j])
        count=0.0
        path_diz={}
        for p in paths:
            count+=1.0
            for n in p[1:-1]:
                if not n in path_diz:
                    path_diz[n]=0.0
                path_diz[n] += 1.0
                
        for n in path_diz.keys():
            path_diz[n]=path_diz[n]/count
            if not n in bc:
                bc[n] = 0.0
            bc[n] += path_diz[n]
                            
l=[]
colors=[]
res=bc
for n in G.nodes():
#     if not res.has_key(n):
    if not n in res:
        res[n]=0.0
    l.append(res[n])
    
for n in G.nodes():
    list_nodes=[n]
    colors.append(str((res[n]-min(l))/(max(l)-min(l))))
  
nx.draw_networkx_edges(G, pos)
nx.draw_networkx_nodes(G, pos, G.nodes(), 
                       node_size=100, 
                       node_color=colors, cmap=plt.cm.Greys_r).set_edgecolor('k')

savefig("./data/betweenness_200.png", dpi=200)

- Betweenness centrality is particulary <u>useful in the case of commnunity detection</u>.
- it is a <u>measure of the “bridging" properties of one vertex/edge</u> so that <u>edges with large betweenness are likely to bridge different communities</u>.
- If we remove them we can isolate the communities present in the graph.
- The idea is to recursively compute the betweenness of the various edges in the network and to remove those with the largest values. In this way,isolated communities emerge from the web of connections. 

### Eigenvector centrality(고유벡터 중심성)
- Based on the spectral properties of the adjacency matrix $A$.
- The centrality of a vertex $i$ as the average of the centrality of its neighbours
$$c_i=\frac{1}{\lambda}\sum_{j=1, N}a_{ij}c_j$$
In its vectorial form the above equation
$$A\overrightarrow{c}=\lambda\overrightarrow{c}$$
- That is, the centrality is an eigenvector of the adjacency matrix $A$, where $\lambda$ is the corresponding eigenvalue.
- To have a physical sense the above eigenvalue must be real, but in general this is not always ensured.
- To partly overcome these problems it is a good choice to take $\lambda$ as the largest (in absolute value) eigenvalue of matrix $A$.
- By [Perron-Frobenius theorem](https://en.wikipedia.org/wiki/Perron–Frobenius_theorem), <u>if $A$ is irreducible, or equivalently if the graph is (strongly) connected, then the eigenvector $\overrightarrow{c}$ is both unique and positive</u>.
- Use [Von Mises (power) iteration method](https://en.wikipedia.org/wiki/Power_iteration). The idea is to <u>start with a good approximation of the eigenvector related to the largest eigenvalue</u> (dominant eigenvector), or <u>directly from a random one</u>, and <u>iterate the vector coefficients according to the relation</u>
$$b_{k+1}=\frac{Ab_k}{\lvert\lvert{Ab_k}\rvert\rvert}$$

In [None]:
## Eigenvector centrality
centrality = nx.eigenvector_centrality_numpy(G)

l=[]
colors=[]
res=centrality

for n in G.nodes():
#     if not res.has_key(n):
    if not n in res:
        res[n]=0.0
    l.append(res[n])
    
for n in G.nodes():
    list_nodes=[n]
    colors.append(str((res[n]-min(l))/(max(l)-min(l))))
    
nx.draw_networkx_edges(G, pos)
nx.draw_networkx_nodes(G, pos, G.nodes(), 
                       node_size=100, 
                       node_color=colors, cmap=plt.cm.Greys_r).set_edgecolor('k')

savefig("eigenvector_200.png", dpi=200)

## Robustness and resilience, giant component
- Robustness and resilience are concepts often <u>invoked in the field of critical infrastructure</u>, as for example with the Internet, water pipelines, and electricity grid.
- The first quantity i.e. **robustness** is more a static property referring to how well a system can resist an attack or failures, before being disrupted.
- The second quantity, i.e. **resilience** is more dynamic and describes how a system can reshape itself to avoid being disrupted.
<img src="./Fig.3.4.png" width=300>
<center><font size=-1>[An example of a network with two components]</font></center>

In [None]:
## Generating the graph with two components
G_test = nx.Graph()
G_test.add_edges_from([('A', 'B'), ('A', 'C'), ('C', 'D'), ('C', 'E'),
                       ('D', 'F'), ('D', 'H'), ('D', 'G'), ('E', 'G'),
                       ('E', 'I')])
G_test.add_node('X')
nx.draw(G_test)
# nx.draw(G_test, with_labels=True)

savefig("components_200.png", dpi=200)

In [None]:
## Giant component through a breadth first search
def giant_component_size(G_input):
    G=G_input.copy()
    components=[]
    node_list=G.nodes()
    
    while len(node_list) != 0:
        root_node=node_list[0]
        component_list=[]
        component_list.append(root_node)
        queue=[]
        queue.append(root_node)
        G.node[root_node]['visited'] = True
        while len(queue):
            working_node = queue.pop(0)
            for n in G.neighbors(working_node):
                if len(G.node[n]) == 0:
                    G.node[n]['visited'] = True
                    queue.append(n)
                    component_list.append(n)
        components.append((len(component_list), component_list))
        # remove the nodes of the component just discoverd
        for i in component_list:
            node_list.remove(i)
            
    components.sort(reverse=True)
    
    GiantComponent = components[0][1]
    SizeGiantComponent = components[0][0]
    
    return GiantComponent, len(components)

(GCC, num_components) = giant_component_size(G_test)

print("Giant Connected Component:", GCC)
print("Number of components:", num_components)

In [None]:
## Breaking the GCC
import copy

def breaking_graph(H, node_list):
    n_l = copy.deepcopy(node_list)
    #iterate deleting from the GCC while the graph comprises 
    # one component (num_components=1)
    num_components=1
    count=0
    
    while num_components == 1:
        count += 1
        #select at random an element in the node list 
        #node_to_delete=random.choice(H.nodes())
        #select a node according to the betweenness ranking 
        #(the last in the list)
        node_to_delete=n_l.pop()
        H.remove_node(node_to_delete)
        num_components=nx.number_connected_components(H)
    return count

(GCC, num_components)=giant_component_size(G_test)

G_GCC = G_test.subgraph(GCC)

random_list=copy.deepcopy(G_GCC.nodes())
random.shuffle(random_list)

c=breaking_graph(G_GCC, random_list)

print("num of iterrations:", c)

graphviz_pos=nx.nx_pydot.graphviz_layout(G_GCC)

nx.draw(G_GCC, graphviz_pos, node_size=200, with_labels=True)

savefig("./data/broken_component_200.png", dpi=200)

In [None]:
## Breaking up the giant connected component randomly
G_AS = nx.read_edgelist("./data/AS-19971108.dat")
print("number of nodes:", G_AS.number_of_nodes(), 
      "nubmer of edges:", G_AS.number_of_edges())

(GCC, num_components) = giant_component_size(G_AS)

n_iter=1000
count=0.0

for i in range(n_iter):
    G_GCC = G_AS.subgraph(GCC)
    random_list=copy.deepcopy(G_GCC.nodes())
    random.shuffle(random_list)
    c=breaking_graph(G_GCC, random_list)
    count += c
    
# graphviz_pos=nx.nx_pydot.graphviz_layout(G_AS)
# nx.draw(G_AS, graphviz_pos, node_size=100, with_labels=True)
    
print("average iterations to break GCC:", count/n_iter)

In [None]:
## Breaking up the giant connected component with betweenness centrality

import operator

G_GCC = G_AS.subgraph(GCC)

node_centrality = nx.betweenness_centrality(G_GCC, k=None, 
                                           normalized=True,
                                           weight=None,
                                           endpoints=False,
                                           seed=None)

sorted_bc = sorted(node_centrality.items(), key=operator.itemgetter(1))

node_ranking=[]
for e in sorted_bc:
    node_ranking.append(e[0])
    
c=breaking_graph(G_GCC, node_ranking)

# graphviz_pos=nx.nx_pydot.graphviz_layout(G_GCC)
# nx.draw(G_GCC, graphviz_pos, node_size=20)

print("num of iterations:", c)

https://en.wikipedia.org/wiki/Autonomous_system_(Internet)