## Let's be investigators!

We have access to a communication network of a famous company involved in one of the largest economical scandals: enron email dataset. 

The goal: using all the measures and quantities we have seen so far analyse the network. Who are the most central nodes? How the different ranks compare?

In [1]:
# the data is in "data/email-Enron.txt"

In [1]:
# as usual we need to import the key libraries we need to store, analyse and plot the network
import networkx as nx
#import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

**1. Setup and Data Loading:**

In [2]:
# let's also get a network from the folder "data"
a=open("data/net1_edge_list.txt","r")
for i in a:
    n=i.strip().split()
    print (n)
a.close()

['1', '2']
['2', '3']
['3', '1']
['2', '4']
['1', '4']
['1', '5']
['2', '5']
['3', '5']
['5', '6']
['6', '7']
['6', '8']
['6', '9']
['7', '8']
['7', '9']
['9', '10']
['6', '11']
['11', '12']
['11', '13']
['11', '14']
['12', '15']
['13', '14']
['12', '13']


In [4]:
import networkx as nx
import matplotlib.pyplot as plt
%matplotlib inline

# Load the Enron email dataset
G = nx.Graph()
with open("data/email-Enron.txt", "r") as file:
    for line in file:
        nodes = line.strip().split()
        G.add_edge(nodes[0], nodes[1])

G

<networkx.classes.graph.Graph at 0x2cfbe7270a0>

**2. Plotting the Network:**

We can visualize the network to get an initial understanding.

In [7]:
nx.draw(G, pos=nx.spring_layout(G), alpha=0.9, node_size=10, width=0.3, edge_color="Black", node_color="Red", with_labels=True, font_size=12)

**3. Degree Centrality:**

Calculate and rank nodes by degree centrality

In [None]:
degree = sorted([[G.degree(n), n] for n in G.nodes()], reverse=True)
for i in range(10):
    print(f"Rank = {i+1}, Degree = {degree[i][0]}, Node id = {degree[i][1]}")

# Save to CSV
with open('data/Degree Centrality.csv', 'w') as f:
    f.write("Rank,Degree,Node id\n")
    for i in range(len(degree)):
     f.write(f"{i+1},{degree[i][0]},{degree[i][1]}\n")


**4. Closeness Centrality:**

Calculate and rank nodes by closeness centrality.

In [None]:
closeness = sorted([[nx.closeness_centrality(G)[n], n] for n in G.nodes()], reverse=True)
for i in range(10):
    print(f"Rank = {i+1}, Closeness = {closeness[i][0]:.2f}, Node id = {closeness[i][1]}")

# Save to CSV
with open('data/Closeness Centrality.csv', 'w') as f:
    f.write("Rank,Closeness,Node id\n")
    for i in range(len(closeness)):
        f.write(f"{i+1},{closeness[i][0]:.2f},{closeness[i][1]}\n")

**5. Betweenness Centrality:**

Calculate and rank nodes by betweenness centrality.

In [None]:
betweenness = sorted([[nx.betweenness_centrality(G)[n], n] for n in G.nodes()], reverse=True)
for i in range(10):
    print(f"Rank = {i+1}, Betweenness = {betweenness[i][0]:.2f}, Node id = {betweenness[i][1]}")

# Save to CSV
with open('data/Betweenness Centrality.csv', 'w') as f:
    f.write("Rank,Betweenness,Node id\n")
    for i in range(len(betweenness)):
        f.write(f"{i+1},{betweenness[i][0]:.2f},{betweenness[i][1]}\n")

**6. Eigenvector Centrality:**

Calculate and rank nodes by eigenvector centrality.

In [None]:
eigenvector = sorted([[nx.eigenvector_centrality(G)[n], n] for n in G.nodes()], reverse=True)
for i in range(10):
    print(f"Rank = {i+1}, Eigenvector Cent. = {eigenvector[i][0]:.2f}, Node id = {eigenvector[i][1]}")

# Save to CSV
with open('data/Eigenvector Centrality.csv', 'w') as f:
    f.write("Rank,Eigenvector,Node id\n")
    for i in range(len(eigenvector)):
        f.write(f"{i+1},{eigenvector[i][0]:.2f},{eigenvector[i][1]}\n")

By analyzing the Enron email dataset using these centrality measures, we can identify the most central individuals in the network. Comparing the ranks across different centralities can provide insights into the roles and importance of these nodes.

* Degree Centrality: Highlights nodes with the most connections.
* Closeness Centrality: Identifies nodes that can quickly reach other nodes.
* Betweenness Centrality: Finds nodes that act as bridges in the network.
* Eigenvector Centrality: Measures the influence of nodes based on their connections.