This network is `bipartite`. Project it into unipartite and find five communities with the `Girvan-Newman edge betweenness algorithm` (repeat for both node types, so you find a total of ten communities). What is the `NMI` with this partition?

In [1]:
import itertools
import pandas as pd
import networkx as nx
from sklearn.metrics import normalized_mutual_info_score

def five_comms_gn(G):
   gn = nx.algorithms.community.girvan_newman(G)
   for communities in itertools.islice(gn, 5):
      pass
   return communities

# Load the network
G = nx.read_edgelist("data.txt", nodetype = int)
ground_truth = pd.read_csv("nodes.txt", sep = "\t")

sets = nx.bipartite.sets(G)

G1 = nx.bipartite.projected_graph(G, sets[0])
G2 = nx.bipartite.projected_graph(G, sets[1])

comms1 = five_comms_gn(G1)
comms2 = five_comms_gn(G2)

communities = []
for i in range(len(comms1)):
   communities.extend([(n, i) for n in comms1[i]])

for i in range(len(comms2)):
   communities.extend([(n, i + len(comms1)) for n in comms2[i]])

communities = pd.DataFrame(data = communities, columns = ("node", "projcomm"))

df = ground_truth.merge(communities, on = "node")

print(normalized_mutual_info_score(df["truecomm"], df["projcomm"]))



0.385979788035656


Perform asynchronous label propagation directly on the bipartite structure of the network from Exercise 35.1. Calculate the `NMI with the ground truth`. Since `asynchronous label propagation` is randomized, take the average of ten runs. Do you get a higher NMI?

In [2]:
ground_truth = pd.read_csv("nodes.txt", sep = "\t")
nmis = []

In [3]:
for _ in range(10):
   comms = list(nx.algorithms.community.asyn_lpa_communities(G))
   communities = []
   for i in range(len(comms)):
      communities.extend([(n, i) for n in comms[i]])
   communities = pd.DataFrame(data = communities, columns = ("node", "projcomm"))
   df = ground_truth.merge(communities, on = "node")
   nmis.append(normalized_mutual_info_score(df["truecomm"], df["projcomm"]))
   
print(sum(nmis) / len(nmis))

0.9309751145932792
