# Randomized Greedy Algorithm

Reference paper: Michael Ovelgonne and Andreas Geyer-Schulz, *Cluster Cores and Modularity Maximization*, IEEE International Conference on Data Mining Workshops, 2010

Let $G = (V, E)$ be an undirected, loop-free graph and $C = \{C_1, . . . , C_p\}$ a non-overlapping clustering, i.e. a clustering of the vertices of the graph into groups $C_i$ so that $\forall i, j : i \neq j \implies C_i \cap C_j = \emptyset$ and $\cup_i C_i = V$. We denote the adjacency matrix of $G$ as $M$ and the element of $M$ in the
$i$-th row and $j$-th column as $m_{ij}$ , where $m_{ij} = m_{ji} = 1$ if $(v_i, v_j) \in E$ and otherwise $m_{ij} = m_{ji} = 0$. The modularity $Q$ of the clustering $C$ is

$$Q = \sum_i (e_{ii} - a_i^2)$$
with
$$e_{ij} = \frac{\sum_{v_x \in C_i}\sum_{v_y \in C_j} m_{xy}}{\sum_{v_x \in V}\sum_{v_y \in V} m_{xy}}$$
and
$$a_i = \sum_j e_{ij}.$$

Objective: Find clastering $C$ such that modularity $Q$ is maximized.

In [2]:
import numpy as np
import networkx as nx

### Loading instances

In [4]:
def load_graph(file_path):
    with open(file_path, 'r') as file:
        lines = file.readlines()
        
        tokens = lines[0].split()
        n, m = int(tokens[0]), int(tokens[1])
                        
        G = nx.Graph()
    
        nodes = np.arange(1, n + 1)
        G.add_nodes_from(nodes)
        
        for i in range(1, n + 1):
            neighbours = map(int, lines[i].split())
            for j in neighbours:
                G.add_edge(i, j)
                
        return G

In [5]:
file_paths = ['Instances/karate.graph', 'Instances/football.graph', 'Instances/netscience.graph', 'Instances/PGPgiantcompo.graph']
graphs = []

for file_path in file_paths:
        graphs.append(load_graph(file_path))

### Modularity function

In [6]:
def number_of_connecting_edges(G, C, i, j):
    res = 0
    for u in C[i]:
        for v in C[j]:
            if G.has_edge(u, v):
                res += 1
    return res

def number_of_claster_edges(G, C, i):
    res = 0
    number_of_clasters = len(C)
    for j in range(number_of_clasters):
        res += number_of_connecting_edges(G, C, i, j)
    return res

def fraction_of_connecting_edges(G, C, i, j):
    return number_of_connecting_edges(G, C, i, j) / G.number_of_edges()
        
def fraction_of_claster_edges(G, C, i):
    return number_of_claster_edges(G, C, i) / G.number_of_edges()

def modularity(G, C):
    res = 0.0
    number_of_clasters = len(C)
    for i in range(number_of_clasters):
        res += (fraction_of_connecting_edges(G, C, i, i) - fraction_of_claster_edges(G, C, i))**2
    return res

In [7]:
G = graphs[0]

C = [[node] for node in G.nodes()]

modularity(G, C)

0.19921104536489145

# Plain Greedy (PG) algorithm

In [40]:
#G.nodes()
#list(G.neighbors(5))

#Initialize
e=[]
a=[]
for node in G.nodes():
    e.append([])
    for neighbor in list(G.neighbors(node)):
        e[node-1].append(1/(2*len(G.edges())))
    sum=0
    for x in range(len(e[node-1])):
        sum+=x
    a.append(x)

#Just to check whether everything is okay.
print(len(G.nodes()))
print(len(a))
print(a)

#Build Dendrogram (Greedy)


34
34
[15, 8, 9, 5, 2, 3, 3, 3, 4, 1, 2, 0, 1, 4, 1, 1, 1, 1, 1, 2, 1, 1, 1, 4, 2, 2, 1, 3, 2, 3, 3, 5, 11, 16]


# Randomized Greedy (RG) algorithm