## Clustering on the event hyper graph
### Node similarity measure:
- topological overlap + event/sentence embeddings
### Group similarity measure:
- average similarity of node pairs between cluster pairs
### Procedure:
- convert hypergraph to weighted normal graph (or don't?)
- Assign each node to its own cluster and evaluate similarity measure for all node pairs
- merge node pairs with highest similarity measure into the same community 
- - how many pairs to merge?
- repeat by merging clusters in the same way until no merge is available
### dual of hypergraph
- link clustering can be achieved by doing clustering on the dual of a hypergraph

In [11]:
import networkx as nx
import hypernetx as hnx
import numpy as np
import json
import hypernetx.algorithms.hypergraph_modularity as hmod
from cdlib.algorithms import ilouvain

Note: to be able to use all crisp methods, you need to install some additional packages:  {'graph_tool', 'infomap', 'wurlitzer', 'leidenalg', 'karateclub'}
Note: to be able to use all overlapping methods, you need to install some additional packages:  {'karateclub', 'ASLPAw'}
Note: to be able to use all bipartite methods, you need to install some additional packages:  {'infomap', 'wurlitzer', 'leidenalg'}


In [None]:
def common_neighbors(H, u, v):
    return len(list(hnx.common_neighbors(H, u, v)))

def node_similarity(node1, node2):
    return 1
def cluster_similarity(cluster1, cluster2):
    """
    cluster1, cluster2: list of nodes
    """
    return 1

In [None]:
# read network
B = nx.node_link_graph(json.load(open('data/result/RAMS/biHgraph_dev/hgraph.json')))
H = hnx.Hypergraph.from_bipartite(B)
list(H.shape)

## reduce hypergraph to two-section graph with edge reweighting proposed in [1]


[1] Kumar T., Vaidyanathan S., Ananthapadmanabhan H., Parthasarathy S. and Ravindran B. “A New Measure of Modularity in Hypergraphs: Theoretical Insights and Implications for Effective Clustering”. In: Cherifi H., Gaito S., Mendes J., Moro E., Rocha L. (eds) Complex Networks and Their Applications VIII. COMPLEX NETWORKS 2019. Studies in Computational Intelligence, vol 881. Springer, Cham

In [None]:
def embedding_kumar(HG, node_embeddings, delta=0.01):
    """
    Compute a partition of the vertices in hypergraph HG as per Kumar's algorithm [1]_
    But instead of using normal clustering, use the modified clustering algorithm that considers embeddings

    Parameters
    ----------
    HG : Hypergraph

    node_embeddings : dict of node -> { node: dict of embeddings -> { 0: d0, 1: d1, 2: d2, ...} }

    delta : float, optional
        convergence stopping criterion

    Returns
    -------
    : list of sets
       A partition of the vertices in HG

    """
    # weights will be modified -- store initial weights
    W = {
        e: HG.edges[e].weight for e in HG.edges
    }  # uses edge id for reference instead of int
    # build graph
    G = hmod.two_section(HG)
    # apply clustering
    # TODO: use modified clustering algorithm
    # CG = G.community_multilevel(weights="weight")
    CG = ilouvain(G, node_embeddings, id)
    CH = []
    for comm in CG.as_cover():
        CH.append(set([G.vs[x]["name"] for x in comm]))
    # LOOP
    diff = 1
    ctr = 0
    while diff > delta:
        # re-weight
        diff = 0
        for e in HG.edges:
            edge = HG.edges[e]
            reweight = (
                sum([1 / (1 + HG.size(e, c)) for c in CH])
                * (HG.size(e) + len(CH))
                / HG.number_of_edges()
            )
            diff = max(diff, 0.5 * abs(edge.weight - reweight))
            edge.weight = 0.5 * edge.weight + 0.5 * reweight
        # re-run louvain
        # build graph
        G = hmod.two_section(HG)
        # apply clustering
        # TODO: use modified clustering algorithm
        CG = G.community_multilevel(weights="weight")
        CH = []
        for comm in CG.as_cover():
            CH.append(set([G.vs[x]["name"] for x in comm]))
        ctr += 1
        if ctr > 50:  # this process sometimes gets stuck -- set limit
            break
    G.vs["part"] = CG.membership
    for e in HG.edges:
        HG.edges[e].weight = W[e]
    return hmod.dict2part({v["name"]: v["part"] for v in G.vs})    


In [None]:
partitions = embedding_kumar(H)

In [None]:
len(partitions)
for partition in partitions:
    print(partition)

## Use I-louvain implemented in CDLib to do attributed node clustering
- the attributes are the embeddings