# TODO change some things once they are ready: import of hnx in first cell, and data loading below.

In [1]:
import sys

import numpy as np

sys.path.append("..")
from hnx.communities.hy_mmsbm.model import HyMMSBM
from hnx.core.hypergraph import Hypergraph

np.random.seed(123)

# Training the *Hy-MMSBM* model

In this tutorial, we will show how to train the *Hy-MMSBM* model on a given dataset.

# TODO change data loading once hnx module is available

In [2]:
def line_to_hyperedge(line):
    hye = line
    hye = line.strip("\n")
    hye = [int(node) for node in line.split(" ")]
    return hye

# Load Justice dataset.
with open("./_example_data/justice_data/hyperedges.txt", "r") as hye_file:
    with open("./_example_data/justice_data/weights.txt", "r") as weight_file:
        justice = Hypergraph([
            line_to_hyperedge(hye)
            for hye, weight in zip(hye_file.readlines(), weight_file.readlines())
        ])

## Model training

Training the model simply requires specifying the number $K$ of communities and whether the model needs to be assortative.

In [3]:
%%time

model = HyMMSBM(
    K=2,
    assortative=False,
)
model.fit(justice)

CPU times: user 265 ms, sys: 6.11 ms, total: 271 ms
Wall time: 284 ms


After inference, the parameters can be retrieved as attributes of the model.

In [4]:
model.u[:5]

array([[0.10145418, 1.79514088],
       [0.3400343 , 0.24403764],
       [0.04678807, 2.14195251],
       [0.06057807, 0.01772415],
       [0.27738008, 0.12582942]])

In [5]:
model.w

array([[ 5.72135364,  0.05072454],
       [ 0.05072454, 12.33828876]])

#### Additional training options

Other options can be specified:
- in the model initialization, one can specify:
    - the maximum hyperedge size (which is otherwise inferred once a hypergraph is observed).
    - the priors for $w$ and $u$, as rates of exponential distributions. These can be specified as non-negative numbers (priors equal to 0 correspond to no prior), or as numpy arrays if a non-uniform prior is expected.
- at inference time, one can specify the number of EM steps.

For example:

In [6]:
model = HyMMSBM(
    K=2,
    assortative=True,
    max_hye_size=15,
    u_prior=1.,
    w_prior=10.,
)
model.fit(
    justice,
    n_iter=500,
)

As a final option, if either $w$ or $u$ are provided at initialization, these are considered fixed parameters and will not be inferred. For example, one can fix the affinity matrix and only infer the community assigments $u$:

In [7]:
fixed_w = np.eye(2)

model = HyMMSBM(
    K=2,
    w=fixed_w,
    u_prior=0.,
)
model.fit(
    justice,
    n_iter=500,
)

The matrix stays the same, but the communities assignments have been inferred normally:

In [8]:
model.w is fixed_w

True

In [9]:
model.u[:5]

array([[3.28212021e+000, 0.00000000e+000],
       [6.96656148e-001, 0.00000000e+000],
       [3.84578964e+000, 9.59825619e-182],
       [7.26881107e-002, 0.00000000e+000],
       [4.04444449e-001, 0.00000000e+000]])