In [2]:
using StatsBase
using Combinatorics
using Plots

include("jl/omega.jl")
include("jl/HSBM.jl")
include("jl/cut.jl")

# parameters

n = 100
Z = rand(1:5, n)
ϑ = dropdims(ones(1,n) + rand(1,n), dims = 1)

# defining group intensity function Ω
μ = mean(ϑ)

kmax = 4
kmin = 1

fk = k->(2 .*μ*k)^(-k)
fp = harmonicMean

Ω_dict = Dict()

for k = kmin:kmax, p in partitions(k)
    Ω_dict[p] = fk(sum(p))*fp(p)
end

Ω = buildΩ(Ω_dict; by_size=true)

Ω (generic function with 1 method)

In [3]:
# sample from the HSBM with these parameters, restricting to hyperedges of size no more than kmax
H = sampleSBM(Z, ϑ, Ω; kmax=kmax, kmin = kmin)
countEdges(H)

35001

In [4]:
## Create a random clustering with k clusters
include("jl/cut.jl")
k = 10
c = rand(1:k,n)
term1 = first_term_eval(H,c,Ω)

-323940.5850808654

In [5]:
@time T1 = first_term_eval(H,c,Ω)
@time Hyp, w = hyperedge_formatting(H)
@time T2 = first_term_v2(Hyp,w,c,Ω)

T1 ≈ T2

  0.110945 seconds (874.00 k allocations: 59.648 MiB, 9.21% gc time)
  0.262420 seconds (392.78 k allocations: 19.712 MiB)
  0.129097 seconds (795.34 k allocations: 58.278 MiB, 15.53% gc time)


true

The CutDiff function computes the change in objective score when node I is moved to cluster J. The code is still a little sloppy, but this is a little faster and a step in the right direction over a naive approach. One thing that is clear to me after implementing this is that we will need a new way to store hyperedge. Given a query node, we need to quickly be able to access all hyperedges that node is in.

In [6]:
include("jl/cut.jl")
node2edges = EdgeMap(H)
I = rand(1:n)
J = c[I] + 1
@time a = NaiveCutDiff(H,c,I,J,Ω)
@time b = CutDiff(Hyp,w,node2edges,c,I,J,Ω)
@show b, a
a ≈ b

  0.320767 seconds (1.81 M allocations: 122.384 MiB, 5.77% gc time)
  0.074287 seconds (89.83 k allocations: 5.866 MiB, 10.51% gc time)
(b, a) = (-23.88977823429559, -23.889778234646656)


true