Ok, so in this notebook we are going to generate a *dyadic* graph, which we will then attempt to cluster. 

In [32]:
## Generate a graph
using StatsBase
using Combinatorics

include("jl/omega.jl")
include("jl/HSBM.jl")
include("jl/hypergraph_louvain.jl")
include("jl/inference.jl")

# parameters

n_ = 50
n = 2*n_
Z = vcat(repeat([1],n_), repeat([2], n_))
ϑ = dropdims(ones(1,n) + rand(1,n), dims = 1)

# defining group intensity function Ω
μ = mean(ϑ)


# because of the way the code is structured, we need to allow kmin = 1, 
# but we set Ω = 0 for all size-1 edges below. 

function ω(p,α)
    if sum(p) == 1
        return 0
    else
        return (5 .*μ*sum(p))^(-sum(p))*prod(p.^α)^(10/(sum(p)*α))
    end
end

α0 = 50

kmax = 2

Ω = buildΩ(ω, α0, kmax)

Ω (generic function with 1 method)

In [33]:
H = sampleSBM(Z, ϑ, Ω; α=α0, kmax=kmax, kmin = kmin)
# number of edges
l = length(H.E[2])
# proportion of edges in same cluster
c = mean([Z[e[1]] == Z[e[2]] for e in keys(H.E[2])]) 

println("The graph has $l edges and $(100*round(c, digits=3)) % of them are within-cluster.")

The graph has 1249 edges and 96.1 % of them are within-cluster.


In [35]:
# encouraging that this does indeed tend to decrease. I don't think it's required to be monotonically decreasing (need to check), so heuristically this looks ok-ish

Ω̂      = estimateΩEmpirically(H, Z; min_val=0)

Z_ = copy(Z)

for i = 1:5
    Z_ = HyperLouvain(H,kmax,Ω̂;α=α0)
    Ω̂  = estimateΩEmpirically(H, Z; min_val=0)
    println("The log-likelihood of the Louvain partition is $(round(logLikelihood(H, Z_, Ω̂;α=α0),digits=3)).")
end


Louvain Iteration 1
Louvain Iteration 2
Louvain Iteration 3
Louvain Iteration 4
The log-likelihood of the Louvain partition is -2805.719.

Louvain Iteration 1
Louvain Iteration 2
Louvain Iteration 3
Louvain Iteration 4
The log-likelihood of the Louvain partition is -2805.719.

Louvain Iteration 1
Louvain Iteration 2
Louvain Iteration 3
Louvain Iteration 4
The log-likelihood of the Louvain partition is -2805.719.

Louvain Iteration 1
Louvain Iteration 2
Louvain Iteration 3
Louvain Iteration 4
The log-likelihood of the Louvain partition is -2805.719.

Louvain Iteration 1
Louvain Iteration 2
Louvain Iteration 3
Louvain Iteration 4
The log-likelihood of the Louvain partition is -2805.719.


In [36]:
Zsing = collect(1:n)

# likelihoods with true parameters

println("The log-likelihood of the true partition is $(round(logLikelihood(H, Z, Ω, ϑ;α=α0),digits=3)).")
println("The log-likelihood of the singleton partition is $(round(logLikelihood(H, Zsing, Ω, ϑ;α=α0),digits=3)).")

The log-likelihood of the true partition is -2841.992.
The log-likelihood of the singleton partition is -6740.588.


In [37]:
Z_ == Z

true