# Quick Comparisons

In this notebook, we'll run a few quick comparisons between dyadic Louvain, single-stage hypergraph Louvain, and full hypergraph Louvain on the `contact-primary-school` data set. The highlight here is that `hypergraph louvain` can, when initialized with reasonable parameters (as obtained e.g. from a warm start) achieve higher polyadic modularity than dyadic Louvain. This is what we would ideally expect, but it's still nice to see it indeed happening. Additionally, as we would expect, the supernode steps do indeed help. 

The number of nodes is not large, but there's a fairly large number of edges, which may explain the relatively slow computation time. We might have opportunities to do better here. 

Finally, it's worth noting that the modularity of the partition we find is actually higher than that of the true labels, which can be interpreted as either a bug or a feature. 

In [26]:
using Optim 
# using Plots
using StatsBase

using Pkg; Pkg.activate(".")
using HypergraphModularity

[32m[1m Activating[22m[39m environment at `~/hypergraph_modularities_code/Project.toml`


In [52]:
dataset = "congress-bills"
kmax_ = 10

H, Z = read_hypergraph_data(dataset,kmax_)

Z = convert(Array{Int16, 1}, Z) # type conversion for faster partitionize method

H.E[1] = Dict()

kmin = max(minimum(keys(H.E)), 2)
kmax = maximum(keys(H.E))

α0 = vcat(repeat([0.0], kmax), 1:kmax)

n = length(H.D)

function ω(p, α)
    k = sum(p)
    return sum(p)/sum((p .* (1:length(p)).^α[k])) / n^(α[kmax+k]*k)
end

Ω = partitionIntensityFunction(ω, kmax);

In [53]:
for k = kmin:kmax
    p = mean([length(partitionize(Z[e])) == 1 for e in keys(H.E[k])])
    println("k = $k: $(round(100*p, digits = 0)) % of $(length(keys(H.E[k]))) edges are within a single group.")
end

k = 2: 61.0 % of 13871 edges are within a single group.
k = 3: 46.0 % of 10156 edges are within a single group.
k = 4: 37.0 % of 7764 edges are within a single group.
k = 5: 32.0 % of 5780 edges are within a single group.
k = 6: 28.0 % of 4829 edges are within a single group.
k = 7: 26.0 % of 4090 edges are within a single group.
k = 8: 22.0 % of 3616 edges are within a single group.
k = 9: 21.0 % of 3250 edges are within a single group.
k = 10: 19.0 % of 2837 edges are within a single group.


In [54]:
sum([length(H.E[k]) for k in kmin:kmax])

56193

In [55]:
println("There are $(length(H.D)) nodes.")

There are 1718 nodes.


In [56]:
timeAlg(expr)= @timed eval(expr)

algDict = Dict(
    "Dyadic"                    => :(CliqueExpansionModularity(H)),
#     "Hypergraph (no supernode)" => :(HyperLouvain(H,kmax,Ω;α=α̂, verbose=false)),
    "Hypergraph (supernode)"    => :(SuperNodeLouvain(H,kmax,Ω;α=α̂, verbose=false))
)

α̂ = α0

print(rpad("algorithm", 30))
print(rpad("Q", 15))
print(rpad("groups", 10))
println(rpad("time (s)", 10))
println(rpad("",  65, "-"))

Ẑ = zero(Z)

# for name in ["Dyadic", "Hypergraph (no supernode)", "Hypergraph (supernode)"]
for name in ["Dyadic", "Hypergraph (supernode)"]
    out = timeAlg(algDict[name])
    Ẑ = out[1]
    time = out[2]
    if name == "Dyadic"
        α̂ = learnParameters(H, Ẑ, Ω, α0; n_iters = 100, amin = -10, amax = 10)
    end
    
    Q = modularity(H, Ẑ, Ω; α = α̂)
    
    print(rpad("$name", 30))
    print(rpad("$(round(Q, digits = 0))", 15))
    print(rpad("$(length(unique(Ẑ)))", 10))
    println(rpad("$time", 10))
end

print(rpad("TRUE LABELS", 30))
Q = modularity(H, Z, Ω; α=α̂)
print(rpad("$(round(Q, digits=0))", 15))
print(rpad("$(length(unique(Z)))", 10))
println(rpad("NA", 10))

algorithm                     Q              groups    time (s)  
-----------------------------------------------------------------
Dyadic                        -2.71871e+06   6         7.661250316
Hypergraph (supernode)        -2.71843e+06   6         475.105061499
TRUE LABELS                   -2.890219e+06  2         NA        
