# Demo: refactored implementation of $\Omega$

This notebook gives an example of the new implementation of $\Omega$. The aim of the implementation has so far been to make as few alterations as possible to the existing codebase, while still enabling the user considerably more flexibility in the definition of $\Omega$. 

## Math Assumptions

The primary mathematical assumption behind the new implementation is that $\Omega(z)$ is always expressible as a function of the partition vector of $z$. This is the same assumption that we were using throughout our previous version. The aim of the new implementation is that special cases can be handled with significantly more peformant code. 

In [1]:
using Pkg; Pkg.activate(".")
using HypergraphModularity

using StatsBase

[32m[1m Activating[22m[39m environment at `~/hypergraph_modularities_code/Project.toml`
┌ Info: Precompiling HypergraphModularity [0c934d27-dd44-49d7-950f-bd4be7819e54]
└ @ Base loading.jl:1260
  ** incremental compilation may be fatally broken for this module **

  ** incremental compilation may be fatally broken for this module **

└ @ Base.Docs docs/Docs.jl:229
  ** incremental compilation may be fatally broken for this module **

└ @ Base.Docs docs/Docs.jl:229
  ** incremental compilation may be fatally broken for this module **

└ @ Base.Docs docs/Docs.jl:229
  ** incremental compilation may be fatally broken for this module **

└ @ Base.Docs docs/Docs.jl:229
  ** incremental compilation may be fatally broken for this module **

  ** incremental compilation may be fatally broken for this module **

└ @ Base.Docs docs/Docs.jl:229
  ** incremental compilation may be fatally broken for this module **

  ** incremental compilation may be fatally broken for this module **

  ** inc

In [2]:
n = 20
Z = rand(1:5, n)
ϑ = dropdims(ones(1,n) + rand(1,n), dims = 1)
μ = mean(ϑ)
kmax = 4;

## The `IntensityFunction` struct

Now we're ready to construct $\Omega$. In the new idiom, $\Omega$ is an `IntensityFunction`, which contains four fields. 
Let's look at two examples, which are both implemented in `src/omega.jl`. 

```julia
function partitionIntensityFunction(ω, kmax)
    range      = partitionsUpTo(kmax)
    P          = partitionize
    aggregator = identity
    return IntensityFunction(ω, P, range, aggregator)
end
```

This `IntensityFunction` corresponds to the highly general `partitionize`-based framework we were using previously. Here are its fields: 

<dl>
    <dt> ω </dt> <dd> A parameterized function whose valid inputs are the elements of `range`. </dd>
    <dt> <code>P</code> </dt> <dd> A function that maps group label vectors (i.e. subvectors of $Z$) to <i>feature</i> vectors. <code>partitionize()</code> is an example, as is the polyadic $\delta$-function. </dd>
    <dt> <code>range</code> </dt> <dd> The set of all possible values of <code>P</code>. Also assumed to be the domain of the function $\omega$. This nomenclature is perhaps confusing and may be revised.</dd>
    <dt> <code>aggregator</code> </dt> <dd> A function that maps partition vectors (of the kind returned by <code>partitionize()</code> to elements of <code>range</code>. This is included for technical purposes related to the calculation of the volume term in modularity. It would be desirable to deprecate it if possible.  </dd>
</dl>

So, the code above creates an intensity function that operates on subvectors of $Z$ by first `partitionize()`ing them and then applying the function $\omega$. No aggregation is needed.  

On the other hand, here's an alternative that corresponds to the all-or-nothing cut. In this case, the feature map $P$ computes two entries: whether or not all entries of $z$ are the same, and the length of $z$. The `range` gives all possible results (up to size `kmax`). The aggregator takes a partition vector and returns a feature. The specified function $\omega$ should then operate on the features. 

```julia
function allOrNothingIntensityFunction(ω, kmax)
    range      = [(1.0*x, y) for x = 0:1 for y = 1:kmax]
    P          = z->(all(z[1] .== z), length(z))
    aggregator = p->(length(p) == 1, sum(p))
    return IntensityFunction(ω, P, range, aggregator)
end
```
Ok, let's try it out: 



In [3]:
function ω(p, α)
    k = sum(p)
    return sum(p)/sum((p .* (1:length(p)).^α[k])) / n^(α[kmax+k]*k)
end

α = vcat(repeat([5.0], kmax), 0.2*(1:kmax))
Ω = partitionIntensityFunction(ω, kmax);

typeof(Ω)

IntensityFunction

In [4]:
H = sampleSBM(Z, ϑ, Ω;α=α, kmax=kmax, kmin = 1)

hypergraph
  N: Array{Int64}((20,)) [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
  E: Dict{Int64,Dict}
  D: Array{Int64}((20,)) [7, 5, 5, 2, 7, 5, 4, 2, 8, 3, 11, 3, 5, 6, 8, 4, 3, 1, 3, 8]


In [49]:
dataset = "contact-primary-school-classes"
kmax_ = 6

H, Z = read_hypergraph_data(dataset,kmax_)

kmax = maximum(keys(H.E))
kmin = minimum(keys(H.E))

n = length(H.D)

# full, partition-based intensity function

function ω_1(p, α)
    k = sum(p)
    return sum(p)/sum((p .* (1:length(p)).^α[k])) / n^(α[kmax+k]*k)
end

Ω_1 = partitionIntensityFunction(ω_1, kmax);

# ---

# sum of exterior
function ω_2(p,α)
    k = p[2]
    δ = p[1]
    return ((1+δ)*n)^α[k] / (n^α[k + kmax])
end

Ω_2 = sumOfExteriorDegreesIntensityFunction(ω_2, kmax);


# all-or-nothing
function ω_3(p,α)
    k = p[2]
    δ = p[1]
    return ((1+δ)*n)^α[k] / (n^α[k + kmax])
end

Ω_3 = allOrNothingIntensityFunction(ω_3, kmax);

# Quick Speed Tests

In [50]:
Z_dyadic = CliqueExpansionModularity(H);

In [51]:
α = zeros(2*kmax);

## General Partition-Based

In [52]:
α = learnParameters(H, Z_dyadic, Ω_1, α; n_iters = 100, amin = -10, amax = 10)



10-element Array{Float64,1}:
 -2.360679774997898
  2.867295705087045
  6.742418369760678
  7.6790213643603105
  7.336111719808482
  7.1070076127661705
  0.9781147969257838
  1.1851004710096498
  1.407190951502974
  1.5977988863173893

In [53]:
@time Z = HypergraphModularity.HyperLouvain(H, kmax, Ω_1; α=α, verbose = true);
println(length(unique(Z)))

Louvain Iteration 1
Louvain Iteration 2
Louvain Iteration 3
Louvain Iteration 4
  4.226131 seconds (53.82 M allocations: 2.474 GiB, 9.95% gc time)
10


## Sum of Exterior Degrees

In [54]:
α = zeros(2*kmax);

In [62]:
α = learnParameters(H, Z_dyadic, Ω_2, α; n_iters = 1000, amin = -50, amax = 50)



10-element Array{Float64,1}:
 -11.80339887498949
  -3.480788262951598
  -8.843804518544633
 -10.204497666444768
  -9.538412408080816
  11.80339887498948
  -1.9630259976190572
  -6.404009175325025
  -5.862050004054816
  -2.755809126649543

In [63]:
@time Z = HypergraphModularity.HyperLouvain(H, kmax, Ω_2; α=α, verbose = true);
println(length(unique(Z)))

Louvain Iteration 1
Louvain Iteration 2
Louvain Iteration 3
Louvain Iteration 4
  3.657383 seconds (52.69 M allocations: 2.030 GiB, 10.10% gc time)
9


## All-or-nothing

In [64]:
α = zeros(2*kmax);

In [72]:
α= learnParameters(H, Z_dyadic, Ω_3, α; n_iters = 1000, amin = -50, amax = 50)



10-element Array{Float64,1}:
 -11.80339887498949
   2.0525828740491034
   6.288909268980722
   8.348388042841721
   8.47997993783242
  11.80339887498948
   4.268014184364058
  10.636084335307212
  15.025848697532082
  17.565827200431848

In [73]:
Z = HypergraphModularity.HyperLouvain(H, kmax, Ω_3; α=α, verbose = true);
println(length(unique(Z)))

Louvain Iteration 1
Louvain Iteration 2
Louvain Iteration 3
Louvain Iteration 4
7
