In [1]:
using StatsBase
using Combinatorics

In [2]:
Z = [1 1 2 2 3 3 4 4 4 5 5] # group partition
D = [3 4 2 5 6 4 3 2 5 2 2] # degree sequence

1×11 Array{Int64,2}:
 3  4  2  5  6  4  3  2  5  2  2

First thing we'll do: let's check the formula

$$\sum_{R \in [n]^\ell} \mathbb{I}(\#(\mathbf{z}_R) = p) \sigma(\theta_R) = \sum_{y: \#y = p}\prod_{a = 1}^{\ell}v_{y_a}\;,$$

In this formula, we're allowing R to range over $[n]^\ell$, where $\ell$ is the number of nodes per hyperedge, and $n$ the number of nodes.  This is **different** from the current version of the nodes, and matches the conversation we had over email. 



In [3]:
# Naive evaluation of the sum (LHS)

function evalSum(p, Z, D)
    n = length(Z)
    ℓ = sum(p)

    S = 0

    T = Iterators.product((1:n for i = 1:ℓ)...)

    for R in T
        a = countmap(vec(Z[collect(R)]))
        a = -sort(-collect(values(a)))
        if a == p
            S += prod(D[collect(R)])
        end
    end
    return(S)
end

# Faster evaluation (RHS)

function evalSum2(p, Z, D)
    n = length(Z)
    ℓ = sum(p)
    S = 0

    V = [sum([(Z[i] == s)*D[i] for i = 1:n]) for s = 1:maximum(Z)] # vector of volumes

    P = Iterators.product((1:maximum(Z) for i = 1:ℓ)...)
    for p_ in P
        a = countmap(vec(collect(p_)))
        a = -sort(-collect(values(a)))
        if a == p
          S += prod(V[collect(p_)])
        end
    end

    return(S)
end

evalSum2 (generic function with 1 method)

In [17]:
# test: check that these two functions give the same result on all partitions

ℓ = 4

for i = 1:ℓ
    for j = 1:i
        for p in partitions(i,j)
            println(evalSum(p, Z, D) == evalSum2(p, Z, D))
        end
    end
end

true
true
true
true
true
true
true
true
true
true
true


Ok, looks good! `evalSum2` is a lot faster than `evalSum`, although they are both very slow. Presumably this is due to inefficient coding on my part. 

In [54]:
p = [2, 1, 1]
@time evalSum(p, Z, D)
@time evalSum2(p, Z, D)

  0.770096 seconds (343.12 k allocations: 1.026 GiB, 22.87% gc time)
  0.029999 seconds (13.80 k allocations: 44.872 MiB, 24.96% gc time)


1175616

Next thing we're going to do is try using a small lemma from the notes to efficiently evaluate all of the sums recursively. Somehow I imagine that this must be related to Austin's dynamic programming approach. 

In [55]:
function correctOvercounting(M, p)
    """
    Utility function: second term in the recurrence in the notes
    """
        pk = p[end]
        S = 0
        for i = 1:length(p)-1
            p_ = copy(p)[1:(end-1)]
            p_[i] += pk
            S += M[-sort(-p_)]
        end
        return(S)
    end

function evalSums(Z, D, ℓ)
    """
    Z: an Array of integer group labels
    D: an Array of degrees
    ℓ: the largest hyperedge size to compute
    """
    
    V = [sum([D[i]*(Z[i] == j) for i in 1:length(Z)]) for j in 1:maximum(Z)]
    μ = [sum(V.^i) for i = 1:ℓ]

    M = Dict()

    for i = 1:ℓ
        for j = 1:i # number of nonzero entries
            for p in partitions(i, j)
                pk = p[j]
                M[p] = μ[p[end]]*get(M, p[1:(end-1)], 1) - correctOvercounting(M,p)
            end
        end
    end
    N = Dict()
    for p in keys(M)
        factor = 1
        counts = values(countmap(p))
        for c in counts
            factor *= factorial(c) 
        end

        N[p] = M[p] * multinomial(p...) ÷ factor
    end
    return(N)
end

evalSums (generic function with 2 methods)

This algorithm is MUCH faster: 

In [62]:
# need to run this block twice in order to avoid timing compile time
ℓ = 5
@time M = evalSums(Z, D, ℓ)

  0.000728 seconds (538 allocations: 764.250 KiB)


Dict{Any,Any} with 18 entries:
  [2, 2, 1]       => 23050800
  [1, 1, 1, 1]    => 346080
  [3, 2]          => 6288620
  [2, 2]          => 220614
  [3]             => 2750
  [1, 1]          => 1130
  [1, 1, 1]       => 24576
  [2]             => 314
  [2, 1, 1]       => 1175616
  [2, 1]          => 27546
  [4, 1]          => 3587830
  [4]             => 25058
  [1]             => 38
  [5]             => 234638
  [2, 1, 1, 1]    => 26997600
  [3, 1, 1]       => 16723680
  [1, 1, 1, 1, 1] => 2352000
  [3, 1]          => 317768

For comparison, let's implement `evalSums2` by just running `evalSum2` for each value of `p`: 

In [65]:
# comparison to using evalSum2 from above
function evalSums2(Z, D, ℓ)
    N = Dict()
    for i = 1:ℓ
        for j = 1:i # number of nonzero entries
            for p in partitions(i, j)
                N[p] = evalSum2(p, Z, D)
            end
        end
    end
    return(N)
end

evalSums2 (generic function with 1 method)

In [None]:
@time N = evalSums2(Z, D, ℓ)

Ok, so the main thing that these timings tell me us that I wrote `evalSum2` really inefficiently, but still seems promising....