This notebook showcases the general improvement performance of the flow algorithm. We will be considering graphs
generated by the chimera function with one million vertices. The maximum allowed size for a component given by 
localImprove will be 10,000. We will also be considering only cases where the set generated by prn has at least 30 vertices, in order to give consistency to our results.

In [1]:
using Laplacians

In [2]:
function condTest(minSize)
    a = chimera(1000000)
    s = prn(a, [1,2,3], 0.5, 5);
    conds = compConductance(a, s)
    if length(s) < minSize
        return -1,0,0,0
    end
    
    minEpsSigma = getVolume(a, s) / getVolume(a, setdiff(collect(1:max(a.n, a.m)), s))
    cut, flow = localImprove(a, s, epsSigma = minEpsSigma, maxSize = 10000)
    condcut = compConductance(a, cut)
    impr_cut = (conds - condcut) / conds * 100
    
    heur_refcut = refineCut(a, s)
    condref = compConductance(a, heur_refcut)
    impr_ref = (conds - condref) / conds * 100
    
    heur_dumb = dumb(a, s)
    conddumb = compConductance(a, heur_dumb)
    impr_dumb = (conds - conddumb) / conds * 100
    
    return conds, (condcut, impr_cut), (condref, impr_ref), (conddumb, impr_dumb)
end

initial = []
flowbased = []
heurbased = []
dumbbased = []

print("Progress: ")
@time for i in 1:500
    x,y,z,t = condTest(30)
    if x == -1
        continue
    end
    
    print("*")
    
    push!(initial, x)
    push!(flowbased, y)
    push!(heurbased, z)
    push!(dumbbased, t)
end
print("\n")

Progress: *********************************************************************************************************************************************************************8723.849714 seconds (19.99 G allocations: 1.422 TB, 7.34% gc time)



Below are the mean and median values for conductance given in order by prn (initial clustering), the flow improvement
and the two heuristic improvements. The smaller the conductance the better the result.

In [3]:
println(length(initial), " successful tests.")
println("Initial values (by prn): ", mean(map(x -> x[1], initial)), " ", median(map(x -> x[1], initial)))
println("Flow values: ", mean(map(x -> x[1], flowbased)), " ", median(map(x -> x[1], flowbased)))
println("refineCut values: ", mean(map(x -> x[1], heurbased)), " ", median(map(x -> x[1], heurbased)))
println("dumb values: ", mean(map(x -> x[1], dumbbased)), " ", median(map(x -> x[1], dumbbased)))

165 successful tests.
Initial values (by prn): 0.4965741795337874 0.4980392156862745
Flow values: 0.27282800388945777 0.2496068850723715
refineCut values: 0.4932065677748111 0.46730462519936206
dumb values: 0.4957761124476248 0.48394160583941603


Below are the mean and median improvements given by the flow clustering and the two heuristics in this orer.

In [5]:
println("Flow improvement: ", mean(map(x -> x[2], flowbased)), "% ", median(map(x -> x[2], flowbased)), "%")
println("refineCut improvement: ", mean(map(x -> x[2], heurbased)), "% ", median(map(x -> x[2], heurbased)), "%")
println("dumb improvement: ", mean(map(x -> x[2], dumbbased)), "% ", median(map(x -> x[2], dumbbased)), "%")

Flow improvement: 45.093647201815585% 50.058196890184256%
refineCut improvement: 0.7258661066646424% 6.426742117531586%
dumb improvement: 0.19371680672091846% 3.0093233637958043%
