You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The adjusted Rand index fails unexpectedly when n is large (n > 100,000). Here is an example with a comparison to an R implementation.
using Random
using Clustering
using RCall
Random.seed!(123);
n = 100_000;
a = rand(1:3,n);
b = rand(1:3,n);
randindex(a,b)[1]
only(R"library(mclust); adjustedRandIndex($a,$b)")
which gives
0.2933142400616828
-1.5731751561282826e-6
In theory the true adjusted Rand index should be close to 0. This starts to happen around n=83,000 for me.
As a Julia comparison, my own implementation of the adjusted Rand index gives the same result as in R:
function ari(a,b)
table = counts(a,b)
acounts = sum(table,dims=1)
bcounts = sum(table,dims=2)
score = sum([x*(x-1)/2 for x in table])
asum = sum([x*(x-1)/2 for x in acounts])
bsum = sum([x*(x-1)/2 for x in bcounts])
expected = asum*bsum/binomial(sum(table),2)
total = (asum + bsum)/2
if total == expected
return 0
else
return (score-expected)/(total-expected)
end
end;
ari(a,b)
-1.5731751561282826e-6
I use Clustering.jl 0.14.2, Julia 1.6.2.
The text was updated successfully, but these errors were encountered:
wildart
added a commit
to wildart/Clustering.jl
that referenced
this issue
Dec 25, 2021
for very large clusterings the agreement/disagreement counts are
very large, so we have to switch to float when multiplying them
fixes#225
enhances #227
The adjusted Rand index fails unexpectedly when n is large (n > 100,000). Here is an example with a comparison to an R implementation.
which gives
In theory the true adjusted Rand index should be close to 0. This starts to happen around
n=83,000
for me.As a Julia comparison, my own implementation of the adjusted Rand index gives the same result as in R:
I use Clustering.jl 0.14.2, Julia 1.6.2.
The text was updated successfully, but these errors were encountered: