Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unusual behavior for HLL with register width 6 #10

Closed
Akninirith opened this issue Jul 15, 2014 · 5 comments
Closed

Unusual behavior for HLL with register width 6 #10

Akninirith opened this issue Jul 15, 2014 · 5 comments

Comments

@Akninirith
Copy link

There seems to be some kind of a bug with HLL execution; when one instantiates an HLL as having a regwidth of 6, the cardinality returned is consistently 0 at large sample sizes. The following test was constructed and run in FullHLLTest. Could this be something to do with cutoff measurements? Thanks!

/**
 * Test case for 6-bit cutoff problems
 */
@Test
public void sixBitSixteenKRegisterCutoffBugTest() {
    Long temp = System.currentTimeMillis();

    // this does NOT work
    final HLL brokenHll4 = new HLL(13, 6);
    {
        Random rand = new Random(temp);
        for(int i=0; i<16384*16384; i++) {
            long random = rand.nextLong();
            brokenHll4.addRaw(random);
        }

        final long cardinality = brokenHll4.cardinality();

        assertTrue(cardinality > 0); // this does NOT work
    }

    // this DOES work
    final HLL brokenHll5 = new HLL(13, 5);
    {
        Random rand = new Random(temp);
        for(int i=0; i<16384*16384; i++) {
            long random = rand.nextLong();
            brokenHll5.addRaw(random);
        }

        final long cardinality = brokenHll5.cardinality();

        assertTrue(cardinality > 0); // this DOES work
    }
}
@blinsay
Copy link
Contributor

blinsay commented Jul 16, 2014

Thanks for finding this!

It looks like HLLUtil#largeEstimator is returning NaN, which then turns into 0 when it gets cast to a long. The argument going into largeEstimator seems sane at first glance, but I'll have to take a deeper look later this week.

@blinsay
Copy link
Contributor

blinsay commented Jul 16, 2014

It doesn't look like this was a regression introduced in any of the recent changes to the estimator code. I checked out de5cc2d and it also has the same issue.

@ghost
Copy link

ghost commented Jul 16, 2014

We hit a long overflow because computing 2L for L > 63 isn't possible with Java's 63 bit signed integer. To see this, look here and note that for regWidth = 6, log2m = 13 we have:

(((1 << regWidth) - 1 - 1) + log2m) = 64 - 2 + 13 = 75

and 275 is greater than Long.MAX_VALUE.

I believe we can safely move TWO_TO_L to be a double[] since all calculations that use it end up converting it to a double anyway.

@ghost ghost closed this as completed in 3e596f8 Jul 16, 2014
@ghost
Copy link

ghost commented Jul 16, 2014

I'll push a new jar to Sonnatype in a bit.

@ghost
Copy link

ghost commented Jul 28, 2014

Sorry for the delay, but it should be in the central repo within a few hours.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants