Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cardinality memory-usage considerations. #5452

Closed
wants to merge 1 commit into from

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Mar 18, 2014

Default precision was computed based on the number of MULTI_BUCKET parents
instead of PER_BUCKET.

The ordinals-based execution mode was almost always used although ordinals
might have non-negligible memory usage compared to the counters.

Default precision was computed based on the number of MULTI_BUCKET parents
instead of PER_BUCKET.

The ordinals-based execution mode was almost always used although ordinals
might have non-negligible memory usage compared to the counters.

Close elastic#5452
final long ordinalsMemoryUsage = OrdinalsCollector.memoryOverhead(maxOrd);
final long countsMemoryUsage = HyperLogLogPlusPlus.memoryUsage(precision);
// only use ordinals if they don't increase memory usage by more than 25%
if (ordinalsMemoryUsage < countsMemoryUsage / 4) {System.out.println("ORDS");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot to remove the sys out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually fixed it before pushing but forgot to commit. :-)

@markharwood
Copy link
Contributor

Nice - tested on my problem dataset/query and the memory use is significantly reduced.

@uboness
Copy link
Contributor

uboness commented Mar 19, 2014

LGTM (I can see the confusion with the MULTI_BUCKET vs PER_BUCKET... I'm open to other names that might be more descriptive... (though not that simple to find those names ;))

@jpountz jpountz closed this in ecdcc2d Mar 20, 2014
jpountz added a commit that referenced this pull request Mar 20, 2014
Default precision was computed based on the number of MULTI_BUCKET parents
instead of PER_BUCKET.

The ordinals-based execution mode was almost always used although ordinals
might have non-negligible memory usage compared to the counters.

Close #5452
@jpountz jpountz deleted the fix/cardinality_memory_usage branch March 20, 2014 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants