Skip to content

Comments

Improve dictionary#indexOf routine with sparse index#12294

Closed
hqx871 wants to merge 1 commit intoapache:masterfrom
hqx871:hqx871/dict-sparse-index
Closed

Improve dictionary#indexOf routine with sparse index#12294
hqx871 wants to merge 1 commit intoapache:masterfrom
hqx871:hqx871/dict-sparse-index

Conversation

@hqx871
Copy link
Contributor

@hqx871 hqx871 commented Mar 2, 2022

Use sparse index to improve GenericIndexed#indexOf performance, especially for large dictionary.

Description

The sorted GenericalIndexed#indexOf use binary search to get index of one object value. This routine will call ObjectStrategy#fromByteBuffer to decode value from the byteBuffer, which is expensive. I suggest build a sparse index for large dictionary when load it. this can get 40+% performance improvement. I made a benchmark here.

Benchmark (cardinality) (indexGranularity) (searchNum) (sparseType) (strLen) Mode Cnt Score Error Units
DictionaryWrapperBenchmark.indexOf 1000000 0 10000 array 16 avgt 10 23.760 ±2.310 ms/op
DictionaryWrapperBenchmark.indexOf 1000000 512 10000 array 16 avgt 10 17.015 ±4.904 ms/op
DictionaryWrapperBenchmark.indexOf 1000000 1024 10000 array 16 avgt 10 14.533 ±3.370 ms/op
DictionaryWrapperBenchmark.indexOf 1000000 2048 10000 array 16 avgt 10 13.390 ±0.748 ms/op
DictionaryWrapperBenchmark.indexOf 1000000 4096 10000 array 16 avgt 10 14.672 ±1.223 ms/op
DictionaryWrapperBenchmark.indexOf 1000000 8192 10000 array 16 avgt 10 15.090 ±0.721 ms/op

Key changed/added classes in this PR
  • SparseArrayIndexed
  • GenericIndexed
  • DictionaryWrapStrategy

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@hqx871 hqx871 changed the title decorate large dictionary with sparse index Improve dictionary#indexOf routine with sparse index Mar 2, 2022
@hqx871 hqx871 force-pushed the hqx871/dict-sparse-index branch 7 times, most recently from 31d3eb6 to 011368e Compare March 3, 2022 03:54
@hqx871 hqx871 force-pushed the hqx871/dict-sparse-index branch from 011368e to 48c0389 Compare March 3, 2022 05:38
@hqx871 hqx871 closed this Jun 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants