Skip to content

Druid 0.13 ~ 0.18 version roaringbitmap benchmark becomes slow #9920

@yinghao-wang

Description

@yinghao-wang

Affected Version

Druid 0.13.0 ~ Druid 0.18.0

Pull down druid0.13 and druid0.18 version from the community. After compiling, run their respective benchmarks (The command is java -jar benchmarks.jar BitmapIterationBenchmark -p bitmapAlgo = "roaring") on roaringbitmap. It is found that the druid0.13 version can be run within ten minutes. The druid0.18 version requires More than an hour, and the results ran out not as expected. There are two cases of Druid 0.18 version which is much slower than Druid 0.13 version.

The roaring bitmap used by the druid 0.13 version is version 0.5.18, and the roaring bitmap used by the druid 0.18 version is version 0.8.11.
All data below comes from mac, 6 CPUs, 16GB memory.
I have also tried it on a server with 48 CPUs and 250G memory, and the results are similar, indicating that the test results are not related to large machines

Benchmark test results of druid 0.13 version:
image
Benchmark test results of druid 0.18 version:
image

image

The above benchmarks have been tested many times and the results are similar.

Below is my troubleshooting process:
Phenomenon 1: The total time of the benchmark becomes longer
Observed that the execution of the benchmark from 0.15 to 0.16 slowed down (not sure why), running bitmapAlgo = "roaring" -pn = 100 -p prob = 0.1 alone, the benchmark time increased from 40s (0.15) to 6min (0.16)
image
Phenomenon 2: Average time of two specific cases becomes larger
In the two intersectionAndIter cases (n = 100, prob = 0.1) (n = 100, prob = 0.5) of 0.13 ~ 0.14, the sorce increased obviously, from 300w ~ 500w (0.13) to 100 ~ 200 million (0.14)
case 1(n=100.prob=0.1)
image

case 2(n=100.prob=0.5)
image

Although the phenomenon 1 will slow down the benchmark execution, I don't think it will affect the performance. I think what really affects the performance is the sorce value of the two cases in the phenomenon 2, so I will try to analyze the reason why the sorce value of the two cases rises.

From the test data analysis phenomenon 2 reasons for the rise of Sorce:
Based on the Druid 0.13 version code, I replaced the roaring bitmap version, and the test results obtained are recorded as follows:
image

image

Observed from the data in the table, after the code change of # 6764 in the community is completed, the sorce of the above two scenes rises significantly.
Analyze the reason for the rise of Sorce from the flame diagram:
Modified the benchmark code to only run the two scenes that are slower in the above figure (prob = 0.1, 0.5), run multiple times in the Druid0.13 and Druid0.14 and grab the flame chart as follows (because the fork of the benchmark is set to 1 , So every time a case is run, a child process is forkd out to run, so the flame graph not only captures the main process, but also captures the child process of the specific scene):
Druid0.13 benchmark main process
image
Druid0.13 child process: intersectionAndIter when prob = 0.1
image
Druid0.13 child process: intersectionAndIter when prob = 0.5
image
Druid0.14 benchmark main process
image
Druid0.14 child process: intersectionAndIter when prob = 0.1
image
Druid0.14 child process: intersectionAndIter when prob = 0.5
image
Druid0.18 child process: intersectionAndIter when prob = 0.1
image
Druid0.18 child process: intersectionAndIter when prob = 0.5
image

The above flame chart has been grabbed multiple times, one of which was randomly taken
From the above flame chart, it is observed that different versions of Druid use different Containers to take intersections in roaring bitmap. Druid version 0.13 uses ArrayContainer or bitmapContainer, but Druid 0.14 and later versions use RunContainer (because of the grabbed Both 0.14 and 0.18 flame charts use RunContainer, so it is speculated that the versions from 0.14 to 0.18 are all used RunContainer)
I think that RunContainer's compression efficiency under this benchmark is not good, because looking at the construction code of fake data, I found that the distribution of fake data is very sparse and the values ​​are not continuous.

Code to construct fake data:
image
image

Observing the setup code, it is found that the initialized bitmap is very sparse and the values ​​are not continuous. In this scenario, I think RunContainer does not have the high compression efficiency of BitmapContainer and ArrayContainer, which may cause some performance problems.

Conclusion: The code change of # 6764 in the community may be related to the use of RunContainer when the Roaring bitmap benchmark is constructed. It is necessary to further observe the impact of the # 6764 code to analyze,I hope everyone can help to see some of these problems, the above is just my personal analysis

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions