Druid 0.13 ~ 0.18 version roaringbitmap benchmark becomes slow

### Affected Version  
Druid 0.13.0 ~ Druid 0.18.0

Pull down druid0.13 and druid0.18 version from the community. After compiling, run their respective benchmarks (The command is java -jar benchmarks.jar BitmapIterationBenchmark -p bitmapAlgo = "roaring") on roaringbitmap. It is found that the druid0.13 version can be run within ten minutes. The druid0.18 version requires More than an hour, and the results ran out not as expected. There are two cases of Druid 0.18 version which is much slower than Druid 0.13 version.

The roaring bitmap used by the druid 0.13 version is version 0.5.18, and the roaring bitmap used by the druid 0.18 version is version 0.8.11.
All data below comes from mac, 6 CPUs, 16GB memory.
I have also tried it on a server with 48 CPUs and 250G memory, and the results are similar, indicating that the test results are not related to large machines

Benchmark test results of druid 0.13 version:
![image](https://user-images.githubusercontent.com/64965834/82751063-40e55900-9de7-11ea-84bd-a3799e8fe0a1.png)
Benchmark test results of druid 0.18 version:
![image](https://user-images.githubusercontent.com/64965834/82799217-45277a00-9eac-11ea-944f-952c2575da8a.png)

![image](https://user-images.githubusercontent.com/64965834/82817382-7c5b5280-9ecf-11ea-9a07-aa7abc926d2f.png)

The above benchmarks have been tested many times and the results are similar.

Below is my troubleshooting process：
**Phenomenon 1: The total time of the benchmark becomes longer**
Observed that the execution of the benchmark from 0.15 to 0.16 slowed down (not sure why), running bitmapAlgo = "roaring" -pn = 100 -p prob = 0.1 alone, the benchmark time increased from 40s (0.15) to 6min (0.16)
![image](https://user-images.githubusercontent.com/64965834/82801003-3098b100-9eaf-11ea-8e70-5dca9a072750.png)
**Phenomenon 2: Average time of two specific cases becomes larger**
In the two intersectionAndIter cases (n = 100, prob = 0.1) (n = 100, prob = 0.5) of 0.13 ～ 0.14, the sorce increased obviously, from 300w ～ 500w (0.13) to 100 ～ 200 million (0.14)
case 1(n=100.prob=0.1)
![image](https://user-images.githubusercontent.com/64965834/82802132-f9c39a80-9eb0-11ea-9870-3ec90caa49a0.png)

case 2(n=100.prob=0.5)
![image](https://user-images.githubusercontent.com/64965834/82802093-ea445180-9eb0-11ea-8098-8d1beb6f6e93.png)

Although the phenomenon 1 will slow down the benchmark execution, I don't think it will affect the performance. I think what really affects the performance is the sorce value of the two cases in the phenomenon 2, so I will try to analyze the reason why the sorce value of the two cases rises.

**From the test data analysis phenomenon 2 reasons for the rise of Sorce:**
Based on the Druid 0.13 version code, I replaced the roaring bitmap version, and the test results obtained are recorded as follows:
![image](https://user-images.githubusercontent.com/64965834/82816795-600ae600-9ece-11ea-9088-fd37d5cc3930.png)

![image](https://user-images.githubusercontent.com/64965834/82816455-c0e5ee80-9ecd-11ea-9052-21487b2bc856.png)

Observed from the data in the table, after the code change of # 6764 in the community is completed, the sorce of the above two scenes rises significantly.
**Analyze the reason for the rise of Sorce from the flame diagram:**
Modified the benchmark code to only run the two scenes that are slower in the above figure (prob = 0.1, 0.5), run multiple times in the Druid0.13 and Druid0.14 and grab the flame chart as follows (because the fork of the benchmark is set to 1 , So every time a case is run, a child process is forkd out to run, so the flame graph not only captures the main process, but also captures the child process of the specific scene):
Druid0.13 benchmark main process
![image](https://user-images.githubusercontent.com/64965834/82813307-5e89ef80-9ec7-11ea-90fc-3029be9ee8b7.png)
Druid0.13 child process: intersectionAndIter when prob = 0.1
![image](https://user-images.githubusercontent.com/64965834/82813811-6ac27c80-9ec8-11ea-9709-04c16b8b722d.png)
Druid0.13 child process: intersectionAndIter when prob = 0.5
![image](https://user-images.githubusercontent.com/64965834/82813917-a5c4b000-9ec8-11ea-9a21-0db0361d5e3b.png)
Druid0.14 benchmark main process
![image](https://user-images.githubusercontent.com/64965834/82814015-d99fd580-9ec8-11ea-9f47-bda89d204a12.png)
Druid0.14 child process: intersectionAndIter when prob = 0.1
![image](https://user-images.githubusercontent.com/64965834/82814071-f89e6780-9ec8-11ea-8dff-35c99efe6359.png)
Druid0.14 child process: intersectionAndIter when prob = 0.5
![image](https://user-images.githubusercontent.com/64965834/82814121-0f44be80-9ec9-11ea-9e7d-baf821f1da52.png)
Druid0.18 child process: intersectionAndIter when prob = 0.1
![image](https://user-images.githubusercontent.com/64965834/82814178-2edbe700-9ec9-11ea-9512-821a50deb4e3.png)
Druid0.18 child process: intersectionAndIter when prob = 0.5
![image](https://user-images.githubusercontent.com/64965834/82814209-3e5b3000-9ec9-11ea-9e5f-162f5ffc3372.png)

The above flame chart has been grabbed multiple times, one of which was randomly taken
From the above flame chart, it is observed that different versions of Druid use different Containers to take intersections in roaring bitmap. Druid version 0.13 uses ArrayContainer or bitmapContainer, but Druid 0.14 and later versions use RunContainer (because of the grabbed Both 0.14 and 0.18 flame charts use RunContainer, so it is speculated that the versions from 0.14 to 0.18 are all used RunContainer)
I think that RunContainer's compression efficiency under this benchmark is not good, because looking at the construction code of fake data, I found that the distribution of fake data is very sparse and the values ​​are not continuous.

Code to construct fake data：
![image](https://user-images.githubusercontent.com/64965834/82814978-ca218c00-9eca-11ea-9ffb-7ae5d5c4a3e8.png)
![image](https://user-images.githubusercontent.com/64965834/82815167-33090400-9ecb-11ea-9158-81c55485bf85.png)

Observing the setup code, it is found that the initialized bitmap is very sparse and the values ​​are not continuous. In this scenario, I think RunContainer does not have the high compression efficiency of BitmapContainer and ArrayContainer, which may cause some performance problems.


**Conclusion: The code change of # 6764 in the community may be related to the use of RunContainer when the Roaring bitmap benchmark is constructed. It is necessary to further observe the impact of the # 6764 code to analyze,I hope everyone can help to see some of these problems, the above is just my personal analysis**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Druid 0.13 ~ 0.18 version roaringbitmap benchmark becomes slow #9920

Affected Version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Druid 0.13 ~ 0.18 version roaringbitmap benchmark becomes slow #9920

Description

Affected Version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions