-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IncrementalIndex Tests and Benchmarks Parametrization #10593
Conversation
4a93dd0
to
288ee9e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank Liran for this PR.
It is generally a very big PR.
The text description helps a lot in reviewing the code.
However, there are some classes with very big changes, and it is not always easy to track back the reason for these changes, see my comments below.
Adding (many) lines of documentation within the code can greatly help the next reviewers read and approve the code.
@@ -227,10 +228,10 @@ public void tearDown() throws IOException | |||
|
|||
private IncrementalIndex makeIncIndex() | |||
{ | |||
return new IncrementalIndex.Builder() | |||
return new OnheapIncrementalIndex.Builder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this method take a parameter to decide which type of index to return?
or is this the default builder?
then maybe buildDefaultIncIndex and the default should be some hard coded value that can be changed over time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. But for the sake of reducing the diff size, I'd prefer to avoid this refactor.
{ | ||
FileUtils.deleteDirectory(tmpDir); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the diff here is very misleading - this line is part of a one line method tearDown that was deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that it was not deleted. It just moved below to the teardown of the QueryableIndexState
: qIndexesDir.delete();
public static class IncrementalIndexState | ||
{ | ||
@Param({"onheap", "offheap"}) | ||
private String indexType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since now there is a new extension point for incremental index, shouldn't the type be extendable as well?
use enum instead of string and names like defaultOnHeap and OakOffHeap so additional on/off-heap implementations can be added in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea here is indeed to allow a future index to be tested with the same code.
Using Enum
will force this enumeration to list all existing index types in the core Druid package, albeit the index may only exist as an extension.
This way (using string), the user can choose any indexType
name in the command line without it having to be pre-defined in the code.
@TearDown | ||
public void tearDown() | ||
{ | ||
qIndex.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no option for qIndex to be null? e.g., if indexFile is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. But if setup fails, this might be an issue. Added a null check just to be safe.
public void setup2() | ||
{ | ||
incIndex = makeIncIndex(); | ||
incFloatIndex = makeIncIndex(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does adding all rows into one index equivalent to having 3 indices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice that the setup level was changed to per invocation, so for each benchmark, a new index is created.
There wasn't really a need for three different indices in the first place.
{ | ||
log.info("SETUP CALLED AT " + +System.currentTimeMillis()); | ||
|
||
ComplexMetrics.registerSerde("hyperUnique", new HyperUniquesSerde()); | ||
|
||
executorService = Execs.multiThreaded(numProcessingThreads, "GroupByThreadPool[%d]"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from this point onward a bit hard to follow the reasoning for the changes? what part of the PR description does this relate to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are due to the scoping of the benchmark. This setup method now only in charge of initializing everything common for benchmarking both the incremental-index and the queriable-index.
Anything specific to the incremental or queriable index was moved to its designated scope below.
bufferIndex = indexAndOffset[0]; | ||
bufferOffset = indexAndOffset[1]; | ||
aggBuffer = aggBuffers.get(bufferIndex).get(); | ||
ByteBuffer aggBuffer = aggBuffers.get(indexAndOffset[0]).get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the reasoning for these changes? add documentation to explain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this change, the code that responsible for the aggregation ran after a new row was inserted to indexAndOffsets
(see line 209 below). This means that the new row was visible before any data was aggregated to it.
This does not correspond with the on-heap index behavior, which first aggregates the data, then inserts the row to the index.
According to IncrementalIndexIngestionTest.testMultithreadAddFacts()
, the on-heap behavior is the correct one, so I changed it accordingly so the test will pass for this index as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I feel that this fix to OffheapIncrementalIndex can be independent in its own separate issue and PR since it is a bug and would make it easier for tracking in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this bug fix will be in a separate PR, then this PR will have a failing test.
@@ -321,10 +322,10 @@ private static QueryableIndex buildIndex(String storeDoubleAsFloat) throws IOExc | |||
) | |||
.build(); | |||
|
|||
final IncrementalIndex index = new IncrementalIndex.Builder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from here on forward same 5 lines changes repeat for different tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true for tests that do not test the incremental-index implementation. For such tests, we use the on-heap implementation because it is the most stable.
Tests that test the index itself, are now parametrized so they have more modifications other than these 5 lines.
@@ -268,7 +217,7 @@ private static MapBasedInputRow getLongRow(long timestamp, int dimensionCount) | |||
public void testCaseSensitivity() throws Exception | |||
{ | |||
long timestamp = System.currentTimeMillis(); | |||
IncrementalIndex index = closerRule.closeLater(indexCreator.createIndex(DEFAULT_AGGREGATOR_FACTORIES)); | |||
IncrementalIndex<?> index = indexCreator.createIndex((Object) DEFAULT_AGGREGATOR_FACTORIES); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all changes from here are due to the generic type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. To allow testing any new incremental index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the questions and issues
LGTM
import java.nio.ByteBuffer; | ||
|
||
/** | ||
* Since the off-heap incremental index is not yet supported in production ingestion, we define its spec here only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a more general documentation of the role of this class for the time it is supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
6a05e86
to
559a9e8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in this PR would help people evaluate the oak extension.
@@ -268,7 +217,7 @@ private static MapBasedInputRow getLongRow(long timestamp, int dimensionCount) | |||
public void testCaseSensitivity() throws Exception | |||
{ | |||
long timestamp = System.currentTimeMillis(); | |||
IncrementalIndex index = closerRule.closeLater(indexCreator.createIndex(DEFAULT_AGGREGATOR_FACTORIES)); | |||
IncrementalIndex<?> index = indexCreator.createIndex((Object) DEFAULT_AGGREGATOR_FACTORIES); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the questions and issues
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. I've left some comments. If you have the results from your local run of the modified benchmarks, could you please post them here?
} | ||
|
||
/** | ||
* Add rows form any generator to an index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: form -> from
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
public static AppendableIndexSpec parseIndexType(String indexType) throws JsonProcessingException | ||
{ | ||
return JSON_MAPPER.readValue( | ||
String.format(Locale.ENGLISH, "{\"type\": \"%s\"}", indexType), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use StringUtils.format here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
processing/src/test/java/org/apache/druid/segment/incremental/IncrementalIndexCreator.java
Show resolved
Hide resolved
processing/src/test/java/org/apache/druid/segment/incremental/IncrementalIndexCreator.java
Show resolved
Hide resolved
* @param c a list of collections of parameters | ||
* @return the cartesian product of all parameters | ||
*/ | ||
public static List<Object[]> cartesianProduct(Collection<?>... c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this method need public visibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Changed to private
.
bufferIndex = indexAndOffset[0]; | ||
bufferOffset = indexAndOffset[1]; | ||
aggBuffer = aggBuffers.get(bufferIndex).get(); | ||
ByteBuffer aggBuffer = aggBuffers.get(indexAndOffset[0]).get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I feel that this fix to OffheapIncrementalIndex can be independent in its own separate issue and PR since it is a bug and would make it easier for tracking in the future.
@@ -110,28 +119,28 @@ public void setup() throws IOException | |||
); | |||
|
|||
incIndex = makeIncIndex(); | |||
gen.addToIndex(incIndex, rowsPerSegment); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see there are other usages of gen.nextRow() that haven't been replaced. Is the plan to replace them in an follow up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't planned, but I don't mind creating a follow-up PR for that or replacing everything in this PR.
What do you think is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing it in a follow-up PR sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I'll publish a new PR for this once this PR is merged.
benchmarks/src/test/java/org/apache/druid/benchmark/indexing/IncrementalIndexReadBenchmark.java
Show resolved
Hide resolved
2f2bbc3
to
3a4d039
Compare
@a2l007 Thanks for your review. I made modifications accordingly. I have results for the benchmarks I changed, but they are not with the default parameters. |
a7f195c
to
7f6752f
Compare
@a2l007 The benchmark run results are available here: https://pastebin.pl/view/raw/8a5559c7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM apart from one minor comment.
Also I'm curious if you've noticed any improvement in the benchmark scores even though the benchmark related changes are mainly refactoring and parametrization?
@@ -110,28 +119,28 @@ public void setup() throws IOException | |||
); | |||
|
|||
incIndex = makeIncIndex(); | |||
gen.addToIndex(incIndex, rowsPerSegment); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing it in a follow-up PR sounds good to me.
@Param({"none", "moderate", "high"}) | ||
private String rollupOpportunity; | ||
@Param({"0", "1000", "10000"}) | ||
private int rollupOpportunity; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed this earlier, but I feel that we should retain the textual values for rollupOpportunity as that is more user-friendly when reading the benchmark results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I rolled back this change.
129abee
to
334dbdd
Compare
Thanks, @a2l007. Please let me know what other modifications are required to approve this PR. Regarding your question:
First, it is much faster to run the benchmarks due to less redundant setup/teardown procedure calls (achieved via scopes). The exceptions are as follows:
|
I feel that this should be called out in the release notes so that users running benchmarks are aware of this change before upgrading. Could you please add a short description in the PR description mentioning the changes a user should expect running benchmarks before upgrading to this version. This would help the release manager add the blurb to the release notes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
@a2l007 I added a "User Experience Changes" section in the PR description. |
- Reveal and fix a bug in OffheapIncrementalIndex
…g#format(java.lang.String,java.lang.Object[]) [Uses default locale]
334dbdd
to
7527dde
Compare
Can we proceed to merge this PR? |
I haven't studied the patch, but the description makes sense to me (thanks for the detailed description), @a2l007 has reviewed it, and it seems pretty low risk (test/benchmark only) so I would say yes. Thanks for the contribution! |
Thanks! |
* Remove redundant IncrementalIndex.Builder * Parametrize incremental index tests and benchmarks - Reveal and fix a bug in OffheapIncrementalIndex * Fix forbiddenapis error: Forbidden method invocation: java.lang.String#format(java.lang.String,java.lang.Object[]) [Uses default locale] * Fix Intellij errors: declared exception is never thrown * Add documentation and validate before closing objects on tearDown. * Add documentation to OffheapIncrementalIndexTestSpec * Doc corrections and minor changes. * Add logging for generated rows. * Refactor new tests/benchmarks. * Improve IncrementalIndexCreator documentation * Add required tests for DataGenerator * Revert "rollupOpportunity" to be a string
Fixes #10494.
Description
Note: This PR only affects tests and benchmarks.
It would help developers evaluate incremental-index extensions, such as oak-incremental-index (#10001).
#10335 added a per incremental-index builder, but the parent class builder (
IncrementalIndex.Builder
) was not removed to avoid 100+ line changes in the test code.This PR removes
IncrementalIndex.Builder
and refactor all its usage (only in tests the test/benchmarks code).In addition, where needed, a parametrization was added so it will test/benchmark both builder implementations (on-heap and off-heap).
Add test cases for each index type
All tests that are relevant to the incremental index were modified. The modifications include the parametrization of the tests for all incremental-index implementations: on-heap and off-heap. In addition, this PR includes a bug fix in
OffheapIncrementalIndex
that was found using these tests.To support this, a new helper class was added:
IncrementalIndexCreator
.This class handle creating the appropriate index according to its name and closing it at the end of each test.
Add benchmark cases for each index type
All the benchmarks that are relevant to the incremental index were added an incremental-index parametrization: on-heap or off-heap.
In addition, some of these benchmarks were modified to resolve some issues that were encountered.
We list here the additional modifications we made to some of the benchmarks.
tearDown()
proceduretearDown()
proceduresetup()
/tearDown()
methods so they would not affect the measurements of the results@State(Scope.Benchmark)
) that allow us to test the incremental index without the overhead of the setup procedure of the queryable index benchmarkIn addition, to reduce code duplications, a few methods were added to
DataGenerator
:void addToIndex(IncrementalIndex<?> index, int numOfRows)
: adds rows from this generator to an existing indexList<InputRow> toList(int numOfRows)
: adds rows from this generator to a new listUser Experience Changes
After this PR, the user should not expect changes in most benchmarks results.
However, some benchmarks behavior will change as follows:
Runtime
Expected change: the following benchmarks will run much faster due to eliminating redundancy setup/teardown procedure calls.
However, the benchmarks reported results should not change.
FilteredAggregatorBenchmark
,GroupByBenchmark
,ScanBenchmark
,SearchBenchmark
,TimeseriesBenchmark
,TopNBenchmark
Parametrization
Expected change: the following benchmarks will have additional parametrization options, hence they might take longer to run and produce more results.
indexType
parametrization (will also test the off-heap implementation):FilteredAggregatorBenchmark
,IncrementalIndexRowTypeBenchmark
,IncrementalIndexReadBenchmark
,IndexIngestionBenchmark
,IndexPersistBenchmark
,GroupByBenchmark
,ScanBenchmark
,SearchBenchmark
,TimeseriesBenchmark
,TopNBenchmark
descending
query parametrization:FilteredAggregatorBenchmark
,TimeseriesBenchmark
rollupOpportunity
ingestion parametrization:IndexIngestionBenchmark
Unified Benchmarks Behaviour
Expected change: these changes affect some of the benchmarks' reported results as follows:
rowsPerSegment
parametrization was added toIncrementalIndexRowTypeBenchmark
. Before the number of rows was not parametrized and it reported the time per single row insertion. Now it reports the total insertion time of all the rows, like the rest of the tests report.ScanBenchmark
,SearchBenchmark
,GroupByBenchmark
: now using a fixed seed, so the results are reproducible but might be slightly different than what was before with the random seed.IndexPersistBenchmark
: this benchmark previously cleaned the temporary data folder inside the tested method, instead of in the teardown procedure. For large index sizes, it affected the benchmark result significantly. With this modification, the results will be different (shorter times), but it will better reflect the "persist" performance.This PR has: