You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Writing this down, mostly to sort out my thoughts, but also to share this for some further discussion.
Current state
Execution of queries in CrateDB is row based. The main interface to access rows is BatchIterator, which is usually consumed like this:
while (it.moveNext()) {
// do something with it.currentElement()
}
if (it.allLoaded()) {
// iterator is exhausted
} else {
it.loadNextBatch().whenComplete((r, t) -> {
// continue consumption
}
}
moveNext() and currentElement are used iterate through the rows.
loadNextBatch can be used to avoid blocking (network) IO or to off-load computation to a different thread.
BatchIterator implementations are usually generic in regards to the element type, but most of the time the element type will be Row.
This composes well. There are implementations that wrap a source to apply a limit, filter out records and so on.
The second piece of the puzzle of how execution works is the expression-evaluation, or how value's are retrieved.
Focusing on the Lucene cases, at the root we have LuceneCollectorExpression implementation, which mostly wrap SortedNumericDocValues (or some other instance) in some way or another. This is encapsulated via a Input interface, which has a single value method. This is a source of excessive boxing - the PR #10046 gives an idea of how much.
The Input implementations are put behind a InputRow, so that the consumers of the BatchIterator can access values via row#get
Large intermediate data structures (HashMap for GROUP BY) have significant overhead and are created on the heap. That can lead to long GC pauses.
Future
The PR #10048 gives an idea how much we pay for these abstractions. It shows a 100% performance improvement. And it is likely that the difference increases with a larger data set.
Some things to consider:
Vector based execution may not be better in all cases. Queries that can operate in a lazy fashion and have a small result set may benefit from being executed row based.
We can "bridge" from a block or vector based structure to a row based structure easily via a flatMap operation. Earlier benchmarks indicate that there is no significant cost in doing so. (In fact, the Row / Input / DocValues abstractions already act a bit like that.)
SortedNumericDocValues exposes the values are long and can be converted freely in any of the other numeric primitive types: short, integer, float, double. This can avoid having to implement many type-specialized components.
This means:
We can incrementally switch to vectorized execution by starting at the root - focusing on the more expensive operations (GROUP BY, Aggregations), and plug in an adapter to use existing operator implementations
For some cases we may always end up using row-mode. (Which raises the question if which mode to use should be a planner/optimization decision at some point)
Constraints / Other notes
New execution implementation should have the option to use off-heap
There is no Java native way to do vectorized operations yet. We could wait for JEP 338: Vector API or go with JNI (QuestDB implements vectorization in C++)
The off-heap/on-heap support is messy. There is Unsafe, but we also already have netty's ByteBuf. There is also JEP 370: Foreign-Memory Access API which we could use since we're on JDK 14, but it's incubating. It looks like since there is no "common ground" all projects that built components working with off-heap structures implemented their own abstractions - making it difficult to re-use anything.
We currently allow scalar operations in pretty much any operation, as the expression evaluation framework takes care of it. It may be beneficial to limit the capabilities of each operator and only do scalar evaluations part of a EvalProjection.
Could build this on top of https://arrow.apache.org/, and potentially re-use some of the execution engine parts (Acero?)
The Vector API is pretty stable now. I don't think they even changed anything since the 4th incubation so it'll probably be fully GA in JDK 21.
If possible it would be great to start building the vectorized execution engine now and have it gated with a feature flag. Early adopter Crate users could enable vectorization and those who aren't can stick with the old way by default.
Road to vectorized/hybrid execution
Writing this down, mostly to sort out my thoughts, but also to share this for some further discussion.
Current state
Execution of queries in CrateDB is row based. The main interface to access rows is
BatchIterator
, which is usually consumed like this:moveNext()
andcurrentElement
are used iterate through the rows.loadNextBatch
can be used to avoid blocking (network) IO or to off-load computation to a different thread.BatchIterator implementations are usually generic in regards to the element type, but most of the time the element type will be
Row
.This composes well. There are implementations that wrap a source to apply a limit, filter out records and so on.
The second piece of the puzzle of how execution works is the expression-evaluation, or how value's are retrieved.
Focusing on the Lucene cases, at the root we have
LuceneCollectorExpression
implementation, which mostly wrapSortedNumericDocValues
(or some other instance) in some way or another. This is encapsulated via aInput
interface, which has a singlevalue
method. This is a source of excessive boxing - the PR #10046 gives an idea of how much.The
Input
implementations are put behind aInputRow
, so that the consumers of theBatchIterator
can access values viarow#get
Input/Expression interaction
Problems
Future
The PR #10048 gives an idea how much we pay for these abstractions. It shows a 100% performance improvement. And it is likely that the difference increases with a larger data set.
Some things to consider:
Vector based execution may not be better in all cases. Queries that can operate in a lazy fashion and have a small result set may benefit from being executed row based.
We can "bridge" from a block or vector based structure to a row based structure easily via a
flatMap
operation. Earlier benchmarks indicate that there is no significant cost in doing so. (In fact, theRow
/Input
/DocValues
abstractions already act a bit like that.)SortedNumericDocValues exposes the values are
long
and can be converted freely in any of the other numeric primitive types: short, integer, float, double. This can avoid having to implement many type-specialized components.This means:
Constraints / Other notes
Unsafe
, but we also already have netty'sByteBuf
. There is also JEP 370: Foreign-Memory Access API which we could use since we're on JDK 14, but it's incubating. It looks like since there is no "common ground" all projects that built components working with off-heap structures implemented their own abstractions - making it difficult to re-use anything.EvalProjection
.Resources
The text was updated successfully, but these errors were encountered: