Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRAFT: Vectorized execution engine #10063

Open
mfussenegger opened this issue Jun 9, 2020 · 1 comment
Open

DRAFT: Vectorized execution engine #10063

mfussenegger opened this issue Jun 9, 2020 · 1 comment
Labels

Comments

@mfussenegger
Copy link
Member

mfussenegger commented Jun 9, 2020

Road to vectorized/hybrid execution

Writing this down, mostly to sort out my thoughts, but also to share this for some further discussion.

Current state

Execution of queries in CrateDB is row based. The main interface to access rows is BatchIterator, which is usually consumed like this:

while (it.moveNext()) {
    // do something with it.currentElement()
}
if (it.allLoaded()) {
     // iterator is exhausted
} else {
    it.loadNextBatch().whenComplete((r, t) -> {
        // continue consumption
    }
}
  • moveNext() and currentElement are used iterate through the rows.
  • loadNextBatch can be used to avoid blocking (network) IO or to off-load computation to a different thread.

BatchIterator implementations are usually generic in regards to the element type, but most of the time the element type will be Row.

This composes well. There are implementations that wrap a source to apply a limit, filter out records and so on.

The second piece of the puzzle of how execution works is the expression-evaluation, or how value's are retrieved.
Focusing on the Lucene cases, at the root we have LuceneCollectorExpression implementation, which mostly wrap SortedNumericDocValues (or some other instance) in some way or another. This is encapsulated via a Input interface, which has a single value method. This is a source of excessive boxing - the PR #10046 gives an idea of how much.

The Input implementations are put behind a InputRow, so that the consumers of the BatchIterator can access values via row#get

Input/Expression interaction

CollectPhase/AggregationProjection sketch

Problems

  • This approach is not ideal for how modern CPU works. See MonetDB/X100: Hyper-Pipelining Query Execution
  • Large intermediate data structures (HashMap for GROUP BY) have significant overhead and are created on the heap. That can lead to long GC pauses.

Future

The PR #10048 gives an idea how much we pay for these abstractions. It shows a 100% performance improvement. And it is likely that the difference increases with a larger data set.

Some things to consider:

  • Vector based execution may not be better in all cases. Queries that can operate in a lazy fashion and have a small result set may benefit from being executed row based.

  • We can "bridge" from a block or vector based structure to a row based structure easily via a flatMap operation. Earlier benchmarks indicate that there is no significant cost in doing so. (In fact, the Row / Input / DocValues abstractions already act a bit like that.)

  • SortedNumericDocValues exposes the values are long and can be converted freely in any of the other numeric primitive types: short, integer, float, double. This can avoid having to implement many type-specialized components.

This means:

  • We can incrementally switch to vectorized execution by starting at the root - focusing on the more expensive operations (GROUP BY, Aggregations), and plug in an adapter to use existing operator implementations
  • For some cases we may always end up using row-mode. (Which raises the question if which mode to use should be a planner/optimization decision at some point)

Constraints / Other notes

  • New execution implementation should have the option to use off-heap
  • There is no Java native way to do vectorized operations yet. We could wait for JEP 338: Vector API or go with JNI (QuestDB implements vectorization in C++)
  • The off-heap/on-heap support is messy. There is Unsafe, but we also already have netty's ByteBuf. There is also JEP 370: Foreign-Memory Access API which we could use since we're on JDK 14, but it's incubating. It looks like since there is no "common ground" all projects that built components working with off-heap structures implemented their own abstractions - making it difficult to re-use anything.
  • We currently allow scalar operations in pretty much any operation, as the expression evaluation framework takes care of it. It may be beneficial to limit the capabilities of each operator and only do scalar evaluations part of a EvalProjection.
  • Could build this on top of https://arrow.apache.org/, and potentially re-use some of the execution engine parts (Acero?)

Resources

@mfussenegger mfussenegger added this to To Do in 4.3 via automation Jun 9, 2020
@mfussenegger mfussenegger moved this from Candidate to Could in 4.3 Sep 2, 2020
@mfussenegger mfussenegger added this to Candidate in 4.4 via automation Sep 11, 2020
@mfussenegger mfussenegger removed this from Could in 4.3 Sep 11, 2020
@seut seut removed this from Candidate in 4.4 Jan 12, 2021
@seut seut added this to Candidate in 4.5 via automation Jan 12, 2021
@mfussenegger mfussenegger removed this from Candidate in 4.5 Mar 9, 2021
@robd003
Copy link
Contributor

robd003 commented May 9, 2023

The Vector API is pretty stable now. I don't think they even changed anything since the 4th incubation so it'll probably be fully GA in JDK 21.

If possible it would be great to start building the vectorized execution engine now and have it gated with a feature flag. Early adopter Crate users could enable vectorization and those who aren't can stick with the old way by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants