DRAFT: Vectorized execution engine #10063

mfussenegger · 2020-06-09T14:33:34Z

Road to vectorized/hybrid execution

Writing this down, mostly to sort out my thoughts, but also to share this for some further discussion.

Current state

Execution of queries in CrateDB is row based. The main interface to access rows is BatchIterator, which is usually consumed like this:

while (it.moveNext()) {
    // do something with it.currentElement()
}
if (it.allLoaded()) {
     // iterator is exhausted
} else {
    it.loadNextBatch().whenComplete((r, t) -> {
        // continue consumption
    }
}

moveNext() and currentElement are used iterate through the rows.
loadNextBatch can be used to avoid blocking (network) IO or to off-load computation to a different thread.

BatchIterator implementations are usually generic in regards to the element type, but most of the time the element type will be Row.

This composes well. There are implementations that wrap a source to apply a limit, filter out records and so on.

The second piece of the puzzle of how execution works is the expression-evaluation, or how value's are retrieved.
Focusing on the Lucene cases, at the root we have LuceneCollectorExpression implementation, which mostly wrap SortedNumericDocValues (or some other instance) in some way or another. This is encapsulated via a Input interface, which has a single value method. This is a source of excessive boxing - the PR #10046 gives an idea of how much.

The Input implementations are put behind a InputRow, so that the consumers of the BatchIterator can access values via row#get

Input/Expression interaction

Problems

This approach is not ideal for how modern CPU works. See MonetDB/X100: Hyper-Pipelining Query Execution
Large intermediate data structures (HashMap for GROUP BY) have significant overhead and are created on the heap. That can lead to long GC pauses.

Future

The PR #10048 gives an idea how much we pay for these abstractions. It shows a 100% performance improvement. And it is likely that the difference increases with a larger data set.

Some things to consider:

Vector based execution may not be better in all cases. Queries that can operate in a lazy fashion and have a small result set may benefit from being executed row based.
We can "bridge" from a block or vector based structure to a row based structure easily via a flatMap operation. Earlier benchmarks indicate that there is no significant cost in doing so. (In fact, the Row / Input / DocValues abstractions already act a bit like that.)
SortedNumericDocValues exposes the values are long and can be converted freely in any of the other numeric primitive types: short, integer, float, double. This can avoid having to implement many type-specialized components.

This means:

We can incrementally switch to vectorized execution by starting at the root - focusing on the more expensive operations (GROUP BY, Aggregations), and plug in an adapter to use existing operator implementations
For some cases we may always end up using row-mode. (Which raises the question if which mode to use should be a planner/optimization decision at some point)

Constraints / Other notes

New execution implementation should have the option to use off-heap
There is no Java native way to do vectorized operations yet. We could wait for JEP 338: Vector API or go with JNI (QuestDB implements vectorization in C++)
The off-heap/on-heap support is messy. There is Unsafe, but we also already have netty's ByteBuf. There is also JEP 370: Foreign-Memory Access API which we could use since we're on JDK 14, but it's incubating. It looks like since there is no "common ground" all projects that built components working with off-heap structures implemented their own abstractions - making it difficult to re-use anything.
We currently allow scalar operations in pretty much any operation, as the expression evaluation framework takes care of it. It may be beneficial to limit the capabilities of each operator and only do scalar evaluations part of a EvalProjection.
Could build this on top of https://arrow.apache.org/, and potentially re-use some of the execution engine parts (Acero?)

Resources

https://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html
https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html
https://www.cockroachlabs.com/blog/how-we-built-a-vectorized-execution-engine/
https://www.cockroachlabs.com/blog/vectorizing-the-merge-joiner-in-cockroachdb/
https://pingcap.com/blog/10x-performance-improvement-for-expression-evaluation-made-possible-by-vectorized-execution/
http://hyper-db.com/index.html#publications (I think it was Engineering High-Performance Database Engines but not sure)

The text was updated successfully, but these errors were encountered:

robd003 · 2023-05-09T18:14:12Z

The Vector API is pretty stable now. I don't think they even changed anything since the 4th incubation so it'll probably be fully GA in JDK 21.

If possible it would be great to start building the vectorized execution engine now and have it gated with a feature flag. Early adopter Crate users could enable vectorization and those who aren't can stick with the old way by default.

mfussenegger added feature: performance meta A meta issue labels Jun 9, 2020

mfussenegger added this to To Do in 4.3 via automation Jun 9, 2020

mfussenegger mentioned this issue Jul 21, 2020

Q3 / CrateDB 4.3 iteration focus #10256

Closed

mfussenegger moved this from Candidate to Could in 4.3 Sep 2, 2020

mfussenegger added this to Candidate in 4.4 via automation Sep 11, 2020

mfussenegger removed this from Could in 4.3 Sep 11, 2020

seut removed this from Candidate in 4.4 Jan 12, 2021

seut added this to Candidate in 4.5 via automation Jan 12, 2021

mfussenegger removed this from Candidate in 4.5 Mar 9, 2021

mfussenegger mentioned this issue Mar 29, 2023

[meta] Join performance improvements #13893

Open

mfussenegger mentioned this issue May 8, 2023

Reserve CPU for system operations #14068

Open

mfussenegger mentioned this issue Jun 24, 2024

fdw: Add foreign data wrapper for parquet files (read-only) #15721

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRAFT: Vectorized execution engine #10063

DRAFT: Vectorized execution engine #10063

mfussenegger commented Jun 9, 2020 •

edited

robd003 commented May 9, 2023

DRAFT: Vectorized execution engine #10063

DRAFT: Vectorized execution engine #10063

Comments

mfussenegger commented Jun 9, 2020 • edited

Road to vectorized/hybrid execution

Current state

Problems

Future

Constraints / Other notes

Resources

robd003 commented May 9, 2023

mfussenegger commented Jun 9, 2020 •

edited