Skip to content

rowid filter pushdown #15020

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Dec 4, 2024
Merged

rowid filter pushdown #15020

merged 12 commits into from
Dec 4, 2024

Conversation

Maxxen
Copy link
Member

@Maxxen Maxxen commented Nov 28, 2024

This PR adds support for pushing down static and dynamic filters on the rowid column into table scans.

The rowid column is not a real stored column and doesn't have statistics, but since row-ids are sequential within a row group we can always trivially create fake statistics on-the-fly to feed to the table filters. We also create dummy stats and filter for each vector when doing a partial scan of a row-group in order to avoid materializing the row-ids and actually evaluate the filter expression fully when there are no matches.

This opens up new possibilities for query-rewrites and optimizations.

Benchmarks:

Here are some microbenchmarks, all run on TPC-H sf=10

Static Filter Pushdown

This benchmark just does a point-lookup, with or without the filter pushdown optimizer enabled.

CALL dbgen(sf=10);
SELECT * FROM lineitem where rowid = 500;

Results:

Without rowid pushdown:

benchmark/micro/pushdown/rowid_no_pushdown.benchmark
name    run     timing
benchmark/micro/pushdown/rowid_no_pushdown.benchmark    1       0.292741
benchmark/micro/pushdown/rowid_no_pushdown.benchmark    2       0.291047
benchmark/micro/pushdown/rowid_no_pushdown.benchmark    3       0.292847
benchmark/micro/pushdown/rowid_no_pushdown.benchmark    4       0.297395
benchmark/micro/pushdown/rowid_no_pushdown.benchmark    5       0.292097

With rowid pushdown:

benchmark/micro/pushdown/rowid_pushdown.benchmark   
name    run     timing
benchmark/micro/pushdown/rowid_pushdown.benchmark       1       0.000455
benchmark/micro/pushdown/rowid_pushdown.benchmark       2       0.000462
benchmark/micro/pushdown/rowid_pushdown.benchmark       3       0.000586
benchmark/micro/pushdown/rowid_pushdown.benchmark       4       0.000522
benchmark/micro/pushdown/rowid_pushdown.benchmark       5       0.000428

Dynamic Filter Pushdown

Here we're rewriting a Top-N style query to instead produce a dynamic filter on the rowid as a result of the comparison join produced by the IN clause. In the future we may want to rewrite this query automatically to benefit from this optimization when a lot of columns are returned.

-- The "naive" Top-N
SELECT * FROM lineitem ORDER BY l_orderkey DESC LIMIT 5;

-- Is equivalent to this rewritten query:
SELECT * FROM lineitem WHERE rowid IN (SELECT rowid FROM lineitem ORDER BY l_orderkey DESC LIMIT 5);

Results:

Naive Top-N

benchmark/micro/pushdown/rowid_no_pushdown_dynamic.benchmark    1       0.381389
benchmark/micro/pushdown/rowid_no_pushdown_dynamic.benchmark    2       0.372278
benchmark/micro/pushdown/rowid_no_pushdown_dynamic.benchmark    3       0.372439
benchmark/micro/pushdown/rowid_no_pushdown_dynamic.benchmark    4       0.379967
benchmark/micro/pushdown/rowid_no_pushdown_dynamic.benchmark    5       0.372103

Rewritten Top-N with manual row filtering (on this branch, with rowid pushdown):

benchmark/micro/pushdown/rowid_pushdown_dynamic.benchmark   
name    run     timing
benchmark/micro/pushdown/rowid_pushdown_dynamic.benchmark       1       0.101268
benchmark/micro/pushdown/rowid_pushdown_dynamic.benchmark       2       0.101460
benchmark/micro/pushdown/rowid_pushdown_dynamic.benchmark       3       0.101515
benchmark/micro/pushdown/rowid_pushdown_dynamic.benchmark       4       0.104626
benchmark/micro/pushdown/rowid_pushdown_dynamic.benchmark       5       0.101839

Rewritten Top-N with manual row filtering, on main branch (without rowid pushdown):

./build/release/benchmark/benchmark_runner benchmark/micro/pushdown/main_dynamic_rowid_pushdown.benchmark 
name    run     timing
benchmark/micro/pushdown/main_dynamic_rowid_pushdown.benchmark  1       0.414385
benchmark/micro/pushdown/main_dynamic_rowid_pushdown.benchmark  2       0.411004
benchmark/micro/pushdown/main_dynamic_rowid_pushdown.benchmark  3       0.403185
benchmark/micro/pushdown/main_dynamic_rowid_pushdown.benchmark  4       0.405434
benchmark/micro/pushdown/main_dynamic_rowid_pushdown.benchmark  5       0.409918

@Maxxen
Copy link
Member Author

Maxxen commented Nov 28, 2024

I haven't benchmarked it specifically, but I'd also expect the duckdb_fts extension to benefit from this as AFAIK it stores row-ids in the FTS-index it creates and uses regular joins to do lookups.

@carlopi
Copy link
Contributor

carlopi commented Nov 28, 2024

Amazing work!

I think also queries like:

SELECT * FROM lineitem LIMIT 12000 OFFSET 1000000;

once rewritten as

SELECT * FROM lineitem WHERE rowid >=1000000  LIMIT 12000;

are significantly faster with this PR, allowing reliably fast access to arbitrary segments of a table.


Or this is great also for sampling (this is with reservoir, where there is biggest overhead, but other sampling strategies on table can possibly also benefit from being seen as: first identify which rowid, then fetch)

SELECT * FROM lineitem USING SAMPLE 5% (reservoir);
--- Run Time (s): real 9.636 user 11.249101 sys 0.411607

Two steps example in SQL, first create selection then use rowid pushdown

CREATE OR REPLACE TABLE selection AS (SELECT * FROM range(1,(SELECT count(*) FROM lineitem)) USING SAMPLE 5% (reservoir ));
--- Run Time (s): real 1.313 user 1.289073 sys 0.233425
SELECT * FROM lineitem WHERE rowid IN (FROM selection);
--- Run Time (s): real 0.653 user 4.730228 sys 0.077837

Also note that the bottleneck becomes the selection table creation, and significantly faster way to populate the selection vector. (for sure in C++, possibly mocked in SQL like:)

CREATE OR REPLACE TABLE selection_alternative AS (SELECT DISTINCT (random()*(SELECT count(*) FROM lineitem))::INT FROM range (1, (SELECT count(*) FROM lineitem)//18)) LIMIT (SELECT count(*) FROM lineitem)//20;
--- Run Time (s): real 0.200 user 0.309626 sys 0.182240

@duckdb-draftbot duckdb-draftbot marked this pull request as draft December 2, 2024 10:06
Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks great - some minor comments:

@Maxxen Maxxen marked this pull request as ready for review December 2, 2024 12:58
@Maxxen
Copy link
Member Author

Maxxen commented Dec 2, 2024

@Mytherin Seems like a spurious python download failure, could you restart the failing job for the extension CI?

Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! LGTM - could we perhaps add a few tests where we use the rowid pushdown as an optimization, e.g.:

SELECT * FROM lineitem WHERE rowid IN (SELECT rowid FROM lineitem ORDER BY l_orderkey DESC LIMIT 5);
SELECT * FROM lineitem WHERE rowid IN (SELECT rowid FROM lineitem WHERE l_orderkey * l_suppkey = 5998800238368);

@arjenpdevries
Copy link
Contributor

I have a question about this very useful optimisation.

I think select * from ... where rowid in (select rowid from ...) can also be seen as positional join.

How do these two ways of processing these "batch array lookup" requests compare?

@Mytherin
Copy link
Collaborator

Mytherin commented Dec 3, 2024

rowid can have gaps if there are deletes, e.g.:

D create table tbl(i int);
D insert into tbl values (1), (2), (3);
D delete from tbl where i=2;
D select i, rowid from tbl;
┌───────┬───────┐
│   i   │ rowid │
│ int32 │ int64 │
├───────┼───────┤
│     10 │
│     32 │
└───────┴───────┘

Because of that, it's not equivalent to a positional join when joining across different tables. The rowid should generally only be used within the same table.

rowid columns are also only present in DuckDB tables - we don't emit them for other data sources (yet?).

That being said - the rowid operations will be much faster than positional joins since they are much easier to parallelize.

@duckdb-draftbot duckdb-draftbot marked this pull request as draft December 3, 2024 13:19
@Alex-Monahan
Copy link
Contributor

Alex-Monahan commented Dec 3, 2024

Wow, arbitrary limit/offset queries getting faster would be amazing! This will come in super handy for table sampling.
Tagging @hamilton !

@Maxxen Maxxen marked this pull request as ready for review December 3, 2024 14:33
@Maxxen
Copy link
Member Author

Maxxen commented Dec 3, 2024

@Mytherin added some more tests, but python CI is still messing up

@carlopi
Copy link
Contributor

carlopi commented Dec 3, 2024

@Mytherin added some more tests, but python CI is still messing up

This is due to a problem with libxml, that is a postgres_scanner dependency. I am not sure what proper solution could be. If this is frequent (failure mode is network request 503 at about 13-15 minutes) we could remove azure from what's build on regular CI.

@Mytherin Mytherin merged commit 8382208 into duckdb:main Dec 4, 2024
42 of 43 checks passed
@Mytherin
Copy link
Collaborator

Mytherin commented Dec 4, 2024

Thanks! I think this is good to go

@hamilton
Copy link
Contributor

hamilton commented Dec 5, 2024

Very excited about this pushdown!

@carlopi
Copy link
Contributor

carlopi commented Dec 5, 2024

( I am not sure the stuff I wrote is completely correct about the rewrites being equivalent, there are some caveat that make rowid and LIMIT not the same with deletes / updates, so use with care, but also here very exited )

@FrancoisLepoutre
Copy link

Hi all, that looks like a great helpful addition! There are indeed data selection scenarii in which the rowid can help deliver subsets in a smarter way.

IMHO there are others data handling cases in UI mouse-controlled interfaces in which grasping data via some sort of "current id pointer" with a firm and fast handle on the duckdb rowid can make sense as well. In such scenarii, 0.292741 sec. is a nogo. But 0.000455 sec. certainly ain't. No need for an additional buffering layer at app level.

krlmlr added a commit to duckdb/duckdb-r that referenced this pull request Dec 27, 2024
`rowid` filter pushdown (duckdb/duckdb#15020)
Update the `.clang-format` file to disable sorting includes (duckdb/duckdb#15131)
Implement struct projection pushdown for JSON reads (duckdb/duckdb#15116)
Avoid cleaning up past releases if we have not just uploaded a new one (duckdb/duckdb#15134)
Test now passes due to flexible quote on CSVs (duckdb/duckdb#15133)
Skip building azure extension due to problems installing libxml (duckdb/duckdb#15126)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request Dec 27, 2024
`rowid` filter pushdown (duckdb/duckdb#15020)
Update the `.clang-format` file to disable sorting includes (duckdb/duckdb#15131)
Implement struct projection pushdown for JSON reads (duckdb/duckdb#15116)
Avoid cleaning up past releases if we have not just uploaded a new one (duckdb/duckdb#15134)
Test now passes due to flexible quote on CSVs (duckdb/duckdb#15133)
Skip building azure extension due to problems installing libxml (duckdb/duckdb#15126)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request Dec 27, 2024
`rowid` filter pushdown (duckdb/duckdb#15020)
Update the `.clang-format` file to disable sorting includes (duckdb/duckdb#15131)
Implement struct projection pushdown for JSON reads (duckdb/duckdb#15116)
Avoid cleaning up past releases if we have not just uploaded a new one (duckdb/duckdb#15134)
Test now passes due to flexible quote on CSVs (duckdb/duckdb#15133)
Skip building azure extension due to problems installing libxml (duckdb/duckdb#15126)
github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Dec 28, 2024
`rowid` filter pushdown (duckdb/duckdb#15020)
Update the `.clang-format` file to disable sorting includes (duckdb/duckdb#15131)
Implement struct projection pushdown for JSON reads (duckdb/duckdb#15116)
Avoid cleaning up past releases if we have not just uploaded a new one (duckdb/duckdb#15134)
Test now passes due to flexible quote on CSVs (duckdb/duckdb#15133)
Skip building azure extension due to problems installing libxml (duckdb/duckdb#15126)
github-actions bot added a commit to duckdb/duckdb-r that referenced this pull request Dec 28, 2024
`rowid` filter pushdown (duckdb/duckdb#15020)
Update the `.clang-format` file to disable sorting includes (duckdb/duckdb#15131)
Implement struct projection pushdown for JSON reads (duckdb/duckdb#15116)
Avoid cleaning up past releases if we have not just uploaded a new one (duckdb/duckdb#15134)
Test now passes due to flexible quote on CSVs (duckdb/duckdb#15133)
Skip building azure extension due to problems installing libxml (duckdb/duckdb#15126)

Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
Mytherin added a commit that referenced this pull request Jan 14, 2025
This PR adds the Late Materialization optimizer that enables late
materialization for certain queries - in particular top-n (`ORDER BY ..
LIMIT ..`), limit + offset, and sample queries. The optimization
piggy-backs off of the row-id filter pushdown introduced in
#15020 - and does the row-id
rewrites mentioned there automatically.

#### Rewrites

Here are some examples of rewrites:

###### Top-N
```sql
SELECT * FROM lineitem ORDER BY l_extendedprice DESC LIMIT 5;

-- becomes

SELECT * FROM lineitem WHERE rowid IN (
    SELECT rowid FROM lineitem ORDER BY l_extendedprice DESC LIMIT 5)
ORDER BY l_extendedprice DESC;

```

###### Limit + Offset
```sql
SELECT * FROM lineitem LIMIT 1000000 OFFSET 10000000;

-- becomes


SELECT * FROM lineitem WHERE rowid IN (
    SELECT rowid FROM lineitem LIMIT 1000000 OFFSET 10000000)
ORDER BY rowid;

```


#### Performance

```sql
SELECT * FROM lineitem ORDER BY l_extendedprice DESC LIMIT 5;
```

| v1.1.3 | main  |  new  |
|--------|-------|-------|
| 2.1s   | 0.07s | 0.01s |


```sql
SELECT * FROM lineitem ORDER BY l_orderkey DESC LIMIT 5;
```

| v1.1.3 | main |  new  |
|--------|------|-------|
| 10.2s  | 2.8s | 0.14s |


```sql
SELECT * FROM lineitem LIMIT 1000000 OFFSET 10000000;
```

| v1.1.3 | main  |  new  |
|--------|-------|-------|
| 0.46s  | 0.46s | 0.07s |
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants