Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow sorting to improve FixedSizeBinary filtering #11170

Open
samuelcolvin opened this issue Jun 28, 2024 · 6 comments
Open

Allow sorting to improve FixedSizeBinary filtering #11170

samuelcolvin opened this issue Jun 28, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@samuelcolvin
Copy link
Contributor

samuelcolvin commented Jun 28, 2024

Describe the bug

Is it a bug? Is it a feature? No one has every really known 🤷 .

We have an ID field which is really a u128, because arrow doesn't support u128, I experimented with making it a DataType::FixedSizeBinary(16) instead of a string.

The performance differences when querying the column (just looking for a single row that matched a value) were quite surprising to me (updated, see below):

  • unsorted string: 600ms
  • unsorted FWB: 350ms
  • sorted string: 56ms
  • sorted FWB: 350ms

("FWB" == FixedSizeBinary(16))

I assume the point is that FixedSizeBinary doesn't support using the known sort order to improve filtering?

I assume (probably naively) that this shouldn't be too hard to add. Where would I start looking to add it?

To Reproduce

Example code: https://github.com/samuelcolvin/datafusion-id-experiment

Expected behavior

Ideally FixedSizeBinary look ups could benefit from known sort order in the same way that LargeUtf8 does.

Additional context

No response

@samuelcolvin samuelcolvin added the bug Something isn't working label Jun 28, 2024
@samuelcolvin
Copy link
Contributor Author

Well that's embarrassing 🤦, I forgot --release.

In release mode:

  • unsorted string: 82ms
  • unsorted FWB: 82ms
  • sorted string: 11ms
  • sorted FWB: 82ms

Kind of even more surprising - Surely FWB should be faster as you don't need to calculate offsets?

@samuelcolvin
Copy link
Contributor Author

Well it keeps getting weirder.

In release mode:

  • unsorted string: 82ms
  • unsorted FWB: 82ms
  • unsorted UInt64: 51ms
  • sorted string: 11ms
  • sorted FWB: 82ms
  • sorted UInt64: 39ms

(Trying UInt64 was the first step towards using a struct of two UInt64, but it seems unlike that would be as fast as a string right now)

This is all very confusing.

TL;DR; - @alamb if you were storing

A 16-byte array with at least one non-zero byte. ref

In parquet to query with datafusion, and wanted it to be fast long term, what would you use?

@samuelcolvin
Copy link
Contributor Author

Okay last comment here (for now), I'll stop talking to myself.

It seems that Decimal128 is the best option for our case (we can rewrite it to look like hex and be queried with hex):

Times (unsorted, sorted):

  • DataType::FixedSizeBinary(16) - (51, 57)
  • DataType::LargeUtf8 - (81, 10)
  • DataType::UInt64 - (52, 36)
  • DataType::Decimal128(38, 10) - (57, 7)

@alamb
Copy link
Contributor

alamb commented Jun 30, 2024

In parquet to query with datafusion, and wanted it to be fast long term, what would you use?

I would have recommended using FixedSizeBinary as you have done (and in fact I believe @hiltontj is doing something like this internall at InfluxData at the moment).

However I got broadly similar numbers to you in with the different types (and I agree Decimal128 looks quite good)

I checked out https://github.com/samuelcolvin/datafusion-id-experiment and got an explain plan with metrics (ran EXPLAIN ANALYZE {sql}):

FixedSizeBinary

select * from simple_fixed_sorted where id=arrow_cast(decode('57f16cbaf865bcd9adcc71c03200fd60', 'hex'),
 'FixedSizeBinary(16)')
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=0, elapsed_compute=3.172µs]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
|                   |   FilterExec: id@0 = 87,241,108,186,248,101,188,217,173,204,113,192,50,0,253,96, metrics=[output_rows=0, elapsed_compute=2.207689ms]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
|                   |     ParquetExec: file_groups={16 groups: [[Users/andrewlamb/Downloads/datafusion-id-experiment/simple_fixed.parquet:0..2375388], [Users/andrewlamb/Downloads/datafusion-id-experiment/simple_fixed.parquet:2375388..4750776], [Users/andrewlamb/Downloads/datafusion-id-experiment/simple_fixed.parquet:4750776..7126164], [Users/andrewlamb/Downloads/datafusion-id-experiment/simple_fixed.parquet:7126164..9501552], [Users/andrewlamb/Downloads/datafusion-id-experiment/simple_fixed.parquet:9501552..11876940], ...]}, projection=[id, name], predicate=id@0 = 87,241,108,186,248,101,188,217,173,204,113,192,50,0,253,96, pruning_predicate=CASE WHEN id_null_count@2 = id_row_count@3 THEN false ELSE id_min@0 <= 87,241,108,186,248,101,188,217,173,204,113,192,50,0,253,96 AND 87,241,108,186,248,101,188,217,173,204,113,192,50,0,253,96 <= id_max@1 END, required_guarantees=[id in (87,241,108,186,248,101,188,217,173,204,113,192,50,0,253,96)], metrics=[output_rows=1000000, elapsed_compute=16ns, bytes_scanned=38037954, file_open_errors=0, row_groups_matched_statistics=1, row_groups_pruned_statistics=0, num_predicate_creation_errors=0, file_scan_errors=0, predicate_evaluation_errors=0, row_groups_pruned_bloom_filter=0, page_index_rows_filtered=0, row_groups_matched_bloom_filter=0, pushdown_rows_filtered=0, time_elapsed_opening=24.008333ms, time_elapsed_scanning_total=68.525251ms, time_elapsed_processing=75.557418ms, pushdown_eval_time=32ns, time_elapsed_scanning_until_data=10.007876ms, page_index_eval_time=218.687µs] |

Decimal

select * from decimal where id=arrow_cast('5714204269946304998258834512.6198419457', 'Decimal128(38, 10)')
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type         | plan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=0, elapsed_compute=2.492µs]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
|                   |   FilterExec: id@0 = Some(57142042699463049982588345126198419457),38,10, metrics=[output_rows=0, elapsed_compute=422.896µs]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
|                   |     ParquetExec: file_groups={16 groups: [[Users/andrewlamb/Downloads/datafusion-id-experiment/decimal.parquet:0..2376546], [Users/andrewlamb/Downloads/datafusion-id-experiment/decimal.parquet:2376546..4753092], [Users/andrewlamb/Downloads/datafusion-id-experiment/decimal.parquet:4753092..7129638], [Users/andrewlamb/Downloads/datafusion-id-experiment/decimal.parquet:7129638..9506184], [Users/andrewlamb/Downloads/datafusion-id-experiment/decimal.parquet:9506184..11882730], ...]}, projection=[id, name], predicate=id@0 = Some(57142042699463049982588345126198419457),38,10, pruning_predicate=CASE WHEN id_null_count@2 = id_row_count@3 THEN false ELSE id_min@0 <= Some(57142042699463049982588345126198419457),38,10 AND Some(57142042699463049982588345126198419457),38,10 <= id_max@1 END, required_guarantees=[id in (Some(57142042699463049982588345126198419457),38,10)], metrics=[output_rows=1000000, elapsed_compute=16ns, bytes_scanned=38054970, file_open_errors=0, row_groups_matched_statistics=1, row_groups_pruned_statistics=0, num_predicate_creation_errors=0, file_scan_errors=0, predicate_evaluation_errors=0, row_groups_pruned_bloom_filter=0, page_index_rows_filtered=0, row_groups_matched_bloom_filter=0, pushdown_rows_filtered=0, time_elapsed_opening=4.448333ms, time_elapsed_scanning_total=60.640037ms, time_elapsed_processing=56.689831ms, pushdown_eval_time=32ns, time_elapsed_scanning_until_data=7.254625ms, page_index_eval_time=17.975µs] |
|                   |

I don't really have great insight as to why Decimal was better -- it may be because it is stored inline as i128 values (rather than out of line).

@samuelcolvin
Copy link
Contributor Author

What's weird is the behaviour with a decimal 128 is better than a uint64 when sorted. Is that that a fundamental side affect of the type, or some missing logic/optimisation?

@alamb
Copy link
Contributor

alamb commented Jul 1, 2024

What's weird is the behaviour with a decimal 128 is better than a uint64 when sorted. Is that that a fundamental side affect of the type, or some missing logic/optimisation?

I suspect it is some missing optimization -- I don't know of any reason that fixed size binary would be less efficient than decimal.

I double checked that FixedSizeBinary is also stored inline

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants