New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support orc filter push down (file + stripe + rowgroup level) #55330
Conversation
This is an automated comment for commit ef4b5d5 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page Successful checks
|
@alexey-milovidov can you review it, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks for the SourceWithKeyCondition
refactoring!
Please add a test and remove the leftover debug logging (once you don't need it).
@al13n321 can you also review this related pr: ClickHouse/orc#11 |
Generating orc fileecho "select number as a, cast(number as String) as b from numbers(100000000) format ORC" | clickhouse-client > seq.orc First test was improved by 8.7x:Query: Without orc filter push down(set input_format_orc_filter_push_down = false)
With orc filter push down(set input_format_orc_filter_push_down = true)
Second test was improved by 117xQuery: Without orc filter push down(set input_format_orc_filter_push_down = false)
With orc filter push down(set input_format_orc_filter_push_down = true)
|
Lgtm, now just to figure out what to do with the orc PR (see comment there). |
Maybe it's just from removing (If that's correct, I'm surprised that a simple function call overhead is so significant compared to all the Field visitor stuff and the index stuff and KeyCondition etc.) |
The error is: |
@@ -312,6 +797,9 @@ Chunk NativeORCBlockInputFormat::generate() | |||
if (is_stopped) | |||
return {}; | |||
|
|||
/// TODO: figure out why reuse batch would cause asan fatals in https://s3.amazonaws.com/clickhouse-test-reports/55330/be39d23af2d7e27f5ec7f168947cf75aeaabf674/stateless_tests__asan__[4_4].html | |||
/// Not sure if it is a false positive case. Notice that reusing batch will speed up reading ORC by 1.15x. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why asan fatals appear if batch
is reused. Maybe it is a false positive ? @al13n321
Failed ut https://s3.amazonaws.com/clickhouse-test-reports/55330/0f4a3c8b40f0eff543b0931a39776bbd5b3efcca/stateless_tests__tsan__s3_storage__[3_5].html seems not related to this pr. |
@al13n321 do you think it is ok to merge this pr now. |
Looks like a TSAN error in EDIT: A more complete report in https://s3.amazonaws.com/clickhouse-test-reports/55330/0f4a3c8b40f0eff543b0931a39776bbd5b3efcca/stateless_tests__tsan__s3_storage__[3_5]/run.log (EDIT 2: This has 2 TSAN errors, the first one seems unrelated, the second is KeyCondition) |
It is fixed. Let's wait for ci. |
The failed flaky test can't be reproduced in my local environment. |
This PR create a dangling commit in orc submodule. |
@taiyang-li, according the test result it works even better on AArch64, but I don't know why: https://s3.amazonaws.com/clickhouse-test-reports/58061/7f170575229e41ba73102d6d7fb3718d04c604b8/stateless_tests__aarch64_.html Quite soon, we will not allow any tests without AArch64: #58061 |
Interesting, I'll see. |
@alexey-milovidov sorry, I don't have any develop environment with AArch64. Could you add |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Support orc filter push down (rowgroup level)
related pr of contrib/orc: ClickHouse/orc#11