Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: trim block metas pruned by runtime filter #14166

Conversation

dantengsky
Copy link
Member

@dantengsky dantengsky commented Dec 27, 2023

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

  • When ReadParquetDataSource and ReadParquetDataSource functioning as an async processor, it may incorrectly pass block metadata that was pruned by the runtime filter downstream.

    addressed by commit b4abd56 of this PR, accompanied by the SQL logic test case in commit 28cd2d2.

  • The error message for de-serialization failures has been modified to include the location and column information.

  • A minor refactor that eliminates duplicate code of native|parquet_data_source.rs

  • add new ci job "standalone-minio", which runs sql logic test dir "query" with

    • mysql and http handlers
    • minio as backend storage

    since the issue addressed in this PR occurs only when s3 (non-blocking io) style storage is used

    please see commit 26347a5

Fixes #14165

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-bugfix this PR patches a bug in codebase label Dec 27, 2023
@dantengsky dantengsky force-pushed the fix-trim-block-metas-pruned-by-runtime-filter branch from fc4aab7 to a972da6 Compare December 27, 2023 05:05
so that the number of PartInfoPtr and DataSource passed to
downstream will be equal to each other
@dantengsky dantengsky force-pushed the fix-trim-block-metas-pruned-by-runtime-filter branch from a972da6 to b4abd56 Compare December 27, 2023 05:56
@dantengsky dantengsky marked this pull request as ready for review December 27, 2023 10:48
run sql logic test dir "query" with
- mysql and http hanlders
- minio as backend storage
@dantengsky dantengsky force-pushed the fix-trim-block-metas-pruned-by-runtime-filter branch from 01e6278 to 26347a5 Compare December 27, 2023 14:13
@dantengsky dantengsky added this pull request to the merge queue Dec 27, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 27, 2023
@dantengsky dantengsky added this pull request to the merge queue Dec 27, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 27, 2023
@BohuTANG BohuTANG merged commit 12a275d into datafuselabs:main Dec 28, 2023
68 of 70 checks passed
@dantengsky dantengsky added the ci-cloud Build docker image for cloud test label Dec 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-cloud Build docker image for cloud test pr-bugfix this PR patches a bug in codebase
Projects
None yet
4 participants