Skip to content

[fix](parquet) fix null pointer dereference in gen_filter_map when filter_all is true#63889

Open
Larborator wants to merge 1 commit into
apache:masterfrom
Larborator:fix/parquet_column_reader
Open

[fix](parquet) fix null pointer dereference in gen_filter_map when filter_all is true#63889
Larborator wants to merge 1 commit into
apache:masterfrom
Larborator:fix/parquet_column_reader

Conversation

@Larborator
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #63887

Problem Summary:

Fix null pointer dereference (SIGSEGV) in ScalarColumnReader::gen_filter_map when reading nested type columns (Struct/Array/Map) from Parquet-based external tables.

When a predicate filters out all rows in a RowGroup, FilterMap is initialized with filter_all=true and _filter_map_data=nullptr. The _read_nested_column function only checks has_filter() before calling gen_filter_map, which dereferences filter_map_data() unconditionally — causing a crash.

Fix: add a filter_all() check before calling gen_filter_map. When filter_all is true, directly construct an all-zero nested filter map with filter_all=true propagation. This is logically equivalent to what gen_filter_map would produce with valid all-zero filter data — both correctly discard all data from the RowGroup.

Release note

Fix BE crash (SIGSEGV) when querying Parquet-based external tables (Paimon/Hive/Iceberg) with nested type columns under lazy read, if predicates filter out all rows in a RowGroup.

Check List (For Author)

  • Test
    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason

Manual test: Verified with standalone program simulating gen_filter_map logic — crashes without fix, passes with fix.

Unit test: Added test_filter_all_nullptr_nested_filter_map and test_all_zero_filter_nested_filter_map in parquet_common_test.cpp.

  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

…lter_all is true

When FilterMap is initialized with filter_all=true (via init(nullptr, total_rows, true)),
_has_filter is true but _filter_map_data is nullptr. The gen_filter_map function
dereferences filter_map_data() without null check, causing SIGSEGV when reading
nested columns (e.g. struct fields) from Paimon external tables during lazy read
with predicates that filter out all rows.

Fix: skip gen_filter_map when filter_all() is true, and instead propagate the
filter_all semantics to the nested filter map directly.
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Larborator
Copy link
Copy Markdown
Contributor Author

/review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug](parquet) SIGSEGV in gen_filter_map when reading nested columns with filter_all=true

2 participants