[feature](fe) Add partition filter sql block rule#62196
[feature](fe) Add partition filter sql block rule#62196CalvinKirs wants to merge 6 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run cloud_p0 |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
/review |
There was a problem hiding this comment.
Findings
fe/fe-core/src/main/java/org/apache/doris/datasource/hive/source/HiveScanNode.java:require_partition_filteris incorrect for Hive scans planned by the legacy planner. The new signal comes only fromselectedPartitions.hasPartitionPredicate, but this file explicitly documents thatselectedPartitionsis only populated in Nereids, so the legacy planner will always look like it has no partition predicate.fe/fe-core/src/main/java/org/apache/doris/qe/StmtExecutor.java:require_partition_filtercan still blockEXPLAINin the Nereids query path, because scan-based block-rule checking runs immediately after planning and beforehandleQueryStmt()returns early for explain statements.fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/AlterSqlBlockRuleCommand.java: malformed boolean input forrequire_partition_filteris silently treated asfalseduringALTER SQL_BLOCK_RULE, unlikeCREATE SQL_BLOCK_RULEwhich validates the property. That can unexpectedly disable the rule instead of rejecting bad input.
Critical Checkpoints
- Goal / correctness: Partially met. The new field is wired through rule metadata and positive planner paths, but the issues above mean the behavior still does not match the intended contract.
- Minimal / focused change: Yes.
- Concurrency: No new concurrency-sensitive logic found.
- Lifecycle / static initialization: No concerns found.
- Config changes: None.
- Compatibility / persistence: Applicable and okay. The new field is serialized and old persisted rules are defaulted to
falseingsonPostProcess(). - Parallel code paths: Not fully covered. Legacy Hive planning and EXPLAIN handling diverge from the intended behavior.
- Special condition checks: The new planner-mode / explain / boolean-validation conditions are not handled robustly enough.
- Test coverage: Positive coverage is good, but there is no coverage for legacy-planner Hive scans, malformed
ALTERbooleans, or a negativeEXPLAINcase forrequire_partition_filter. - Observability: No additional observability issues found.
- Transaction / persistence: Applicable and okay.
- Data writes / modifications: Not directly involved.
- FE-BE variable passing: Not applicable.
- Performance: No obvious new hot-path regression found.
- Other issues: None beyond the findings above.
| } | ||
|
|
||
| public boolean hasPartitionPredicate() { | ||
| return selectedPartitions != null && selectedPartitions.hasPartitionPredicate; |
There was a problem hiding this comment.
This only works for Nereids. The field comment directly above says selectedPartitions is only populated in Nereids and stays null in the legacy planner, so hasPartitionPredicate() is always false there. With enable_nereids_planner=false, a partitioned Hive query like ... WHERE pt1 = a will still hit missing partition filter even though the scan is pruned correctly.
| scanNode.getSelectedSplitNum(), | ||
| scanNode.getCardinality(), | ||
| isPartitionedTable, | ||
| hasPartitionPredicate, |
There was a problem hiding this comment.
This still runs before the Nereids EXPLAIN fast path. In executeByNereids, planner.plan(...) and checkBlockRulesByScan(planner) happen before handleQueryStmt() returns early for queryStmt.isExplain(), so a global require_partition_filter rule can reject EXPLAIN SELECT * FROM partitioned_tbl. That contradicts the PR description and the regression comments that say EXPLAIN must remain allowed.
| String cardinalityString = properties.get(SCANNED_CARDINALITY); | ||
| String requirePartitionFilterString = properties.get(REQUIRE_PARTITION_FILTER_PROPERTY); | ||
| this.requirePartitionFilter = StringUtils.isNotEmpty(requirePartitionFilterString) | ||
| ? Boolean.parseBoolean(requirePartitionFilterString) : null; |
There was a problem hiding this comment.
Boolean.parseBoolean() silently maps any non-true string to false. That means ALTER SQL_BLOCK_RULE ... PROPERTIES("require_partition_filter" = "not_a_bool") will clear the flag instead of rejecting invalid input. CREATE SQL_BLOCK_RULE already uses Util.getBooleanPropertyOrDefault(...) here, so ALTER should validate the new property the same way.
| **/ | ||
| public SqlBlockRule(String name, String sql, String sqlHash, Long partitionNum, Long tabletNum, Long cardinality, | ||
| Boolean global, Boolean enable) { | ||
| Boolean global, Boolean enable, Boolean requirePartitionFilter) { |
There was a problem hiding this comment.
add this parameter before Boolean global
| String cardinalityString = properties.get(SCANNED_CARDINALITY); | ||
| String requirePartitionFilterString = properties.get(REQUIRE_PARTITION_FILTER_PROPERTY); | ||
| this.requirePartitionFilter = StringUtils.isNotEmpty(requirePartitionFilterString) | ||
| ? Boolean.parseBoolean(requirePartitionFilterString) : null; |
There was a problem hiding this comment.
maybe need more clear error message when Boolean.parseBoolean(requirePartitionFilterString) throw exception
| .addColumn(new Column("Cardinality", ScalarType.createVarchar(20))) | ||
| .addColumn(new Column("Global", ScalarType.createVarchar(4))) | ||
| .addColumn(new Column("Enable", ScalarType.createVarchar(4))) | ||
| .addColumn(new Column("RequirePartitionFilter", ScalarType.createVarchar(5))) |
There was a problem hiding this comment.
why not use boolean type?
| if (partitionSlots == null) { | ||
| return false; | ||
| } | ||
| return !BooleanLiteral.TRUE.equals(PartitionPruneExpressionExtractor.extract( |
There was a problem hiding this comment.
PartitionPruner will call PartitionPruneExpressionExtractor.extract too. we should let pruner return the info to avoid call it twice for better perfermance
| * todo: should be pulled up to base class? | ||
| */ | ||
| private final boolean partitionPruned; | ||
| private final boolean hasPartitionPruningPredicate; |
There was a problem hiding this comment.
the attribute name should same as the one in LogicalFileScan, or better, you should add the attribute in their common super class
| private final long selectedIndexId; | ||
| private final ImmutableList<Long> selectedTabletIds; | ||
| private final ImmutableList<Long> selectedPartitionIds; | ||
| private final boolean hasPartitionPruningPredicate; |
| private final PartitionPruneV2ForShortCircuitPlan cachedPartitionPruner = | ||
| new PartitionPruneV2ForShortCircuitPlan(); | ||
|
|
||
| private boolean hasPartitionPruningPredicate = false; |
There was a problem hiding this comment.
add into ScanNode is better
|
|
||
| private PartitionColumnFilter createPartitionFilter(SlotDescriptor desc, List<Expr> conjuncts, | ||
| @VisibleForTesting | ||
| static boolean hasUsablePartitionPruningPredicate(List<Column> partitionColumns, TupleDescriptor tupleDescriptor, |
There was a problem hiding this comment.
useless function, remove it
| boolean hasPartitionPredicate = false; | ||
| if (scanNode instanceof OlapScanNode) { | ||
| OlapScanNode olapScanNode = (OlapScanNode) scanNode; | ||
| isPartitionedTable = olapScanNode.getOlapTable().isPartitionedTable(); |
There was a problem hiding this comment.
should add a interface isPartitionedTable on ScanNode
Issue Number: None
Related PR: None
Problem Summary: Add a scan-based SQL_BLOCK_RULE option that rejects partitioned internal-table and Hive-table scans without partition predicates when the rule enables require_partition_filter, and expose the new field through SHOW and information_schema with unit and regression coverage.
Add require_partition_filter to SQL_BLOCK_RULE for partitioned internal and Hive table scans.
- Test: FE unit tests and FE checkstyle
- Regression test not run in this session
- Behavior changed: Yes (configured SQL block rules can now reject partitioned scans without partition filters)
- Does this need documentation: No
### What problem does this PR solve? Issue Number: None Related PR: apache#62196 Problem Summary: Address morrySnow's review feedback for sql block rule partition-filter handling, scan-node partition predicate exposure, and repeated partition-prune extraction in Nereids. ### Release note None ### Check List (For Author) - Test: FE Unit Test - ./run-fe-ut.sh --run org.apache.doris.blockrule.SqlBlockRuleMgrTest,org.apache.doris.nereids.trees.plans.commands.SqlBlockRuleCommandTest,org.apache.doris.planner.OlapScanNodeTest,org.apache.doris.datasource.hive.source.HiveScanNodeTest,org.apache.doris.nereids.glue.translator.PhysicalPlanTranslatorTest - Behavior changed: Yes (stricter boolean property validation, unified partition-predicate exposure on scan nodes) - Does this need documentation: No
|
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
### What problem does this PR solve? Issue Number: None Related PR: apache#62196 Problem Summary: Fix SHOW SQL_BLOCK_RULE boolean output so JDBC clients can decode RequirePartitionFilter after the column type change to BOOLEAN. ### Release note None ### Check List (For Author) - Test: FE Unit Test - ./run-fe-ut.sh --run org.apache.doris.blockrule.SqlBlockRuleMgrTest - ./run-fe-ut.sh --run org.apache.doris.blockrule.SqlBlockRuleMgrTest,org.apache.doris.nereids.trees.plans.commands.SqlBlockRuleCommandTest - Behavior changed: Yes (SHOW SQL_BLOCK_RULE now returns 1/0 text for the BOOLEAN RequirePartitionFilter column) - Does this need documentation: No
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
### What problem does this PR solve? Issue Number: None Related PR: apache#62196 Problem Summary: Update OptimizeGetAvailableMvsTest to mock the new PartitionPruner.pruneWithResult API and always clean up materialized views so failed assertions do not leak state into later tests. ### Release note None ### Check List (For Author) - Test: FE Unit Test - ./run-fe-ut.sh --run org.apache.doris.nereids.mv.OptimizeGetAvailableMvsTest - Behavior changed: No - Does this need documentation: No
|
run buildall |
FE Regression Coverage ReportIncrement line coverage |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
|
/review |
There was a problem hiding this comment.
Blocking findings:
StmtExecutor.checkBlockRulesByScan()now appliesrequire_partition_filterto everyFileScanNode, but only Hive/Hudi populatehasPartitionPredicate(). Partitioned Iceberg/Paimon/MaxCompute scans are rejected even with partition filters.PruneOlapScanPartitionstill marks explicitTABLET(...)scans as missing partition filters.PruneOlapScanPartitionstill marks explicitPARTITION(...)scans as missing partition filters.PartitionPruner.pruneWithResult()recordshasPartitionPredicatetoo early, so Nereids can satisfy the rule with predicates likep IS NOT NULLthat do not actually prune partitions.information_schema.sql_block_rule_statusadds a new FE/BE schema-table column without mixed-version compatibility, so old/new FE-BE combinations can fail withcol size not equalduring rolling upgrade.
Critical checkpoints:
- Goal and proof: Partially achieved. The PR wires the new rule, show output, schema-table output, and tests, but the blocking cases above mean it does not yet correctly enforce the intended semantics across supported query shapes and deployment modes. Current tests only cover happy paths.
- Small and focused: Mostly focused, but the
StmtExecutorchange unintentionally broadens enforcement from Hive-only to allFileScanNodeimplementations. - Concurrency: No new concurrency or lock-order issue identified.
- Lifecycle / static init: No special lifecycle or static-initialization issue identified.
- Configuration: No new config item added.
- Compatibility: Not safe for rolling upgrade because the schema-table RPC row width changed without backward-compatible handling.
- Parallel code paths: Not handled consistently. Legacy planner and Nereids now disagree on what counts as a usable partition predicate, and external scan implementations are not wired uniformly.
- Special conditions: The new
hasPartitionPredicatesignal is too broad for some Nereids predicates and too narrow for explicitPARTITION(...)/TABLET(...)restrictions. - Test coverage: Missing regression/UT coverage for Iceberg/Paimon/MaxCompute, explicit
PARTITION(...), explicitTABLET(...), and non-pruning predicates such asIS NOT NULL. The new tests look reasonable for happy paths but do not cover these blockers. - Persist / replay: The new
requirePartitionFilterfield itself appears correctly persisted and defaulted via Gson post-processing. - Observability: No additional observability requirement identified.
| scanNode.getSelectedPartitionNum(), | ||
| scanNode.getSelectedSplitNum(), | ||
| scanNode.getCardinality(), | ||
| scanNode.isPartitionedTable(), |
There was a problem hiding this comment.
This widens require_partition_filter from the old Hive-only path to every FileScanNode, but the new generic ScanNode.hasPartitionPredicate() flag is only populated for Hive/Hudi. IcebergScanNode, PaimonScanNode, and MaxComputeScanNode still leave it at the default false, while isPartitionedTable() is true for partitioned tables. With a rule like require_partition_filter=true, queries such as SELECT * FROM iceberg_pt WHERE ds = '2026-04-09' will now be rejected as missing partition filter, which both breaks valid queries and contradicts the PR's documented scope (Iceberg / other unwired external tables should not be covered yet).
| LogicalFilter filter, | ||
| MatchingContext ctx) { | ||
| Set<String> partitionColumnNameSet = Utils.execWithReturnVal(table::getPartitionColumnNames); | ||
| if (partitionColumnNameSet.isEmpty()) { |
There was a problem hiding this comment.
When the query is restricted only by TABLET(...), this branch feeds hasPartitionPredicate=false into prunePartitionByTabletIds(), so the final scan still looks like it has missing partition filter. With require_partition_filter=true, SELECT * FROM part_tbl TABLET(12345) gets blocked even though the scan is already narrower than any partition predicate. This query shape needs to satisfy the rule, or be exempted before the generic block-rule check.
| partitionSlots, filter.getPredicate(), idToPartitions, ctx.cascadesContext, | ||
| PartitionTableType.OLAP, sortedPartitionRanges); | ||
| return prunedPartitions; | ||
| } else if (!manuallySpecifiedPartitions.isEmpty()) { |
There was a problem hiding this comment.
This marks explicit PARTITION(...) selection as hasPartitionPredicate=false. That means SELECT * FROM part_tbl PARTITION(p20260409) is rejected as missing partition filter under require_partition_filter=true, even though the query already names the only partition(s) it is allowed to scan. The rule is supposed to prevent accidental full partition scans, so explicit partition selection should not be treated the same as an unrestricted scan.
| List<Slot> partitionSlots, | ||
| Expression partitionPredicate, | ||
| Map<K, PartitionItem> idToPartitions, CascadesContext cascadesContext, | ||
| PartitionTableType partitionTableType, Optional<SortedPartitionRanges<K>> sortedPartitionRanges) { |
There was a problem hiding this comment.
This flag is recorded before we know whether the extracted partition expression is actually usable for pruning. Any evaluable predicate on the partition slot will count here, even if it prunes nothing. For example, on a normal range-partitioned table WHERE p IS NOT NULL leaves hasPartitionPredicate=true, so require_partition_filter is satisfied even though the scan still reads every partition. The old OlapScanNode.hasPartitionPredicate() helper only treated predicates that createPartitionFilter() / createColumnRange() could really use for pruning as valid, so this creates a Nereids-only bypass.
| {"CARDINALITY", TYPE_BIGINT, sizeof(int64_t), true}, | ||
| {"GLOBAL", TYPE_BOOLEAN, sizeof(bool), true}, | ||
| {"ENABLE", TYPE_BOOLEAN, sizeof(bool), true}, | ||
| {"REQUIRE_PARTITION_FILTER", TYPE_BOOLEAN, sizeof(bool), true}, |
There was a problem hiding this comment.
Adding a new schema-table column here changes the FE<->BE row width from 12 cells to 13, but _get_sql_block_rule_status_block_from_fe() still hard-fails on any size mismatch. During a rolling upgrade, an old FE will still return 12 cells and a new FE will return 13 cells, so SELECT * FROM information_schema.sql_block_rule_status can fail with col size not equal as soon as FE/BE versions are mixed. We need a compatibility path here (for example, tolerate missing/extra trailing REQUIRE_PARTITION_FILTER and fill false for older peers) before extending this RPC schema.
| @@ -24,7 +24,6 @@ | |||
| import org.apache.doris.common.FeNameFormat; | |||
| import org.apache.doris.common.UserException; | |||
| import org.apache.doris.common.util.SqlBlockUtil; | |||
| import org.apache.doris.common.util.Util; | |||
| protected static boolean getBooleanPropertyOrDefault(Map<String, String> properties, | ||
| String propertyName, boolean defaultValue) throws AnalysisException { | ||
| if (!properties.containsKey(propertyName)) { | ||
| return defaultValue; | ||
| } | ||
| return parseBooleanProperty(properties.get(propertyName), propertyName); | ||
| } | ||
|
|
||
| protected static Boolean getOptionalBooleanProperty(Map<String, String> properties, | ||
| String propertyName) throws AnalysisException { | ||
| if (!properties.containsKey(propertyName)) { | ||
| return null; | ||
| } | ||
| return parseBooleanProperty(properties.get(propertyName), propertyName); | ||
| } | ||
|
|
||
| private static boolean parseBooleanProperty(String value, String propertyName) throws AnalysisException { | ||
| if ("true".equalsIgnoreCase(value)) { | ||
| return true; | ||
| } | ||
| if ("false".equalsIgnoreCase(value)) { | ||
| return false; | ||
| } | ||
| throw new AnalysisException(propertyName + " should be a boolean"); | ||
| } |
There was a problem hiding this comment.
these functions are general function that could be used for any properties, maybe we already has similar util functions, if not, we should add them into a general util helper class
| } | ||
| } | ||
| return false; | ||
| } |
There was a problem hiding this comment.
should not add logic code here, should be return hasPartitionPredicate directly
| } | ||
| } | ||
| return false; | ||
| } |
There was a problem hiding this comment.
should not add logic code here, should return hasPartitionPredicate directly
### What problem does this PR solve? Issue Number: None Related PR: apache#62196 Problem Summary: Fix the sql block rule follow-up review items, including partition-filter enforcement propagation, effective partition-pruning detection, rolling-upgrade compatibility for sql_block_rule_status, and prepared-statement boolean result handling. Also add regression and unit-test coverage for the new behavior. ### Release note None ### Check List (For Author) - Test: Unit Test - ./run-fe-ut.sh --run org.apache.doris.blockrule.SqlBlockRuleMgrTest,org.apache.doris.nereids.trees.plans.commands.SqlBlockRuleCommandTest,org.apache.doris.planner.OlapScanNodeTest,org.apache.doris.nereids.rules.rewrite.PartitionPrunerTest,org.apache.doris.nereids.glue.translator.PhysicalPlanTranslatorTest - ./run-fe-ut.sh --run org.apache.doris.qe.StmtExecutorTest,org.apache.doris.blockrule.SqlBlockRuleMgrTest,org.apache.doris.nereids.trees.plans.commands.SqlBlockRuleCommandTest - Test: Regression test / Added case, not run locally against a rebuilt FE service because no usable local service was available - Behavior changed: Yes (sql block rule partition-filter checks, SHOW output handling, and upgrade compatibility) - Does this need documentation: No
### What problem does this PR solve? Issue Number: None Related PR: apache#62196 Problem Summary: Fix the remaining java checkstyle and clang-format issues introduced in the sql block rule review follow-up changes. ### Release note None ### Check List (For Author) - Test: Java checkstyle and clang-format check - Unit Test / Manual test - Behavior changed: No - Does this need documentation: No
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
Today SQL_BLOCK_RULE can block SQL by regex/sqlHash or by scan scale thresholds such as
partition_num,tablet_num, andcardinality, but it cannot directly reject queries thatscan a partitioned table without using any partition filter.
This PR adds a new scan-based SQL block rule option:
require_partition_filter.When this option is enabled on a rule, Doris rejects scans on supported partitioned tables if
the query does not contain any usable partition predicate. This helps prevent accidental full
partition scans caused by missing partition filters.
The rule is intended for workloads where partition filters are mandatory for safety or cost
control.
User-visible behavior
Users can now create a SQL block rule like this:
CREATE SQL_BLOCK_RULE require_partition_filter_rule PROPERTIES( "require_partition_filter" = "true", "global" = "true", "enable" = "true" ); Or bind it to specific users: CREATE SQL_BLOCK_RULE require_partition_filter_rule PROPERTIES( "require_partition_filter" = "true", "global" = "false", "enable" = "true" ); SET PROPERTY FOR 'test_user' 'sql_block_rules' = 'require_partition_filter_rule'; If a supported partitioned table is scanned without any partition predicate, Doris returns an error like: sql hits sql block rule: require_partition_filter_rule, missing partition filter ### Scope This new rule currently applies to: - Partitioned internal tables - Partitioned Hive external tables This rule does not apply to: - Non-partitioned internal tables - Non-partitioned Hive tables - Iceberg tables - Other external table types not wired into this rule yet ### Rule semantics The new property is: - require_partition_filter = true|false Behavior: - The rule only takes effect when require_partition_filter=true - For supported partitioned tables, the scan is allowed if the query hits any partition column in partition pruning predicates - Filters on non-partition columns do not count - The rule applies to scan-producing statements, such as: - SELECT - INSERT INTO ... SELECT ... - EXPLAIN is not blocked Examples: Blocked: SELECT * FROM part_tbl; SELECT * FROM part_tbl WHERE non_partition_col = 1; INSERT INTO dst SELECT * FROM part_tbl; Allowed: SELECT * FROM part_tbl WHERE dt = '2026-04-09'; SELECT * FROM part_tbl WHERE dt = '2026-04-09' AND hh = '10'; INSERT INTO dst SELECT * FROM part_tbl WHERE dt = '2026-04-09'; ### Compatibility and validation require_partition_filter is treated as a scan-based SQL block condition. It can be used together with existing scan-based limits such as: - partition_num - tablet_num - cardinality It cannot be mixed with text-based block conditions such as: - sql - sqlHash