[improvement](fe) Add configurable return mode for insert publish timeout in ETL scenarios#63919
[improvement](fe) Add configurable return mode for insert publish timeout in ETL scenarios#63919wenzhenghu wants to merge 12 commits into
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
run buildall |
TPC-H: Total hot run time: 31315 ms |
TPC-DS: Total hot run time: 172577 ms |
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 31761 ms |
TPC-DS: Total hot run time: 172440 ms |
### What problem does this PR solve?
Issue Number: N/A
Related PR: None
Problem Summary: The migrated insert publish-timeout change had targeted FE tests, but incremental coverage was still too low because several new branches were not exercised. In particular, OlapInsertExecutor.onFail() lacked direct coverage for committed and uncommitted failure cleanup paths, and SessionVariable still missed default-value, normalization, empty-input validation, and map restore branches for insert_visible_timeout_return_mode. This change expands the FE unit tests to cover those cases so the migrated logic is protected against regressions and the incremental coverage for the new code stays above the required threshold.
### Release note
None
### Check List (For Author)
- Test: FE unit test
- ./run-fe-ut.sh --run org.apache.doris.qe.SessionVariablesTest,org.apache.doris.nereids.trees.plans.commands.insert.OlapInsertExecutorTest
- ./run-fe-ut.sh --coverage --run org.apache.doris.qe.SessionVariablesTest,org.apache.doris.nereids.trees.plans.commands.insert.OlapInsertExecutorTest
- Behavior changed: No
- Does this need documentation: No
|
run buildall |
TPC-H: Total hot run time: 31585 ms |
TPC-DS: Total hot run time: 171226 ms |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Add a regression case for the new session variable `insert_visible_timeout_return_mode`. The case injects the FE debug point `PublishVersionDaemon.stop_publish` so a normal OLAP insert can commit successfully but remain non-visible long enough to hit `insert_visible_timeout_ms`. It verifies that `committed` mode returns success, `error` mode returns the expected client error, and both rows become visible after publish resumes. The case uses the configured FE HTTP address for debug-point operations so it also works when an external regression target reports loopback host addresses in `SHOW FRONTENDS`.
### Release note
None
### Check List (For Author)
- Test: Regression test
- Run `./run-regression-test.sh --run --conf /tmp/regression-conf-remote.groovy -d insert_p0 -s test_insert_visible_timeout_return_mode -genOut`
- Behavior changed: No
- Does this need documentation: No
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
### What problem does this PR solve? Issue Number: None Related PR: apache#63919 Problem Summary: Review feedback on the insert publish-timeout return-mode change requested using an enum instead of raw string constants. This commit keeps the SQL-visible values unchanged (`committed` and `error`) but stores the session variable as a dedicated enum inside FE. It also extends the generic session-variable assignment and restore paths so enum-backed variables can still be set from SQL, forwarded variables, and JSON/map persistence. The existing insert timeout behavior is unchanged, while the implementation becomes type-safe and removes the previous string normalization cleanup logic. ### Release note None ### Check List (For Author) - Test: FE Unit Test - Run `./run-fe-ut.sh --run org.apache.doris.qe.SessionVariablesTest,org.apache.doris.nereids.trees.plans.commands.insert.OlapInsertExecutorTest` - Behavior changed: No - Does this need documentation: No
|
run buildall |
|
run buildall |
TPC-H: Total hot run time: 28969 ms |
TPC-DS: Total hot run time: 172010 ms |
FE Regression Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 29293 ms |
TPC-DS: Total hot run time: 169577 ms |
|
run buildall |
TPC-H: Total hot run time: 29834 ms |
TPC-DS: Total hot run time: 171658 ms |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Normal internal-table inserts currently treat publish timeout as a committed insert and return success with COMMITTED status. This behavior is acceptable when clients only care that the transaction has been committed and can tolerate delayed visibility, but it is unsafe for pipelines whose downstream steps depend on the inserted data already being visible.
A typical case is ETL workflows that first use CREATE TABLE AS SELECT to build a temporary table and then immediately read that table to populate a downstream result table. If the upstream transaction has been committed but is not yet VISIBLE, the downstream step may temporarily read no rows and silently write empty data into the final table, so the whole pipeline appears successful even though the result is incorrect.
Doris already returns an error in explicit transaction mode when a COMMIT statement times out before the transaction becomes visible. This change adds a compatible mode for the regular non-transactional internal-table insert path by introducing a session variable, insert_visible_timeout_return_mode, so users can choose whether publish timeout should keep returning COMMITTED or return ERR.
The implementation also keeps committed-side bookkeeping unchanged in error mode so finished load jobs, insert result metadata, and related accounting still reflect the real transaction state.
Release note
Add a session variable to control whether normal internal-table inserts return COMMITTED or ERR when publish visibility times out.
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)