[fix](insert) fix INSERT job statistics lost in show load after FE restart#62331
[fix](insert) fix INSERT job statistics lost in show load after FE restart#62331liaoxin01 merged 1 commit intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
c5cfec3 to
5ec7282
Compare
|
run buildall |
|
/review |
There was a problem hiding this comment.
Finding:
InsertLoadJobnow persists a snapshot ofloadStatistic, but the patch does not actually remove the!Config.enable_nereids_loadgate from the finished-job journaling path. In the current tree, bothfe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/insert/OlapInsertExecutor.java:330andfe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/insert/RemoteOlapInsertExecutor.java:307still skiprecordFinishedLoadJob(...)whenenable_nereids_load=true. In that configuration the newjobDetailsJsonfield is never populated or written to the edit log, soSHOW LOADwill still lose INSERT job statistics after FE restart. The new regression test also runs only with the default config, so it does not cover this live path.
Critical checkpoint conclusions:
- Goal of the task: Partially achieved. The persisted snapshot itself is correct, but the config-enabled Nereids insert path that must create/journal the finished
InsertLoadJobremains unfixed, so the stated goal is not met for all supported configurations. - Is the modification small/clear/focused: Yes.
- Concurrency: No new lock-order or thread-safety issue found in the added code;
LoadStatistic.toJson()is synchronized and no new shared mutable structure was introduced. - Lifecycle/static initialization: No special lifecycle or static-init issue found.
InsertLoadJobremains registered in Gson's runtime type adapter, so the new field can replay once the object is journaled. - Configuration items: No new config added, but existing
enable_nereids_loadhandling is still incorrect for this fix. - Compatibility: The added serialized field is backward-compatible.
- Parallel code paths: Not fully handled. The Nereids local and remote insert executors still have the old guard.
- Special conditional checks: The remaining
!Config.enable_nereids_loadcondition is no longer valid for the stated behavior. - Test coverage: Insufficient for the bug described in the PR because the regression test does not exercise
enable_nereids_load=true. - Observability: No additional observability needed for this change.
- Transaction/persistence: The new field is fine once persisted, but the missing finished-job journal write on the guarded path still causes restart-time metadata loss.
- Data writes/modifications: No table-data correctness issue introduced by the new serialization itself.
- FE/BE variable passing: Not applicable.
- Performance: No material concern.
- Other issues: None beyond the missing Nereids-path fix above.
Summary opinion: needs follow-up before merge because the main correctness issue is still reproducible when enable_nereids_load=true.
FE UT Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
|
/review |
There was a problem hiding this comment.
Found one blocking issue.
fe/fe-core/src/main/java/org/apache/doris/load/loadv2/InsertLoadJob.java/fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/insert/OlapInsertExecutor.java:330/.../RemoteOlapInsertExecutor.java:307: the newjobDetailsJsonsnapshot only helps if the finishedInsertLoadJobis actually persisted viaLoadManager.recordFinishedLoadJob()->EditLog.logCreateLoadJob(loadJob). This patch does not remove the existing!Config.enable_nereids_loadguard in either executor, even though the PR description says it does. Whenenable_nereids_load=true,AbstractInsertExecutoronly creates an in-memoryInsertLoadJoband adds it toLoadManager; it is neither journaled nor registered as a transaction callback. After an FE restart,SHOW LOADstill cannot recover this snapshot in that mode. The new regression test also does not cover that path becauseenable_nereids_loaddefaults tofalse.
Critical checkpoint conclusions:
- Goal of the task: preserve finished INSERT
JobDetailsacross FE restart. The code only accomplishes this on therecordFinishedLoadJob()path; it is not correct forenable_nereids_load=true, and the added test does not prove that mode. - Small, clear, focused change: yes. The patch is small and localized.
- Concurrency: no new concurrency pattern was introduced; existing locking in
LoadJobis unchanged. - Lifecycle / static initialization: no special lifecycle or static-init issue found.
- Configuration items: no new config was added, but the existing
enable_nereids_loadswitch still leaves a parallel path unfixed. - Compatibility: no storage-format or symbol compatibility issue found.
- Functionally parallel code paths: applicable, and not fully handled. Both local and remote Nereids insert executors still keep the old guard.
- Special conditional checks: the
!Config.enable_nereids_loadcondition is now the key incorrect gate and is not justified for persistedSHOW LOADstate. - Test coverage: insufficient. The new regression test is useful for the default path, but it misses
enable_nereids_load=true; no replay-focused FE unit test was added either. - Observability: adequate for this small change.
- Transaction / persistence: blocking issue present. The persisted snapshot depends on a journal write path that still does not run in Nereids-load mode.
- Data writes / modifications: no direct table-data correctness issue identified.
- FE-BE variable passing: not applicable here.
- Performance: no material performance concern found in the patch itself.
- Other issues: the PR description and actual code diverge on the claimed guard removal.
Overall opinion: REQUEST_CHANGES because the advertised fix is incomplete and still loses SHOW LOAD job details in the enable_nereids_load=true execution path.
enable_nereids_load is useless.
…start (#62331) ## Problem After a FE restart, `SHOW LOAD` for finished INSERT jobs shows all-zero `JobDetails` (`ScannedRows`, `LoadBytes`, etc.), even though the values were correct before the restart. ## Fix **`InsertLoadJob`** - Add `@SerializedName("jdj") private String jobDetailsJson` to capture a JSON snapshot of `loadStatistic` at the moment the job finishes. - In `setJobProperties()` (called when the job transitions to FINISHED/CANCELLED), snapshot `loadStatistic.toJson()` into `jobDetailsJson`. - Override `getJobDetailsJson()` to return the persisted snapshot when available, falling back to the live `loadStatistic` during execution. **`LoadJob`** - Extract `protected String getJobDetailsJson()` (defaults to `loadStatistic.toJson()`) so subclasses can override the stats source used by `getShowInfoUnderLock()`. **`OlapInsertExecutor`** - Remove the `!Config.enable_nereids_load` guard in `afterExec()` so that `recordFinishedLoadJob` is always called when `jobId != -1`. This ensures the job is persisted to the edit log and its statistics are captured regardless of the config value. ## Test Added `test_insert_statistic_after_fe_restart` docker regression test that: 1. Inserts rows via `INSERT INTO ... SELECT` 2. Reads and records `JobDetails` from `SHOW LOAD` 3. Restarts FE 4. Asserts `ScannedRows`, `LoadBytes`, `FileNumber`, `FileSize` are unchanged after restart
…start (#62331) ## Problem After a FE restart, `SHOW LOAD` for finished INSERT jobs shows all-zero `JobDetails` (`ScannedRows`, `LoadBytes`, etc.), even though the values were correct before the restart. ## Fix **`InsertLoadJob`** - Add `@SerializedName("jdj") private String jobDetailsJson` to capture a JSON snapshot of `loadStatistic` at the moment the job finishes. - In `setJobProperties()` (called when the job transitions to FINISHED/CANCELLED), snapshot `loadStatistic.toJson()` into `jobDetailsJson`. - Override `getJobDetailsJson()` to return the persisted snapshot when available, falling back to the live `loadStatistic` during execution. **`LoadJob`** - Extract `protected String getJobDetailsJson()` (defaults to `loadStatistic.toJson()`) so subclasses can override the stats source used by `getShowInfoUnderLock()`. **`OlapInsertExecutor`** - Remove the `!Config.enable_nereids_load` guard in `afterExec()` so that `recordFinishedLoadJob` is always called when `jobId != -1`. This ensures the job is persisted to the edit log and its statistics are captured regardless of the config value. ## Test Added `test_insert_statistic_after_fe_restart` docker regression test that: 1. Inserts rows via `INSERT INTO ... SELECT` 2. Reads and records `JobDetails` from `SHOW LOAD` 3. Restarts FE 4. Asserts `ScannedRows`, `LoadBytes`, `FileNumber`, `FileSize` are unchanged after restart
Problem
After a FE restart,
SHOW LOADfor finished INSERT jobs shows all-zeroJobDetails(ScannedRows,LoadBytes, etc.), even though the valueswere correct before the restart.
Fix
InsertLoadJob@SerializedName("jdj") private String jobDetailsJsonto capture aJSON snapshot of
loadStatisticat the moment the job finishes.setJobProperties()(called when the job transitions toFINISHED/CANCELLED), snapshot
loadStatistic.toJson()intojobDetailsJson.getJobDetailsJson()to return the persisted snapshot whenavailable, falling back to the live
loadStatisticduring execution.LoadJobprotected String getJobDetailsJson()(defaults toloadStatistic.toJson()) so subclasses can override the stats sourceused by
getShowInfoUnderLock().OlapInsertExecutor!Config.enable_nereids_loadguard inafterExec()so thatrecordFinishedLoadJobis always called whenjobId != -1. This ensuresthe job is persisted to the edit log and its statistics are captured
regardless of the config value.
Test
Added
test_insert_statistic_after_fe_restartdocker regression test that:INSERT INTO ... SELECTJobDetailsfromSHOW LOADScannedRows,LoadBytes,FileNumber,FileSizeareunchanged after restart