[HUDI-6967] Add clearJobStatus api in HoodieEngineContext#9899
[HUDI-6967] Add clearJobStatus api in HoodieEngineContext#9899wecharyu wants to merge 2 commits intoapache:masterfrom
Conversation
| String keyField = hoodieTable.getMetaClient().getTableConfig().getRecordKeyFieldProp(); | ||
|
|
||
| List<Pair<String, HoodieBaseFile>> baseFilesForAllPartitions = HoodieIndexUtils.getLatestBaseFilesForAllPartitions(partitions, context, hoodieTable); | ||
| context.clearJobStatus(); |
There was a problem hiding this comment.
This shouldn't be added. Key range loading has not finished here.
There was a problem hiding this comment.
The code after here will not create new job, it seems OK to clear job status here. WDYT?
| } | ||
|
|
||
| // Now delete partially written files | ||
| context.setJobStatus(this.getClass().getSimpleName(), "Delete all partially written files: " + config.getTableName()); |
There was a problem hiding this comment.
This status will be overwritten in HoodieTable#deleteInvalidFilesByPartitions, so just delete it.
| // perform index loop up to get existing location of records | ||
| context.setJobStatus(this.getClass().getSimpleName(), "Tagging: " + table.getConfig().getTableName()); | ||
| taggedRecords = tag(dedupedRecords, context, table); | ||
| context.clearJobStatus(); |
There was a problem hiding this comment.
If lazy execution happens afterwards, the job status may not be properly populated. Have you verified all places that this won't happen?
There was a problem hiding this comment.
Let me check all lazy execution. For this one, "Tagging xxx" status will also populate todeduplicateRecords, but clear here will not affect other jobs, so we retain this line.
| partitionFsPair.getRight().getLeft(), keyGenerator)); | ||
| partitionFsPair.getRight().getLeft(), keyGenerator)); | ||
| } finally { | ||
| context.clearJobStatus(); |
There was a problem hiding this comment.
This method composes a DAG and is triggered by lazy execution.
There was a problem hiding this comment.
Will remove clearJobStatus of lazy execution in CommitActionExecutor, because it will clear job status finally:
| } | ||
| return recordsAndPendingClusteringFileGroups.getLeft(); | ||
| } finally { | ||
| context.clearJobStatus(); |
There was a problem hiding this comment.
Could you check here for lazy execution too?
| }; | ||
| }); | ||
| } finally { | ||
| engineContext.clearJobStatus(); |
There was a problem hiding this comment.
This may be subject to lazy execution.
There was a problem hiding this comment.
Remove this clearJobStatus.
| }); | ||
| }); | ||
| } finally { | ||
| engineContext.clearJobStatus(); |
There was a problem hiding this comment.
Check this one too for lazy execution.
| new HoodieJsonPayload(genericRecord.toString())); | ||
| }); | ||
| } finally { | ||
| context.clearJobStatus(); |
| new HoodieJsonPayload(genericRecord.toString())); | ||
| }); | ||
| } finally { | ||
| context.clearJobStatus(); |
| executorOutputFs.getConf()); | ||
| }, parallelism); | ||
| } finally { | ||
| context.clearJobStatus(); |
There was a problem hiding this comment.
This should be at the end of the method, correct? since context.foreach also triggers Spark stages?
| String keyField = hoodieTable.getMetaClient().getTableConfig().getRecordKeyFieldProp(); | ||
|
|
||
| List<Pair<String, HoodieBaseFile>> baseFilesForAllPartitions = HoodieIndexUtils.getLatestBaseFilesForAllPartitions(partitions, context, hoodieTable); | ||
| context.clearJobStatus(); |
There was a problem hiding this comment.
The code after here will not create new job, it seems OK to clear job status here. WDYT?
| } | ||
|
|
||
| // Now delete partially written files | ||
| context.setJobStatus(this.getClass().getSimpleName(), "Delete all partially written files: " + config.getTableName()); |
There was a problem hiding this comment.
This status will be overwritten in HoodieTable#deleteInvalidFilesByPartitions, so just delete it.
| // perform index loop up to get existing location of records | ||
| context.setJobStatus(this.getClass().getSimpleName(), "Tagging: " + table.getConfig().getTableName()); | ||
| taggedRecords = tag(dedupedRecords, context, table); | ||
| context.clearJobStatus(); |
There was a problem hiding this comment.
Let me check all lazy execution. For this one, "Tagging xxx" status will also populate todeduplicateRecords, but clear here will not affect other jobs, so we retain this line.
| partitionFsPair.getRight().getLeft(), keyGenerator)); | ||
| partitionFsPair.getRight().getLeft(), keyGenerator)); | ||
| } finally { | ||
| context.clearJobStatus(); |
There was a problem hiding this comment.
Will remove clearJobStatus of lazy execution in CommitActionExecutor, because it will clear job status finally:
| } | ||
| return recordsAndPendingClusteringFileGroups.getLeft(); | ||
| } finally { | ||
| context.clearJobStatus(); |
| }; | ||
| }); | ||
| } finally { | ||
| engineContext.clearJobStatus(); |
There was a problem hiding this comment.
Remove this clearJobStatus.
| new HoodieJsonPayload(genericRecord.toString())); | ||
| }); | ||
| } finally { | ||
| context.clearJobStatus(); |
| executorOutputFs.getConf()); | ||
| }, parallelism); | ||
| } finally { | ||
| context.clearJobStatus(); |
| return Pair.of(schemaProvider, Pair.of(checkpointStr, records)); | ||
| return Pair.of(schemaProvider, Pair.of(checkpointStr, records)); | ||
| } finally { | ||
| hoodieSparkContext.clearJobStatus(); |
There was a problem hiding this comment.
This one seems also lazy execution.
| new HoodieJsonPayload(genericRecord.toString())); | ||
| }); | ||
| } finally { | ||
| context.clearJobStatus(); |
Change Logs
Impact
Fix the incorrect job group and descriptions.
Risk level (write none, low medium or high below)
None.
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist