Better logging for MSQ worker task#13790
Conversation
| if (log.isDebugEnabled()) { | ||
| log.debug("Processing work order: %s", context.jsonMapper().writeValueAsString(kernel.getWorkOrder())); | ||
| } else { | ||
| log.info("Processing work order for stage[%d]", kernel.getStageDefinition().getStageNumber()); |
There was a problem hiding this comment.
Would recommend first emitting the info-level general comment. If debug is enabled, emit the work order as well. Else when searching for this message, one has to know what log level was set to know which message to search for.
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/WorkerImpl.java
Outdated
Show resolved
Hide resolved
| @Override | ||
| public void postWorkOrder(final WorkOrder workOrder) | ||
| { | ||
| log.info("Got work order for stage[%d]", workOrder.getStageNumber()); |
There was a problem hiding this comment.
Nit: better to use standard English spacing: a space between "stage" and the bracket: stage [%d].
The brackets are really only needed for items that can contain spaces, so we know what is the message and what is the value. But, this is a number. so state %d.
There was a problem hiding this comment.
In MSQ, in most places we are using this [] convention. I would hate to change it partially. Hence, if we decide to move away from using no brackets for numbers, then we should do it as part of another pr which touches all MSQ code.
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/WorkerImpl.java
Outdated
Show resolved
Hide resolved
| @Override | ||
| public ClusterByStatisticsSnapshot fetchStatisticsSnapshot(StageId stageId) | ||
| { | ||
| log.info("Fetching statistics for stage:[%d]", stageId.getStageNumber()); |
There was a problem hiding this comment.
As above. Inconsistent use of colon. No colon is needed here: stage %d.
Does the logger automagically fill in the task ID? If not, would be good to include that so we can find events for a specific task.
There was a problem hiding this comment.
Each worker can only have one taskID and each log is generally task log is generally pulled from the druid console.
In the case, people are using popular log agg tools, then they should add a dimension taskId to the "log line" via the constructs provided by the tool, "looking at file name in case of druid to figure out the taskId" is one such approach.
Hence I am not pushing taskId everywhere which just bloats the message.
| public ClusterByStatisticsSnapshot fetchStatisticsSnapshotForTimeChunk(StageId stageId, long timeChunk) | ||
| { | ||
| log.debug( | ||
| "Fetching statistics for stage:[%d] with timechunk:[%d]", |
There was a problem hiding this comment.
Edit as above stage %d, time chunk %d
| private void transitionTo(final WorkerStagePhase newPhase) | ||
| { | ||
| if (newPhase.canTransitionFrom(phase)) { | ||
| log.info( |
There was a problem hiding this comment.
Same comments as above.
|
@paul-rogers This PR is waiting on review/approval on you. Let me know if there are more things that are needed to be addressed. |
Adding more logs statements to the worker implementation which makes it easier to debug.
Key changed/added classes in this PR
WorkerImplWorkerStageKernelThis PR has: