Skip to content

Comments

Better logging for MSQ worker task#13790

Merged
cryptoe merged 4 commits intoapache:masterfrom
cryptoe:master
Feb 25, 2023
Merged

Better logging for MSQ worker task#13790
cryptoe merged 4 commits intoapache:masterfrom
cryptoe:master

Conversation

@cryptoe
Copy link
Contributor

@cryptoe cryptoe commented Feb 10, 2023

Adding more logs statements to the worker implementation which makes it easier to debug.


Key changed/added classes in this PR
  • WorkerImpl
  • WorkerStageKernel

This PR has:

  • been self-reviewed.
  • been tested in a test Druid cluster.

@cryptoe cryptoe changed the title Better logging to MSQ worker task Better logging for MSQ worker task Feb 10, 2023
@cryptoe cryptoe requested a review from rohangarg February 11, 2023 05:50
if (log.isDebugEnabled()) {
log.debug("Processing work order: %s", context.jsonMapper().writeValueAsString(kernel.getWorkOrder()));
} else {
log.info("Processing work order for stage[%d]", kernel.getStageDefinition().getStageNumber());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would recommend first emitting the info-level general comment. If debug is enabled, emit the work order as well. Else when searching for this message, one has to know what log level was set to know which message to search for.

@Override
public void postWorkOrder(final WorkOrder workOrder)
{
log.info("Got work order for stage[%d]", workOrder.getStageNumber());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: better to use standard English spacing: a space between "stage" and the bracket: stage [%d].

The brackets are really only needed for items that can contain spaces, so we know what is the message and what is the value. But, this is a number. so state %d.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In MSQ, in most places we are using this [] convention. I would hate to change it partially. Hence, if we decide to move away from using no brackets for numbers, then we should do it as part of another pr which touches all MSQ code.

@Override
public ClusterByStatisticsSnapshot fetchStatisticsSnapshot(StageId stageId)
{
log.info("Fetching statistics for stage:[%d]", stageId.getStageNumber());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above. Inconsistent use of colon. No colon is needed here: stage %d.

Does the logger automagically fill in the task ID? If not, would be good to include that so we can find events for a specific task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each worker can only have one taskID and each log is generally task log is generally pulled from the druid console.
In the case, people are using popular log agg tools, then they should add a dimension taskId to the "log line" via the constructs provided by the tool, "looking at file name in case of druid to figure out the taskId" is one such approach.
Hence I am not pushing taskId everywhere which just bloats the message.

public ClusterByStatisticsSnapshot fetchStatisticsSnapshotForTimeChunk(StageId stageId, long timeChunk)
{
log.debug(
"Fetching statistics for stage:[%d] with timechunk:[%d]",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit as above stage %d, time chunk %d

private void transitionTo(final WorkerStagePhase newPhase)
{
if (newPhase.canTransitionFrom(phase)) {
log.info(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as above.

@a2l007 a2l007 added the Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 label Feb 14, 2023
@cryptoe
Copy link
Contributor Author

cryptoe commented Feb 21, 2023

@paul-rogers This PR is waiting on review/approval on you. Let me know if there are more things that are needed to be addressed.

Copy link
Contributor

@paul-rogers paul-rogers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cryptoe cryptoe merged commit 6bb5eff into apache:master Feb 25, 2023
@clintropolis clintropolis added this to the 26.0 milestone Apr 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants