Skip to content

msq path to emit unparseable exceptions#13670

Closed
TSFenwick wants to merge 9 commits intoapache:masterfrom
TSFenwick:feature-emit-unparseable-events-parse-exception-handler-msq
Closed

msq path to emit unparseable exceptions#13670
TSFenwick wants to merge 9 commits intoapache:masterfrom
TSFenwick:feature-emit-unparseable-events-parse-exception-handler-msq

Conversation

@TSFenwick
Copy link
Contributor

@TSFenwick TSFenwick commented Jan 13, 2023

Emit UnparseableExceptions in realtime as a new event.

Description

The goal of this PR is to emit an unparseableEvent in real time that an admin of druid might be interested in storing elsewhere. The new event shouldn't be emitted by default and should be opted into.

Release note

For tips about how to write a good release note, see Release notes.


Key changed/added classes in this PR
  • ServiceLogEvent
  • ParseExceptionHandler
  • MSQCompositeWarningReporter
  • MSQFilteredEmitterReporter

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Copy link
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments.
Thanks for taking this up.
Please update the description whenever you have time.

return;
}
}
if (CannotParseExternalDataFault.CODE.equals(errorCode)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I would do:
Rename MSQWarningReportPublisher to ExceptionPublisher.

Create a new CompositeExceptionPublisher.

One implementation can be of a FilteredEmitterExceptionPublisher where in the constructor you take the error code.

It would be cool to avoid This call
:MSQErrorReport msqFault = MSQErrorReport.fromException(workerId, host, stageNumber, e);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:MSQErrorReport msqFault = MSQErrorReport.fromException(workerId, host, stageNumber, e);
will definitely avoid repeating this call

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do i go about attaching a debugger in MSQ to see if these reports are being captured/emitted

Copy link
Contributor Author

@TSFenwick TSFenwick Jan 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure i follow

This is what I would do:
Rename MSQWarningReportPublisher to ExceptionPublisher.

Create a new CompositeExceptionPublisher.

One implementation can be of a FilteredEmitterExceptionPublisher where in the constructor you take the error code.

Are you saying create another class that implements the interface that i will rename.

The CompositeExceptionPublisher i don't understand?
FilteredEmitterExceptionPublisher i also don't understand

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do i go about attaching a debugger in MSQ to see if these reports are being captured/emitted

MSQWarningsTest#testThrowExceptionWhenParseExceptionsExceedLimit should be a good one.

CompositeExceptionPublisher will implement the renamed interface ExceptionPublisher something like

class CompositeExceptionPublisher implements ExceptionPublisher{
List<ExceptionPublisher> delegates;

public CompositeExceptionPublisher(List<ExceptionPublisher> delegates){

this.delegates=delegates;
}

void publishException(int stageNumber, Throwable e){
for(ExceptionPublisher delegate:delegates){
delegate.publishException(stage,e);
}
}


  void close() throws IOException{
for(ExceptionPublisher delegate:delegates){
delegate.close();
}
}


}

Now you can create a new impl FilteredEmitterExceptionPublisher and pass it errorCodes in the constructor.

Then in WorkerImpl#runTask method create a CompositeExceptionPublisher with MSQWarningReportLimiterPublisher and FilteredEmitterExceptionPublisher and pass it along.

Maybe this strategy can work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cryptoe How does this look?
We have the Limiter publisher and we pass in the composite publisher that contains the normal publisher and the emitter publisher

@cryptoe cryptoe added the Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 label Jan 19, 2023
have a couple of todos as well like setting up the correct event to emit, and getting more relevant information
Copy link
Contributor

@LakshSingla LakshSingla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do i go about attaching a debugger in MSQ to see if these reports are being captured/emitted

There should be a class MSQWarningsTest which test a few conditions when warnings should be thrown. Also, to test if the metrics are emitted, a few ways would be to either create a mock emitter that captures the calls made to it or spy on the existing noop emmiter and then verifying the arguments.

import java.io.IOException;
import java.util.Set;

public class MSQFilteredEmitterWarningPublisher implements MSQWarningPublisher
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be appropriate to have separate classes:

  1. MSQFilteredPublisher which takes in a delegate and publishes the warning only if it passes through the filter
  2. MSQEmitterPublisher that emits the warning if publishException is called

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, but this is the only consumer of filtering and emitting right now so does that make sense to do that level of abstraction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is necessary i'll leave this change to later as it is pretty straightforward

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me 👍


Bouncer processorBouncer();

Emitter emitter();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the emitter be nullable? If so, appropriate checks and annotations would be helpful

Copy link
Contributor Author

@TSFenwick TSFenwick Jan 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are getting this from TaskToolBox and the emitter there isn't labeled as nullable

public MSQFilteredEmitterWarningPublisher(
final String workerId,
final String controllerTaskId,
final String taskId,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add datasource

Integer.MAX_VALUE,
0
0,
null
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change nulls to a noop parse exception handler if possible

* is trully published
*/
public class MSQWarningReportLimiterPublisher implements MSQWarningReportPublisher
public class MSQWarningReportLimiterPublisher implements MSQWarningPublisher
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change class to MSQWarningLimiterPublisher

@@ -0,0 +1,38 @@
package org.apache.druid.indexing.common;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the correct package for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class seems avoidable. Looks like this is introduced cause of parseExceptionHandler cause you need the taskId, groupId, data source.
TaskID, groupId is part of the Task.java and seems to be repeated here.

What happens if task is operating on 2 data sources ?

So for the above reasons I think we should think of another way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a task operate on two datasources?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok i understand that it can now after chatting with some other people. Would it be clearer if i said TargetDataSource?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like parseExceptionMetadata ?

0
0,
null
);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing task metadata

Copy link
Contributor

@suneet-s suneet-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level design feedback:

  • I think a new ErrorLogger and set of associated classes should be added. Similar to RequestLogger, they can be backed by different implementations like logging to a file or emitting somewhere.
  • ParseExceptionHandler should accept an ErrorLogger in it's constructor. Any class that instantiates a ParseExceptionHandler should be able to get ErrorLogger from the injector
  • Update ParseExceptionHandler#handle to accept a Map<String, Object> metadata in addition to the parseException. This makes it clear that the caller is responsible for passing additional metadata about the exception. If the metadata is consistent, maybe the method should accept a POJO instead of a map.

If this approach works, it would probably be best to split this PR into 2 - one where you introduce the ErrorLogger and another where it is wired up to the ParseExceptionHandler

Copy link
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with suneet on the 1) and 2) points. That would be much cleaner. Regarding the third point since we will not have common metadata like taskId where ever the handle method is called, it may be better to just wire the common dimensions it in the constructor.

@@ -0,0 +1,38 @@
package org.apache.druid.indexing.common;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like parseExceptionMetadata ?

…vent

was overeager with module as i followed RequestLogger and QueryableModule for inspiration.
this should be scoped down more

also cleaned up checkstyle issues
@TSFenwick
Copy link
Contributor Author

@suneet-s i'll go down the path of creating error logger, but i think the name will need to be changed.

@github-actions
Copy link

github-actions bot commented Feb 1, 2024

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

@github-actions github-actions bot added the stale label Feb 1, 2024
@github-actions
Copy link

github-actions bot commented Mar 1, 2024

This pull request/issue has been closed due to lack of activity. If you think that
is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Metrics/Event Emitting Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Area - Operations Ease of Use stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants