-
Notifications
You must be signed in to change notification settings - Fork 507
METRON-1218: Metron REST should return better error messages #779
Conversation
+1 this one bit me today and this fixes it; great job! |
fullMessage = cause.getMessage(); | ||
cause = cause.getCause(); | ||
} | ||
return fullMessage; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also get the stacktrace? Maybe a call to ExceptionUtils.getStackTrace(ex)
appended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe be a security risk to show stack in ui or rest errors? Could we control it by profile or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a good alternative would be to show the error to the endpoint (the UI) and at least log the exception in the metron-rest.log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ottobackwards the stack trace is not shown in the UI, only the root cause. If a user is not authenticated, they wouldn't be see this error anyways, they would get a 401.
@simonellistonball are you suggesting we move the logging call to the exception handler instead of the offending class? In this case it's very likely an error would get logged twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cestella I initially tried to include the whole stack trace in the response but I couldn't find a good way to format it. Multi-line statements just don't work well in JSON so I settled on the root cause because it's usually the most important piece of information anyways. The whole stack trace is logged, just not returned in the JSON response. Any ideas on how to do this better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, in the situation that I was in last night while debugging, I was using this patch and the exception was not logged in metron-rest. I was hunting the same bug as Simon. I ended up adjusting this to include the stack trace and that's how I eventually figured it out. I'm not entirely certain WHY the error logging wasn't showing up.
Since this has happened to 2 of us now, could you validate that when you try to use the alerts-ui and use a sensor that does not have the alert
nested template (any new sensor), you get an exception in the log? It should be something like:
Caused by: org.apache.metron.indexing.dao.search.InvalidSearchException: Could not execute search
at org.apache.metron.elasticsearch.dao.ElasticsearchDao.search(ElasticsearchDao.java:170)
at org.apache.metron.elasticsearch.dao.ElasticsearchMetaAlertDao.search(ElasticsearchMetaAlertDao.java:190)
at org.apache.metron.rest.service.impl.SearchServiceImpl.search(SearchServiceImpl.java:53)
... 88 more
Caused by: Failed to execute phase [query_fetch], all shards failed; shardFailures {[S6cJSn5nSmOpd7Lm2_OF9g][cowrie_index_2017.09.29.05][0]: RemoteTransportException[[node1][192.168.66.121:9300][indices:data/read/search[phase/query+fetch]]]; nested: SearchParseException[failed to parse search source [{\"from\":0,\"size\":25,\"query\":{\"constant_score\":{\"filter\":{\"bool\":{\"should\":[{\"query_string\":{\"query\":\"*\"}},{\"bool\":{\"must\":[{\"term\":{\"status\":\"active\"}},{\"nested\":{\"query\":{\"query_string\":{\"query\":\"*\"}},\"path\":\"alert\"}}]}}]}}}},\"_source\":{\"includes\":[],\"excludes\":[]},\"sort\":[{\"timestamp\":{\"order\":\"desc\"}}],\"track_scores\":true}]]; nested: QueryParsingException[[nested] failed to find nested object under path [alert]]; }
at org.elasticsearch.action.search.AbstractSearchAsyncAction.onFirstPhaseResult(AbstractSearchAsyncAction.java:206)
at org.elasticsearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:152)
at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:46)
at org.elasticsearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:855)
at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:833)
at org.elasticsearch.transport.TransportService$4.onFailure(TransportService.java:387)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:39)
If you can get it to work, then I'm +1. I absolutely couldn't get it to work with this PR, but I might've been doing something weird and I worry there's something weird with the logging level (which I think is at INFO) on metron-rest by default.
I still, personally, would rather see a stack trace in the error in the UI (since I'd prefer to get a stack trace from users for diagnosing issues and that's where they'll be looking). I take @ottobackwards 's point about security, but I still think it's valid and valuable. For the moment, I'd just append the multiline string and we can sort it out when debugging. Mind you, this would be in addition to in the rest logs.
I guess the net-net of it is that I want it logged and validated at LEAST one place and I'd be ok if it were 2 (one to the user and one to the system logs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, yes, I realize that the stack trace shown is right in the path of your LOG.error
call, but I absolutely did not see anything in the log. I just want to make certain I'm doing something stupid and this case is covered by having you validate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log level is indeed INFO and it should not be. That regression was introduced when we added ${HBASE_HOME}/conf to the REST classpath because that directory contains a log4j.properties file with logging set to INFO. I'm planning on fixing that in a separate PR. I'm not sure that's the cause here though because I'm able to produce an error in the logs with the same type of error (InvalidSearchException).
Are you sure you have the code in this PR deployed properly? Your stack trace says the exception was thrown at line 170 but in this PR the exception is thrown at line 176.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with having it the UI, but it must be configurable or governed by the production profile.
You can have the customer switch and reproduce to get the screen shot ( although, when would you NOT want the whole log in practice? ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it’s could be a bad merge. If you saw that in the log, then I’m ok with this. My +1 stands. Great job.
sorry, one small nit. |
|
||
private String getFullMessage(Throwable ex) { | ||
String fullMessage = ex.getMessage(); | ||
Throwable cause = ex.getCause(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a possible recursion point, where ex.getCause() == ex .
Can we use the commons ExceptionUtils.getRootCause()? If not we need to guard against this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think this method is just ExceptionUtils.getRootCause.getMessage()
, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. The get cause recursion is an oldie but a goodie.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a blocker imo.
The latest commit replaces the getFullMessage call with ExceptionUtils.getStackTrace. Hopefully that addresses the recursion issue. As for security @ottobackwards, I'm not sure what the requirements are. |
@@ -35,7 +36,7 @@ | |||
@ResponseBody | |||
ResponseEntity<?> handleControllerException(HttpServletRequest request, Throwable ex) { | |||
HttpStatus status = getStatus(request); | |||
return new ResponseEntity<>(new RestError(status.value(), ex.getMessage(), getFullMessage(ex)), status); | |||
return new ResponseEntity<>(new RestError(status.value(), ex.getMessage(), ExceptionUtils.getStackTrace(ex)), status); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change to stack trace from message?
ExceptionUtils.getRootCauseMessage() ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because @cestella requested it earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, but I thought you wanted to show the root cause to the user, and have the stack trace that casey wanted. Not just the stack trace. What is the user going to see now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer just root cause but I had the impression the full stack trace was wanted. Doesn't make that much difference to me so we just need to agree on what it should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the stack trace is generally more helpful in debugging than just the root cause, so yeah, I kinda wanted the stack trace along with the root cause. I'm willing to be argued with, but as a dev, I'd like to know line numbers of failures along with why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not in logs that is the problem, it is if it is in the UI. I think that it would apply to rest too... but I am not sure on that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine; I'm with it :) Thanks for the clarity, @ottobackwards
The most recent commit changes fullMessage to use ExceptionUtils.getRootCauseMessage. |
+1 |
+1, great job |
Contributor Comments
This PR improves the error handling in the REST app and adds logging to failed searches in the ElasticsearchDao. This can be tested in full dev by executing a search against REST with a value for "from" that is larger than the index. For example:
Before it would just return:
but after this change it should instead include the root cause:
There should also be the full stacktrace in /var/log/metron/metron-rest.log.
This change should improve all error messages that are returned from REST and are not limited to this case. I also added a unit test for the REST exception handler class.
Pull Request Checklist
Thank you for submitting a contribution to Apache Metron.
Please refer to our Development Guidelines for the complete guide to follow for contributions.
Please refer also to our Build Verification Guidelines for complete smoke testing guides.
In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following:
For all changes:
For code changes:
Have you included steps to reproduce the behavior or problem that is being changed or addressed?
Have you included steps or a guide to how the change may be verified and tested manually?
Have you ensured that the full suite of tests and checks have been executed in the root metron folder via:
Have you written or updated unit tests and or integration tests to verify your changes?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?
For documentation related changes:
Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via
site-book/target/site/index.html
:Note:
Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
It is also recommended that travis-ci is set up for your personal repository such that your branches are built there before submitting a pull request.