Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rest exception cause #8359

Closed

Conversation

tlloydthwaites
Copy link

References

Description

  • If X-DSPACE-REST-EXCEPTION-CAUSE header is present, append the root exception cause to the error message.

Instructions for Reviewers

  • Will have no effect unless the header is present in the request
  • If header is present and matches String "true", the root cause of the exception will be added to the message.
  • Exact message format is controlled with config property 'rest.exceptionCause.format', defaults to '%s [CAUSE: %s]'
  • Example: An exception has occurred [CAUSE: bad_dublin_core schema=a.fake.key. Metadata field does not exist!]

Checklist

This checklist provides a reminder of what we are going to look for when reviewing your PR. You need not complete this checklist prior to creating your PR (draft PRs are always welcome). If you are unsure about an item in the checklist, don't hesitate to ask. We're here to help!

  • My PR is small in size (e.g. less than 1,000 lines of code, not including comments & integration tests). Exceptions may be made if previously agreed upon.
  • My PR passes Checkstyle validation based on the Code Style Guide.
  • My PR includes Javadoc for all new (or modified) public methods and classes. It also includes Javadoc for large or complex private methods.
  • My PR passes all tests and includes new/updated Unit or Integration Tests based on the Code Testing Guide.
  • If my PR includes new, third-party dependencies (in any pom.xml), I've made sure their licenses align with the DSpace BSD License based on the Licensing of Contributions documentation.
  • If my PR modifies the REST API, I've linked to the REST Contract page (or open PR) related to this change.

@tlloydthwaites
Copy link
Author

sorry about the .vscode stuff! :)

@lgtm-com
Copy link

lgtm-com bot commented Jun 20, 2022

This pull request introduces 1 alert when merging b285507 into 3fa31f5 - view on LGTM.com

new alerts:

  • 1 for Information exposure through a stack trace

@lgtm-com
Copy link

lgtm-com bot commented Jun 21, 2022

This pull request introduces 2 alerts when merging 80e049f into 3fa31f5 - view on LGTM.com

new alerts:

  • 1 for Information exposure through a stack trace
  • 1 for Dereferenced variable may be null

@tdonohue
Copy link
Member

@tlloydthwaites : This PR is unlikely to be accepted as it opens DSpace up to "information exposure" security issues. See this "Information exposure through a stack trace" note from LGTM (on this PR): https://lgtm.com/rules/6780073/

Essentially, error stack traces / caused by clauses can provide information about the underlying system's behavior to attackers, allowing them to gather information on which attacks may be possible on your system.

Therefore, DSpace purposefully only provides generic error messages in responses, and never any information from the Java stacktrace itself. The full error stacktrace can always be found in the DSpace logs (dspace.log), and there's a configuration on the backend that allows you to optionally include the full stacktrace in REST responses (server.error.include-stacktrace = always) for debugging (or development) purposes.

If I'm misunderstanding this PR, let me know. But, I'd lean towards not merging this for the reasons given.

@tlloydthwaites
Copy link
Author

Thanks Tim,

Understand about information leakage, and that this is a bit of a desperate measure! At the moment however, because our dspace is hosted elsewhere, we are stuck.

The issue is that we don't have access to the logs, and our provider will not enable debug mode for performance reasons. So as REST consumers we have no way of knowing what is happening when we get an error. For example, sometimes file uploads work, sometimes they don't. We need some way to diagnose this without enabling debug/trace logging. Even a logsnap solution wouldn't help as the error log also does not really show the root problem, just general advice that an error occurred.

Another example: one time we updated our custom front-end tool and found that nothing was working - it took us a long time to determine that our metadata registry was missing a key. A cause message in the API response would have let us diagnose this straight away. (We would not display that cause to users, but store it in our own logs.)

Could it be managed with permissions? For example, only allow cause messages if the API consumer is logged in, and possibly check for a specific permission? (Either a user policy or a simple list of user ids in config). That way unauthenticated consumers will never see cause messages. I will mock that up and update the PR - just as a means of discussion.

Appreciate you getting back to me so quickly Tim, thank you!

Add configurable authorisation of cause reporting
@tlloydthwaites
Copy link
Author

Updated with some ideas on how to lock it down. A user must explicitly be allowed to receive cause messages, so this will have no effect unless explicitly enabled. Presumably, if a user is allowed to see the messages, then they are a known to the repo admins and it would be ok for them to see any potentially leaked information.

@lgtm-com
Copy link

lgtm-com bot commented Jun 22, 2022

This pull request introduces 1 alert when merging 60a6642 into 3fa31f5 - view on LGTM.com

new alerts:

  • 1 for Information exposure through a stack trace

@mwoodiupui
Copy link
Member

Could you, at least temporarily, add a log appender that sends log messages to a place from which you can read them? Those logs belong to you, are part of your critical operational data, and should be readily available to you.

I'm inclined to say that this patch addresses a local administrative issue and should not be in the stock code. You can of course run with the patch in your own shop.

I really hate to discourage sharing. Thank you for this patch.

@github-actions github-actions bot added the merge conflict PR has a merge conflict that needs resolution label Jun 22, 2022
@tlloydthwaites
Copy link
Author

tlloydthwaites commented Jun 22, 2022

Thanks for the input Mark. The main motivation is that we are currently getting ready for the 5-yearly government reporting in Aus, so the system is getting fairly heavy use, and I am having difficulty supporting people when there's an error.

Even if we could enable debug tracing in the logs, we would have to introduce downtime while restarting the server before trying to reproduce the error. With over 300 researchers using our front-end tool day to day, we would have to basically have to leave debug tracing on all the time, which as we learned is not very performant!

I understand the concerns regarding potential information leakage. By having to explicitly enable this feature for individual credentials, I feel this protects against potentially sensitive error messages getting into the wrong hands, as you would only enable it for an API consumer you were writing yourself. It would be a fairly esoteric setting, and with the required header I can't see anyone enabling it accidentally.

The only other solution I can think of would involve explicitly sending safe messages (ie wrap in SafeException or similar), which would obviously be a fairly onerous task.

@tdonohue
Copy link
Member

tdonohue commented Jun 22, 2022

@tlloydthwaites : I'm having a hard time understanding the use case here (and whether it would apply to others in some manner). Currently, we definitely assume that if you are developing a frontend for the DSpace backend then you must have some access to the logs of the backend. There are plenty of log sharing tools out there that can provide such access, even if your team cannot be given commandline access to the backend REST API (or you cannot enable the debug tracing).

Your configuration rest.causeAdvice.allowedUsers seems like it really wouldn't scale to more than a few users (I cannot imagine any site wanting to add 100s of users to this list), which begs the question why those few users cannot just be given log access.

Keep in mind, I'm not against a solution here that allows for some to see the errors via the REST API... but it'd have to be secure. Your solution seems secure, but it really doesn't scale, which begs the question of whether DSpace should even offer this or simply point users to log analysis/sharing tools instead (which would provide you with more controls over logs and be non-DSpace specific).

The only other option I can think of would be to allow authenticated Administrative Accounts to retrieve caused by errors from the REST API (perhaps only when a configuration flag is enabled). That's potentially more scalable, but again I'm not entirely convinced yet that this is a common need for other institutions.

@tlloydthwaites
Copy link
Author

tlloydthwaites commented Jun 23, 2022

Well it seems that I am a goose. I thought that server.error.include-stacktrace only affected what was sent to logs - I didn't twig that is also sent it to REST responses, and would not also be logged unless in debug mode.

Thanks for your patience everyone and sorry for wasting your time! :)

EDIT: I'm guessing this is not something we should enable in production though... so we are back to perhaps enabling it only for authorised/admin users.

@tdonohue
Copy link
Member

Flagging as "needs discussion". I'm still not entirely sure I understand the use cases here beyond being helpful for development. While we could give Administrators this capability, there are no tools in the UI to allow them to easily view the exception cause -- so it'd only make sense for Administrators who are also very technical (or developers themselves).

The core question is whether DSpace should solve this problem, or whether this is a use case that is very specific to a single institution. If others are interested in this feature, please speak up or provide additional use cases.

@tdonohue tdonohue added needs discussion Ticket or PR needs discussion before it can be moved forward. error handling How errors are handled and/or logged interface: REST API v7+ REST API for v7 and later (dspace-server-webapp module) labels Jun 27, 2022
@tlloydthwaites
Copy link
Author

The main case for this is for REST consumers other than Angular UI. We use it as part of a microservice orchestration piece, eg to display items on a researcher's website page, and to allow researchers to add/edit their items from a web front end.

In the latter case, when the API returns errors, we have no way to help. Rebooting the server to enable debug mode and look through logs is infeasible for us. As noted, even if we had access to server logs, they don't print cause messages either (they probably should - we don't need a stack trace, just the message), and our provider is very reluctant to enable debug logging for performance reasons.

So this is probably a fairly unusual scenario where we can't put the server in debug mode, but to me it seems a bit drastic having to do that to get an error message. If the REST API is mainly for serving Angular, then I can see that this change is not that useful.

@tdonohue
Copy link
Member

tdonohue commented Jun 29, 2022

@tlloydthwaites : It sounds like you've jumbled a few issues in that response... and I'd like to tease them apart, as one may be a bug while the other needs more discussion.

  1. Logs don't detail cause of exception by default -> This would be considered a BUG and it should be fixed. I was not aware of that, as I thought the logs always include the full exception stacktrace whenever the system returns a 5xx error. That said, I believe that logs don't include that stacktrace for some 4xx errors (like 404) as doing so would be very noisy in the logs (i.e. anytime a user hits an invalid URL, then a 404 would log a detailed stacktrace).
  2. Returning exception causes or stacktraces via the REST API -> This is the issue that needs more discussion, as it can be a security issue (as noted above).
    1. I'd be OK with returning more specific errors in specific scenarios. That seems like a completely reasonable request & something our REST API may not be doing well enough. So if there are specific scenarios where you are not getting enough information, maybe we should look more closely at those & see if we can provide a better error message back via the REST API. This would also benefit general users as they'd see a more specific error.
      • Sidenote: It's worth noting that, in some scenarios, the return code has a documented meaning in our Contract (so if you are aware of the documented meaning, then the return code may be all you need). So, as an example, if you attempt to create a Collection and get a 422 response, the meaning is always that the Parent Community doesn't exist: https://github.com/DSpace/RestContract/blob/main/collections.md#creating-a-collection (These are things the Angular UI just knows about, but another application may not)
    2. However, I'm objecting to the idea of blindly returning Java-level error messages or stacktraces for all exceptions in DSpace. This is where the concerns around "information disclosure" arise: https://lgtm.com/rules/6780073/
    3. As we discussed, one "workaround" is to perhaps let Administrators see these details Java-level errors at all times... but then Administrator accounts should be very tightly controlled. For example, if an external application was setup to always login as an Administrator (hopefully no one would do that though), then it would potentially result in the same "information disclosure" issues with DSpace. Therefore, this would need more careful thought to ensure we can implement & document it in a safe manner. This approach wouldn't help normal users though (like in the "i" bullet above), only Admins

If we can tease these apart, I think there are some areas of commonality here & areas we could quickly fix. I'm just not willing to jump straight to "blindly returning Java-level error messages or stacktraces" (even though I do realize that's the "quickest fix" for you, it's concerning behavior to anyone who is security minded).

Perhaps we should create a few different tickets here to tease apart the different issues you are seeing?

@tlloydthwaites
Copy link
Author

Thanks for your thoughts @tdonohue - I appreciate the consideration for the API in general. I agree, it is definitely a quick-fix and should probably not be a PR in its current form - more to stimulate discussion. We can apply this to a custom build but I would prefer to keep abreast with the main repo.

The general issue is that we need at least some feedback for errors to be sent in the REST response. In our case it doesn't matter if it's 4xx or 5xx. This is not so much for debugging dspace itself, but 'why did my item not submit'. I wouldn't display these messages to users, but some feedback is crucial for us to help diagnose issues such as why some attachments upload and some don't.

We don't technically need a cause message (which may be quite specific, such as JDBC problems), and we would never need a stack trace. A few examples that we have tracked down are that we have forgotten to migrate a meta key, or have sent an invalid owning collection for new items. The cause messages for those exceptions are very useful! However, other times we have some files that won't upload, and it is difficult to see why. Perhaps a specific exception type could be thrown that indicates it is intended/allowed to be reported. However, that would require a fair bit of work, which is why we reached for a quick-fix.

Regarding the logs, when I see errors in my local dev build, it is usually a WARN message with a 4xx HTTP code and 'An exception has occurred.' 5xx will include the stack trace, but it is the 4xx that are a mystery.

@tlloydthwaites
Copy link
Author

To answer your points:

  1. Agreed - logs don't need stack traces for 404s. However, just the cause message would probably be ok for all 4xx exceptions? It wouldn't fill the log and would be very useful. (Eg "An exception has occurred: ")

  2. i) This is really the solution. Specific messages for known problems should be fed back to the REST response where possible. The problem is, we don't know what specific situations will throw an exception. The message we get, 'An exception has occurred', is not really useful.

  3. i-sidenote: that is probably something I haven't looked into closely enough, however a message would still be nice.

  4. ii) I think the idea of returning causes arose because it was the easiest way to get some feedback coming from the API. It is probably really a symptom of point 2.i.

  5. iii) that is a major weakness of the current PR - it requires admin privileges to see error messages. I think this stems from using a cannon to kill a mosquito - sending java-level cause messages in the absence of controlled, sanitised error feedback.

@mwoodiupui
Copy link
Member

"An exception has occurred" is a useless and exasperating message. One should never see such a message. OTOH simply upchucking an exception's message or, worse, a stack trace, without interpretation, is often nearly as bad for the client.

I know that I struggle with the tendency to think of exception handling as a botheration rather than an opportunity to explain failure, but it is worth the struggle. The caller knows what it means for the callee to have thrown an exception in this context, and the developer should attempt to interpret the exception in terms of the caller's own purpose.

Now, there may be cases where LGTM is worrying about nothing. There is no point in trying to hide details of DSpace's methods, because the source is published. What we need to protect is local secrets and users' secrets. We should avoid returning stack traces to clients, not because they give away secrets, but because they are usually clutter. What the client needs to know is what it did wrong, or that the service failed for reasons that are interesting only to the service's own admin.s. "Internal error, please tell us what you did and when" is as helpful to the client as "you are not authorized for that operation", if it is an internal error (resource exhaustion, "impossible" internal state, etc).

Crafting a good error report is hard work that interrupts the flow of the developer's thoughts, but with practice it becomes an enjoyable skill as well as a valuable one.

@kshepherd
Copy link
Member

Just a note that I have also found the vague 4xx/5xx "An exception has occurred" errors frustrating! In development enabling request stack traces helps, but I think it'd be great to get the REST API sending informative, internally-consistent error messages that explain a specific problem without disclosing sensitive information (and in logs, too) even if they're not intended to be displayed to a user in the Angular UI

@tlloydthwaites
Copy link
Author

I'm closing this pull request, as even the root exception messages are not always helpful. As noted here, we are getting exceptions like this:

org.springframework.web.util.NestedServletException: Request processing failed; nested exception is javax.persistence.PersistenceException: org.hibernate.exception.SQLGrammarException: could not extract ResultSet

This is no more illuminating than 'an exception has occurred.'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
error handling How errors are handled and/or logged interface: REST API v7+ REST API for v7 and later (dspace-server-webapp module) merge conflict PR has a merge conflict that needs resolution needs discussion Ticket or PR needs discussion before it can be moved forward.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

More detail for REST exceptions
5 participants