Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable aborting an Operation result, fix dead lock #173

Merged
merged 5 commits into from Jan 22, 2019
Merged

Enable aborting an Operation result, fix dead lock #173

merged 5 commits into from Jan 22, 2019

Conversation

niklas88
Copy link
Member

When an exception is thrown during Operation::computeResult() the
computing thread would leave the ResultTable in its unfinished state in
the subtree cache. Another thread waiting on it to be finished would
wait indefinitely.

We fix this by introducing the concept of aborting an Operation result.
When an exception is thrown in Operation::computeResult() we call the
newly added ResultTable::abort() to mark the result as aborted unlocking
waiting threads. Then we propagate the exception up the stack of
recursive Operation::getResult() calls.

To improve error output we propagate the exception with the original
details but a new QUERY_ABORTED error code which allows us to only print
the inner most Operation which caused the original error while all
Operations up the stack are silently aborted.

The aborted Operation result is left in the cache intentionally. It is
cleared during ResultTable::abort() and thus takes little memory and
will be pruned from the cache like any other result until which time it
may serve as a zero cost warning that a particular subtree fails.

When an exception is thrown during Operation::computeResult() the
computing thread would leave the ResultTable in its unfinished state in
the subtree cache. Another thread waiting on it to be finished would
wait indefinitely.

We fix this by introducing the concept of aborting an Operation result.
When an exception is thrown in Operation::computeResult() we call the
newly added ResultTable::abort() to mark the result as aborted unlocking
waiting threads. Then we propagate the exception up the stack of
recursive Operation::getResult() calls.

To improve error output we propagate the exception with the original
details but a new QUERY_ABORTED error code which allows us to only print
the inner most Operation which caused the original error while all
Operations up the stack are silently aborted.

The aborted Operation result is left in the cache intentionally. It is
cleared during ResultTable::abort() and thus takes little memory and
will be pruned from the cache like any other result until which time it
may serve as a zero cost warning that a particular subtree fails.
Copy link
Member

@floriankramer floriankramer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, but I think we should catch all exceptions in the try block for computeResult to ensure that no unexpected exception type can introduce deadlocks.

computeResult(newResult.get());
try {
computeResult(newResult.get());
} catch (const ad_semsearch::Exception& e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about non ad_semsearch exceptions (e.g. a standard library call that fails, or std::bad_alloc)? A second catch block that uses catch (...) to catch everything and then throw; to rethrow whatever it caught would help prevent deadlocks when an unexpected exception is thrown.

@@ -23,8 +23,9 @@ using std::string;
__PRETTY_FUNCTION__); \
} // NOLINT
// Rethrow an exception
#define AD_RETHROW(e) \
throw semsearch::Exception(e.getErrorCode(), e.getErrorDetails()) // NOLINT
#define AD_RETHROW(e) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any point to an AD_RETRHOW given that throw; will rethrow the current exception?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right it doesn't have a point and in fact it isn't even used

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested type changes as discussed in person

@@ -20,7 +20,7 @@ using std::vector;

class ResultTable {
public:
enum Status { FINISHED = 0, OTHER = 1 };
enum Status { IN_PROGESS = 0, FINISHED = 1, ABORTED = 2 };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think currently ResultTable::awaitFinished never returns in the aborted case.
I think the lambda in the condition_variable wait should check for any status != IN_PROGRESS. Then the outer calls can decide what to do on Aborted cases or possibly timeouts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh you're absolutely right, totally missed that.

computeResult(newResult.get());
try {
computeResult(newResult.get());
} catch (const ad_semsearch::Exception& e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ad:semsearch::Exception should inherit from std::exception and implement what().
ad_semsearch::AbortException should be a separate type.
Then you can write one catch block for AbortException (simply rethrow) and one for all other exceptions (print the what() message and throw and AbortException. In particular we also want to handle std::bad_alloc etc.

Also make ad_semsearch::Exception inherit from std::exception and use a
separate ad_semsearch::AbortException to propagae excetpions during
query abortion
@niklas88
Copy link
Member Author

@joka921, @floriankramer I've addressed your review comments. The change became a bit more involved than the last version but we should also now be catching stuff like std::bad_alloc correctly.

Copy link
Member

@floriankramer floriankramer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from a non critical catch all block this looks good to me.

newResult->abort();
// Rethrow as QUERY_ABORTED allowing us to print the Operation
// only at innermost failure of a recursive call
throw ad_semsearch::Exception(ad_semsearch::Exception::QUERY_ABORTED,
e.getErrorDetails());
throw ad_semsearch::AbortException(e);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think a catch all block in the form of catch(...) { would be a good idea here, as we might otherwise run into deadlocks when weird exceptions are being thrown (as pretty much anything could be used as an exception object, and exceptions do not have to inherit from std::exception). With our current code all exceptions should inherit from std::exception, so this is not a critical problem right now, but might help with otherwise unexpected behavior in the future.

Copy link
Member Author

@niklas88 niklas88 Jan 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, then I wouldn't have an object to pass to the AbortException constructor so I would need that to be an extra block. Let's see how that looks and then we can decide if we rather want to crash on these weird exceptions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to cppreference all standard library exceptions inherit from std::exception so I'm not sure if this warrants uglier code. If there's an exception that isn't from the standard library or ad_semsearch::Exception something would be pretty fishy

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with both solutions. We generally do not use third-party libraries in this part, so the catch(...) is not necessary for me. But it does not bother me too much, since it is very explicit and readable what happens here.

This would catch even those exceptions that don't inherit from
std::exception. Since those are pretty weird, I'm not 100% sure if this
is much better than just crashing
Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one point I am not sure about (aborting queries recursively). The rest is just cosmetic.

LOG(ERROR) << e.getFullErrorMessage() << '\n';
return 1;
} catch (const std::exception& e) {
LOG(ERROR) << e.what() << std::endl;
}
return 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way that we legally reach this return 0 statement? We could remove it to suggest that it is never reached. In addition provably server.run() could get a [[noreturn]] attribute if it is indeed an infinite loop that can be only left by throwing. (Unrelated to your changes, I just see this now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried marking it [[noreturn]] but the compiler thinks it does return because it uses thread::join() and the compiler can't see that the threads will only terminate with exceptions.

computeResult(newResult.get());
} catch (const ad_semsearch::AbortException& e) {
// AbortExceptions have already been printed simply rethrow to
// unwind the callstack until the whole query is aborted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have to recursively set the newResult->abort() also in this recursive call?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we absolutely do need to abort() here as well, missed that with the last change.

newResult->abort();
// Rethrow as QUERY_ABORTED allowing us to print the Operation
// only at innermost failure of a recursive call
throw ad_semsearch::Exception(ad_semsearch::Exception::QUERY_ABORTED,
e.getErrorDetails());
throw ad_semsearch::AbortException(e);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with both solutions. We generally do not use third-party libraries in this part, so the catch(...) is not necessary for me. But it does not bother me too much, since it is very explicit and readable what happens here.


//! Get error message pertaining to code
string getErrorMessage() const { return errorCodeAsString(_errorCode); }
string getErrorMessage() const noexcept {
return errorCodeAsString(_errorCode);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory this might throw (involves a std::string constructor). But this still might be intended, because in those rare cases we probably cannot save our software at all anymore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think this would be a very weird error for which crashing might be the best thing to do.

@niklas88
Copy link
Member Author

@joka921 I've pushed a new version

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Fell free to merge (possibly rebase with my PR that you merged earlier today, if they share changed files.

@niklas88 niklas88 merged commit 5e598f6 into ad-freiburg:master Jan 22, 2019
@niklas88 niklas88 deleted the fix_compute_result_exceptions branch July 19, 2019 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants