Enable aborting an Operation result, fix dead lock #173

niklas88 · 2019-01-10T11:53:55Z

When an exception is thrown during Operation::computeResult() the
computing thread would leave the ResultTable in its unfinished state in
the subtree cache. Another thread waiting on it to be finished would
wait indefinitely.

We fix this by introducing the concept of aborting an Operation result.
When an exception is thrown in Operation::computeResult() we call the
newly added ResultTable::abort() to mark the result as aborted unlocking
waiting threads. Then we propagate the exception up the stack of
recursive Operation::getResult() calls.

To improve error output we propagate the exception with the original
details but a new QUERY_ABORTED error code which allows us to only print
the inner most Operation which caused the original error while all
Operations up the stack are silently aborted.

The aborted Operation result is left in the cache intentionally. It is
cleared during ResultTable::abort() and thus takes little memory and
will be pruned from the cache like any other result until which time it
may serve as a zero cost warning that a particular subtree fails.

When an exception is thrown during Operation::computeResult() the computing thread would leave the ResultTable in its unfinished state in the subtree cache. Another thread waiting on it to be finished would wait indefinitely. We fix this by introducing the concept of aborting an Operation result. When an exception is thrown in Operation::computeResult() we call the newly added ResultTable::abort() to mark the result as aborted unlocking waiting threads. Then we propagate the exception up the stack of recursive Operation::getResult() calls. To improve error output we propagate the exception with the original details but a new QUERY_ABORTED error code which allows us to only print the inner most Operation which caused the original error while all Operations up the stack are silently aborted. The aborted Operation result is left in the cache intentionally. It is cleared during ResultTable::abort() and thus takes little memory and will be pruned from the cache like any other result until which time it may serve as a zero cost warning that a particular subtree fails.

floriankramer

Looks good overall, but I think we should catch all exceptions in the try block for computeResult to ensure that no unexpected exception type can introduce deadlocks.

floriankramer · 2019-01-15T11:20:50Z

src/engine/Operation.h

-      computeResult(newResult.get());
+      try {
+        computeResult(newResult.get());
+      } catch (const ad_semsearch::Exception& e) {


What about non ad_semsearch exceptions (e.g. a standard library call that fails, or std::bad_alloc)? A second catch block that uses catch (...) to catch everything and then throw; to rethrow whatever it caught would help prevent deadlocks when an unexpected exception is thrown.

floriankramer · 2019-01-15T11:25:39Z

src/util/Exception.h

@@ -23,8 +23,9 @@ using std::string;
                                  __PRETTY_FUNCTION__);              \
  }  // NOLINT
 // Rethrow an exception
-#define AD_RETHROW(e) \
-  throw semsearch::Exception(e.getErrorCode(), e.getErrorDetails())  // NOLINT
+#define AD_RETHROW(e)                             \


Is there any point to an AD_RETRHOW given that throw; will rethrow the current exception?

You're right it doesn't have a point and in fact it isn't even used

joka921

Requested type changes as discussed in person

joka921 · 2019-01-15T10:58:14Z

src/engine/ResultTable.h

@@ -20,7 +20,7 @@ using std::vector;

 class ResultTable {
 public:
-  enum Status { FINISHED = 0, OTHER = 1 };
+  enum Status { IN_PROGESS = 0, FINISHED = 1, ABORTED = 2 };


I think currently ResultTable::awaitFinished never returns in the aborted case.
I think the lambda in the condition_variable wait should check for any status != IN_PROGRESS. Then the outer calls can decide what to do on Aborted cases or possibly timeouts.

Argh you're absolutely right, totally missed that.

joka921 · 2019-01-15T11:02:55Z

src/engine/Operation.h

-      computeResult(newResult.get());
+      try {
+        computeResult(newResult.get());
+      } catch (const ad_semsearch::Exception& e) {


ad:semsearch::Exception should inherit from std::exception and implement what().
ad_semsearch::AbortException should be a separate type.
Then you can write one catch block for AbortException (simply rethrow) and one for all other exceptions (print the what() message and throw and AbortException. In particular we also want to handle std::bad_alloc etc.

Also make ad_semsearch::Exception inherit from std::exception and use a separate ad_semsearch::AbortException to propagae excetpions during query abortion

niklas88 · 2019-01-16T14:06:16Z

@joka921, @floriankramer I've addressed your review comments. The change became a bit more involved than the last version but we should also now be catching stuff like std::bad_alloc correctly.

floriankramer

Apart from a non critical catch all block this looks good to me.

floriankramer · 2019-01-16T15:33:32Z

src/engine/Operation.h

        newResult->abort();
        // Rethrow as QUERY_ABORTED allowing us to print the Operation
        // only at innermost failure of a recursive call
-        throw ad_semsearch::Exception(ad_semsearch::Exception::QUERY_ABORTED,
-                                      e.getErrorDetails());
+        throw ad_semsearch::AbortException(e);
      }


I still think a catch all block in the form of catch(...) { would be a good idea here, as we might otherwise run into deadlocks when weird exceptions are being thrown (as pretty much anything could be used as an exception object, and exceptions do not have to inherit from std::exception). With our current code all exceptions should inherit from std::exception, so this is not a critical problem right now, but might help with otherwise unexpected behavior in the future.

Hmm, then I wouldn't have an object to pass to the AbortException constructor so I would need that to be an extra block. Let's see how that looks and then we can decide if we rather want to crash on these weird exceptions

According to cppreference all standard library exceptions inherit from std::exception so I'm not sure if this warrants uglier code. If there's an exception that isn't from the standard library or ad_semsearch::Exception something would be pretty fishy

I am fine with both solutions. We generally do not use third-party libraries in this part, so the catch(...) is not necessary for me. But it does not bother me too much, since it is very explicit and readable what happens here.

This would catch even those exceptions that don't inherit from std::exception. Since those are pretty weird, I'm not 100% sure if this is much better than just crashing

joka921

There is one point I am not sure about (aborting queries recursively). The rest is just cosmetic.

joka921 · 2019-01-18T12:36:20Z

src/ServerMain.cpp

-    LOG(ERROR) << e.getFullErrorMessage() << '\n';
-    return 1;
+  } catch (const std::exception& e) {
+    LOG(ERROR) << e.what() << std::endl;
  }
  return 0;


Is there any way that we legally reach this return 0 statement? We could remove it to suggest that it is never reached. In addition provably server.run() could get a [[noreturn]] attribute if it is indeed an infinite loop that can be only left by throwing. (Unrelated to your changes, I just see this now.

I tried marking it [[noreturn]] but the compiler thinks it does return because it uses thread::join() and the compiler can't see that the threads will only terminate with exceptions.

joka921 · 2019-01-18T12:39:24Z

src/engine/Operation.h

+        computeResult(newResult.get());
+      } catch (const ad_semsearch::AbortException& e) {
+        // AbortExceptions have already been printed simply rethrow to
+        // unwind the callstack until the whole query is aborted


Don't we have to recursively set the newResult->abort() also in this recursive call?

Yes we absolutely do need to abort() here as well, missed that with the last change.

joka921 · 2019-01-18T12:43:11Z

src/engine/Operation.h

        newResult->abort();
        // Rethrow as QUERY_ABORTED allowing us to print the Operation
        // only at innermost failure of a recursive call
-        throw ad_semsearch::Exception(ad_semsearch::Exception::QUERY_ABORTED,
-                                      e.getErrorDetails());
+        throw ad_semsearch::AbortException(e);
      }


I am fine with both solutions. We generally do not use third-party libraries in this part, so the catch(...) is not necessary for me. But it does not bother me too much, since it is very explicit and readable what happens here.

joka921 · 2019-01-18T12:56:27Z

src/util/Exception.h


  //! Get error message pertaining to code
-  string getErrorMessage() const { return errorCodeAsString(_errorCode); }
+  string getErrorMessage() const noexcept {
+    return errorCodeAsString(_errorCode);


In theory this might throw (involves a std::string constructor). But this still might be intended, because in those rare cases we probably cannot save our software at all anymore.

Yes I think this would be a very weird error for which crashing might be the best thing to do.

niklas88 · 2019-01-21T16:03:55Z

@joka921 I've pushed a new version

joka921

LGTM, Fell free to merge (possibly rebase with my PR that you merged earlier today, if they share changed files.

niklas88 requested review from joka921 and floriankramer January 10, 2019 11:53

floriankramer suggested changes Jan 15, 2019

View reviewed changes

joka921 requested changes Jan 15, 2019

View reviewed changes

niklas88 added 2 commits January 15, 2019 22:24

Adapt awaitFinished() condition for abort

0fcf214

Handle std::exceptions, not just ad_semsearch::Ex…

c5c9b2b

Also make ad_semsearch::Exception inherit from std::exception and use a separate ad_semsearch::AbortException to propagae excetpions during query abortion

floriankramer approved these changes Jan 16, 2019

View reviewed changes

Add catch(..) all block for computeResult()

24506b3

This would catch even those exceptions that don't inherit from std::exception. Since those are pretty weird, I'm not 100% sure if this is much better than just crashing

joka921 requested changes Jan 18, 2019

View reviewed changes

Review comments, missing newResult->abort()

9d3f1d1

joka921 approved these changes Jan 21, 2019

View reviewed changes

niklas88 merged commit 5e598f6 into ad-freiburg:master Jan 22, 2019

niklas88 mentioned this pull request Jan 22, 2019

Fix use of ResultTable::isFinished() fixes build #177

Merged

niklas88 deleted the fix_compute_result_exceptions branch July 19, 2019 09:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable aborting an Operation result, fix dead lock #173

Enable aborting an Operation result, fix dead lock #173

niklas88 commented Jan 10, 2019

floriankramer left a comment

floriankramer Jan 15, 2019

floriankramer Jan 15, 2019

niklas88 Jan 15, 2019

joka921 left a comment

joka921 Jan 15, 2019

niklas88 Jan 15, 2019

joka921 Jan 15, 2019

niklas88 commented Jan 16, 2019

floriankramer left a comment

floriankramer Jan 16, 2019

niklas88 Jan 16, 2019 •

edited

niklas88 Jan 16, 2019

joka921 Jan 18, 2019

joka921 left a comment

joka921 Jan 18, 2019

niklas88 Jan 21, 2019

joka921 Jan 18, 2019

niklas88 Jan 21, 2019

joka921 Jan 18, 2019

joka921 Jan 18, 2019

niklas88 Jan 21, 2019

niklas88 commented Jan 21, 2019

joka921 left a comment

Enable aborting an Operation result, fix dead lock #173

Enable aborting an Operation result, fix dead lock #173

Conversation

niklas88 commented Jan 10, 2019

floriankramer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joka921 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

niklas88 commented Jan 16, 2019

floriankramer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

niklas88 Jan 16, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joka921 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

niklas88 commented Jan 21, 2019

joka921 left a comment

Choose a reason for hiding this comment

niklas88 Jan 16, 2019 •

edited