Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First draft of a timeout for operations #346

Merged
merged 17 commits into from Apr 25, 2021
Merged

Conversation

joka921
Copy link
Member

@joka921 joka921 commented Sep 25, 2020

  • Timeout can be specified via timeout=3 (3 seconds) in the http-get request.

  • Currently only checked between operations and inside the expensive dummy join and transitive path operations

- Every Operation that is executed and that may possibly take a long time regularly
  checks whether it still has time to continue and else throw a TimeoutException.

- This commit implements such checks for almost all operations.
- For sorting, there is no efficient way to implement a timeout without changing
  the implementation of the library function. We thus estimate the execution time
  before starting the operation, and only start it if the estimate is not much higher
  than the time budget.

- The Timeout currently can be set for each query the HTTP parameter "timeout", the
  value is a floating point value that specifies the number of seconds.
@joka921 joka921 force-pushed the f.Timeout branch 2 times, most recently from 33dc9e0 to 4c1b6a4 Compare April 10, 2021 13:54
Copy link
Member Author

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to make sure, that the other operations get a timeout support at some point.

  • In the transitive path the HashMaps take quite a long time to destroy, this leads to strange timeout behavior.
  • The Pattern trick should get a timeout functionality as part of Florian's extension.
  • With parallelization (not only of pattern trick) in mind, we should research the best pattern to cancel parallel computations in OpenMP etc.

src/util/Timer.h Outdated Show resolved Hide resolved
src/util/Timer.h Outdated
/// Factory function for a timer that never times out
static TimeoutTimer unlimited() { return TimeoutTimer(UnlimitedTag{}); }
/// Factory function for a timer that times out after a number of microseconds
static TimeoutTimer usecLimited(off_t usecs) { return TimeoutTimer(usecs); }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fromMicroseconds

src/util/Timer.h Outdated Show resolved Hide resolved
src/util/Timer.h Outdated Show resolved Hide resolved
src/util/Timer.h Show resolved Hide resolved
src/engine/Operation.h Outdated Show resolved Hide resolved
src/engine/Operation.cpp Outdated Show resolved Hide resolved
src/engine/Join.cpp Outdated Show resolved Hide resolved
src/engine/Join.cpp Outdated Show resolved Hide resolved
src/engine/GroupBy.cpp Show resolved Hide resolved
Copy link
Member

@hannahbast hannahbast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial code review in 1-1 with Johannes, will do the rest later today.

Thanks a lot for this valuable PR!

src/util/Timer.h Outdated
if (_isUnlimited) {
return false;
}
auto prevRunning = isRunning();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isRunningSnapshot

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// NOTE: we cannot get the current timer value without stopping the timer.

src/util/Timer.h Outdated
}
auto prevRunning = isRunning();
stop();
auto res = usecs() > _timeLimitInMicroseconds;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasTimedOut

src/util/Timer.h Outdated

/// Did this timer already timeout
/// Can't be const because of the internals of the Timer class.
bool isTimeout() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasTimedOut

src/util/Timer.h Outdated
static_cast<double>(_timeLimitInMicroseconds) / (1000 * 1000);
throw TimeoutException(additionalMessage +
"A Timeout occured. The time limit was "s +
std::to_string(seconds) + "seconds"s);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed precision or integer milliseconds with thousand separator.

Comment on lines 111 to 112
// set a global timeout timer for all child operations.
// As soon as this runs out, the complete tree will fail.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Use the same timeout timer for all children of an operation (= query plan rooted at that operation). As soon as one child times out, the whole operation times out.

@@ -80,7 +92,14 @@ shared_ptr<const ResultTable> Operation::getResult(bool isRoot) {
// Passing the raw pointer here is ok as the result shared_ptr remains
// in scope
try {
if (_timeoutTimer->wlock()->isTimeout()) {
throw ad_utility::TimeoutException("uncaught Timeout before " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Timeout in operation without timeout functionality, before "

@@ -18,7 +18,7 @@ Join::Join(QueryExecutionContext* qec, std::shared_ptr<QueryExecutionTree> t1,
size_t t2JoinCol, bool keepJoinColumn)
: Operation(qec) {
// Make sure subtrees are ordered so that identical queries can be identified.
if (t1.get()->asString() < t2.get()->asString()) {
if (t1 && t2 && t1.get()->asString() < t2.get()->asString()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throw exception if t1 or t2 is null

if (j >= b.size()) {
// The next i might still match
break;
}
}
++i;
checkTimeoutAfterNCalls();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this one is needed, but it doesn't harm either

if (left[i][leftSideCol] == last_elem) {
// We can repeat the last output
size_t num_new = last_result_end - last_result_begin;
size_t res_row = res.size();
res.resize(res.size() + num_new);
checkTimeoutAfterNCalls(num_new);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not num_new * res_width ?

I presume that res_width is never very large. But what about num_new?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Num new can in theory be very large (but this means, that the transitive path is easy to compute and mostly only copies.

@@ -522,6 +533,7 @@ void TransitivePath::computeTransitivePathRightBound(
res(res_row + j, c) = res(last_result_begin + j, c);
}
}
checkTimeoutAfterNCalls(num_new);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

Copy link
Member

@hannahbast hannahbast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished review after 1-1 with Johannes earlier that day.

One general comment: There are various constants in the various operations concerning the maxCount that determines how often a (relatively expensive) timeout check is made. Here is a suggestion to make this a bit less ad hoc:

  1. Have a global constant NUM_OPERATIONS_BETWEEN_TIMEOUT_CHECKS with a reasonable value. No need to have this configurable from the command line for now, but it would be good to have one place in the code where this is defined.
  2. Have a constant NUM_OPERATIONS_HASHSET_LOOKUP that estimates the number of operations for a lookup in a hash set (used several times in the code in timeout critical sections)
  3. Any other such constants if needed. But maybe the two above are already enough.

Comment on lines 177 to 180

void GroupBy::processGroup(const GroupBy::Aggregate& a, size_t blockStart,
size_t blockEnd, const IdTableView<IN_WIDTH>& input,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's with the two empty lines here?

switch (a._type) {
case ParsedQuery::AggregateType::AVG: {
float res = 0;
if (inputTypes[a._inCol] == ResultTable::ResultType::VERBATIM) {
if (a._distinct) {
for (size_t i = blockStart; i <= blockEnd; i++) {
checkTimeoutAfterNCalls();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a loop body with a HashSet.find. In another operation you used a lower value for numCalls then 32000 for that. There are a few more occurrences of this below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, this would be much cleaner, I will adapt those constants.

for (size_t pos = 1; pos < input.size(); pos++) {
checkTimeoutAfterNCalls();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not call this with argument currentGroupBlock.size() here, given the loop below? Or do we usually break early from that loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, you are right.

if (a._type == ParsedQuery::AggregateType::GROUP_CONCAT) {
delete static_cast<std::string*>(a._userData);

auto cleanup = [&]() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specify variables in capture explicitly when it'a a short list and/or a short lambda?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general you are right. In this case I would argue that the default capture by reference is not problematic, because we do not pass the lambda outside, and only use it as a "copied code block", but with NEVER using default captures you agree with one of my favourite C++ authors, so I am totally convinced.

Comment on lines 802 to 806
} catch (...) {
cleanup();
throw;
}

cleanup();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you briefly explain which case is handled by this which was not properly handled before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As soon as we had a groupConcat operation (which requires manual cleanup of the _userData) and the CALL_FIXED_SIZE_2 (basically a call to doGroupBy) would throw an exception, then the _userData memory would leak. I don't know if this was previously safe, but in the meantime we at least have Timeout and OutOfMemory exceptions, which might very well occur here and can lead to the memory leak.
As stated before: This type erasure - style _userData should become a type with a proper constructor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(which I would do in a separate, small PR)

@@ -180,11 +188,20 @@ void Union::computeUnion(
// This if clause is only here to avoid creating the call to insert when
// it would not be possible to call the function due to not matching
// columns.
AD_CHECK(LEFT_WIDTH == OUT_WIDTH);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this of the kind "This should never happen?"

If yes, do we still need the if in the next line now that we have the AD_CHECK?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YES, we do, because the next line is if constexpr (the direct copy/insert operations would not compile if the columns do not match).

Comment on lines 232 to 240
// always copy chunkSize results at once and then check for a timeout
size_t numChunks = right.size() / chunkSize;
for (size_t i = 0; i < numChunks; ++i) {
res.insert(res.end(), right.begin() + i * chunkSize,
right.begin() + (i + 1) * chunkSize);
checkTimeout();
}
res.insert(res.end(), right.begin() + numChunks * chunkSize,
right.end());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code duplication, in other places you have used a lambda to avoid this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed

using ref = value_type&;
using const_ref = const value_type&;

value_type& operator[](const size_t pos) { return _data[pos]; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you need this and how can the compiler tell whether to use this operator or the one from the next line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

&* These are overloads that only differ in the constness of the result AND of the method. The compiler chooses this method if this is not const, and else the second one. This is an absolute standard pattern for containers, and the magic behind the following behavior:

std::vector<int> a{3};
a[0] = 42; // perfectly fine, the non-const operator[] returns a non-const reference.
const std::vector<int> b{3};
b[0] = 43; // compiler error, b is const, the const overload of operator[] returns a const ref to which we may not write

I am not sure if this is needed in one of the other PRs, but in general it is good to have a complete interface (other methods of the Pattern class also have the const and non-const overloads, e.g. back.

if (p._meta.relationExists(key)) {
const FullRelationMetaData& rmd = p._meta.getRmd(key)._rmdPairs;
result->reserve(rmd.getNofElements() + 2);
result->resize(rmd.getNofElements());
p._file.read(result->data(), rmd.getNofElements() * 2 * sizeof(Id),
rmd._startFullIndex);
rmd._startFullIndex, std::move(timer));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that gives the answer to my earlier question in IndexScan.cpp . I wonder though, why this function is here and not in IndexScan.cpp . Especially since it seems to be the only timeout-relevant function in class Index

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I came to the same conclusion above, but I think it is not relevant for the timeout PR

src/util/File.h Outdated
@@ -191,14 +192,22 @@ class File {
//! Read nofBytesToRead bytes from file starting at the given offset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove extra space in comment

joka921 and others added 6 commits April 19, 2021 15:09
Still TODO:
- Global Constants for Operations until check.
- Verify impact of Timeout
- Proper Estimates for sorting.
It is tricky to test.

TODO: exclude the test from the make test command b.c. it takes too long and integrate the SortPerformanceEstimator into the ExecutionContext.
Copy link
Member

@hannahbast hannahbast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

src/engine/Operation.h Outdated Show resolved Hide resolved

for (size_t numColumns = 7; numColumns < 15; numColumns++) {
bool isFirst = true;
for (size_t i = 1000000; i < 100000000; i = static_cast<size_t>(i * 1.5)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1'000'000
100'000'000

I understood that a single one of these sorts takes seconds, which is fine. Minutes for all of them together is maybe a bit too long. How about testing fewer of them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a random generator to only execute every 6th test case on average.
On my average laptop the tests then run for about 23 seconds which I consider acceptable

test/SortPerformanceEstimatorTest.cpp Outdated Show resolved Hide resolved
src/util/Random.h Outdated Show resolved Hide resolved
src/util/Random.h Outdated Show resolved Hide resolved
src/engine/Sort.cpp Outdated Show resolved Hide resolved
src/engine/Sort.cpp Outdated Show resolved Hide resolved
src/engine/OrderBy.cpp Outdated Show resolved Hide resolved
src/engine/OrderBy.cpp Outdated Show resolved Hide resolved
Comment on lines 120 to 123
template <int WIDTH, typename C,
typename = std::enable_if_t<
std::is_invocable_v<C, typename IdTableStatic<WIDTH>::row_type,
typename IdTableStatic<WIDTH>::row_type>>>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// The effect of the third template argument is that if C does not have operator() with the specified arguments, then this template does not match (so that there is no confusion with the templated function above). This is an instance of SFINAE.

Johannes: maybe add return type bool as well

Copy link
Member

@hannahbast hannahbast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, it's great to finally have a working timeout!

Comment on lines 98 to 99
SortPerformanceEstimator estimator(allocator);
QueryExecutionContext qec(index, engine, &cache, &pinnedSizes, allocator,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sortPerformanceEstimator

SORT_ESTIMATE_CANCELLATION_FACTOR >
remainingSecs) {
->getSortPerformanceEstimator()
.estimateSortTimeInSeconds(subRes->size(), subRes->width()) >
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the function returns the time, it is better called estimatedSortTimeInSeconds

That also reads better in the code where it is used, like here

@@ -64,7 +64,7 @@ class QueryExecutionContext {
_pinnedSizes(pinnedSizes),
_allocator(std::move(allocator)),
_costFactors(),
_sortPerformanceEstimator(_allocator) {}
_sortPerformanceEstimator(sortPerformanceEstimator) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops :-)

@@ -35,6 +36,7 @@ class Server {
cacheMaxSizeGBSingleEntry * (1ull << 30u) / sizeof(Id)),
_allocator(ad_utility::makeAllocationMemoryLeftThreadsafeObject(
maxMemGB * (1ull << 30u))),
_sortPerformanceEstimator(_allocator),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit confusing that we _sortPerformanceEstimator(sortPerformanceEstimator) above and _sortPerformanceEstimator(_allocator) here. Not sure what to do about that though. If you think it's fine, then it's fine with me, too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constructing a SortPerformanceEstimator is expensive (it involves sorting large inputs for a time measurement). So we want to construct the Estimator (at most) once in a program.
There is only one Server which stores an Estimator, and passes it via (cheap) copy to the QueryExecutionContext class. It is however crucial, that the QueryExecutionContexts don't always recreate the estimators, as the Server creates on QueryExecutionContext per query.
This is what I got wrong initially.

SORT_ESTIMATE_CANCELLATION_FACTOR >
remainingSecs) {
->getSortPerformanceEstimator()
.estimateSortTimeInSeconds(subRes->size(), subRes->width()) >
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above: estimated...

@@ -18,25 +18,26 @@ class SortPerformanceEstimator {
explicit SortPerformanceEstimator(
const ad_utility::AllocatorWithLimit<Id>& allocator);

// Create a random IdTable with specified number of rows and columns. Sort
// Create a random IdTable with the specified number of rows and columns. Sort
// this table and return the time in seconds that this sorting took.
static double measureSortingTimeInSeconds(
size_t numRows, size_t numColumns,
const ad_utility::AllocatorWithLimit<Id>& allocator);

// Return an Estimate for how long sorting an IdTable with the specified
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Compute and return an estimate ...

@@ -10,7 +10,7 @@ static const size_t STXXL_MEMORY_TO_USE = 1024L * 1024L * 1024L * 2L;
static const size_t STXXL_DISK_SIZE_INDEX_BUILDER = 1000 * 1000;
static const size_t STXXL_DISK_SIZE_INDEX_TEST = 10;

static constexpr size_t DEFAULT_MEM_FOR_QUERIES_IN_GB = 80;
static constexpr size_t DEFAULT_MEM_FOR_QUERIES_IN_GB = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why so little?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can go for 4 here.
The problem is now, if this is too high, then the initial construction of the SortPerformanceEstimator will use that amount of money. So with a too high value here, bad things happen. Most notably, the end2end tests don't work properly. But I'll change it to 4 here (most people should have at least 8 Gigabytes on their machine) and explicitly scale down the memory for the end2end tests.

* A simple and fast Pseudo-Random-Number-Generator called Xoroshiro128+, for
* details see
* https://thompsonsed.co.uk/random-number-generators-for-c-performance-tested
* Limiting the range of the generated numbers is currently not supported
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing full stop + add:

Assumes that Int fits into 64 bits. If used with a type that does not fulfill this property, the template will not match (because of the std::enable_if) and there will be a compile-time error.

size_t _futureIndex = 0;
size_t _elementIndex = 0;
std::array<uint64_t, 4> _shuffleTable;
// The actual algorithm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this comment doing here, is it a leftover?

* more flexibility and stronger probabilistic guarantees
*/
template <typename Int, typename = std::enable_if_t<std::is_integral_v<Int>>>
class RandomIntGenerator {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for SlowRandomIntGenerator in order to pique interest for anyone seeing this used in the code.

@joka921 joka921 merged commit e4dd294 into ad-freiburg:master Apr 25, 2021
@joka921 joka921 deleted the f.Timeout branch April 25, 2021 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants