New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tweak in SortPerformanceEstimator (faster + log) #398
Conversation
1. Right now, the estimation takes several minutes. Is that really necessary? The main culprits are the 100,000,000 tests, so I removed them. OK? 2. The log had two lines per sort ("Sorting ..." and "done"), it's now only one line. TODO: It is confusing that the sort estimation is the very first thing one sees in the server log upon startup. The server initialization comes afterwards. Is realize that there are reasons for this in the code. Nevertheless, is it possible to do this the other way round?
@joka921 Did you have a chance to look at this, Johannes? It's a rather tiny PR. BTW, it would be good to adapt the array sizes to the input size. For small input collections, the start-up time is virtually zero without the sort performance estimation. Alternatively, one could make the sort performance estimation optional via a command line argument. |
This allows us, to limit the maximum sample size depending on the Knowledgebase size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this!
I have only two minor suggestions: a variable name change and a more precise comment. Address as you see fit
/// Set up the sort estimates. This will take some time. Only samples, that | ||
/// can be allocated by the allocator and that have less thatn | ||
/// `maxNumberOfElementsToSort` elements will actually be measured. | ||
void createEstimatesExpensively( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about computeEstimatesExpensively ?
src/global/Constants.h
Outdated
@@ -114,6 +114,10 @@ static constexpr size_t NUM_OPERATIONS_HASHSET_LOOKUP = 32; | |||
// than the remaining time, then the sort is canceled with a timeout exception | |||
static constexpr double SORT_ESTIMATE_CANCELLATION_FACTOR = 3.0; | |||
|
|||
// When initializing a sort performance estimator, at most this percentage of | |||
// the index size is being sorted at once. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of the index size -> of the number of triples in the index
Since I started this PR, I can't approve it, please do @joka921 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the two suggestions, I am waiting for the CI pipeline and am then going to merge it.
Right now, the estimation takes several minutes. Is that really
necessary? The main culprits are the 100,000,000 tests, so I removed
them. OK?
The log had two lines per sort ("Sorting ..." and "done"), it's now
only one line.
TODO: It is confusing that the sort estimation is the very first thing one
sees in the server log upon startup. The server initialization comes
afterwards. Is realize that there are reasons for this in the code.
Nevertheless, is it possible to do this the other way round?