Tweak in SortPerformanceEstimator (faster + log) #398

hannahbast · 2021-05-08T10:09:33Z

Right now, the estimation takes several minutes. Is that really
necessary? The main culprits are the 100,000,000 tests, so I removed
them. OK?
The log had two lines per sort ("Sorting ..." and "done"), it's now
only one line.

TODO: It is confusing that the sort estimation is the very first thing one
sees in the server log upon startup. The server initialization comes
afterwards. Is realize that there are reasons for this in the code.
Nevertheless, is it possible to do this the other way round?

1. Right now, the estimation takes several minutes. Is that really necessary? The main culprits are the 100,000,000 tests, so I removed them. OK? 2. The log had two lines per sort ("Sorting ..." and "done"), it's now only one line. TODO: It is confusing that the sort estimation is the very first thing one sees in the server log upon startup. The server initialization comes afterwards. Is realize that there are reasons for this in the code. Nevertheless, is it possible to do this the other way round?

hannahbast · 2021-05-10T22:15:28Z

@joka921 Did you have a chance to look at this, Johannes? It's a rather tiny PR.

BTW, it would be good to adapt the array sizes to the input size. For small input collections, the start-up time is virtually zero without the sort performance estimation. Alternatively, one could make the sort performance estimation optional via a command line argument.

This allows us, to limit the maximum sample size depending on the Knowledgebase size.

hannahbast

Thanks a lot for this!

I have only two minor suggestions: a variable name change and a more precise comment. Address as you see fit

hannahbast · 2021-05-28T17:47:55Z

src/engine/SortPerformanceEstimator.h

+  /// Set up the sort estimates. This will take some time. Only samples, that
+  /// can be allocated by the allocator and that have less thatn
+  /// `maxNumberOfElementsToSort` elements will actually be measured.
+  void createEstimatesExpensively(


How about computeEstimatesExpensively ?

hannahbast · 2021-05-28T17:50:10Z

src/global/Constants.h

@@ -114,6 +114,10 @@ static constexpr size_t NUM_OPERATIONS_HASHSET_LOOKUP = 32;
 // than the remaining time, then the sort is canceled with a timeout exception
 static constexpr double SORT_ESTIMATE_CANCELLATION_FACTOR = 3.0;

+// When initializing a sort performance estimator, at most this percentage of
+// the index size is being sorted at once.


of the index size -> of the number of triples in the index

hannahbast · 2021-05-28T17:56:44Z

Since I started this PR, I can't approve it, please do @joka921

joka921

I changed the two suggestions, I am waiting for the CI pipeline and am then going to merge it.

hannahbast requested a review from joka921 May 8, 2021 10:09

Two-phase initialization for the sortPerformanceEstimator.

2ab492e

This allows us, to limit the maximum sample size depending on the Knowledgebase size.

hannahbast commented May 28, 2021

View reviewed changes

Two last changes from Hannah's review.

11f4bf0

joka921 approved these changes May 29, 2021

View reviewed changes

hannahbast merged commit 4308ec2 into master Jun 2, 2021

hannahbast deleted the qlever.sort-estimator-tweak branch September 29, 2021 21:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tweak in SortPerformanceEstimator (faster + log) #398

Tweak in SortPerformanceEstimator (faster + log) #398

hannahbast commented May 8, 2021

hannahbast commented May 10, 2021

hannahbast left a comment

hannahbast May 28, 2021

hannahbast May 28, 2021

hannahbast commented May 28, 2021

joka921 left a comment

Tweak in SortPerformanceEstimator (faster + log) #398

Tweak in SortPerformanceEstimator (faster + log) #398

Conversation

hannahbast commented May 8, 2021

hannahbast commented May 10, 2021

hannahbast left a comment

Choose a reason for hiding this comment

hannahbast May 28, 2021

Choose a reason for hiding this comment

hannahbast May 28, 2021

Choose a reason for hiding this comment

hannahbast commented May 28, 2021

joka921 left a comment

Choose a reason for hiding this comment