Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweak in SortPerformanceEstimator (faster + log) #398

Merged
merged 3 commits into from Jun 2, 2021

Conversation

hannahbast
Copy link
Member

  1. Right now, the estimation takes several minutes. Is that really
    necessary? The main culprits are the 100,000,000 tests, so I removed
    them. OK?

  2. The log had two lines per sort ("Sorting ..." and "done"), it's now
    only one line.

TODO: It is confusing that the sort estimation is the very first thing one
sees in the server log upon startup. The server initialization comes
afterwards. Is realize that there are reasons for this in the code.
Nevertheless, is it possible to do this the other way round?

1. Right now, the estimation takes several minutes. Is that really
necessary? The main culprits are the 100,000,000 tests, so I removed
them. OK?

2. The log had two lines per sort ("Sorting ..." and "done"), it's now
only one line.

TODO: It is confusing that the sort estimation is the very first thing one
sees in the server log upon startup. The server initialization comes
afterwards. Is realize that there are reasons for this in the code.
Nevertheless, is it possible to do this the other way round?
@hannahbast hannahbast requested a review from joka921 May 8, 2021 10:09
@hannahbast
Copy link
Member Author

@joka921 Did you have a chance to look at this, Johannes? It's a rather tiny PR.

BTW, it would be good to adapt the array sizes to the input size. For small input collections, the start-up time is virtually zero without the sort performance estimation. Alternatively, one could make the sort performance estimation optional via a command line argument.

This allows us, to limit the maximum sample size depending on the Knowledgebase size.
Copy link
Member Author

@hannahbast hannahbast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this!

I have only two minor suggestions: a variable name change and a more precise comment. Address as you see fit

/// Set up the sort estimates. This will take some time. Only samples, that
/// can be allocated by the allocator and that have less thatn
/// `maxNumberOfElementsToSort` elements will actually be measured.
void createEstimatesExpensively(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about computeEstimatesExpensively ?

@@ -114,6 +114,10 @@ static constexpr size_t NUM_OPERATIONS_HASHSET_LOOKUP = 32;
// than the remaining time, then the sort is canceled with a timeout exception
static constexpr double SORT_ESTIMATE_CANCELLATION_FACTOR = 3.0;

// When initializing a sort performance estimator, at most this percentage of
// the index size is being sorted at once.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of the index size -> of the number of triples in the index

@hannahbast
Copy link
Member Author

Since I started this PR, I can't approve it, please do @joka921

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the two suggestions, I am waiting for the CI pipeline and am then going to merge it.

@hannahbast hannahbast merged commit 4308ec2 into master Jun 2, 2021
@hannahbast hannahbast deleted the qlever.sort-estimator-tweak branch September 29, 2021 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants