Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

F.refactor scans #214

Merged
merged 1 commit into from Mar 28, 2019
Merged

F.refactor scans #214

merged 1 commit into from Mar 28, 2019

Conversation

joka921
Copy link
Member

@joka921 joka921 commented Mar 21, 2019

Refactored the Scan operation.

Now we have everything in one place to reason about. This makes the code cleaner and more maintainable.

@niklas88 niklas88 self-requested a review March 22, 2019 09:26
Copy link
Member

@niklas88 niklas88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really glad you're tackling this it looks much cleaner already. I've added a few comments for the first change. I also propose we completely remove the -a flag. QLever now focuses more on being a general SPARQL engine and with your previous changes the cost isn't al that high. Also it's a source of frustration if you forget the flag.

@@ -599,4 +599,82 @@ class Index {
}
LOG(INFO) << "Done" << std::endl;
}

// _____________________________________________________________________________
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a method comment. I know we haven't had one here previously but it would be great to see this improve. Also I'd propose to use the format Florian used in recent PRs instead of the //! we have in some parts.

const PermutationInfo& perm, const MetaData& meta,
ad_utility::File& file) const {
LOG(DEBUG) << "Performing " << perm._readableName << " scan of relation "
<< keyFirst << " with fixed subject: " << keySecond << "...\n";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't always the subject, right?

void scan(const string& key, IdTable* result, const MetaData& meta,
ad_utility::File& file, const string& name) const {
if constexpr (checkPredicateVariable) {
if (!file.isOpen()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just do the file.isOpen() check for every one of the permutations and just warn that it doesn't exist? This would be basically free and useful to know even the file was just deleted. On top of this, seeing as the focus of QLever has shifted more towards being a full featured SPARQL engine and that your changes have made it a lot cheaper, I would propose we just remove that -a switch completely.

LOG(DEBUG) << "Performing " << perm._readableName << " scan of relation "
<< keyFirst << " with fixed subject: " << keySecond << "...\n";
Id relId;
Id subjId;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two are now also only sometimes relation and subject, right? So maybe they should be renamed


//_____________________________________________________________________________
template <class MetaData>
void scanImpl(Id key, IdTable* result, const MetaData& meta,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should have a name that better distinguishes it from the scan() function. It's not really an …Impl as it isn't used to implement scan() right? Maybe scanFull() because it scans a full "relation"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a different suggestion: Everything is a scan. The overload is sufficient to disambiguate:

  • 1 or 2 keys
  • are the keys strings (have to be looked up in vocab) or Ids (this has already been done)

@ghost
Copy link

ghost commented Mar 22, 2019

DeepCode encountered a problem when analyzing this pull request. If you want to retry, create a comment: "Retry Deepcode".

@joka921
Copy link
Member Author

joka921 commented Mar 22, 2019

Retry Deepcode

Copy link
Member

@niklas88 niklas88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, your changes remove twice as many lines as they add. I really like how much cleaner this looks now. I've added a few suggestions but nothing major.

@@ -272,26 +273,37 @@ void Join::computeResultForJoinWithFullScanDummy(ResultTable* result) const {
Join::ScanMethodType Join::getScanMethod(
std::shared_ptr<QueryExecutionTree> fullScanDummyTree) const {
ScanMethodType scanMethod;
using namespace std::placeholders;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used? I only know of _1,… in this namespace but I can't see them anywhere

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about std::bind at some point but found another solution, it is removed now.

@@ -75,7 +75,8 @@ class Join : public Operation {

void computeResultForJoinWithFullScanDummy(ResultTable* result) const;

typedef void (Index::*ScanMethodType)(Id, IdTable*) const;
// typedef void (Index::*ScanMethodType)(Id, IdTable*) const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the comment

@@ -75,7 +75,8 @@ class Join : public Operation {

void computeResultForJoinWithFullScanDummy(ResultTable* result) const;

typedef void (Index::*ScanMethodType)(Id, IdTable*) const;
// typedef void (Index::*ScanMethodType)(Id, IdTable*) const;
using ScanMethodType = std::function<void(Id, IdTable*)>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure the std::function doesn't have an overhead here? I'd guess it specializes for a standard function pointer but I'm not sure. Then again the scans are a lot slower than that overhead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You answered your question:
The std::function overhead is basically a virtual function call. It is called once per scan, and scans are on disk operations. So I think in this case it is perfectly fine. Other solutions seem much more hacky to me.

const MetaData Index::*metaPtr,
ad_utility::File Index::*filePtr)
array<unsigned short, 3> order, const MetaData& meta,
ad_utility::File& file)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try not to use non-const references (its forbidden in the Google styleguide though in the past we have been less strict). I think copying an ad_utility::File and passing by value should work unless you aleady mmap()ed it at this point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check out my latest commits, I have moved the MetaData and the file completely into the permutation class.

src/index/Index.h Show resolved Hide resolved

// this works because the join operations execution Context never changes
// during its lifetime
auto genericLambda = [this](const auto& perm) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be const auto? also I would call it scanLambda. Also can't we use std::bind() here? Is that why you have a using namespace std::placeholders? Alternatively instead of capturing this we could capture a const auto& idx = this->_executionContext->getIndex() which would make the lambda simpler too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, it is indeed much more readable if we only capture what we need.

}
auto filename = string(onDiskBase + ".index" + _fileSuffix);
_file.open(filename, "r");
AD_CHECK(_file.isOpen());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would turn this into an if with a proper error message about missing the file and that this means the index is broken.
Since you have this here I think you should remove the isOpen() check from the scans of course that makes the point about missing that in the two parameter case moot.

@@ -287,38 +289,38 @@ TEST(IndexTest, createFromTsvTest) {
// Lhs info
// ASSERT_EQ(0u, *reinterpret_cast<Id*>(buf + bytesDone));
// bytesDone += sizeof(Id);
// ASSERT_EQ(index._posMeta.getRmd(7)._rmdBlocks->_startRhs,
// ASSERT_EQ(index.POS().metaData.getRmd(7)._rmdBlocks->_startRhs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove the commented code outright, git will remember it in case we ever need it again

@joka921
Copy link
Member Author

joka921 commented Mar 26, 2019

Retry Deepcode

// bytesDone += sizeof(Id);
// ASSERT_EQ(bytesDone,
// index._PSO.metaData().getRmd(7)._rmdBlocks->_startRhs); ASSERT_EQ(0u,
// *reinterpret_cast<Id*>(buf + bytesDone)); bytesDone += sizeof(Id);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented section

Copy link
Member

@niklas88 niklas88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really really like these PRs that remove hundreds of lines of code while making the rest more maintainable.

@joka921
Copy link
Member Author

joka921 commented Mar 28, 2019

Retry Deepcode

- The templated Permutation class now owns the Permutations meta data and file
- The scan method now is templated on the permutations and takes it as an argument
- This removes a lot of duplicated code in the Index class.
- Idea for further improvements: The scan methods could also be inside the Permutations,
  this further reduces the complexity per file
@joka921 joka921 merged commit 9269d2a into ad-freiburg:master Mar 28, 2019
@joka921 joka921 deleted the f.refactorScans branch March 28, 2019 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants