Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat optional #43

Merged
merged 23 commits into from Mar 15, 2018
Merged

Feat optional #43

merged 23 commits into from Mar 15, 2018

Conversation

floriankramer
Copy link
Member

Added support for OPTIONAL query parts. Fixes #22
I changed the query parsing to be able to recursively parse optional parts, storing them in a new GraphPattern struct. The QueryPlanner then adds optional parts to the optimiations dp table during the seeding phase. There is a new OptionalJoin Operation, that joins two ResultTables on any number of columns, adding new ID_NO_VALUE constants to represent fields empty due to optional sides.

@niklas88
Copy link
Member

Will start reviewing now, in the meantime can you rebase against master? This should pull in the .travis.yml and give us CI testing of these changes. I propose to do the rebase as follows:

git fetch upstream/master
git checkout feat_optional
git rebase upstream/master
git push --force

Note that for rebasing --force is both needed and ok.

Copy link
Member

@niklas88 niklas88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this work, I'm really impressed how fast you got a grasp of the code base. I don't really have the expertise to review this adequately to be honest. So added only some nitpicking comments.

Do I see this right, that we don't have any end-to-end tests that actually parse⇒plan⇒execute some queries? Maybe we could add those as an external script using SparqlEngineMain. I found some query lists that look like an attempt of this in misc/ but some look like they still use an older text search syntax so they are definitely not run

@@ -88,20 +123,34 @@ int main(int argc, char** argv) {
case 'a':
allPermutations = true;
break;
case 'h':
printUsage(argv[0]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cleaning this up!

cerr << "ERROR: No index specified, but an index is required." << endl;
}
if (port == -1) {
cerr << "ERROR: No port specified, but the port is required." << endl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

os << "{\n";
for (size_t i = 0; i < _whereClauseTriples.size(); ++i) {
os << "\n\t" << _whereClauseTriples[i].asString();
if (i + 1 < _whereClauseTriples.size()) { os << ','; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't like the if in the loop I personally prefer the following pattern.

for (size_t i = 0; i < vec.size() - 1; i++) {
    out << vec[i] << ", ";
}
out << vec.back();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to your definitely nicer style in the latest commit, and added some proper indentation to the output.

@@ -121,7 +125,46 @@ void SparqlParser::parseWhere(const string& str, ParsedQuery& query) {
while (start < inner.size()) {
size_t k = start;
while (inner[k] == ' ' || inner[k] == '\t' || inner[k] == '\n') { ++k; }
if (inner[k] == 'F') {
if (inner[k] == 'O' || inner[k] == 'o') {
if (inner.substr(k, 8) == "OPTIONAL"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we case insensitve or only all-upper or all-lower? Would maybe prefer a toLower() and then check

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently only all upper and all lower case are supported. The standard says most keywords are case insensitive (apart from the keyword 'a' which may only be lower case). It may be cleaner to change that in another pull request /. commit though.

jcls_a |= (1 << jc[0]);
jcls_b |= (1 << jc[1]);
}
// avoid compiler warnings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which warning exactly and how?

template<typename A, typename B, typename R, int K>
static void optionalJoin(const A& a, const B& b,
bool aOptional, bool bOptional,
const vector<array<size_t, 2>>& jcls,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you provide more desriptive names here? I have no idea what jcls stands for

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced all mentions of jcl and jcls with joinColumn and joinColumns respectively.

for (size_t ib = 0; ib < b.size(); ib++) {
R res = newOptionalResult<R, K>()(resultSize);
createOptionalResult<A, B, R, true, false>(
static_cast<typename A::value_type*>(0), &b[ib],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nullptr should work without a cast, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally relied on the compiler deducing the template arguments, thats why I had to cast the nullptrs to be pointers of the right type. Replaced them with actual nullptrs now.

@@ -60,11 +94,12 @@ int main(int argc, char** argv) {
bool interactive = false;
bool onDiskLiterals = false;
bool allPermutations = false;
bool optimizeOptionals = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hannah said this currently slows down queries? I assume this only happens if there is an OPTIONAL? Is this the right default at the moment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The slowdown happens due to the query optimizer effectively having to optimize a query with another triple, representing the optional part. For queries with many triples (around 9 or more) the slowdown starts becoming noticeable. For queries with fewer triples the change in speed is minmal, and using the optimizer can of course lead to significantly faster execution time. If the default use case includes queries which, on average, have many triples changing the setting to false by default would be better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, this case is actually more non-trivial than I thought.

}

// add sentinels
A& l1 = const_cast<A&>(a);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment should explain why it's ok to cast away the const here

return os.str();
}

// Used to generate all up to 125 combinations of left, right and result size.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we allowed C++14/17 constexpr with loops this could be done much nicer, right? If yes, then we probably should.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right. I'll have to look into that later today or tomorrow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a bit of googling for the uses of constexpr in c++ 14 and 17, as I've never used it extensively before.
As far as I could find out, there is no way, apart from the same recursion that is implemented with the templates right now, to create a constexpr loop. As the problem with finding the right template parameters is, that we need to transform a runtime value with a limited number of values into a compile time constant, I didn't find a nicer solution using constexpr, as any assignment of a constexpr would depend on a runtime value.
Was there a specific trick you were thinking of, or is there a feature of constexpr I didn't find, that would help with these problems?

@@ -389,25 +389,25 @@ class Engine {
static void createOptionalResult(const typename A::value_type* a,
const typename B::value_type* b,
size_t sizeA,
int jcls_a, int jcls_b,
const std::vector<size_t>& jclAToB,
int joinColumBitmap_a, int joinColumBitmap_b,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's "column" so use either "..Col" or "..Column"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@niklas88 niklas88 merged commit 124ae54 into ad-freiburg:master Mar 15, 2018
@floriankramer floriankramer deleted the feat_optional branch March 15, 2018 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants