New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove variable/column map redundancy (WIP) #281
Conversation
The theory behind caching aborted Operations was that they would fail again anyway. However, under memory pressure Operations may fail due to failing allocations and will succeed in the future once memory pressure drops, if we kept them in the cache we wouldn't try again. While we are at it, improve wording and use an extra abort() method on Operation.
this allows to mark elements as pinned, meaning they will not be dropped due to capacity constraints or clear() but only with an explicit erase (e.g. as used for aborted queries).
ASSERT_EQ(*cache["4"], "xxxx"); // not dropped | ||
ASSERT_EQ(*cache["3"], "xxx"); // not dropped | ||
cache.insert("6", "xxxxxx"); | ||
ASSERT_FALSE(cache["5"]); // oldest access driooed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good find. This is in fact also in the branch I'd like to still finnish.
cache.insert("4", "xxxx"); | ||
ASSERT_EQ(*cache["pinned"], "bar"); // pinned still there | ||
ASSERT_FALSE(cache["1"]); // oldest already gone | ||
ASSERT_EQ(*cache["2"], "xx"); // not dropped because pinned not in capacity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
misaligned comment
The Operation is owned so it should be a unique_ptr not shared_ptr. This will hopefully save quite a few memory barriers (for atomic ref counting) and is cleaner. We also prevent the copying of QueryExecutionTrees which was previously done in several places and may be expensive but is definitely unexpected.
They are actually kept in the order of the triple file as seen by the setVariableColumn() calls in QueryPlanner.cpp
The name always suggested that it carried context per query while in fact it held the SubtreeCache which should really live for many queries. Since it is indeed useful to have a per query context we move the SubtreeCache outside the QueryExecutionContext and make it truly per query.
This allows to pin all subtrees of a query to the cache and have that as a per query option. On the QLever server this is exposed as the "pinsubtrees" URL param. If set to true all subtrees in the query will be cached. This is especially useful for completion queries and the like.
In the entries we now only store a pointer to the key inside the hash map
This reverts commit 9209326.
Both currently keep separate _variableColumnMap hash maps. This commit tries to always init the one in the QueryExecutionTree from the _rootOperation we have anyway. Sadly it seems they aren't just redundant but diverge somewhere as this commit currently fails to pass the e2e tests.
Compared with the #282 PR this adds two commits that currently break things but would be nice to have if fixed. First (reverted) commit 4db3815 tries to change the Secondly the commit 905de53 as described above tries to only handle one set of variable column maps but it too is currently still buggy. Ideally it would also not make a copy of the (currently temporary) variable columns mapping generated by the |
This branch tries to reduce (in the thus named commit) and ultimately remove the redundant and confusing copy of the variable to column mapping in
QueryExecutionTree
vsOperation
mentioned in #280. The code already compiles and handles some of the e2e test queries but crashes on a later one. Since today is my last full day working on QLever and this is more like an aesthetic correction I want to get this out here if @floriankramer or @joka921 would like to tackle this if I can't find the time.