Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove variable/column map redundancy (WIP) #281

Closed
wants to merge 9 commits into from
Closed

Remove variable/column map redundancy (WIP) #281

wants to merge 9 commits into from

Conversation

niklas88
Copy link
Member

@niklas88 niklas88 commented Sep 6, 2019

This branch tries to reduce (in the thus named commit) and ultimately remove the redundant and confusing copy of the variable to column mapping in QueryExecutionTree vs Operation mentioned in #280. The code already compiles and handles some of the e2e test queries but crashes on a later one. Since today is my last full day working on QLever and this is more like an aesthetic correction I want to get this out here if @floriankramer or @joka921 would like to tackle this if I can't find the time.

The theory behind caching aborted Operations was that they would fail
again anyway. However, under memory pressure Operations may fail due to
failing allocations and will succeed in the future once memory pressure
drops, if we kept them in the cache we wouldn't try again.

While we are at it, improve wording and use an extra abort() method on
Operation.
this allows to mark elements as pinned, meaning they will not be dropped
due to capacity constraints or clear() but only with an explicit erase
(e.g. as used for aborted queries).
ASSERT_EQ(*cache["4"], "xxxx"); // not dropped
ASSERT_EQ(*cache["3"], "xxx"); // not dropped
cache.insert("6", "xxxxxx");
ASSERT_FALSE(cache["5"]); // oldest access driooed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good find. This is in fact also in the branch I'd like to still finnish.

cache.insert("4", "xxxx");
ASSERT_EQ(*cache["pinned"], "bar"); // pinned still there
ASSERT_FALSE(cache["1"]); // oldest already gone
ASSERT_EQ(*cache["2"], "xx"); // not dropped because pinned not in capacity
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

misaligned comment

The Operation is owned so it should be a unique_ptr not shared_ptr. This
will hopefully save quite a few memory barriers (for atomic ref
counting) and is cleaner.

We also prevent the copying of QueryExecutionTrees which was previously
done in several places and may be expensive but is definitely
unexpected.
They are actually kept in the order of the triple file as seen by the
setVariableColumn() calls in QueryPlanner.cpp
The name always suggested that it carried context per query while in
fact it held the SubtreeCache which should really live for many queries.
Since it is indeed useful to have a per query context we move the
SubtreeCache outside the QueryExecutionContext and make it truly per
query.
This allows to pin all subtrees of a query to the cache and have that as
a per query option. On the QLever server this is exposed as the
"pinsubtrees" URL param. If set to true all subtrees in the query will
be cached. This is especially useful for completion queries and the
like.
In the entries we now only store a pointer to the key inside the hash
map
Both currently keep separate _variableColumnMap hash maps. This commit
tries to always init the one in the QueryExecutionTree from the
_rootOperation we have anyway. Sadly it seems they aren't just redundant
but diverge somewhere as this commit currently fails to pass the e2e
tests.
@niklas88
Copy link
Member Author

niklas88 commented Sep 6, 2019

Compared with the #282 PR this adds two commits that currently break things but would be nice to have if fixed. First (reverted) commit 4db3815 tries to change the LRUCache so that it only stores keys once, the tests pass and some queries work but I think somehow the iterators into the list get jumbled and this breaks e.g. the e2e tests.

Secondly the commit 905de53 as described above tries to only handle one set of variable column maps but it too is currently still buggy. Ideally it would also not make a copy of the (currently temporary) variable columns mapping generated by the Operation.

@niklas88 niklas88 closed this Oct 1, 2019
@niklas88 niklas88 deleted the var_col_redundancy branch October 1, 2019 15:52
@niklas88 niklas88 restored the var_col_redundancy branch October 1, 2019 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants