Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement CONSTRUCT query processing #528

Merged
merged 55 commits into from
Jan 1, 2022
Merged
Show file tree
Hide file tree
Changes from 47 commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
a3bf864
Initial changes to implement construct parser
RobinTF Dec 14, 2021
88b22ec
Fix types and temporary values
RobinTF Dec 14, 2021
c7d7fdc
Start adding tests
RobinTF Dec 14, 2021
75ce36c
Fix bad reference
RobinTF Dec 14, 2021
f25dea9
Enable usage of a keyword
RobinTF Dec 14, 2021
a9d435a
Use proper iri notation
RobinTF Dec 15, 2021
b8df565
Create blank nodes instead of variables
RobinTF Dec 15, 2021
6956b68
Add TODO
RobinTF Dec 15, 2021
577356d
Start implementing construct parsing
RobinTF Dec 16, 2021
554030e
Start implementing turtle format
RobinTF Dec 18, 2021
348075c
Fix casting issues and enhance visitor
RobinTF Dec 19, 2021
91bb21c
Use more complex types
RobinTF Dec 19, 2021
d5fbd0a
Unify variable substitution
RobinTF Dec 19, 2021
95626be
Clean up code a little
RobinTF Dec 19, 2021
9a2bb0e
Reformat code
RobinTF Dec 19, 2021
7a36a3e
Introduce co_return to avoid potential bugs
RobinTF Dec 21, 2021
726070c
Remove broken test
RobinTF Dec 21, 2021
b8a5078
Inline appendVector
RobinTF Dec 21, 2021
f1d4473
Inline and rename reversed nodes wrapper
RobinTF Dec 21, 2021
dbb076b
Rename variable
RobinTF Dec 21, 2021
841ba8e
Inline BlankNodeCreator into Visitor
RobinTF Dec 21, 2021
781eaf6
Refactor method order
RobinTF Dec 21, 2021
cf6100d
Use constructor over make_pair
RobinTF Dec 21, 2021
02f9658
Fix failing test case
RobinTF Dec 21, 2021
b94b9c1
Move Data Types into dedicated data directory
RobinTF Dec 21, 2021
e051c2d
Prefer std::variant over std::function
RobinTF Dec 21, 2021
6563fa2
Introduce Iri Type
RobinTF Dec 21, 2021
f02f537
Add comment
RobinTF Dec 21, 2021
de936a7
Fix syntax
RobinTF Dec 22, 2021
feda6fc
Properly handle turtle accept header
RobinTF Dec 22, 2021
c529c2d
Fix code style
RobinTF Dec 22, 2021
f09c003
Fix member name due to rebase
RobinTF Dec 23, 2021
793ab27
Remove redundant TODO
RobinTF Dec 23, 2021
10c48e8
Introduce basic rdf graph checking
RobinTF Dec 24, 2021
7eaf74d
Add newline at end of file
RobinTF Dec 24, 2021
11efd69
Unify _selectClause and _constructClause in variant
RobinTF Dec 26, 2021
505a766
Address PR comments
RobinTF Dec 26, 2021
2480a68
Format files
RobinTF Dec 26, 2021
635ea19
Introduce separator comments
RobinTF Dec 26, 2021
75c7d9e
Fix failing E2E tests
RobinTF Dec 27, 2021
f552aea
Use strict error strategy
RobinTF Dec 27, 2021
e644715
Fix formatting
RobinTF Dec 27, 2021
21d3807
Prefer constexpr
RobinTF Dec 30, 2021
5605f6c
Use helper functions for common type code
RobinTF Dec 30, 2021
ac1c969
Properly use references
RobinTF Dec 30, 2021
5b880c1
Adress simple one-line fixes
RobinTF Dec 30, 2021
b9585bb
Use ctre
RobinTF Dec 30, 2021
54fafd4
Revert accidental CMakeLists change
RobinTF Dec 31, 2021
9c0b92b
Address more PR comments
RobinTF Dec 31, 2021
eb96bd9
Rename functions for clarity
RobinTF Dec 31, 2021
6385f13
Clearly separate non-textual use of variable name
RobinTF Dec 31, 2021
35323c4
Add invariant for Variable class
RobinTF Dec 31, 2021
e5065cb
Remove redundant test code
RobinTF Dec 31, 2021
58c02e6
Add trailing newline
RobinTF Dec 31, 2021
87fa97d
Potentially final commit for this PR
RobinTF Jan 1, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
16 changes: 14 additions & 2 deletions src/SparqlEngineMain.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -213,8 +213,20 @@ void processQuery(QueryExecutionContext& qec, const string& query) {
LOG(INFO) << "Execution Tree: " << qet.asString() << "ms\n";
size_t limit = pq._limit.value_or(MAX_NOF_ROWS_IN_RESULT);
size_t offset = pq._offset.value_or(0);
qet.writeResultToStream(cout, pq._selectClause._selectedVariables, limit,
offset);
ad_utility::stream_generator::stream_generator generator;
if (pq.hasSelectClause()) {
generator = qet.generateResults(pq.selectClause()._selectedVariables, limit,
offset);
} else if (pq.hasConstructClause()) {
generator = qet.writeRdfGraphTurtle(pq.constructClause(), limit, offset);
} else {
// Missing implementation
AD_CHECK(false);
}

while (generator.hasNext()) {
cout << generator.next();
}
t.stop();
std::cout << "\nDone. Time: " << t.usecs() / 1000.0 << " ms\n";
size_t numMatches = qet.getResult()->size();
Expand Down
40 changes: 29 additions & 11 deletions src/engine/QueryExecutionTree.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -86,17 +86,6 @@ void QueryExecutionTree::setVariableColumns(
_variableColumnMap = map;
}

// _____________________________________________________________________________
void QueryExecutionTree::writeResultToStream(std::ostream& out,
const vector<string>& selectVars,
size_t limit, size_t offset,
char sep) const {
auto generator = generateResults(selectVars, limit, offset, sep);
while (generator.hasNext()) {
out << generator.next();
}
}

// _____________________________________________________________________________
ad_utility::stream_generator::stream_generator
QueryExecutionTree::generateResults(const vector<string>& selectVars,
Expand Down Expand Up @@ -445,3 +434,32 @@ ad_utility::stream_generator::stream_generator QueryExecutionTree::writeTable(
}
LOG(DEBUG) << "Done creating readable result.\n";
}

// _____________________________________________________________________________
ad_utility::stream_generator::stream_generator
QueryExecutionTree::writeRdfGraphTurtle(
const std::vector<std::array<VarOrTerm, 3>>& constructTriples, size_t limit,
size_t offset) const {
// They may trigger computation (but does not have to).
shared_ptr<const ResultTable> res = getResult();

size_t upperBound = std::min<size_t>(offset + limit, res->_idTable.size());
auto variableColumns = getVariableColumns();
for (size_t i = offset; i < upperBound; i++) {
RobinTF marked this conversation as resolved.
Show resolved Hide resolved
Context context{i, *res, variableColumns, _qec->getIndex()};
for (const auto& triple : constructTriples) {
auto subject = triple[0].toString(context, SUBJECT);
auto verb = triple[1].toString(context, VERB);
auto object = triple[2].toString(context, OBJECT);
if (!subject.has_value() || !verb.has_value() || !object.has_value()) {
continue;
}
co_yield subject.value();
co_yield ' ';
co_yield verb.value();
co_yield ' ';
co_yield object.value();
co_yield " .\n";
}
}
}
11 changes: 7 additions & 4 deletions src/engine/QueryExecutionTree.h
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@
#include <unordered_map>
#include <unordered_set>

#include "../parser/data/Context.h"
#include "../parser/data/VarOrTerm.h"
#include "../util/Conversions.h"
#include "../util/HashSet.h"
#include "../util/streamable_generator.h"
Expand Down Expand Up @@ -103,10 +105,6 @@ class QueryExecutionTree {
const std::vector<string>& selectVariables,
const ResultTable& resultTable) const;

void writeResultToStream(std::ostream& out, const vector<string>& selectVars,
size_t limit = MAX_NOF_ROWS_IN_RESULT,
size_t offset = 0, char sep = '\t') const;

ad_utility::stream_generator::stream_generator generateResults(
const vector<string>& selectVars, size_t limit = MAX_NOF_ROWS_IN_RESULT,
size_t offset = 0, char sep = '\t') const;
Expand Down Expand Up @@ -195,6 +193,11 @@ class QueryExecutionTree {
bool& isRoot() noexcept { return _isRoot; }
[[nodiscard]] const bool& isRoot() const noexcept { return _isRoot; }

// Generate an RDF graph in turtle syntax for a CONSTRUCT query.
ad_utility::stream_generator::stream_generator writeRdfGraphTurtle(
RobinTF marked this conversation as resolved.
Show resolved Hide resolved
const std::vector<std::array<VarOrTerm, 3>>& constructTriples,
size_t limit, size_t offset) const;

private:
QueryExecutionContext* _qec; // No ownership
ad_utility::HashMap<string, size_t> _variableColumnMap;
Expand Down
60 changes: 40 additions & 20 deletions src/engine/QueryPlanner.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,10 @@ QueryExecutionTree QueryPlanner::createExecutionTree(ParsedQuery& pq) {
_enablePatternTrick && checkUsePatternTrick(&pq, &patternTrickTriple);

bool doGrouping = !pq._groupByVariables.empty() || usePatternTrick;
if (!doGrouping) {
if (!doGrouping && pq.hasSelectClause()) {
// if there is no group by statement, but an aggregate alias is used
// somewhere do grouping anyways.
for (const ParsedQuery::Alias& alias : pq._selectClause._aliases) {
for (const ParsedQuery::Alias& alias : pq.selectClause()._aliases) {
if (alias._expression.isAggregate({})) {
doGrouping = true;
break;
Expand All @@ -74,8 +74,11 @@ QueryExecutionTree QueryPlanner::createExecutionTree(ParsedQuery& pq) {
}

// DISTINCT
if (pq._selectClause._distinct) {
plans.emplace_back(getDistinctRow(pq, plans));
if (pq.hasSelectClause()) {
const auto& selectClause = pq.selectClause();
if (selectClause._distinct) {
plans.emplace_back(getDistinctRow(selectClause, plans));
}
}

// ORDER BY
Expand Down Expand Up @@ -440,12 +443,15 @@ bool QueryPlanner::checkUsePatternTrick(
// appear in a value clause?
// Check if the query has the right number of variables for aliases and
// group by.
if (pq->_groupByVariables.size() != 1 ||
pq->_selectClause._aliases.size() > 1) {
if (!pq->hasSelectClause()) {
return false;
}
const auto& selectClause = pq->selectClause();
if (pq->_groupByVariables.size() != 1 || selectClause._aliases.size() > 1) {
return false;
}

bool returns_counts = pq->_selectClause._aliases.size() == 1;
bool returns_counts = selectClause._aliases.size() == 1;

// These will only be set if the query returns the count of predicates
// The varialbe the COUNT alias counts
Expand All @@ -455,7 +461,7 @@ bool QueryPlanner::checkUsePatternTrick(

if (returns_counts) {
// There has to be a single count alias
const ParsedQuery::Alias& alias = pq->_selectClause._aliases.back();
const ParsedQuery::Alias& alias = selectClause._aliases.back();
auto countVariable =
alias._expression.getVariableForNonDistinctCountOrNullopt();
if (!countVariable.has_value()) {
Expand Down Expand Up @@ -492,7 +498,7 @@ bool QueryPlanner::checkUsePatternTrick(

// check that all selected variables are outputs of
// CountAvailablePredicates
for (const std::string& s : pq->_selectClause._selectedVariables) {
for (const std::string& s : selectClause._selectedVariables) {
if (s != t._o && s != count_var_name) {
usePatternTrick = false;
break;
Expand Down Expand Up @@ -556,8 +562,12 @@ bool QueryPlanner::checkUsePatternTrick(
graphsToProcess.push_back(&arg._child2);
} else if constexpr (std::is_same_v<
T, GraphPatternOperation::Subquery>) {
for (const std::string& v :
arg._subquery._selectClause._selectedVariables) {
if (!arg._subquery.hasSelectClause()) {
usePatternTrick = false;
return;
}
const auto& selectClause = arg._subquery.selectClause();
for (const auto& v : selectClause._selectedVariables) {
if (v == t._o) {
usePatternTrick = false;
break;
Expand Down Expand Up @@ -604,8 +614,12 @@ bool QueryPlanner::checkUsePatternTrick(
graphsToProcess.push_back(&arg._child2);
} else if constexpr (std::is_same_v<
T, GraphPatternOperation::Subquery>) {
for (const std::string& v :
arg._subquery._selectClause._selectedVariables) {
if (!arg._subquery.hasSelectClause()) {
usePatternTrick = false;
return;
}
const auto& selectClause = arg._subquery.selectClause();
for (const auto& v : selectClause._selectedVariables) {
if (v == t._o) {
usePatternTrick = false;
break;
Expand Down Expand Up @@ -802,7 +816,8 @@ bool QueryPlanner::checkUsePatternTrick(

// _____________________________________________________________________________
vector<QueryPlanner::SubtreePlan> QueryPlanner::getDistinctRow(
const ParsedQuery& pq, const vector<vector<SubtreePlan>>& dpTab) const {
const ParsedQuery::SelectClause& selectClause,
const vector<vector<SubtreePlan>>& dpTab) const {
const vector<SubtreePlan>& previous = dpTab[dpTab.size() - 1];
vector<SubtreePlan> added;
added.reserve(previous.size());
Expand All @@ -812,7 +827,7 @@ vector<QueryPlanner::SubtreePlan> QueryPlanner::getDistinctRow(
vector<size_t> keepIndices;
ad_utility::HashSet<size_t> indDone;
const auto& colMap = parent._qet->getVariableColumns();
for (const auto& var : pq._selectClause._selectedVariables) {
for (const auto& var : selectClause._selectedVariables) {
const auto it = colMap.find(var);
if (it != colMap.end()) {
auto ind = it->second;
Expand Down Expand Up @@ -884,6 +899,7 @@ vector<QueryPlanner::SubtreePlan> QueryPlanner::getDistinctRow(
vector<QueryPlanner::SubtreePlan> QueryPlanner::getPatternTrickRow(
const ParsedQuery& pq, const vector<vector<SubtreePlan>>& dpTab,
const SparqlTriple& patternTrickTriple) {
const auto& selectClause = pq.selectClause();
const vector<SubtreePlan>* previous = nullptr;
if (!dpTab.empty()) {
previous = &dpTab.back();
Expand Down Expand Up @@ -927,7 +943,7 @@ vector<QueryPlanner::SubtreePlan> QueryPlanner::getPatternTrickRow(
_qec, isSorted ? parent._qet : orderByPlan._qet, subjectColumn);

countPred->setVarNames(patternTrickTriple._o,
pq._selectClause._aliases[0]._outVarName);
selectClause._aliases[0]._outVarName);
QueryExecutionTree& tree = *patternTrickPlan._qet;
tree.setVariableColumns(countPred->getVariableColumns());
tree.setOperation(QueryExecutionTree::COUNT_AVAILABLE_PREDICATES,
Expand All @@ -941,7 +957,7 @@ vector<QueryPlanner::SubtreePlan> QueryPlanner::getPatternTrickRow(
std::make_shared<CountAvailablePredicates>(_qec, patternTrickTriple._s);

countPred->setVarNames(patternTrickTriple._o,
pq._selectClause._aliases[0]._outVarName);
selectClause._aliases[0]._outVarName);
QueryExecutionTree& tree = *patternTrickPlan._qet;
tree.setVariableColumns(countPred->getVariableColumns());
tree.setOperation(QueryExecutionTree::COUNT_AVAILABLE_PREDICATES,
Expand All @@ -952,9 +968,9 @@ vector<QueryPlanner::SubtreePlan> QueryPlanner::getPatternTrickRow(
SubtreePlan patternTrickPlan(_qec);
auto countPred = std::make_shared<CountAvailablePredicates>(_qec);

if (pq._selectClause._aliases.size() > 0) {
if (selectClause._aliases.size() > 0) {
countPred->setVarNames(patternTrickTriple._o,
pq._selectClause._aliases[0]._outVarName);
selectClause._aliases[0]._outVarName);
} else {
countPred->setVarNames(patternTrickTriple._o, generateUniqueVarName());
}
Expand Down Expand Up @@ -1003,8 +1019,12 @@ vector<QueryPlanner::SubtreePlan> QueryPlanner::getGroupByRow(
SubtreePlan groupByPlan(_qec);
groupByPlan._idsOfIncludedNodes = parent->_idsOfIncludedNodes;
groupByPlan._idsOfIncludedFilters = parent->_idsOfIncludedFilters;
std::vector<ParsedQuery::Alias> aliases;
if (pq.hasSelectClause()) {
aliases = pq.selectClause()._aliases;
}
auto groupBy = std::make_shared<GroupBy>(_qec, pq._groupByVariables,
pq._selectClause._aliases);
std::move(aliases));
RobinTF marked this conversation as resolved.
Show resolved Hide resolved
QueryExecutionTree& groupByTree = *groupByPlan._qet;

// Then compute the sort columns
Expand Down
3 changes: 2 additions & 1 deletion src/engine/QueryPlanner.h
Original file line number Diff line number Diff line change
Expand Up @@ -275,7 +275,8 @@ class QueryPlanner {
const ParsedQuery& pq, const vector<vector<SubtreePlan>>& dpTab) const;

vector<SubtreePlan> getDistinctRow(
const ParsedQuery& pq, const vector<vector<SubtreePlan>>& dpTab) const;
const ParsedQuery::SelectClause& selectClause,
const vector<vector<SubtreePlan>>& dpTab) const;

vector<SubtreePlan> getPatternTrickRow(
const ParsedQuery& pq, const vector<vector<SubtreePlan>>& dpTab,
Expand Down
48 changes: 42 additions & 6 deletions src/engine/Server.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -163,13 +163,15 @@ json Server::composeResponseQleverJson(const ParsedQuery& query,
off_t compResultUsecs = requestTimer.usecs();
size_t resultSize = rt->size();

const auto& selectClause = query.selectClause();

nlohmann::json j;

j["query"] = query._originalString;
j["status"] = "OK";
j["resultsize"] = resultSize;
j["warnings"] = qet.collectWarnings();
j["selected"] = query._selectClause._selectedVariables;
j["selected"] = selectClause._selectedVariables;

j["runtimeInformation"] = RuntimeInformation::ordered_json(
qet.getRootOperation()->getRuntimeInfo());
Expand All @@ -178,9 +180,8 @@ json Server::composeResponseQleverJson(const ParsedQuery& query,
size_t limit = query._limit.value_or(MAX_NOF_ROWS_IN_RESULT);
size_t offset = query._offset.value_or(0);
requestTimer.cont();
j["res"] =
qet.writeResultAsQLeverJson(query._selectClause._selectedVariables,
std::min(limit, maxSend), offset);
j["res"] = qet.writeResultAsQLeverJson(selectClause._selectedVariables,
std::min(limit, maxSend), offset);
requestTimer.stop();
}

Expand All @@ -204,7 +205,7 @@ json Server::composeResponseSparqlJson(const ParsedQuery& query,
size_t limit = query._limit.value_or(MAX_NOF_ROWS_IN_RESULT);
size_t offset = query._offset.value_or(0);
requestTimer.cont();
j = qet.writeResultAsSparqlJson(query._selectClause._selectedVariables,
j = qet.writeResultAsSparqlJson(query.selectClause()._selectedVariables,
std::min(limit, maxSend), offset);
requestTimer.stop();
return j;
Expand All @@ -215,10 +216,19 @@ ad_utility::stream_generator::stream_generator Server::composeResponseSepValues(
const ParsedQuery& query, const QueryExecutionTree& qet, char sep) {
size_t limit = query._limit.value_or(MAX_NOF_ROWS_IN_RESULT);
size_t offset = query._offset.value_or(0);
return qet.generateResults(query._selectClause._selectedVariables, limit,
return qet.generateResults(query.selectClause()._selectedVariables, limit,
offset, sep);
}

// _____________________________________________________________________________

ad_utility::stream_generator::stream_generator Server::composeTurtleResponse(
const ParsedQuery& query, const QueryExecutionTree& qet) {
size_t limit = query._limit.value_or(MAX_NOF_ROWS_IN_RESULT);
size_t offset = query._offset.value_or(0);
return qet.writeRdfGraphTurtle(query.constructClause(), limit, offset);
}

// _____________________________________________________________________________
json Server::composeExceptionJson(const string& query, const std::exception& e,
ad_utility::Timer& requestTimer) {
Expand Down Expand Up @@ -349,6 +359,8 @@ boost::asio::awaitable<void> Server::processQuery(
mediaType = ad_utility::MediaType::qleverJson;
} else if (containsParam("action", "sparql_json_export")) {
mediaType = ad_utility::MediaType::sparqlJson;
} else if (containsParam("action", "turtle_export")) {
mediaType = ad_utility::MediaType::turtle;
}

std::string_view acceptHeader = request.base()[http::field::accept];
Expand All @@ -371,23 +383,47 @@ boost::asio::awaitable<void> Server::processQuery(
AD_CHECK(mediaType.has_value());
switch (mediaType.value()) {
case ad_utility::MediaType::csv: {
if (pq.hasConstructClause()) {
throw std::runtime_error{
"CONSTRUCT queries only support turtle syntax right now"};
RobinTF marked this conversation as resolved.
Show resolved Hide resolved
}
auto responseGenerator = composeResponseSepValues(pq, qet, ',');
auto response = createOkResponse(std::move(responseGenerator), request,
ad_utility::MediaType::csv);
co_await send(std::move(response));
} break;
case ad_utility::MediaType::tsv: {
if (pq.hasConstructClause()) {
throw std::runtime_error{
"CONSTRUCT queries only support turtle syntax right now"};
}
auto responseGenerator = composeResponseSepValues(pq, qet, '\t');
auto response = createOkResponse(std::move(responseGenerator), request,
ad_utility::MediaType::tsv);
co_await send(std::move(response));
} break;
case ad_utility::MediaType::qleverJson: {
if (pq.hasConstructClause()) {
throw std::runtime_error{
"CONSTRUCT queries only support turtle syntax right now"};
}
// Normal case: JSON response
auto responseString =
composeResponseQleverJson(pq, qet, requestTimer, maxSend);
co_await sendJson(std::move(responseString));
} break;
case ad_utility::MediaType::turtle:
if (pq.hasConstructClause()) {
auto responseGenerator = composeTurtleResponse(pq, qet);
auto response =
createOkResponse(std::move(responseGenerator), request,
ad_utility::MediaType::turtle);
co_await send(std::move(response));
} else {
throw std::runtime_error{
"Turtle Syntax is only supported for CONSTRUCT queries"};
}
break;
case ad_utility::MediaType::sparqlJson: {
auto responseString =
composeResponseSparqlJson(pq, qet, requestTimer, maxSend);
Expand Down