Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order by aggregate #76

Merged
merged 4 commits into from
Jul 30, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
11 changes: 5 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
Status](https://travis-ci.org/ad-freiburg/QLever.svg?branch=master)](https://travis-ci.org/ad-freiburg/QLever)

QLever (pronounced "clever") is a query engine for efficient combined search on a knowledge base and a text corpus, in which named entities from the knowledge base have been identified.
The query language is SPARQL extended by three QLever-specific predicates `ql:contains-entity`, `ql:contains-word` and `ql:has-relation`. `ql:contains-entity` and `ql:contains-word` can express the occurrence of an entity or word (the object of the predicate) in a text record (the subject of the predicate). `ql:has-relation` can be used to efficiently count available predicates for a set of entities.
The query language is SPARQL extended by three QLever-specific predicates `ql:contains-entity`, `ql:contains-word` and `ql:has-predicate`. `ql:contains-entity` and `ql:contains-word` can express the occurrence of an entity or word (the object of the predicate) in a text record (the subject of the predicate). `ql:has-predicate` can be used to efficiently count available predicates for a set of entities.
Pure SPARQL is supported as well.

With this, it is possible to answer queries like the following one for astronauts who walked on the moon:
Expand Down Expand Up @@ -202,7 +202,7 @@ If you want support for SPARQL queries with predicate variables (perfectly norm

./IndexBuilderMain -i /path/to/myindex -n /path/to/input.nt -a -w

To generate a patterns file and include support for ql:has-relations:
To generate a patterns file and include support for ql:has-predicates:

./IndexBuilderMain -i /path/to/myindex -n /path/to/input.nt --patterns

Expand Down Expand Up @@ -347,19 +347,18 @@ Text / Knowledge-base data can be nested in queries. This allows queries like on
For now, each text-record variable is required to have a triple `ql:contains-word/entity WORD/URI`.
Pure connections to variables (e.g. "Books with a description that mentions a plant.") are planned for the future.

To obtain a list of available relations and their counts `ql:has-relation` can be used if the index was build with the `--patterns` option, and the server was started with the `--patterns` option:
To obtain a list of available predicates and their counts `ql:has-predicate` can be used if the index was build with the `--patterns` option, and the server was started with the `--patterns` option:

SELECT ?relations (COUNT(?relations) as ?count) WHERE {
?s <is-a> <Scientist> .
?t2 ql:contains-entity ?s .
?t2 ql:contains-word "manhattan project" .
?s ql:has-relation ?relations .
?s ql:has-predicate ?relations .
}
GROUP BY ?relations
ORDER BY DESC(?count)

As of yet using ql:has-relation in any other form of query (apart from adding more triples in the WHERE part) ist not supported.
In particular ql:has-relation can not be used as a normal predicate to add all available relations to the current solution.
`ql:has-predicate` can also be used as a normal predicate in an arbitrary query.

Group by is supported, but aggregate aliases may currently only be used within the SELECT part of the query:

Expand Down
42 changes: 37 additions & 5 deletions e2e/scientists_queries.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,38 @@ queries:
- selected: ["?count", "?place"]
- contains_row: [280, "<New_York_City>"]
- order_numeric: {"dir": "DESC", "var": "?count"}
- query: scientists-order-by-aggregate-count
solutions:
- type: no-text
sparql: |
SELECT ?place (COUNT(?x) as ?count2) WHERE {
?x <is-a> <Scientist> .
?x <Place_of_birth> ?place .
}
GROUP BY ?place
ORDER BY DESC((COUNT(?x) as ?count))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should also test another aggregate like AVG?

checks:
- num_cols: 2
# The query returns to many rows, the current limit is 4096
# - num_rows: 5295
- selected: ["?place", "?count2"]
- order_numeric: {"dir": "DESC", "var": "?count2"}
- query: scientists-order-by-aggregate-avg
solutions:
- type: no-text
sparql: |
SELECT ?profession (AVG(?height) as ?avg2) WHERE {
?x <is-a> <Scientist> .
?x <Profession> ?profession .
?x <Height> ?height .
}
GROUP BY ?profession
ORDER BY ASC((AVG(?height) as ?avg))
checks:
- num_cols: 2
- num_rows: 209
- selected: ["?profession", "?avg2"]
- order_numeric: {"dir": "ASC", "var": "?avg2"}
- query: group-by-profession-average-height
solutions:
- type: no-text
Expand Down Expand Up @@ -132,7 +164,7 @@ queries:
sparql: |
SELECT ?r (COUNT(?r) as ?count) WHERE {
?a <is-a> <Scientist> .
?a ql:has-relation ?r .
?a ql:has-predicate ?r .
}
GROUP BY ?r
ORDER BY DESC(?count)
Expand All @@ -142,26 +174,26 @@ queries:
- selected: ["?r", "?count"]
- contains_row: ["<Religion>", 1185]
- order_numeric: {"dir": "DESC", "var": "?count"}
- query : has-relation-full
- query : has-predicate-full
solutions:
- type: no-text
sparql: |
SELECT ?entity ?relation WHERE {
?entity ql:has-relation ?relation .
?entity ql:has-predicate ?relation .
}
checks:
# The number o rows is greater than the current limit of 4096.
# - num_rows: 168444
- num_cols: 2
- selected: ["?entity", "?relation"]
- contains_row: ["<Alan_Fersht>", "<Leader_of>"]
- query : has-relation-subquery-subject
- query : has-predicate-subquery-subject
solutions:
- type: no-text
sparql: |
SELECT ?entity ?r WHERE {
?entity <is-a> <Profession> .
?entity ql:has-relation ?r.
?entity ql:has-predicate ?r.
}
checks:
- num_rows: 760
Expand Down
2 changes: 1 addition & 1 deletion src/ServerMain.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ void printUsage(char* execName) {
cout << " " << std::setw(20) << "p, port" << std::setw(1) << " "
<< "The port on which to run the web interface." << endl;
cout << " " << std::setw(20) << "P, patterns" << std::setw(1) << " "
<< "Use relation patterns for fast ql:has-relation queries." << endl;
<< "Use predicate patterns to enable ql:has-predicate queries." << endl;
cout << " " << std::setw(20) << "t, text" << std::setw(1) << " "
<< "Enables the usage of text." << endl;
cout << " " << std::setw(20) << "j, worker-threads" << std::setw(1) << " "
Expand Down
2 changes: 1 addition & 1 deletion src/engine/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ add_library(engine
OptionalJoin.cpp OptionalJoin.h
CountAvailablePredicates.cpp CountAvailablePredicates.h
GroupBy.cpp GroupBy.h
HasRelationScan.cpp HasRelationScan.h
HasPredicateScan.cpp HasPredicateScan.h
)

target_link_libraries(engine index parser)
16 changes: 8 additions & 8 deletions src/engine/CountAvailablePredicates.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -92,8 +92,8 @@ void CountAvailablePredicates::computeResult(ResultTable* result) const {

const std::vector<PatternID>& hasPattern =
_executionContext->getIndex().getHasPattern();
const CompactStringVector<Id, Id>& hasRelation =
_executionContext->getIndex().getHasRelation();
const CompactStringVector<Id, Id>& hasPredicate =
_executionContext->getIndex().getHasPredicate();
const CompactStringVector<size_t, Id>& patterns =
_executionContext->getIndex().getPatterns();

Expand All @@ -103,33 +103,33 @@ void CountAvailablePredicates::computeResult(ResultTable* result) const {
Engine::computePatternTrick<vector<Id>>(
&subresult->_varSizeData,
static_cast<vector<array<Id, 2>>*>(result->_fixedSizeData), hasPattern,
hasRelation, patterns, _subjectColumnIndex);
hasPredicate, patterns, _subjectColumnIndex);
} else {
if (subresult->_nofColumns == 1) {
Engine::computePatternTrick<array<Id, 1>>(
static_cast<vector<array<Id, 1>>*>(subresult->_fixedSizeData),
static_cast<vector<array<Id, 2>>*>(result->_fixedSizeData),
hasPattern, hasRelation, patterns, _subjectColumnIndex);
hasPattern, hasPredicate, patterns, _subjectColumnIndex);
} else if (subresult->_nofColumns == 2) {
Engine::computePatternTrick<array<Id, 2>>(
static_cast<vector<array<Id, 2>>*>(subresult->_fixedSizeData),
static_cast<vector<array<Id, 2>>*>(result->_fixedSizeData),
hasPattern, hasRelation, patterns, _subjectColumnIndex);
hasPattern, hasPredicate, patterns, _subjectColumnIndex);
} else if (subresult->_nofColumns == 3) {
Engine::computePatternTrick<array<Id, 3>>(
static_cast<vector<array<Id, 3>>*>(subresult->_fixedSizeData),
static_cast<vector<array<Id, 2>>*>(result->_fixedSizeData),
hasPattern, hasRelation, patterns, _subjectColumnIndex);
hasPattern, hasPredicate, patterns, _subjectColumnIndex);
} else if (subresult->_nofColumns == 4) {
Engine::computePatternTrick<array<Id, 4>>(
static_cast<vector<array<Id, 4>>*>(subresult->_fixedSizeData),
static_cast<vector<array<Id, 2>>*>(result->_fixedSizeData),
hasPattern, hasRelation, patterns, _subjectColumnIndex);
hasPattern, hasPredicate, patterns, _subjectColumnIndex);
} else if (subresult->_nofColumns == 5) {
Engine::computePatternTrick<array<Id, 5>>(
static_cast<vector<array<Id, 5>>*>(subresult->_fixedSizeData),
static_cast<vector<array<Id, 2>>*>(result->_fixedSizeData),
hasPattern, hasRelation, patterns, _subjectColumnIndex);
hasPattern, hasPredicate, patterns, _subjectColumnIndex);
}
}
result->finish();
Expand Down
8 changes: 4 additions & 4 deletions src/engine/Engine.h
Original file line number Diff line number Diff line change
Expand Up @@ -646,7 +646,7 @@ class Engine {
* @param result A table with two columns, one for predicate ids,
* one for counts
* @param hasPattern A mapping from entity ids to pattern ids (or NO_PATTERN)
* @param hasRelation A mapping from entity ids to sets of relations
* @param hasPredicate A mapping from entity ids to sets of relations
* @param patterns A mapping from pattern ids to patterns
* @param subjectColumn The column containing the entities for which the
* relations should be counted.
Expand All @@ -655,7 +655,7 @@ class Engine {
static void computePatternTrick(
const vector<A>* input, vector<array<Id, 2>>* result,
const vector<PatternID>& hasPattern,
const CompactStringVector<Id, Id>& hasRelation,
const CompactStringVector<Id, Id>& hasPredicate,
const CompactStringVector<size_t, Id>& patterns,
const size_t subjectColumn) {
ad_utility::HashMap<Id, size_t> predicateCounts;
Expand All @@ -672,11 +672,11 @@ class Engine {
if (subject < hasPattern.size() && hasPattern[subject] != NO_PATTERN) {
// The subject matches a pattern
patternCounts[hasPattern[subject]]++;
} else if (subject < hasRelation.size()) {
} else if (subject < hasPredicate.size()) {
// The subject does not match a pattern
size_t numPredicates;
Id* predicateData;
std::tie(predicateData, numPredicates) = hasRelation[subject];
std::tie(predicateData, numPredicates) = hasPredicate[subject];
if (numPredicates > 0) {
for (size_t i = 0; i < numPredicates; i++) {
auto it = predicateCounts.find(predicateData[i]);
Expand Down