New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Language Filter #71
Language Filter #71
Conversation
96bfdec
to
63f0d49
Compare
So I guess @Buchhold's concerns about the Scientist Collection not being ideal already have a point in that I'm not sure whether we can add a test query for language filtering on that collection. @floriankramer what do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM though I'm not super happy about the case-insensitivity hacks. I think a ad_utility
function for case-insensitive string matching would make sense here.
@@ -1266,12 +1273,18 @@ void QueryPlanner::applyFiltersIfPossible( | |||
entityId = _qec->getIndex().getVocab().getValueIdForLT(compWith); | |||
} else if (filters[i]._type == SparqlFilter::LE) { | |||
entityId = _qec->getIndex().getVocab().getValueIdForLE(compWith); | |||
} else if (filters[i]._type == SparqlFilter::LANG_MATCHES) { | |||
entityId = std::numeric_limits<size_t>::max() - 1; | |||
} | |||
} | |||
std::shared_ptr<Operation> filter( | |||
new Filter(_qec, row[n]._qet, filters[i]._type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use std::make_shared
, also why is the right hand side string not set in the constructor like the others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently only the LANG_MATCHES filter requires an actual string as its right hand sight, all other types of filters work with an id. Due to that, to avoid adding an argument to the constructor that is not needed in most cases, the right hand side string is set separately right now.
A potentially better alternative solution would be adding a second constructor, that takes a string for the right hand side and then some assertions to assure that the right constructor is called for the right type of filter.
src/parser/ParsedQuery.cpp
Outdated
@@ -189,8 +189,11 @@ void ParsedQuery::parseAliases() { | |||
a._isAggregate = true; | |||
size_t pos = inner.find("as"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe instead of checking for both "AS" and "as", "DISTINCT" and "distinct" we could have an ad_utility
function(s) that does case-insensitive checks and replace the above ad_utility::startsWith()
as well as the string::find()s
below. I think we can't just lowercase inner
because that may contain variable names which are case sensitive. According to the spec SPARQL is suppposed to be case-insensitive by the way, see also #17. Long-term we really should write/generate a proper parser based on the grammar though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a temporary solution I changed all the find
s and startsWith
s to run on a lowercase version of inner
. This preserves the spelling of variables but makes all keywords case insensitive.
A proper parser for SPARQL sounds like a very good idea though.
Changed the parsing to look for keywords in a lowercase version of the string.
63f0d49
to
2ea1f0c
Compare
This adds a naive implementation of a language filter. The filter only supports one language and is not very fast, but as the feature is apparently rather useful it might be worth including it in the master branch as a temporary solution (as a more efficient filter is probably going to require another index structure to support it, making it a larger change.)
The branch is based upon my current version of the grouping branch, and should only be merged once #69 has been approved and merged.