Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language Filter #71

Merged
merged 3 commits into from Jul 8, 2018
Merged

Conversation

floriankramer
Copy link
Member

This adds a naive implementation of a language filter. The filter only supports one language and is not very fast, but as the feature is apparently rather useful it might be worth including it in the master branch as a temporary solution (as a more efficient filter is probably going to require another index structure to support it, making it a larger change.)
The branch is based upon my current version of the grouping branch, and should only be merged once #69 has been approved and merged.

@floriankramer floriankramer force-pushed the language_filter branch 3 times, most recently from 96bfdec to 63f0d49 Compare July 4, 2018 15:47
@niklas88
Copy link
Member

niklas88 commented Jul 5, 2018

So I guess @Buchhold's concerns about the Scientist Collection not being ideal already have a point in that I'm not sure whether we can add a test query for language filtering on that collection. @floriankramer what do you think?

Copy link
Member

@niklas88 niklas88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM though I'm not super happy about the case-insensitivity hacks. I think a ad_utility function for case-insensitive string matching would make sense here.

@@ -1266,12 +1273,18 @@ void QueryPlanner::applyFiltersIfPossible(
entityId = _qec->getIndex().getVocab().getValueIdForLT(compWith);
} else if (filters[i]._type == SparqlFilter::LE) {
entityId = _qec->getIndex().getVocab().getValueIdForLE(compWith);
} else if (filters[i]._type == SparqlFilter::LANG_MATCHES) {
entityId = std::numeric_limits<size_t>::max() - 1;
}
}
std::shared_ptr<Operation> filter(
new Filter(_qec, row[n]._qet, filters[i]._type,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use std::make_shared, also why is the right hand side string not set in the constructor like the others?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently only the LANG_MATCHES filter requires an actual string as its right hand sight, all other types of filters work with an id. Due to that, to avoid adding an argument to the constructor that is not needed in most cases, the right hand side string is set separately right now.
A potentially better alternative solution would be adding a second constructor, that takes a string for the right hand side and then some assertions to assure that the right constructor is called for the right type of filter.

@@ -189,8 +189,11 @@ void ParsedQuery::parseAliases() {
a._isAggregate = true;
size_t pos = inner.find("as");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of checking for both "AS" and "as", "DISTINCT" and "distinct" we could have an ad_utility function(s) that does case-insensitive checks and replace the above ad_utility::startsWith() as well as the string::find()s below. I think we can't just lowercase inner because that may contain variable names which are case sensitive. According to the spec SPARQL is suppposed to be case-insensitive by the way, see also #17. Long-term we really should write/generate a proper parser based on the grammar though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a temporary solution I changed all the finds and startsWiths to run on a lowercase version of inner. This preserves the spelling of variables but makes all keywords case insensitive.
A proper parser for SPARQL sounds like a very good idea though.

@niklas88 niklas88 merged commit b332455 into ad-freiburg:master Jul 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants