New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support SurroundQueryParser to jena-text #536
Conversation
Not that it's a blocker for this PR but anytime I see this kind of code pattern in Jena (and Java in general) I think that we really should be using the |
@rvesse I agree, although for the near-term it's worth watching out for OSGi problems there. As JPMS becomes more widely used and useful, that concern will go away. |
I think it's a pity this PR merges the two methods into one, making the code a bit messier and more difficult to understand. Is that because SurroundQueryParser doesn't implement the normal QueryParser interface? Other than that, I don't have any objections to this PR, it even comes with unit tests. Docs have to be updated too after the merge. |
@osma Yes, the SurroundQueryParser doesn't implement org.apache.lucene.queryparser.classic.QueryParser but all parsers can return a org.apache.lucene.search.Query. |
All - where are we on this PR? Should it be merged, things being as they are in Lucene, or are there specific changes to make? And documentation? |
jena-text/src/main/java/org/apache/jena/query/text/TextIndexLucene.java
Outdated
Show resolved
Hide resolved
@afs Thanks for the reminder @DamienFontaine Any chance to refactor For example
Another option would be to create a class that wraps SurroundQueryParser giving it a QueryParser interface, so that it can be handled just like the others (ie revert the merging of two methods). But that may be difficult in practice - I assume there's a reason why SurroundQueryParser isn't a QueryParser in Lucene already although I haven't looked at it in detail. If we can get the line count in TextIndexLucene down by a dozen or so, I'm happy. And of course we need some kind of promise to update the jena-text documentation accordingly after this gets merged. |
Thanks for patience w/ my lack of response - slammed is all I can say.
In any event, I agree with Osma’s comments. After the merge, then we can see what sort of interactions with other features, in particular, multi-language, occur. I’m not confident that they will play well together which would have to be clarified in the documentation.
If there are other query parsers that are desirable to add in the future it might be worth extending the assembler a bit to handle such extensions.
Regards,
Chris
… On Mar 22, 2019, at 11:26 AM, Osma Suominen ***@***.***> wrote:
@afs <https://github.com/afs> Thanks for the reminder
@DamienFontaine <https://github.com/DamienFontaine> Any chance to refactor parseQuery just a little bit? There is quite a lot of redundancy and repetition now.
For example
combine QueryParser and AnalyzingQueryParser cases into one, since their code is (nowadays) identical
handle the SurroundQueryParser case separate from the others since it's treated differently - for the others, avoid repeating the same boilerplate (e.g. qp.setAllowLeadingWildcard(true)) over and over
Another option would be to create a class that wraps SurroundQueryParser giving it a QueryParser interface, so that it can be handled just like the others (ie revert the merging of two methods). But that may be difficult in practice - I assume there's a reason why SurroundQueryParser isn't a QueryParser in Lucene already although I haven't looked at it in detail.
If we can get the line count in TextIndexLucene down by a dozen or so, I'm happy. And of course we need some kind of promise to update the jena-text documentation accordingly after this gets merged.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#536 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADYNFlCnqgmgjtGlhWCNchd6tkQR-6Seks5vZQQrgaJpZM4bHfHA>.
|
The switch seems not correct yet:
It should be more like:
otherwise the |
I still don't get the How about this: switch(queryParserType) {
case "SurroundQueryParser":
try {
query = org.apache.lucene.queryparser.surround.parser.QueryParser.parse(queryString).makeLuceneQueryField(docDef.getPrimaryField(), new BasicQueryFactory());
} catch(org.apache.lucene.queryparser.surround.parser.ParseException e) {
throw new ParseException(e.getMessage());
}
return query;
case "ComplexPhraseQueryParser":
qp = new ComplexPhraseQueryParser(docDef.getPrimaryField(), analyzer);
break;
default:
log.warn("Unknown query parser type '" + queryParserType + "'. Defaulting to standard QueryParser");
case "AnalyzingQueryParser": // since Lucene 7 analyzing is done by QueryParser
case "QueryParser":
qp = new QueryParser(docDef.getPrimaryField(), analyzer);
}
}
qp.setAllowLeadingWildcard(true);
query = qp.parse(queryString);
return query ; |
@osma cleaner, and I thought it was a good idea to alert users that they are using a deprecated query parser if they refer to private Query parseQuery(String queryString, Analyzer analyzer) throws ParseException {
Query query = null;
QueryParser qp = new QueryParser(docDef.getPrimaryField(), analyzer);
switch(queryParserType) {
case "SurroundQueryParser":
try {
query = org.apache.lucene.queryparser.surround.parser.QueryParser.parse(queryString).makeLuceneQueryField(docDef.getPrimaryField(), new BasicQueryFactory());
} catch(org.apache.lucene.queryparser.surround.parser.ParseException e) {
throw new ParseException(e.getMessage());
}
return query;
case "ComplexPhraseQueryParser":
qp = new ComplexPhraseQueryParser(docDef.getPrimaryField(), analyzer);
break;
case "AnalyzingQueryParser": // since Lucene 7 analyzing is done by QueryParser
log.warn("Deprecated query parser type '" + queryParserType + "'. Defaulting to standard QueryParser");
break;
default:
log.warn("Unknown query parser type '" + queryParserType + "'. Defaulting to standard QueryParser");
}
qp.setAllowLeadingWildcard(true);
query = qp.parse(queryString);
return query ;
} I expect that most uses end up with the standard |
I'm okay with @xristy's version, although QueryParser is needlessly instantiated in some cases. But I'd choose clear code over premature optimization any day! |
I also had second thoughts about the needless instantiation and almost edited the previous as follows: private Query parseQuery(String queryString, Analyzer analyzer) throws ParseException {
Query query = null;
QueryParser qp = null;
switch(queryParserType) {
case "SurroundQueryParser":
try {
query = org.apache.lucene.queryparser.surround.parser.QueryParser.parse(queryString).makeLuceneQueryField(docDef.getPrimaryField(), new BasicQueryFactory());
} catch(org.apache.lucene.queryparser.surround.parser.ParseException e) {
throw new ParseException(e.getMessage());
}
return query;
case "ComplexPhraseQueryParser":
qp = new ComplexPhraseQueryParser(docDef.getPrimaryField(), analyzer);
break;
case "AnalyzingQueryParser": // since Lucene 7 analyzing is done by QueryParser
log.warn("Deprecated query parser type 'AnalyzingQueryParser'. Defaulting to standard QueryParser");
break;
default:
log.warn("Unknown query parser type '" + queryParserType + "'. Defaulting to standard QueryParser");
}
if (qp == null)
qp = new QueryParser(docDef.getPrimaryField(), analyzer);
qp.setAllowLeadingWildcard(true);
query = qp.parse(queryString);
return query ;
} |
Either version is fine. |
Is everyone agrees with the last @xristy version ? |
+1 for last version |
Merged, thanks again for the contribution If you have chance to update the docs to mention this new capability that would also be great - http://jena.apache.org/getting_involved/index.html#improving-the-website The easiest way to do this is just to go the relevant page of the docs and hit the "Improve this Page" link in the top corner and your edits will generate a patch that will be sent to us for review |
@DamienFontaine Thanks, doc changes added and visible on the staging site - http://jena.staging.apache.org/documentation/query/text-query.html We typically only publish the site to production when a release happens so this won't show up in the public site until the next release happens |
PR to add SurroundQueryParser in jena-text.