Add support SurroundQueryParser to jena-text #536

DamienFontaine · 2019-02-21T13:53:31Z

PR to add SurroundQueryParser in jena-text.

rvesse · 2019-02-22T19:53:35Z

Not that it's a blocker for this PR but anytime I see this kind of code pattern in Jena (and Java in general) I think that we really should be using the ServiceLoader pattern. This would make stuff like this dynamically extensible so we can have some basic defaults out of the box with an easy drop-in mechanism to discover user provided extensions. And text search does seem to be an area where end users want to do a lot of customisation

ajs6f · 2019-02-25T13:37:39Z

@rvesse I agree, although for the near-term it's worth watching out for OSGi problems there. As JPMS becomes more widely used and useful, that concern will go away.

osma · 2019-02-25T16:21:09Z

I think it's a pity this PR merges the two methods into one, making the code a bit messier and more difficult to understand. Is that because SurroundQueryParser doesn't implement the normal QueryParser interface?

Other than that, I don't have any objections to this PR, it even comes with unit tests. Docs have to be updated too after the merge.

DamienFontaine · 2019-02-25T17:25:56Z

@osma Yes, the SurroundQueryParser doesn't implement org.apache.lucene.queryparser.classic.QueryParser but all parsers can return a org.apache.lucene.search.Query.

afs · 2019-03-20T15:51:05Z

All - where are we on this PR? Should it be merged, things being as they are in Lucene, or are there specific changes to make?

And documentation?

jena-text/src/main/java/org/apache/jena/query/text/TextIndexLucene.java

afs · 2019-03-21T22:38:45Z

JENA-1690.

osma · 2019-03-22T16:26:17Z

@afs Thanks for the reminder

@DamienFontaine Any chance to refactor parseQuery just a little bit? There is quite a lot of redundancy and repetition now.

For example

combine QueryParser and AnalyzingQueryParser cases into one, since their code is (nowadays) identical
handle the SurroundQueryParser case separate from the others since it's treated differently - for the others, avoid repeating the same boilerplate (e.g. qp.setAllowLeadingWildcard(true)) over and over

Another option would be to create a class that wraps SurroundQueryParser giving it a QueryParser interface, so that it can be handled just like the others (ie revert the merging of two methods). But that may be difficult in practice - I assume there's a reason why SurroundQueryParser isn't a QueryParser in Lucene already although I haven't looked at it in detail.

If we can get the line count in TextIndexLucene down by a dozen or so, I'm happy. And of course we need some kind of promise to update the jena-text documentation accordingly after this gets merged.

xristy · 2019-03-22T19:45:10Z

Thanks for patience w/ my lack of response - slammed is all I can say. In any event, I agree with Osma’s comments. After the merge, then we can see what sort of interactions with other features, in particular, multi-language, occur. I’m not confident that they will play well together which would have to be clarified in the documentation. If there are other query parsers that are desirable to add in the future it might be worth extending the assembler a bit to handle such extensions. Regards, Chris

…

On Mar 22, 2019, at 11:26 AM, Osma Suominen ***@***.***> wrote: @afs <https://github.com/afs> Thanks for the reminder @DamienFontaine <https://github.com/DamienFontaine> Any chance to refactor parseQuery just a little bit? There is quite a lot of redundancy and repetition now. For example combine QueryParser and AnalyzingQueryParser cases into one, since their code is (nowadays) identical handle the SurroundQueryParser case separate from the others since it's treated differently - for the others, avoid repeating the same boilerplate (e.g. qp.setAllowLeadingWildcard(true)) over and over Another option would be to create a class that wraps SurroundQueryParser giving it a QueryParser interface, so that it can be handled just like the others (ie revert the merging of two methods). But that may be difficult in practice - I assume there's a reason why SurroundQueryParser isn't a QueryParser in Lucene already although I haven't looked at it in detail. If we can get the line count in TextIndexLucene down by a dozen or so, I'm happy. And of course we need some kind of promise to update the jena-text documentation accordingly after this gets merged. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#536 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADYNFlCnqgmgjtGlhWCNchd6tkQR-6Seks5vZQQrgaJpZM4bHfHA>.

afs · 2019-03-30T11:37:27Z

Hi all - where are we on this PR?

If there are some approvals from @xristy / @osma / other Jena committers, I can go the merge.

xristy · 2019-03-30T15:46:10Z

The switch seems not correct yet:

        case "AnalyzingQueryParser":
            if (qp == null) qp = new QueryParser(docDef.getPrimaryField(), analyzer);
        case "QueryParser":
            if (qp == null) {
                log.warn("Deprecated query parser type '" + queryParserType + "'. Defaulting to standard QueryParser");
                qp = new QueryParser(docDef.getPrimaryField(), analyzer);
            }
        default:
            if(qp  == null) {
                log.warn("Unknown query parser type '" + queryParserType + "'. Defaulting to standard QueryParser") ;
                qp = new QueryParser(docDef.getPrimaryField(), analyzer);
            }
            qp.setAllowLeadingWildcard(true);
            query = qp.parse(queryString);
    }
    return query ;
}

It should be more like:

        case "AnalyzingQueryParser":
            if (qp == null) {
               log.warn("Deprecated query parser type '" + queryParserType + "'. Defaulting to standard QueryParser");
            }
        case "QueryParser":
            if (qp == null) {
                qp = new QueryParser(docDef.getPrimaryField(), analyzer);
            }
        default:
            if (qp  == null) {
                log.warn("Unknown query parser type '" + queryParserType + "'. Defaulting to standard QueryParser") ;
                qp = new QueryParser(docDef.getPrimaryField(), analyzer);
            }
            qp.setAllowLeadingWildcard(true);
            query = qp.parse(queryString);
    }
    return query ;
}

otherwise the log.warn fires, incorrectly, for the case "QueryParser" and qp is redundantly assigned for case "AnalyzingQueryParser"

osma · 2019-04-01T13:00:46Z

I still don't get the switch block. The control flow seems just...messy, with all the if checks and sometimes implicitly falling through to the next case, sometimes not.

How about this:

        switch(queryParserType) {
            case "SurroundQueryParser":
                try {
                    query = org.apache.lucene.queryparser.surround.parser.QueryParser.parse(queryString).makeLuceneQueryField(docDef.getPrimaryField(), new BasicQueryFactory());
                } catch(org.apache.lucene.queryparser.surround.parser.ParseException e) {
                    throw new ParseException(e.getMessage());
                }
                return query;
            case "ComplexPhraseQueryParser":
                qp = new ComplexPhraseQueryParser(docDef.getPrimaryField(), analyzer);
                break;
            default:
                log.warn("Unknown query parser type '" + queryParserType + "'. Defaulting to standard QueryParser");
            case "AnalyzingQueryParser": // since Lucene 7 analyzing is done by QueryParser
            case "QueryParser":
                qp = new QueryParser(docDef.getPrimaryField(), analyzer);
            }
        }
        qp.setAllowLeadingWildcard(true);
        query = qp.parse(queryString);
        return query ;

xristy · 2019-04-01T14:41:54Z

@osma cleaner, and I thought it was a good idea to alert users that they are using a deprecated query parser if they refer to AnalyzingQueryParser. How about:

    private Query parseQuery(String queryString, Analyzer analyzer) throws ParseException {
        Query query = null;
        QueryParser qp = new QueryParser(docDef.getPrimaryField(), analyzer);

        switch(queryParserType) {
            case "SurroundQueryParser":
                try {
                    query = org.apache.lucene.queryparser.surround.parser.QueryParser.parse(queryString).makeLuceneQueryField(docDef.getPrimaryField(), new BasicQueryFactory());
                } catch(org.apache.lucene.queryparser.surround.parser.ParseException e) {
                    throw new ParseException(e.getMessage());
                }
                return query;
            case "ComplexPhraseQueryParser":
                qp = new ComplexPhraseQueryParser(docDef.getPrimaryField(), analyzer);
                break;
            case "AnalyzingQueryParser": // since Lucene 7 analyzing is done by QueryParser
                log.warn("Deprecated query parser type '" + queryParserType + "'. Defaulting to standard QueryParser");
                break;
            default:
                log.warn("Unknown query parser type '" + queryParserType + "'. Defaulting to standard QueryParser");
        }

        qp.setAllowLeadingWildcard(true);
        query = qp.parse(queryString);
        return query ;
    }

I expect that most uses end up with the standard QueryParser.

osma · 2019-04-01T14:55:00Z

I'm okay with @xristy's version, although QueryParser is needlessly instantiated in some cases. But I'd choose clear code over premature optimization any day!

xristy · 2019-04-01T15:15:59Z

I also had second thoughts about the needless instantiation and almost edited the previous as follows:

    private Query parseQuery(String queryString, Analyzer analyzer) throws ParseException {
        Query query = null;
        QueryParser qp = null;

        switch(queryParserType) {
            case "SurroundQueryParser":
                try {
                    query = org.apache.lucene.queryparser.surround.parser.QueryParser.parse(queryString).makeLuceneQueryField(docDef.getPrimaryField(), new BasicQueryFactory());
                } catch(org.apache.lucene.queryparser.surround.parser.ParseException e) {
                    throw new ParseException(e.getMessage());
                }
                return query;
            case "ComplexPhraseQueryParser":
                qp = new ComplexPhraseQueryParser(docDef.getPrimaryField(), analyzer);
                break;
            case "AnalyzingQueryParser": // since Lucene 7 analyzing is done by QueryParser
                log.warn("Deprecated query parser type 'AnalyzingQueryParser'. Defaulting to standard QueryParser");
                break;
            default:
                log.warn("Unknown query parser type '" + queryParserType + "'. Defaulting to standard QueryParser");
        }

        if (qp == null) 
            qp = new QueryParser(docDef.getPrimaryField(), analyzer);
        qp.setAllowLeadingWildcard(true);
        query = qp.parse(queryString);
        return query ;
    }

osma · 2019-04-01T15:29:51Z

Either version is fine.

DamienFontaine · 2019-04-02T08:36:28Z

Is everyone agrees with the last @xristy version ?

xristy · 2019-04-02T13:55:38Z

+1 for last version

rvesse · 2019-04-04T08:52:52Z

Merged, thanks again for the contribution

If you have chance to update the docs to mention this new capability that would also be great - http://jena.apache.org/getting_involved/index.html#improving-the-website

The easiest way to do this is just to go the relevant page of the docs and hit the "Improve this Page" link in the top corner and your edits will generate a patch that will be sent to us for review

rvesse · 2019-04-04T17:10:44Z

@DamienFontaine Thanks, doc changes added and visible on the staging site - http://jena.staging.apache.org/documentation/query/text-query.html

We typically only publish the site to production when a release happens so this won't show up in the public site until the next release happens

Add support SurroundQueryParser to jena-text

c3e39f3

afs requested a review from xristy March 20, 2019 15:47

afs approved these changes Mar 21, 2019

View reviewed changes

jena-text/src/main/java/org/apache/jena/query/text/TextIndexLucene.java Outdated Show resolved Hide resolved

Fix space indentation

399bac3

Refactor parseQuery function

58a66b4

Add AnalyzingQueryParser in parseQuery

36efc43

rvesse approved these changes Apr 2, 2019

View reviewed changes

Refactor ParseQuery function

713678f

rvesse merged commit 6254f27 into apache:master Apr 4, 2019

afs mentioned this pull request Apr 25, 2019

JENA-1706: Handle "QueryParser"; suppress logging WARN #560

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support SurroundQueryParser to jena-text #536

Add support SurroundQueryParser to jena-text #536

DamienFontaine commented Feb 21, 2019

rvesse commented Feb 22, 2019

ajs6f commented Feb 25, 2019

osma commented Feb 25, 2019

DamienFontaine commented Feb 25, 2019

afs commented Mar 20, 2019

afs commented Mar 21, 2019

osma commented Mar 22, 2019

xristy commented Mar 22, 2019 via email

afs commented Mar 30, 2019

xristy commented Mar 30, 2019

osma commented Apr 1, 2019

xristy commented Apr 1, 2019

osma commented Apr 1, 2019

xristy commented Apr 1, 2019

osma commented Apr 1, 2019

DamienFontaine commented Apr 2, 2019

xristy commented Apr 2, 2019

rvesse commented Apr 4, 2019

rvesse commented Apr 4, 2019

Add support SurroundQueryParser to jena-text #536

Add support SurroundQueryParser to jena-text #536

Conversation

DamienFontaine commented Feb 21, 2019

rvesse commented Feb 22, 2019

ajs6f commented Feb 25, 2019

osma commented Feb 25, 2019

DamienFontaine commented Feb 25, 2019

afs commented Mar 20, 2019

afs commented Mar 21, 2019

osma commented Mar 22, 2019

xristy commented Mar 22, 2019 via email

afs commented Mar 30, 2019

xristy commented Mar 30, 2019

osma commented Apr 1, 2019

xristy commented Apr 1, 2019

osma commented Apr 1, 2019

xristy commented Apr 1, 2019

osma commented Apr 1, 2019

DamienFontaine commented Apr 2, 2019

xristy commented Apr 2, 2019

rvesse commented Apr 4, 2019

rvesse commented Apr 4, 2019