Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support SurroundQueryParser to jena-text #536

Merged
merged 5 commits into from Apr 4, 2019

Conversation

DamienFontaine
Copy link
Contributor

PR to add SurroundQueryParser in jena-text.

@rvesse
Copy link
Member

rvesse commented Feb 22, 2019

Pinging @osma @xristy

Not that it's a blocker for this PR but anytime I see this kind of code pattern in Jena (and Java in general) I think that we really should be using the ServiceLoader pattern. This would make stuff like this dynamically extensible so we can have some basic defaults out of the box with an easy drop-in mechanism to discover user provided extensions. And text search does seem to be an area where end users want to do a lot of customisation

@ajs6f
Copy link
Member

ajs6f commented Feb 25, 2019

@rvesse I agree, although for the near-term it's worth watching out for OSGi problems there. As JPMS becomes more widely used and useful, that concern will go away.

@osma
Copy link
Contributor

osma commented Feb 25, 2019

I think it's a pity this PR merges the two methods into one, making the code a bit messier and more difficult to understand. Is that because SurroundQueryParser doesn't implement the normal QueryParser interface?

Other than that, I don't have any objections to this PR, it even comes with unit tests. Docs have to be updated too after the merge.

@DamienFontaine
Copy link
Contributor Author

@osma Yes, the SurroundQueryParser doesn't implement org.apache.lucene.queryparser.classic.QueryParser but all parsers can return a org.apache.lucene.search.Query.

@afs afs requested a review from xristy March 20, 2019 15:47
@afs
Copy link
Member

afs commented Mar 20, 2019

All - where are we on this PR? Should it be merged, things being as they are in Lucene, or are there specific changes to make?

And documentation?

@afs
Copy link
Member

afs commented Mar 21, 2019

JENA-1690.

@osma
Copy link
Contributor

osma commented Mar 22, 2019

@afs Thanks for the reminder

@DamienFontaine Any chance to refactor parseQuery just a little bit? There is quite a lot of redundancy and repetition now.

For example

  • combine QueryParser and AnalyzingQueryParser cases into one, since their code is (nowadays) identical
  • handle the SurroundQueryParser case separate from the others since it's treated differently - for the others, avoid repeating the same boilerplate (e.g. qp.setAllowLeadingWildcard(true)) over and over

Another option would be to create a class that wraps SurroundQueryParser giving it a QueryParser interface, so that it can be handled just like the others (ie revert the merging of two methods). But that may be difficult in practice - I assume there's a reason why SurroundQueryParser isn't a QueryParser in Lucene already although I haven't looked at it in detail.

If we can get the line count in TextIndexLucene down by a dozen or so, I'm happy. And of course we need some kind of promise to update the jena-text documentation accordingly after this gets merged.

@xristy
Copy link
Contributor

xristy commented Mar 22, 2019 via email

@afs
Copy link
Member

afs commented Mar 30, 2019

Hi all - where are we on this PR?

If there are some approvals from @xristy / @osma / other Jena committers, I can go the merge.

@xristy
Copy link
Contributor

xristy commented Mar 30, 2019

The switch seems not correct yet:

        case "AnalyzingQueryParser":
            if (qp == null) qp = new QueryParser(docDef.getPrimaryField(), analyzer);
        case "QueryParser":
            if (qp == null) {
                log.warn("Deprecated query parser type '" + queryParserType + "'. Defaulting to standard QueryParser");
                qp = new QueryParser(docDef.getPrimaryField(), analyzer);
            }
        default:
            if(qp  == null) {
                log.warn("Unknown query parser type '" + queryParserType + "'. Defaulting to standard QueryParser") ;
                qp = new QueryParser(docDef.getPrimaryField(), analyzer);
            }
            qp.setAllowLeadingWildcard(true);
            query = qp.parse(queryString);
    }
    return query ;
}

It should be more like:

        case "AnalyzingQueryParser":
            if (qp == null) {
               log.warn("Deprecated query parser type '" + queryParserType + "'. Defaulting to standard QueryParser");
            }
        case "QueryParser":
            if (qp == null) {
                qp = new QueryParser(docDef.getPrimaryField(), analyzer);
            }
        default:
            if (qp  == null) {
                log.warn("Unknown query parser type '" + queryParserType + "'. Defaulting to standard QueryParser") ;
                qp = new QueryParser(docDef.getPrimaryField(), analyzer);
            }
            qp.setAllowLeadingWildcard(true);
            query = qp.parse(queryString);
    }
    return query ;
}

otherwise the log.warn fires, incorrectly, for the case "QueryParser" and qp is redundantly assigned for case "AnalyzingQueryParser"

@osma
Copy link
Contributor

osma commented Apr 1, 2019

I still don't get the switch block. The control flow seems just...messy, with all the if checks and sometimes implicitly falling through to the next case, sometimes not.

How about this:

        switch(queryParserType) {
            case "SurroundQueryParser":
                try {
                    query = org.apache.lucene.queryparser.surround.parser.QueryParser.parse(queryString).makeLuceneQueryField(docDef.getPrimaryField(), new BasicQueryFactory());
                } catch(org.apache.lucene.queryparser.surround.parser.ParseException e) {
                    throw new ParseException(e.getMessage());
                }
                return query;
            case "ComplexPhraseQueryParser":
                qp = new ComplexPhraseQueryParser(docDef.getPrimaryField(), analyzer);
                break;
            default:
                log.warn("Unknown query parser type '" + queryParserType + "'. Defaulting to standard QueryParser");
            case "AnalyzingQueryParser": // since Lucene 7 analyzing is done by QueryParser
            case "QueryParser":
                qp = new QueryParser(docDef.getPrimaryField(), analyzer);
            }
        }
        qp.setAllowLeadingWildcard(true);
        query = qp.parse(queryString);
        return query ;

@xristy
Copy link
Contributor

xristy commented Apr 1, 2019

@osma cleaner, and I thought it was a good idea to alert users that they are using a deprecated query parser if they refer to AnalyzingQueryParser. How about:

    private Query parseQuery(String queryString, Analyzer analyzer) throws ParseException {
        Query query = null;
        QueryParser qp = new QueryParser(docDef.getPrimaryField(), analyzer);

        switch(queryParserType) {
            case "SurroundQueryParser":
                try {
                    query = org.apache.lucene.queryparser.surround.parser.QueryParser.parse(queryString).makeLuceneQueryField(docDef.getPrimaryField(), new BasicQueryFactory());
                } catch(org.apache.lucene.queryparser.surround.parser.ParseException e) {
                    throw new ParseException(e.getMessage());
                }
                return query;
            case "ComplexPhraseQueryParser":
                qp = new ComplexPhraseQueryParser(docDef.getPrimaryField(), analyzer);
                break;
            case "AnalyzingQueryParser": // since Lucene 7 analyzing is done by QueryParser
                log.warn("Deprecated query parser type '" + queryParserType + "'. Defaulting to standard QueryParser");
                break;
            default:
                log.warn("Unknown query parser type '" + queryParserType + "'. Defaulting to standard QueryParser");
        }

        qp.setAllowLeadingWildcard(true);
        query = qp.parse(queryString);
        return query ;
    }

I expect that most uses end up with the standard QueryParser.

@osma
Copy link
Contributor

osma commented Apr 1, 2019

I'm okay with @xristy's version, although QueryParser is needlessly instantiated in some cases. But I'd choose clear code over premature optimization any day!

@xristy
Copy link
Contributor

xristy commented Apr 1, 2019

I also had second thoughts about the needless instantiation and almost edited the previous as follows:

    private Query parseQuery(String queryString, Analyzer analyzer) throws ParseException {
        Query query = null;
        QueryParser qp = null;

        switch(queryParserType) {
            case "SurroundQueryParser":
                try {
                    query = org.apache.lucene.queryparser.surround.parser.QueryParser.parse(queryString).makeLuceneQueryField(docDef.getPrimaryField(), new BasicQueryFactory());
                } catch(org.apache.lucene.queryparser.surround.parser.ParseException e) {
                    throw new ParseException(e.getMessage());
                }
                return query;
            case "ComplexPhraseQueryParser":
                qp = new ComplexPhraseQueryParser(docDef.getPrimaryField(), analyzer);
                break;
            case "AnalyzingQueryParser": // since Lucene 7 analyzing is done by QueryParser
                log.warn("Deprecated query parser type 'AnalyzingQueryParser'. Defaulting to standard QueryParser");
                break;
            default:
                log.warn("Unknown query parser type '" + queryParserType + "'. Defaulting to standard QueryParser");
        }

        if (qp == null) 
            qp = new QueryParser(docDef.getPrimaryField(), analyzer);
        qp.setAllowLeadingWildcard(true);
        query = qp.parse(queryString);
        return query ;
    }

@osma
Copy link
Contributor

osma commented Apr 1, 2019

Either version is fine.

@DamienFontaine
Copy link
Contributor Author

Is everyone agrees with the last @xristy version ?

@xristy
Copy link
Contributor

xristy commented Apr 2, 2019

+1 for last version

@rvesse rvesse merged commit 6254f27 into apache:master Apr 4, 2019
@rvesse
Copy link
Member

rvesse commented Apr 4, 2019

Merged, thanks again for the contribution

If you have chance to update the docs to mention this new capability that would also be great - http://jena.apache.org/getting_involved/index.html#improving-the-website

The easiest way to do this is just to go the relevant page of the docs and hit the "Improve this Page" link in the top corner and your edits will generate a patch that will be sent to us for review

@rvesse
Copy link
Member

rvesse commented Apr 4, 2019

@DamienFontaine Thanks, doc changes added and visible on the staging site - http://jena.staging.apache.org/documentation/query/text-query.html

We typically only publish the site to production when a release happens so this won't show up in the public site until the next release happens

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants