Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-16567: KnnQueryParser support for both pre-filters and post-filter #1245

Merged
merged 11 commits into from Jan 4, 2023
48 changes: 33 additions & 15 deletions solr/core/src/java/org/apache/solr/search/neural/KnnQParser.java
Expand Up @@ -17,6 +17,7 @@
package org.apache.solr.search.neural;

import java.util.List;
import java.util.stream.Collectors;
import org.apache.commons.lang3.StringUtils;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.BooleanQuery;
Expand All @@ -28,6 +29,7 @@
import org.apache.solr.schema.DenseVectorField;
import org.apache.solr.schema.FieldType;
import org.apache.solr.schema.SchemaField;
import org.apache.solr.search.ExtendedQuery;
import org.apache.solr.search.QParser;
import org.apache.solr.search.QueryParsing;
import org.apache.solr.search.QueryUtils;
Expand Down Expand Up @@ -80,44 +82,60 @@ public Query parse() {
float[] parsedVectorToSearch = parseVector(vectorToSearch, denseVectorType.getDimension());

return denseVectorType.getKnnVectorQuery(
schemaField.getName(), parsedVectorToSearch, topK, getFilterQuery());
schemaField.getName(), parsedVectorToSearch, topK, buildPreFilter());
}

private Query getFilterQuery() throws SolrException {
if (!isFilter()) {
private Query buildPreFilter() throws SolrException {
dsmiley marked this conversation as resolved.
Show resolved Hide resolved
boolean isSubQuery = recurseCount != 0;
if (!isFilter() && !isSubQuery) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is isFilter being checked? Why is isSubQuery being checked? In my experience, use of these is extremely rare. isFilter... that would only be pertinent if the KNN query was specified in a filter query (fq), which is kind of surprising because AFAIK, KNN is for relevancy (scoring). Does this query need to operate differently or to make more optimal decisions if it's in a filter query (to only filter and not rank)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the isFilter:
I may want to run a Knn search as a filter, to then score the documents by any lexical query, it's probably unlikely, but possible.
The check avoid you end up with an infinite recursion with org.apache.solr.search.QueryUtils#parseFilterQueries.
But the check also avoid a KnnQuery in the filters to capture the other filter queries and use them as pre-filters, so it's not ideal anyway.
Tagging @eliaporciani if he has any better idea.
I'll think about it

For the "isSubQuery", I want to avoid ending up with the infinite recursion frange and similar filters may end up (the original issue in the ticket).

String[] filterQueries = req.getParams().getParams(CommonParams.FQ);
if (filterQueries != null && filterQueries.length != 0) {
List<Query> filters;
List<Query> preFilters;

try {
filters = QueryUtils.parseFilterQueries(req, true);
preFilters = acceptPreFiltersOnly(QueryUtils.parseFilterQueries(req, true));
} catch (SyntaxError e) {
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, e);
}

if (filters.size() == 1) {
return filters.get(0);
if (preFilters.size() == 0) {
return null;
} else if (preFilters.size() == 1) {
return preFilters.get(0);
} else {
BooleanQuery.Builder builder = new BooleanQuery.Builder();
for (Query query : preFilters) {
builder.add(query, BooleanClause.Occur.FILTER);
}
return builder.build();
}

BooleanQuery.Builder builder = new BooleanQuery.Builder();
for (Query query : filters) {
builder.add(query, BooleanClause.Occur.FILTER);
}

return builder.build();
}
}

return null;
}

private List<Query> acceptPreFiltersOnly(List<Query> filters) {
dsmiley marked this conversation as resolved.
Show resolved Hide resolved
return filters.stream()
.filter(
q -> {
if (q instanceof ExtendedQuery) {
ExtendedQuery eq = (ExtendedQuery) q;
return eq.getCost() == 0;
} else {
return true;
}
})
.collect(Collectors.toList());
}

/**
* Parses a String vector.
*
* @param value with format: [f1, f2, f3, f4...fn]
* @return a float array
*/
private static float[] parseVector(String value, int dimension) {

if (!value.startsWith("[") || !value.endsWith("]")) {
throw new SolrException(
SolrException.ErrorCode.BAD_REQUEST,
Expand Down
Expand Up @@ -335,6 +335,45 @@ public void knnQueryWithFilterQuery_shouldPerformKnnSearchInPreFilteredResults()
"//result/doc[4]/str[@name='id'][.='9']");
}

@Test
public void knnQueryWithCostlyFq_shouldPerformKnnSearchWithPostFilter() {
String vectorToSearch = "[1.0, 2.0, 3.0, 4.0]";

assertQ(
req(
CommonParams.Q,
"{!knn f=vector topK=10}" + vectorToSearch,
"fq",
"{!frange l=0.99}$q",
"fl",
"*,score"),
"//result[@numFound='5']",
"//result/doc[1]/str[@name='id'][.='1']",
"//result/doc[2]/str[@name='id'][.='4']",
"//result/doc[3]/str[@name='id'][.='2']",
"//result/doc[4]/str[@name='id'][.='10']",
"//result/doc[5]/str[@name='id'][.='3']");
}

@Test
public void knnQueryWithFilterQueries_shouldPerformKnnSearchWithPreFiltersAndPostFilters() {
String vectorToSearch = "[1.0, 2.0, 3.0, 4.0]";

assertQ(
req(
CommonParams.Q,
"{!knn f=vector topK=4}" + vectorToSearch,
"fq",
"id:(3 4 9 2)",
"fq",
"{!frange l=0.99}$q",
"fl",
"id"),
"//result[@numFound='2']",
"//result/doc[1]/str[@name='id'][.='4']",
"//result/doc[2]/str[@name='id'][.='2']");
}

@Test
public void knnQueryWithNegativeFilterQuery_shouldPerformKnnSearchInPreFilteredResults() {
String vectorToSearch = "[1.0, 2.0, 3.0, 4.0]";
Expand Down