Skip to content

Conversation

alessandrobenedetti
Copy link
Contributor

https://issues.apache.org/jira/browse/SOLR-15449

Description

When a field text analysis is incompatible with the query text, mm is not fully respected:

sow = false
mm=100%
qf = text numeric_i
q = terminator 100
defType = edismax
"parsedquery_toString":
"+(((text:terminator text:100)~2) | 
(numeric_i:100)~1))"
A document just containing '100' in the field numeric_i is returned as a good search result but it actually doesn't respect the mm=100%

Solution

Instead of just ignoring un-parsable clauses, a noMatch query is added

Tests

tests have been added to: org.apache.solr.search.TestExtendedDismaxParser

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

@alessandrobenedetti
Copy link
Contributor Author

@dsmiley @janhoy @madrob @munendrasn @romseygeek @sarowe , the issue with sow and mm found in: #129 has been isolated here and it's ready for an initial review.
Once approved I'll move to add CHANGES.txt and other burocracy stuff

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than formatting matters, I think this is good!

@@ -1776,6 +1764,36 @@ private static String getParsedQuery(SolrQueryRequest request) throws Exception
return (String) BaseTestHarness.evaluateXPath(resp, "//str[@name='parsedquery']/text()", XPathConstants.STRING);
}

public void testSplitOnWhitespace_shouldRespectMinimumShouldMatch(){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation here is inconsistent. Please configure your IDE so that this simply doesn't happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've done it, formatting is one of the things I hate the most, my opinion is that it shouldn't be the responsibility of single peers but it should be server side to avoid inconsistencies(which I often find also in Lucene/Solr and other Open Source code).
Never really explored a solution for that either, so I hope, my changes are ok now :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://issues.apache.org/jira/browse/SOLR-14920 -- I anticipate this will happen for 9.0 soon.

Comment on lines -420 to -423
// When sow=false, the per-field query structures differ (no "Terminator" query on integer field foo_i),
// so a dismax-per-field is constructed. As a result, mm=100% is applied per-field instead of per-term;
// since there is only one term (100) required in the foo_i field's dismax, the query can match docs that
// only have the 100 term in the foo_i field, and don't necessarily have "Terminator" in any field.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sarowe, the above comment (from dd171ff) describes behavior that seems counterintuitive, but it doesn't give any indication of whether the behavior might be considered desirable for some reason? The behavior described would be changed by this PR; this seems like it would be an improvement, but I wanted to call your attention to this comment specifically, just to make sure you're aware/supportive of the change.

@alessandrobenedetti
Copy link
Contributor Author

As soon as I see an approval and we get potentially a final feedback by @sarowe , I am happy to merge.

For cherry picking, I was just taking a look to the minor version branches but I couldn't find them, has anything changed after the split?
i.e. I was looking for example for 8.9.X branch or 8.10.x (will this exist?) but I couldn't find any

@janhoy
Copy link
Contributor

janhoy commented Jun 3, 2021

The new apache/solr repo only contains main branch, and will eventually contain branch_9x etc.
To backport to 8.x you will need to port the change over to the https://github.com/apache/lucene-solr repo (see README there), that's where the branch_8x lives.
It's not that hard. What I have done is to add apache/solr as a remote to my apache/lucene-solr checkout, so I can do a git fetch and then cherry-pick the main-branch commit into my old repo.

@alessandrobenedetti
Copy link
Contributor Author

@dsmiley I see that changes are still requested, can you check that?
I would like to progress with the merge, I doubt we get any other feedback here.

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Alessandro!

@alessandrobenedetti alessandrobenedetti merged commit 9791057 into main Jun 9, 2021
@alessandrobenedetti alessandrobenedetti deleted the jira/solr-15449 branch June 9, 2021 11:02
bszabo97 pushed a commit to bszabo97/solr that referenced this pull request Jun 14, 2021
If we fail to delete files that belong to a commit point, then we will 
expose that deleted commit in the next calls of IndexDeletionPolicy#onCommit.
I think we should never expose those deleted commit points as 
some of their files might have been deleted already.
epugh pushed a commit to epugh/solr that referenced this pull request Oct 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants