Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-9537 #2097

Merged
merged 24 commits into from
Jan 29, 2021
Merged

LUCENE-9537 #2097

merged 24 commits into from
Jan 29, 2021

Conversation

cammiemw
Copy link
Contributor

Description

This pull request implements logic from our academic search engine Indri: http://www.lemurproject.org/indri.php. The functionality that is implemented is a smoothing score for search terms or subqueries that are not present in the document being scored. The smoothing score acts like an idf so that documents that do not have terms or subqueries that are more frequent in the index are not penalized as much as documents that do not have less frequent terms or subqueries. Additionally, Indri's dirichelet smoothing similarity has been added.

Solution

The smoothingScore method has been added to the Scorable interface and implemented in the abstract class Scorer. The classes IndriAndQuery, IndriAndWeight, and IndriAndScorer have been added to call the smoothingScore method on documents where the search term or subquery are not present. The class IndriDirichletSimilarity has been added for implementing Indri's equation for the Language Model with Dirichlet smoothing.

Tests

TestIndriAndQuery and TestIndriDirichletSmoothing have been added. I am happy to expand upon these tests and implement more tests.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the master branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Ref Guide (for Solr changes only).

Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @cammiemw! This looks very nice (think Borat's voice).

It makes me nervous that we are having to "fork" some complex low-level Lucene search time classes (Scorable, Scorer, BooleanQuery/Scorer, DisjunctionScorer, etc.). I think it might also be confusing to users to have to pick entirely different Query class when they just want to have this nice smoothing. But, maybe we can improve on that in followon improvements.

@@ -46,6 +46,11 @@ public float score() throws IOException {
return in.score();
}

@Override
public float smoothingScore(int docId) throws IOException {
return 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm should this be return in.smoothingScore(docId) instead?

@@ -59,6 +59,11 @@
@Override
public final float score() { return score; }

@Override
public float smoothingScore(int docId) throws IOException {
return 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, should we also cache the smoothingScore for this hit?

Or, if we will keep it at returning 0, couldn't we remove this impl and inherit the default from Scorable?

@@ -0,0 +1,22 @@
package org.apache.lucene.search;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the standard Apache copyright header here and in all the new classes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :-)

*/
public class IndriAndQuery extends IndriQuery {

public IndriAndQuery(List<BooleanClause> clauses) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, BooleanClause can hold any Lucene Query implementation, but it looks like we are only supporting TermQuery and other IndriAndQuery clauses (just from reading the javadoc)? If so, should we check/enforce this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, any comment on this one?

//If the query exists in the document, score the document
//Otherwise, compute a smoothing score, which acts like an idf
//for subqueries/terms
if (docId == scorerDocId) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you factor out the 2nd two statements under each of the true and false clauses here? I.e., only the first line needs to be conditional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :-)


/**
* The Indri implemenation of a disjunction scorer which stores the subscorers
* for for the child queries. The score and smoothingScore methods use the list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove extra for?

*/
abstract public class IndriScorer extends Scorer {

private float boost;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm is this (to store boost) the only reason to have a separate IndriScorer? If I remember right, Lucene used to apply boost similarly (every scorer kept track of it) but at one point we moved all boosting to a dedicated BoostQuery.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this because I apply the boost in the scorer rather than in the similarity (such as in LMDirichletSimilarity), and I divide by the sum of the boosts. I originally did this to exactly match the Indri scores; however, this is not a huge priority. I don't have much issue with dropping this if it doesn't fit in the Lucene workflow well.

/**
* Returns the smoothing score of the current document matching the query.
* This score is used when the query/term does not appear in the document.
* This can return 0 or a smoothing score.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe link to Indri paper that describes/motivates this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added more detail and a paper the describes the motivation of the smoothing score. The description of how the smoothing score is used is at the bottom of page 11 in the paper. It is important to note that most of the explanation has to do with when the score is a product. Even though the IndriAndScorer does not use a product, the smoothing score is still helpful for acting like an idf. Additionally, there are many more Indri operators that I would like to add that do use a product.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks.

* engine (http://www.lemurproject.org/indri.php). Indri Dirichelet Smoothing!
* tf_E + mu*P(t|D) P(t|E)= ------------------------ documentLength + documentMu
* mu*P(t|C) + tf_D where P(t|D)= --------------------- doclen + mu
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, add a few words giving some intuition about what mu does? It looks like it roughly lets you tune how important document length is in the scoring?

Also, the formatting of the above equations looks like it got garbled? You will need to use html/javadoc markup to make the formatting survive future developers viewing in browser...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried adding more formatting and a description of mu. Let me know if you would like to see anything different or more. Thanks!

/** A Query that matches documents matching combinations of
* {@link TermQuery}s or other IndriAndQuerys.
*/
public class IndriAndQuery extends IndriQuery {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally in Lucene AND implies MUST, i.e. required clauses.

But this query is actually disjunctive, right? Documents will match even if they are missing some of the terms. Should we name it IndriOrQuery maybe? Or, IndriBooleanQuery?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the naming is confusing. I have taken the naming schema as well as the logic from the original Indri search engine implementation. The issue with renaming it is that there is already IndriOrQuery, which I have created and hope to be able to add at a future time. I will continue to think about whether there is a better name for the IndriAndQuery though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK let's keep the (somewhat confusing) naming for now. Naming is the hardest part!!

@cammiemw
Copy link
Contributor Author

cammiemw commented Dec 1, 2020

Thanks for your comments and quick response time @mikemccand! I think I have addressed and replied to each one.

An alternative to touching so many low-level lucene classes would be to create a separate interface with only the smoothingScore method. Then the only Lucene class that I would need to change would be TermScorer to implement that interface and add the smoothing score method. This might be less risky although perhaps not as clean.

As I mentioned in some of my comments, I am hoping that can be a base for suggesting to add more Indri content. I have several more query types and an Indri parser implemented already, but I thought it might make sense to start with a small piece of functionality so I could hopefully learn the code base and process.

Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me!

I left small comments -- I think if you add a default implementation you can remove all the return 0 methods in Scorable implementations.

I hope another Lucene developer, with more experience on the search side of Lucene, will also chime in ...

/** A Query that matches documents matching combinations of
* {@link TermQuery}s or other IndriAndQuerys.
*/
public class IndriAndQuery extends IndriQuery {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK let's keep the (somewhat confusing) naming for now. Naming is the hardest part!!

*/
public class IndriAndQuery extends IndriQuery {

public IndriAndQuery(List<BooleanClause> clauses) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, any comment on this one?

* toString, equals, getClauses, and iterator.
*
*/
public abstract class IndriQuery extends Query
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random question: will IndriQuery take advantage of Block MAX Weak And optimization? The added smoothingScore must alter the optimization logic to find the min block score to skip to?

I think it's OK to defer BMW to followon improvements, as long as it is not kicking in incorrectly here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, IndriQuery does not take advantage of Block MAX Weak and optimization. We iterate through all documents that that have a posting for at least one of the search terms. I would be interested in expanding the smoothing score functionality to more parts of lucene in the future.

/**
* Returns the smoothing score of the current document matching the query.
* This score is used when the query/term does not appear in the document.
* This can return 0 or a smoothing score.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks.

@@ -32,4 +33,9 @@ public int docID() {
public float score() {
return score;
}

@Override
public float smoothingScore(int docId) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why not add a default method in the Scorable interface to return 0? Then we don't have to add this default method in all these subclasses?

Copy link
Contributor Author

@cammiemw cammiemw Jan 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this makes more sense :-) I have added the default implementation of smoothingScore in Scorable and reverted my changes to add the smoothingScore method to all the unnecessary extending classes.

subs);
} else {
Scorer scorer = scorer(context);
int advanced = scorer.iterator().advance(doc);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NULLPTR_DEREFERENCE: accessing memory that is the null pointer on line 117 indirectly during the call to IndriAndWeight.scorer(...).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, can scorer ever be null? Other Lucene queries can/do return null scorers to indicate that there are no matches for this query in this segment, and callers need to check for that. But maybe in this context it never happens?

@mikemccand
Copy link
Member

Thank you @cammiemw!

Looks like there are some small style problems with your PR -- see the above Gradle precommit for details. We now enforce consistent styling, and I think you can run gradle tidy to re-style automatically.

@cammiemw
Copy link
Contributor Author

Hi @mikemccand! I think the issues are fixed in my latest commit. Let me know if there is anything else that you think I need. Thanks!

Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @cammiemw, this looks great now!

Could you add a CHANGES.txt entry too, briefly summarizing the new smoothing / Indri search-time capability?

@cammiemw
Copy link
Contributor Author

Thanks @mikemccand! I have added an entry in CHANGES.txt under API changes.

I am hoping to be able to add more of the Indri search engine functionality that build upon these smoothingScore changes in the near future! Once this PR is merged, I would love to start working on adding a few more Indri operators such as the Indri OR, Indri Weighted Sum, and Indri Weighted And, and add either an Indri Query Parser or add to an existing lucene query parser so that it is easier to use these operators. Do you think it makes sense to open a new jira ticket for these changes once this PR is merged? About how long does the process or merging a PR take?

Thank you!

@mikemccand
Copy link
Member

I have added an entry in CHANGES.txt under API changes.

Thank you!

I ran top-level ./gradlew test with this change but hit this Solr test failure, caused by this change, and hopefully easy to fix?:

org.apache.solr.ltr.feature.TestFeature > test suite's output saved to /l/trunk/solr/contrib/ltr/build/test-results/test/outputs/OUTPUT-org.apache.solr.ltr.feature.TestFeature.txt, copied below:
   >     java.lang.AssertionError: class org.apache.solr.ltr.feature.Feature$FeatureWeight$FilterFeatureScorer needs to override 'public float org.apache.lucene.search.Scorer.smoothingScore(int) throws java.io.IOException'
   >         at __randomizedtesting.SeedInfo.seed([933C5904DAF6843E:848500E5F410D49E]:0)
   >         at org.junit.Assert.fail(Assert.java:89)
   >         at org.apache.solr.ltr.feature.TestFeature.testFilterFeatureScorerOverridesScorerMethods(TestFeature.java:46)
   >         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   >         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
   >         at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   >         at java.base/java.lang.reflect.Method.invoke(Method.java:564)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
   >         at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
   >         at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   >         at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
   >         at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
   >         at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
   >         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
   >         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
   >         at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
   >         at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
   >         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
   >         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
   >         at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   >         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
   >         at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
   >         at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
   >         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:51)
   >         at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
   >         at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
   >         at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
   >         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
   >         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
   >         at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:826)
   >         at java.base/java.lang.Thread.run(Thread.java:832)
  2> NOTE: reproduce with: gradlew test --tests TestFeature.testFilterFeatureScorerOverridesScorerMethods -Dtests.seed=933C5904DAF6843E -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=bo-IN -Dtests.timezone=Etc/GMT -Dte\
sts.asserts=true -Dtests.file.encoding=UTF-8
  2> NOTE: test params are: codec=Asserting(Lucene90): {}, docValues:{}, maxPointsInLeafNode=1006, maxMBSortInHeap=6.621471242227271, sim=Asserting(RandomSimilarity(queryNorm=true): {}), locale=bo-IN, timezone=Etc/GMT
  2> NOTE: Linux 5.9.8-arch1-1 amd64/Oracle Corporation 15.0.1 (64-bit)/cpus=128,threads=1,free=249561968,total=270532608
  2> NOTE: All tests run in this JVM: [TestFeature]

@mikemccand
Copy link
Member

Hello @cammiemw, I ran top-level ./gradlew test on the latest diff here, yet still see the Solr TestFeature failure:

org.apache.solr.ltr.feature.TestFeature > test suite's output saved to /l/trunk/solr/contrib/ltr/build/test-results/test/outputs/OUTPUT-org.apache.solr.ltr.feature.TestFeature.txt, copied belo\
w:
   >     java.lang.AssertionError: class org.apache.solr.ltr.feature.Feature$FeatureWeight$FilterFeatureScorer needs to override 'public float org.apache.lucene.search.Scorer.smoothingScore(in\
t) throws java.io.IOException'
   >         at __randomizedtesting.SeedInfo.seed([E634884D6EF7B762:F18DD1AC4011E7C2]:0)
   >         at org.junit.Assert.fail(Assert.java:89)
   >         at org.apache.solr.ltr.feature.TestFeature.testFilterFeatureScorerOverridesScorerMethods(TestFeature.java:46)
   >         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   >         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
   >         at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   >         at java.base/java.lang.reflect.Method.invoke(Method.java:564)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
   >         at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
   >         at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   >         at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
   >         at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
   >         at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
   >         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
   >         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
   >         at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
   >         at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
   >         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
   >         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
   >         at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   >         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
   >         at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
   >         at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
   >         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:51)
   >         at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
   >         at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
   >         at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
   >         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
   >         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
   >         at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:826)
   >         at java.base/java.lang.Thread.run(Thread.java:832)
  2> NOTE: reproduce with: gradlew test --tests TestFeature.testFilterFeatureScorerOverridesScorerMethods -Dtests.seed=E634884D6EF7B762 -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=\
om-KE -Dtests.timezone=Pacific/Auckland -Dtests.asserts=true -Dtests.file.encoding=UTF-8
  2> NOTE: test params are: codec=Asserting(Lucene90): {}, docValues:{}, maxPointsInLeafNode=111, maxMBSortInHeap=6.800048371274569, sim=Asserting(RandomSimilarity(queryNorm=true): {}), locale\
=om-KE, timezone=Pacific/Auckland
  2> NOTE: Linux 5.9.8-arch1-1 amd64/Oracle Corporation 15.0.1 (64-bit)/cpus=128,threads=1,free=250395088,total=270532608
  2> NOTE: All tests run in this JVM: [TestFeature]

@cammiemw
Copy link
Contributor Author

Hi @mikemccand, apologies for the failed test! Unfortunately, I cannot replicate the error in my environments and gradle test as well as that specific test come back successful. I am going to keep digging into what is different in my local. For right now, I looked into the code and could see what was causing the error, so I pushed the change that I think fixes it. If that fails for you, I will need to get my local environment working so that I get the same results that you do. Thanks so much for your time!

@mikemccand
Copy link
Member

Thanks @cammiemw, that test now passes again!

I ran top-level tests multiple times and each time, a solr test failed for what looked like (not certain) unrelated to this great change. On the 4th try, all solr tests finally passed.

And ./gradlew precommit passes as well.

I will push soon! Thank you for persisting @cammiemw and sorry for the slow iterations here. I am excited that we are cross-fertilizing these two impactful open-source search engines!

@mikemccand mikemccand merged commit 9cc5c9b into apache:master Jan 29, 2021
@mikemccand
Copy link
Member

Thanks @cammiemw!

One thing I noticed after I pushed was the new Scorable.smothingScore(int docId) method: why does this take the int docId parameter? Shouldn't it take no parameters, like score(), and always return smoothing score for the "current" docId instead?

@cammiemw
Copy link
Contributor Author

Hi @mikemccand! The smoothingScore method requires a docId because the most important implementation of smoothingScore is in TermScorer, which uses the postingsEnum to get the docId. However, in the case of the smoothing score, there will be no posting for the term in the document. Let me know if you think a different way of getting the docId would work better.

Thanks so much for all your help and time throughout this process! I am very excited about this change and hope that I can continue to add more.

Comment on lines +36 to +38
// the FilterFeatureScorer may simply inherit Scorer's default implementation
if (scorerClassMethod.getName().equals("smoothingScore")) continue;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This surprised me. Created https://issues.apache.org/jira/browse/SOLR-15958 and (draft) apache/solr#567 to consider further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants