LUCENE-9537 #2097

cammiemw · 2020-11-25T15:03:18Z

Description

This pull request implements logic from our academic search engine Indri: http://www.lemurproject.org/indri.php. The functionality that is implemented is a smoothing score for search terms or subqueries that are not present in the document being scored. The smoothing score acts like an idf so that documents that do not have terms or subqueries that are more frequent in the index are not penalized as much as documents that do not have less frequent terms or subqueries. Additionally, Indri's dirichelet smoothing similarity has been added.

Solution

The smoothingScore method has been added to the Scorable interface and implemented in the abstract class Scorer. The classes IndriAndQuery, IndriAndWeight, and IndriAndScorer have been added to call the smoothingScore method on documents where the search term or subquery are not present. The class IndriDirichletSimilarity has been added for implementing Indri's equation for the Language Model with Dirichlet smoothing.

Tests

TestIndriAndQuery and TestIndriDirichletSmoothing have been added. I am happy to expand upon these tests and implement more tests.

Checklist

Please review the following and check all that apply:

I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
I have created a Jira issue and added the issue ID to my pull request title.
I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
I have developed this patch against the master branch.
I have run ./gradlew check.
I have added tests for my changes.
I have added documentation for the Ref Guide (for Solr changes only).

mikemccand

Thank you @cammiemw! This looks very nice (think Borat's voice).

It makes me nervous that we are having to "fork" some complex low-level Lucene search time classes (Scorable, Scorer, BooleanQuery/Scorer, DisjunctionScorer, etc.). I think it might also be confusing to users to have to pick entirely different Query class when they just want to have this nice smoothing. But, maybe we can improve on that in followon improvements.

mikemccand · 2020-11-29T14:54:37Z

lucene/core/src/java/org/apache/lucene/search/FilterScorable.java

@@ -46,6 +46,11 @@ public float score() throws IOException {
    return in.score();
  }

+  @Override
+  public float smoothingScore(int docId) throws IOException {
+    return 0;


Hmm should this be return in.smoothingScore(docId) instead?

mikemccand · 2020-11-29T14:55:40Z

lucene/core/src/java/org/apache/lucene/search/CachingCollector.java

@@ -59,6 +59,11 @@
    @Override
    public final float score() { return score; }

+    @Override
+    public float smoothingScore(int docId) throws IOException {
+      return 0;


Hmm, should we also cache the smoothingScore for this hit?

Or, if we will keep it at returning 0, couldn't we remove this impl and inherit the default from Scorable?

mikemccand · 2020-11-29T14:56:05Z

lucene/core/src/java/org/apache/lucene/search/IndriAndQuery.java

@@ -0,0 +1,22 @@
+package org.apache.lucene.search;


Could you add the standard Apache copyright header here and in all the new classes?

mikemccand · 2020-11-29T14:57:30Z

lucene/core/src/java/org/apache/lucene/search/IndriAndQuery.java

+ */
+public class IndriAndQuery extends IndriQuery {
+
+  public IndriAndQuery(List<BooleanClause> clauses) {


In general, BooleanClause can hold any Lucene Query implementation, but it looks like we are only supporting TermQuery and other IndriAndQuery clauses (just from reading the javadoc)? If so, should we check/enforce this?

Hello, any comment on this one?

mikemccand · 2020-11-29T15:00:22Z

lucene/core/src/java/org/apache/lucene/search/IndriAndScorer.java

+        //If the query exists in the document, score the document
+        //Otherwise, compute a smoothing score, which acts like an idf
+        //for subqueries/terms
+        if (docId == scorerDocId) {


Can you factor out the 2nd two statements under each of the true and false clauses here? I.e., only the first line needs to be conditional?

mikemccand · 2020-11-29T15:03:24Z

lucene/core/src/java/org/apache/lucene/search/IndriDisjunctionScorer.java

+
+/**
+ * The Indri implemenation of a disjunction scorer which stores the subscorers
+ * for for the child queries. The score and smoothingScore methods use the list


Remove extra for?

mikemccand · 2020-11-29T15:08:59Z

lucene/core/src/java/org/apache/lucene/search/IndriScorer.java

+ */
+abstract public class IndriScorer extends Scorer {
+
+	private float boost;


Hmm is this (to store boost) the only reason to have a separate IndriScorer? If I remember right, Lucene used to apply boost similarly (every scorer kept track of it) but at one point we moved all boosting to a dedicated BoostQuery.

I did this because I apply the boost in the scorer rather than in the similarity (such as in LMDirichletSimilarity), and I divide by the sum of the boosts. I originally did this to exactly match the Indri scores; however, this is not a huge priority. I don't have much issue with dropping this if it doesn't fit in the Lucene workflow well.

mikemccand · 2020-11-29T15:09:18Z

lucene/core/src/java/org/apache/lucene/search/Scorable.java

+  /**
+   * Returns the smoothing score of the current document matching the query.
+   * This score is used when the query/term does not appear in the document.
+   * This can return 0 or a smoothing score.


Maybe link to Indri paper that describes/motivates this?

I have added more detail and a paper the describes the motivation of the smoothing score. The description of how the smoothing score is used is at the bottom of page 11 in the paper. It is important to note that most of the explanation has to do with when the score is a product. Even though the IndriAndScorer does not use a product, the smoothing score is still helpful for acting like an idf. Additionally, there are many more Indri operators that I would like to add that do use a product.

Great, thanks.

mikemccand · 2020-11-29T15:14:30Z

lucene/core/src/java/org/apache/lucene/search/similarities/IndriDirichletSimilarity.java

+ * engine (http://www.lemurproject.org/indri.php). Indri Dirichelet Smoothing!
+ * tf_E + mu*P(t|D) P(t|E)= ------------------------ documentLength + documentMu
+ * mu*P(t|C) + tf_D where P(t|D)= --------------------- doclen + mu
+ */


Maybe, add a few words giving some intuition about what mu does? It looks like it roughly lets you tune how important document length is in the scoring?

Also, the formatting of the above equations looks like it got garbled? You will need to use html/javadoc markup to make the formatting survive future developers viewing in browser...

I tried adding more formatting and a description of mu. Let me know if you would like to see anything different or more. Thanks!

mikemccand · 2020-11-29T15:17:13Z

lucene/core/src/java/org/apache/lucene/search/IndriAndQuery.java

+/** A Query that matches documents matching combinations of 
+ * {@link TermQuery}s or other IndriAndQuerys.
+ */
+public class IndriAndQuery extends IndriQuery {


Normally in Lucene AND implies MUST, i.e. required clauses.

But this query is actually disjunctive, right? Documents will match even if they are missing some of the terms. Should we name it IndriOrQuery maybe? Or, IndriBooleanQuery?

I agree the naming is confusing. I have taken the naming schema as well as the logic from the original Indri search engine implementation. The issue with renaming it is that there is already IndriOrQuery, which I have created and hope to be able to add at a future time. I will continue to think about whether there is a better name for the IndriAndQuery though.

OK let's keep the (somewhat confusing) naming for now. Naming is the hardest part!!

…smoothing scores for TermQueries that have a frequency of 0

cammiemw · 2020-12-01T11:16:20Z

Thanks for your comments and quick response time @mikemccand! I think I have addressed and replied to each one.

An alternative to touching so many low-level lucene classes would be to create a separate interface with only the smoothingScore method. Then the only Lucene class that I would need to change would be TermScorer to implement that interface and add the smoothing score method. This might be less risky although perhaps not as clean.

As I mentioned in some of my comments, I am hoping that can be a base for suggesting to add more Indri content. I have several more query types and an Indri parser implemented already, but I thought it might make sense to start with a small piece of functionality so I could hopefully learn the code base and process.

mikemccand

This looks great to me!

I left small comments -- I think if you add a default implementation you can remove all the return 0 methods in Scorable implementations.

I hope another Lucene developer, with more experience on the search side of Lucene, will also chime in ...

mikemccand · 2020-12-14T15:47:55Z

lucene/core/src/java/org/apache/lucene/search/IndriAndQuery.java

+/** A Query that matches documents matching combinations of 
+ * {@link TermQuery}s or other IndriAndQuerys.
+ */
+public class IndriAndQuery extends IndriQuery {


OK let's keep the (somewhat confusing) naming for now. Naming is the hardest part!!

mikemccand · 2020-12-14T15:48:14Z

lucene/core/src/java/org/apache/lucene/search/IndriAndQuery.java

+ */
+public class IndriAndQuery extends IndriQuery {
+
+  public IndriAndQuery(List<BooleanClause> clauses) {


Hello, any comment on this one?

mikemccand · 2020-12-14T15:50:48Z

lucene/core/src/java/org/apache/lucene/search/IndriQuery.java

+ * toString, equals, getClauses, and iterator.
+ *
+ */
+public abstract class IndriQuery extends Query


Random question: will IndriQuery take advantage of Block MAX Weak And optimization? The added smoothingScore must alter the optimization logic to find the min block score to skip to?

I think it's OK to defer BMW to followon improvements, as long as it is not kicking in incorrectly here.

Currently, IndriQuery does not take advantage of Block MAX Weak and optimization. We iterate through all documents that that have a posting for at least one of the search terms. I would be interested in expanding the smoothing score functionality to more parts of lucene in the future.

mikemccand · 2020-12-14T15:54:26Z

lucene/core/src/java/org/apache/lucene/search/Scorable.java

+  /**
+   * Returns the smoothing score of the current document matching the query.
+   * This score is used when the query/term does not appear in the document.
+   * This can return 0 or a smoothing score.


Great, thanks.

mikemccand · 2020-12-14T15:56:28Z

lucene/core/src/java/org/apache/lucene/search/ScoreAndDoc.java

@@ -32,4 +33,9 @@ public int docID() {
  public float score() {
    return score;
  }
+
+  @Override
+  public float smoothingScore(int docId) throws IOException {


Hmm, why not add a default method in the Scorable interface to return 0? Then we don't have to add this default method in all these subclasses?

I agree that this makes more sense :-) I have added the default implementation of smoothingScore in Scorable and reverted my changes to add the smoothingScore method to all the unnecessary extending classes.

…extending classes

sonatype-lift · 2021-01-12T04:22:05Z

lucene/core/src/java/org/apache/lucene/search/IndriAndWeight.java

+          subs);
+    } else {
+      Scorer scorer = scorer(context);
+      int advanced = scorer.iterator().advance(doc);


NULLPTR_DEREFERENCE: accessing memory that is the null pointer on line 117 indirectly during the call to IndriAndWeight.scorer(...).

Hmm, can scorer ever be null? Other Lucene queries can/do return null scorers to indicate that there are no matches for this query in this segment, and callers need to check for that. But maybe in this context it never happens?

mikemccand · 2021-01-12T14:04:52Z

Thank you @cammiemw!

Looks like there are some small style problems with your PR -- see the above Gradle precommit for details. We now enforce consistent styling, and I think you can run gradle tidy to re-style automatically.

cammiemw · 2021-01-13T01:00:10Z

Hi @mikemccand! I think the issues are fixed in my latest commit. Let me know if there is anything else that you think I need. Thanks!

mikemccand

Thank you @cammiemw, this looks great now!

Could you add a CHANGES.txt entry too, briefly summarizing the new smoothing / Indri search-time capability?

cammiemw · 2021-01-21T19:12:28Z

Thanks @mikemccand! I have added an entry in CHANGES.txt under API changes.

I am hoping to be able to add more of the Indri search engine functionality that build upon these smoothingScore changes in the near future! Once this PR is merged, I would love to start working on adding a few more Indri operators such as the Indri OR, Indri Weighted Sum, and Indri Weighted And, and add either an Indri Query Parser or add to an existing lucene query parser so that it is easier to use these operators. Do you think it makes sense to open a new jira ticket for these changes once this PR is merged? About how long does the process or merging a PR take?

Thank you!

mikemccand · 2021-01-26T16:04:40Z

I have added an entry in CHANGES.txt under API changes.

Thank you!

I ran top-level ./gradlew test with this change but hit this Solr test failure, caused by this change, and hopefully easy to fix?:

org.apache.solr.ltr.feature.TestFeature > test suite's output saved to /l/trunk/solr/contrib/ltr/build/test-results/test/outputs/OUTPUT-org.apache.solr.ltr.feature.TestFeature.txt, copied below:
   >     java.lang.AssertionError: class org.apache.solr.ltr.feature.Feature$FeatureWeight$FilterFeatureScorer needs to override 'public float org.apache.lucene.search.Scorer.smoothingScore(int) throws java.io.IOException'
   >         at __randomizedtesting.SeedInfo.seed([933C5904DAF6843E:848500E5F410D49E]:0)
   >         at org.junit.Assert.fail(Assert.java:89)
   >         at org.apache.solr.ltr.feature.TestFeature.testFilterFeatureScorerOverridesScorerMethods(TestFeature.java:46)
   >         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   >         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
   >         at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   >         at java.base/java.lang.reflect.Method.invoke(Method.java:564)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
   >         at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
   >         at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   >         at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
   >         at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
   >         at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
   >         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
   >         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
   >         at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
   >         at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
   >         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
   >         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
   >         at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   >         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
   >         at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
   >         at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
   >         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:51)
   >         at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
   >         at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
   >         at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
   >         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
   >         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
   >         at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:826)
   >         at java.base/java.lang.Thread.run(Thread.java:832)
  2> NOTE: reproduce with: gradlew test --tests TestFeature.testFilterFeatureScorerOverridesScorerMethods -Dtests.seed=933C5904DAF6843E -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=bo-IN -Dtests.timezone=Etc/GMT -Dte\
sts.asserts=true -Dtests.file.encoding=UTF-8
  2> NOTE: test params are: codec=Asserting(Lucene90): {}, docValues:{}, maxPointsInLeafNode=1006, maxMBSortInHeap=6.621471242227271, sim=Asserting(RandomSimilarity(queryNorm=true): {}), locale=bo-IN, timezone=Etc/GMT
  2> NOTE: Linux 5.9.8-arch1-1 amd64/Oracle Corporation 15.0.1 (64-bit)/cpus=128,threads=1,free=249561968,total=270532608
  2> NOTE: All tests run in this JVM: [TestFeature]

mikemccand · 2021-01-27T13:52:06Z

Hello @cammiemw, I ran top-level ./gradlew test on the latest diff here, yet still see the Solr TestFeature failure:

org.apache.solr.ltr.feature.TestFeature > test suite's output saved to /l/trunk/solr/contrib/ltr/build/test-results/test/outputs/OUTPUT-org.apache.solr.ltr.feature.TestFeature.txt, copied belo\
w:
   >     java.lang.AssertionError: class org.apache.solr.ltr.feature.Feature$FeatureWeight$FilterFeatureScorer needs to override 'public float org.apache.lucene.search.Scorer.smoothingScore(in\
t) throws java.io.IOException'
   >         at __randomizedtesting.SeedInfo.seed([E634884D6EF7B762:F18DD1AC4011E7C2]:0)
   >         at org.junit.Assert.fail(Assert.java:89)
   >         at org.apache.solr.ltr.feature.TestFeature.testFilterFeatureScorerOverridesScorerMethods(TestFeature.java:46)
   >         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   >         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
   >         at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   >         at java.base/java.lang.reflect.Method.invoke(Method.java:564)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
   >         at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
   >         at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   >         at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
   >         at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
   >         at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
   >         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
   >         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
   >         at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
   >         at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
   >         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
   >         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
   >         at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   >         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
   >         at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
   >         at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
   >         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:51)
   >         at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
   >         at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
   >         at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
   >         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
   >         at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
   >         at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:826)
   >         at java.base/java.lang.Thread.run(Thread.java:832)
  2> NOTE: reproduce with: gradlew test --tests TestFeature.testFilterFeatureScorerOverridesScorerMethods -Dtests.seed=E634884D6EF7B762 -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=\
om-KE -Dtests.timezone=Pacific/Auckland -Dtests.asserts=true -Dtests.file.encoding=UTF-8
  2> NOTE: test params are: codec=Asserting(Lucene90): {}, docValues:{}, maxPointsInLeafNode=111, maxMBSortInHeap=6.800048371274569, sim=Asserting(RandomSimilarity(queryNorm=true): {}), locale\
=om-KE, timezone=Pacific/Auckland
  2> NOTE: Linux 5.9.8-arch1-1 amd64/Oracle Corporation 15.0.1 (64-bit)/cpus=128,threads=1,free=250395088,total=270532608
  2> NOTE: All tests run in this JVM: [TestFeature]

cammiemw · 2021-01-28T02:25:37Z

Hi @mikemccand, apologies for the failed test! Unfortunately, I cannot replicate the error in my environments and gradle test as well as that specific test come back successful. I am going to keep digging into what is different in my local. For right now, I looked into the code and could see what was causing the error, so I pushed the change that I think fixes it. If that fails for you, I will need to get my local environment working so that I get the same results that you do. Thanks so much for your time!

mikemccand · 2021-01-29T19:45:24Z

Thanks @cammiemw, that test now passes again!

I ran top-level tests multiple times and each time, a solr test failed for what looked like (not certain) unrelated to this great change. On the 4th try, all solr tests finally passed.

And ./gradlew precommit passes as well.

I will push soon! Thank you for persisting @cammiemw and sorry for the slow iterations here. I am excited that we are cross-fertilizing these two impactful open-source search engines!

mikemccand · 2021-01-29T21:50:52Z

Thanks @cammiemw!

One thing I noticed after I pushed was the new Scorable.smothingScore(int docId) method: why does this take the int docId parameter? Shouldn't it take no parameters, like score(), and always return smoothing score for the "current" docId instead?

cammiemw · 2021-01-29T22:03:03Z

Hi @mikemccand! The smoothingScore method requires a docId because the most important implementation of smoothingScore is in TermScorer, which uses the postingsEnum to get the docId. However, in the case of the smoothing score, there will be no posting for the term in the document. Let me know if you think a different way of getting the docId would work better.

Thanks so much for all your help and time throughout this process! I am very excited about this change and hope that I can continue to add more.

cpoerschke · 2022-01-26T18:15:12Z

solr/contrib/ltr/src/test/org/apache/solr/ltr/feature/TestFeature.java

+        // the FilterFeatureScorer may simply inherit Scorer's default implementation
+        if (scorerClassMethod.getName().equals("smoothingScore")) continue;
+


This surprised me. Created https://issues.apache.org/jira/browse/SOLR-15958 and (draft) apache/solr#567 to consider further.

Cameron VandenBerg added 4 commits November 24, 2020 13:00

Added smoothing score

087d9d2

Indri smoothing implementation and formatting

8e93cee

Additional smoothing score fixes

3c75db2

Add smoothing score to additional classes

71a0bcf

mikemccand reviewed Nov 29, 2020

View reviewed changes

Cameron VandenBerg added 7 commits November 30, 2020 14:21

Merge remote-tracking branch 'upstream/master' into LUCENE-9537

22e00f2

Addressed first set of git pull request comments

8e784de

Added lucene copyright to test classes

f0ddef0

Fixed tabs in IndriScorer

0a89a86

Fixed tabs (again)

219bb77

Added smoothingScore to termScorer so that the IndriAndQuery can get …

54ae042

…smoothing scores for TermQueries that have a frequency of 0

Fixed typo

b36cc34

mikemccand reviewed Dec 14, 2020

View reviewed changes

Cameron VandenBerg added 3 commits December 18, 2020 15:39

Merge remote-tracking branch 'upstream/master' into LUCENE-9537

4737298

Added smoothing score implementation to Scorable and removed it from …

c625f5e

…extending classes

Merge remote-tracking branch 'upstream/master' into LUCENE-9537

c6d35dc

sonatype-lift bot reviewed Jan 12, 2021

View reviewed changes

Cameron VandenBerg added 2 commits January 12, 2021 16:39

Formatting fixes and IndriAndWeight explain NullPointer fix

88c9c35

Ran ./gradlew :lucene:core:spotlessApply

0c61859

mikemccand approved these changes Jan 19, 2021

View reviewed changes

Cameron VandenBerg added 3 commits January 20, 2021 15:00

Merge remote-tracking branch 'upstream/master' into LUCENE-9537

95e063c

Added entry to CHANGES.txt

e6ce673

Fixed CHANGES.txt formatting

b68cf43

Cameron VandenBerg added 2 commits January 26, 2021 12:16

Merge remote-tracking branch 'upstream/master' into LUCENE-9537

3bbb848

Removed smoothingScore method from Scorer to fix TestFeature Error

91ff311

Ran spotlessApply

1a5b954

Cameron VandenBerg added 2 commits January 27, 2021 17:20

Merge remote-tracking branch 'upstream/master' into LUCENE-9537

87b7b41

Fixed TestFeature smoothingScore implementation

d18c782

mikemccand merged commit 9cc5c9b into apache:master Jan 29, 2021

cpoerschke reviewed Jan 26, 2022

View reviewed changes

asfimport mentioned this pull request Jan 26, 2022

Add Indri Search Engine Functionality to Lucene [LUCENE-9537] apache/lucene#10577

Closed

		// the FilterFeatureScorer may simply inherit Scorer's default implementation
		if (scorerClassMethod.getName().equals("smoothingScore")) continue;

LUCENE-9537 #2097

LUCENE-9537 #2097

Conversation

cammiemw commented Nov 25, 2020

Description

Solution

Tests

Checklist

mikemccand left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cammiemw commented Dec 1, 2020 • edited Loading

mikemccand left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cammiemw Jan 12, 2021 • edited Loading

Choose a reason for hiding this comment

sonatype-lift bot Jan 12, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikemccand commented Jan 12, 2021

cammiemw commented Jan 13, 2021

mikemccand left a comment

Choose a reason for hiding this comment

cammiemw commented Jan 21, 2021

mikemccand commented Jan 26, 2021

mikemccand commented Jan 27, 2021

cammiemw commented Jan 28, 2021

mikemccand commented Jan 29, 2021

mikemccand commented Jan 29, 2021

cammiemw commented Jan 29, 2021

Choose a reason for hiding this comment

cammiemw commented Dec 1, 2020 •

edited

Loading

cammiemw Jan 12, 2021 •

edited

Loading