Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-16002: don't double-cache FilterQuery #624

Merged
merged 9 commits into from Mar 29, 2022
Merged

Conversation

magibney
Copy link
Contributor

See: SOLR-16002

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears this is obsoleted by the PR that made Filter go away -- see for yourself. I did and your test still passes without the change you did in SolrIndexSearcher :-). I didn't dig into why exactly though I would be curious if you could find out. I like your test -- you should add it nonetheless.

@magibney
Copy link
Contributor Author

@dsmiley I merged main into this branch (including PR #529), and the test still fails for me without the modifications in SolrIndexSearcher (I'm pretty sure about this).
Also addressed the refactoring comments; thanks for the review!

@dsmiley
Copy link
Contributor

dsmiley commented Feb 18, 2022

I'm confused at our differing experience. I just updated and added an assert false; immediately before the docCache = false; you added in SIS. The test still passes for me. A breakpoint doesn't make it there either. If you see a failure, then can you share the stacktrace for the assertion?

@magibney
Copy link
Contributor Author

That's definitely strange!

  1. on f29144a (PR branch current HEAD)
  2. run git diff apache-solr/main... -- solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java | git apply -R (reverts the changes this PR made to SolrIndexSearcher, wrt merge-base with main)
  3. gradlew :solr:core:test --tests "org.apache.solr.search.TestFiltersQueryCaching.testRecursiveFilter" -Ptests.jvms=4 -Ptests.jvmargs=-XX:TieredStopAtLevel=1 -Ptests.seed=8983C499E841DCAC -Ptests.file.encoding=ISO-8859-1:
   >     java.lang.AssertionError: expected:<1> but was:<2>
   >         at __randomizedtesting.SeedInfo.seed([8983C499E841DCAC:E754A78A6DD76190]:0)
   >         at org.junit.Assert.fail(Assert.java:89)
   >         at org.junit.Assert.failNotEquals(Assert.java:835)
   >         at org.junit.Assert.assertEquals(Assert.java:647)
   >         at org.junit.Assert.assertEquals(Assert.java:633)
   >         at org.apache.solr.search.TestFiltersQueryCaching.testRecursiveFilter(TestFiltersQueryCaching.java:87)
   >         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   >         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   >         at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   >         at java.base/java.lang.reflect.Method.invoke(Method.java:566)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
   >         at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
   >         at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
   >         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
   >         at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
   >         at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   >         at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
   >         at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
   >         at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
   >         at org.junit.rules.RunRules.evaluate(RunRules.java:20)
...

@magibney
Copy link
Contributor Author

magibney commented Feb 18, 2022

immediately before the docCache = false; you added in SIS

docCache=false had to be added in two places in SIS ... just making sure, because if you only reverted one place, maybe the other code path is the relevant one in the context of the test?

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh; yes I overlooked that.

The additional special cases kind of has a smell to it that wants for a more elegant solution. I spent some time with the code closely now and I'm thinking that FilterQuery should extend WrappedQuery. WDYT?

I'd also prefer that FilterQuery.getCache is respected, which will happen if FilterQuery extends WrappedQuery but isn't respected with your PR as-is.

@dsmiley
Copy link
Contributor

dsmiley commented Feb 18, 2022

WrappedQuery would then be non-final, but it's constructor would be package-private to restrain abuse. Instead it could have a "wrap" static. For even greater ease-of-use, that wrap method could take the cache & cost values as parameters and only conditionally actually wrap if they differ from defaults. Just an idea.

@magibney
Copy link
Contributor Author

magibney commented Feb 18, 2022

Hmm... the danger with that is that although FilterQuery does wrap a query, the semantics of WrappedQuery are that the inner query should be accessible, and that at a certain point in the docset-creation-from-query code it is proper to "unwrap" the query and run the inner query.

The semantics of FilterQuery are different. If there's a way to provide access to the backing query, it can't/shouldn't be leveraged in the same way that it is for WrappedQuery, so the "instanceof" checks for WrappedQuery would not be appropriate to use I think.

What if FilterQuery (which already extends ExtendedQueryBase) simply overrode getCache() to be final and always return false? Then we'd have the "paradoxically" note in that overridden method, no special cases in SIS. EDIT: I took a crack at this approach with 2df8cf0, and I do think it's cleaner.

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was exploring that path too (what you did just now) and what I didn't like about it is that it blocks using FilterQuery to mean filter but not cache. But I suppose that's okay... whoever creates this thing could consider this scenario and simply boost a ConstantScoreQuery and get the same result.

@madrob
Copy link
Contributor

madrob commented Feb 18, 2022

what I didn't like about it is that it blocks using FilterQuery to mean filter but not cache.

Does this mean that local params fq={!cache=false}foo:bar would no longer work?

@dsmiley
Copy link
Contributor

dsmiley commented Feb 18, 2022

Does this mean that local params fq={!cache=false}foo:bar would no longer work?

No, it doesn't mean that. It's confusing but FilterQuery != fq param.

@dsmiley
Copy link
Contributor

dsmiley commented Feb 18, 2022

The fact that FilterQuery != fq was something I was toying with today -- maybe it should, which would be more intuitive.

@magibney
Copy link
Contributor Author

I think I follow? Yes, FilterQuery here is as in FilterQuery.java, which is really specific, almost always invoked via filter([backing_query]), when you want consult the filterCache for the backing query in a context where it would normally not consult the filterCache. This is particulary useful, e.g., for fine-grained control of how subclauses in queries are cached. That use-case/concept doesn't go away if there's a closer relationship between fq and FilterQuery ... if anything I could see renaming FilterQuery to be more clearly distinct from fq (in fact, it's hard to search Jira for issues related to FilterQuery because that term is indeed often used informally to refer to fq).

So, just to clarify, your comment above should no longer be a concern (i.e., fq={!cache=false}foo:bar would definitely still work as it currently does)? What this PR does is make it so that fq=filter(foo:bar) would only cache one query, not two. And, for good measure, fq={!cache=false}filter(foo:bar) would still hit the filterCache via the filter(...) wrapper.

@magibney
Copy link
Contributor Author

Sorry, I misunderstood what you meant in your comment above. But in any event I still think the concern is not warranted:

  1. fq=filter({!cache=false}foo:bar) would currently cache the FilterQuery, but not the inner query, so subsequent requests for fq=foo:bar would not be a cache hit.
  2. fq=filter(foo:bar) currently caches twice, this PR would cause to only cache foo:bar
  3. fq={!cache=false}filter(foo:bar), oddly, would currently only cache the inner query (behavior equivalent to what this PR introduces as the default for fq=filter(foo:bar)
  4. fq={!cache=true}filter(foo:bar) -- i.e., explicitly requesting that the FilterQuery per se be cached, would throw an IllegalArgumentException under the new PR, and on current main the behavior of this would be equivalent to fq=filter(foo:bar): double-caching. I mean, we could skip the IAE, but this is really really wrong -- I can't think of any circumstance where you'd actually want this.
  5. And yes, you'd be better off with (foo:bar)^=1 for an explicit ConstantScoreQuery; currently in order to support the use case you mention (using FilterQuery to filter but not cache), iiuc you'd have to do this: filter({!cache=false}foo:bar)!

@madrob
Copy link
Contributor

madrob commented Feb 18, 2022

So the use case we're trying to support is "fq=filter(foo:bar) OR foo:baz"?

@magibney
Copy link
Contributor Author

magibney commented Feb 18, 2022

Well "fq=filter(foo:bar) OR foo:baz" is already supported, and this PR wouldn't change that particular example; and yes this example illustrates one reasonable use of filter(...): to separately consult the filterCache for the "foo:bar" subclause, in addition to consulting filterCache (in this example) for the overall fq.

So appropriately, this example would result in two filterCache consultations. This PR addresses the case where fq and filter(...) are coterminous -- where they represent exactly the same fundamental query, but would currently be cached twice.

I guess the way I think about it, filter(...) gives you arbitrarily granular control over how subclauses get cached. The fq=filter(...) example is really straightforward, but is admittedly unlikely to come up in "real world" use. The use that initially motivated my interest in this is perhaps informative: the relatedness JSON facet function accepts a single query arg to define its "foreground set". For obvious reasons, I wanted to define the foreground set to be "the combination of main query and filters", but I found that doing so via the FiltersQParser ("{!filters param=$q param=$fq}") resulted in consulting the filterCache for the top-level combination of q&fq, not finding it (of course), and running all the underlying queries without consulting the cache! Having fixed FiltersQParser to generate FilterQuery instances for its individual subclauses (separate PR forthcoming), I found that each of the subclauses was getting cached twice: once as the clause's underlying query, and once as the clause's FIlterQuery wrapper. That behavior is what this PR fixes, and the behavior that the associated test is designed to evaluate.

@magibney
Copy link
Contributor Author

Thanks again David and Mike; I just wanted to call out one minor change (in 1b4d7d1): instead of throwing an IllegalArgumentException upon attempting FilterQuery.setCache(true), we now silently ignore this at the implementation level, considering that because of what FilterQuery is, the higher-level semantics of setCache(...) are actually better respected by a lenient approach. I've added a detailed comment in the setCache() method, and built out the test method to cover related cases.

I plan to commit tomorrow, pending further feedback/concerns.

@magibney magibney merged commit 534696c into apache:main Mar 29, 2022
magibney added a commit that referenced this pull request Mar 29, 2022
…query)` syntax (#624)

(cherry picked from commit 534696c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants