Optimize IndexSearcher.count() to rewrite the query once#16001
Conversation
javanna
left a comment
There was a problem hiding this comment.
Change looks reasonable to me, left a small comment on the constant score query wrapping logic.
| // Use the already-rewritten query directly, avoiding a redundant rewrite in search(query, collector) | ||
| var collectorManager = new TotalHitCountCollectorManager(getSlices()); | ||
| var firstCollector = collectorManager.newCollector(); | ||
| final Weight weight = createWeight(query, firstCollector.scoreMode(), 1); |
There was a problem hiding this comment.
to be exactly the same as before, shouldn't the weight be created providing new ConstantScoreQuery(innerQuery) ? Not sure if there are edge cases around this, but it seems like the rewritten query may not be a constant score query, in which case we would still wrap it here with a constant score query? Am I reading this correctly?
There was a problem hiding this comment.
query has already been rewritten with ConstantScoreQuery, and thus calling createWeight(query,...) takes into consideration CSQ (basically benefitting from CSQ's rewrite optimizations that have already occurred). I don't think the code down here I'm replacing should reference innerQuery as that var exists soley to detect the 2-clause disjunction optimization.
We could probably get away with referencing innerQuery down here thus simply unwrapping the CSQ but I'd think there's not really a benefit, and it muddies the clarity of the structure of this method IMO.
There was a problem hiding this comment.
I suppose your inquiry indirectly challenges my opinion that this method has structure, as you don't see it. I recommend introducing a pair of { } to scope the innerQuery & rewriteTwoClauseDisjunctionWithTermsForCount check so that the code down here can't reference it.
I'd also like to comment that the inserted CSQ has some optimizations that rewrite will simplify.
There was a problem hiding this comment.
Thanks. It was not a matter of structure on my end, more that I assumed the existing logic was there for good reasons that I could not entirely explain, and changing it may cause issues. We do rewrite to the constant score query, and I wondered why we'd need to forcibly wrap again ahead of the weight creation / additional rewrite.
Looking at history, I see the change that introduced the duplicated rewrite was #13036 . That one did not touch the weight creation logic, but it probably should have in the first place, to remove the additional wrapping and prevent duplicated rewrite.
Add changelog for 10.5
(cherry picked from commit 038e167)
IndexSearcher.count was invoking Query.rewrite and then calling search() which would rewrite the query again. Rewriting the query can be expensive.
Credit to my colleague Niyant for noticing this when doing some JFR analysis.