Remove the construction of second bitmap in text index reader to improve performance #5199
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a follow-up to optimization implemented in PR #5177.
Since we now have pre-built mapping of luceneDocId to pinotDocId, we can directly build the result bitmap with pinotDocId. This PR removes the construction of second bitmap since earlier we had to do build the result in two phases -- (1) run search query to get luceneDocIDs in a bitmap. Iterate over this bitmap and build a second bitmap with corresponding pinotDocIds.
Now in our Lucene collector callback, we can directly build the final bitmap.
This change along with previous PR provides significant performance improvements.
Ran an increasingQPS test on real data (single segment with 10million docs and a text index). QPS was increased from 1 to 40
The optimizations implemented in this and previous PR are not directly applicable to realtime (we have the exact same performance overhead in realtime too) since we can't have a pre-built mapping there. We mostly need to build a cache on-the-fly as queries are processed on realtime lucene index. A solution is in progress. Will put PR soon