You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed we create a new AllTokenStream per indexed doc to index the _all field, and remap any per-field boosts to payloads, but if the AllEntries saw no boosts (it already has a boolean customBoost() method to check this) then I think we can skip wrapping with AllTokenStream?
The cost of AllTokenStream.incrementToken is non-trivial because on each token it does a binary search to look up the boost for that entry. Separately, I think this binary search may not be necessary (can't it just use the "current" entry's boost?).
But stepping back, can't ES just add multiple instances of the _all field, rather than making a custom Reader impl (AllEntries) and TokenFilter (AllTokenStream) that does the concatenating on the fly? When Lucene inverts the multi-valued field it logically appends them together.
The text was updated successfully, but these errors were encountered:
I also noticed & fixed a possible bug in AllFieldMapper.queryTermToString that would fail to return AllTermQuery if the field was index with offsets (for postings highlighter)...
AllTokenStream, used to index the _all field, adds some overhead, but
it's not necessary when no fields were boosted or when positions are
not indexed the _all field.
Closes#6187Closes#6219
clintongormley
changed the title
Don't use AllTokenStream if no fields were boosted
Indexing: Don't use AllTokenStream if no fields were boosted
Jul 16, 2014
clintongormley
changed the title
Indexing: Don't use AllTokenStream if no fields were boosted
Don't use AllTokenStream if no fields were boosted
Jun 7, 2015
I noticed we create a new AllTokenStream per indexed doc to index the _all field, and remap any per-field boosts to payloads, but if the AllEntries saw no boosts (it already has a boolean customBoost() method to check this) then I think we can skip wrapping with AllTokenStream?
The cost of AllTokenStream.incrementToken is non-trivial because on each token it does a binary search to look up the boost for that entry. Separately, I think this binary search may not be necessary (can't it just use the "current" entry's boost?).
But stepping back, can't ES just add multiple instances of the _all field, rather than making a custom Reader impl (AllEntries) and TokenFilter (AllTokenStream) that does the concatenating on the fly? When Lucene inverts the multi-valued field it logically appends them together.
The text was updated successfully, but these errors were encountered: