Per-field boosting of the _all field is broken unless very specific conditions are met #4315

jpountz · 2013-12-02T20:53:30Z

The _all field uses payloads in order to be able to store per-field boosts in a single index field. However, the way it is implemented relies on the fact that the token stream doesn't eagerly consume the input java.io.Reader (see AllEntries.read). So in practice, boost on the _all field doesn't work when under any of these circumstances:

there is a char filter,
the tokenizer is not the standard tokenizer,
any token filter has read-ahead logic.

The text was updated successfully, but these errors were encountered:

roytmana · 2013-12-02T22:37:12Z

Could you also consider a wider scope of

Per field boost in multified see Index time boost in multi_field ignored? #4108
Infrastructure for boosting fragments of input text at index time. This would allow to have some sort of markup in the indexed json to supply boost to fragments of text. Common use case is finding and boosting fragments of importance as a part of indexing

jpountz · 2013-12-03T11:02:53Z

@roytmana The two issues you are mentioning are actually quite tough to implement, so I would like to concentrate on just fixing boosting on the _all field for now.

roytmana · 2013-12-03T14:38:17Z

@jpountz isn't #1 quite similar to _all?
I understand _all is searched in a special way taking per field boosts stored as postings into account. Could not the same to be done for multifields?

jpountz · 2013-12-03T14:48:32Z

@roytmana a similar method could be applied indeed. But I'm not fully happy with the way per-field boosting works for the _all field so I would like that we consider improving it before applying the same logic to other places. In particular, this doesn't work with all queries (eg. phrase queries) and is quite wasteful storage-wise (4 bytes per occurrence of a term whose field has a boost which is not 1: I wouldn't be surprised to see that it sometimes almost doubles the size of the inverted index for the _all field).

roytmana · 2013-12-03T14:57:35Z

@jpountz Great thank you for the info. I just wanted to bring these two cases up so you could consider them as you work on _all implementation. Hopefully multifield will follow soon :-) and an arbitrary snippet boosting after that

_all boosting used to rely on the fact that the TokenStream doesn't eagerly consume the input java.io.Reader. This fixes the issue by using binary search in order to find the right boost given a token's start offset. Close elastic#4315

roytmana · 2013-12-05T17:48:50Z

@jpountz do you mind if I create another ticket with expanded scope as discussed in my first reply toy your post as I feel ability to boos individual text fragments and particularly multifields is very powerful feature?
Or maybe you would rather write it up yourself?

_all boosting used to rely on the fact that the TokenStream doesn't eagerly consume the input java.io.Reader. This fixes the issue by using binary search in order to find the right boost given a token's start offset. Close #4315

jpountz · 2013-12-05T23:03:55Z

@roytmana please open a ticket. I do think the ability to boost individual text fragments is very interesting!

_all boosting used to rely on the fact that the TokenStream doesn't eagerly consume the input java.io.Reader. This fixes the issue by using binary search in order to find the right boost given a token's start offset. Close elastic#4315

ghost assigned jpountz Dec 3, 2013

jpountz mentioned this issue Dec 3, 2013

Fix _all boosting. #4326

Closed

jpountz closed this as completed in 309ee7d Dec 5, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Per-field boosting of the _all field is broken unless very specific conditions are met #4315

Per-field boosting of the _all field is broken unless very specific conditions are met #4315

jpountz commented Dec 2, 2013

roytmana commented Dec 2, 2013

jpountz commented Dec 3, 2013

roytmana commented Dec 3, 2013

jpountz commented Dec 3, 2013

roytmana commented Dec 3, 2013

roytmana commented Dec 5, 2013

jpountz commented Dec 5, 2013

Per-field boosting of the _all field is broken unless very specific conditions are met #4315

Per-field boosting of the _all field is broken unless very specific conditions are met #4315

Comments

jpountz commented Dec 2, 2013

roytmana commented Dec 2, 2013

jpountz commented Dec 3, 2013

roytmana commented Dec 3, 2013

jpountz commented Dec 3, 2013

roytmana commented Dec 3, 2013

roytmana commented Dec 5, 2013

jpountz commented Dec 5, 2013