Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default position_offset_gap to 100 #12544

Closed
wants to merge 2 commits into from

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Jul 29, 2015

This is much more fiddly than it looks because of the way position_offset_gap
is applied in StringFieldMapper. Instead of setting the default to 10 its
simpler to make sure that all the analyzers default to 10 and that
StringFieldMapper doesn't override the default unless the user specifies
something different.

Closes #7268

@nik9000
Copy link
Member Author

nik9000 commented Jul 29, 2015

WIP because it needs tests and more testing. This is way way harder then it ought to be.

@nik9000
Copy link
Member Author

nik9000 commented Jul 29, 2015

Oh and it needs documentation changes and breaking changes documentation.

@rmuir
Copy link
Contributor

rmuir commented Jul 30, 2015

Which one is it? the position gap or the offset gap? Lucene has both, and they both have different meanings. Changing the former to this value makes sense, the latter will break many things.

@rmuir
Copy link
Contributor

rmuir commented Jul 30, 2015

And please, please, we have to rename this, because its totally meaningless. positions and offsets are separate things...

@nik9000
Copy link
Member Author

nik9000 commented Jul 30, 2015

I understand the rename. It's the positions.
On Jul 29, 2015 8:02 PM, "Robert Muir" notifications@github.com wrote:

And please, please, we have to rename this, because its totally
meaningless. positions and offsets are separate things...


Reply to this email directly or view it on GitHub
#12544 (comment)
.

@nik9000
Copy link
Member Author

nik9000 commented Aug 3, 2015

Does anyone have an opinion on what would be ok behavior from a backwards compatibility standpoint?

@nik9000
Copy link
Member Author

nik9000 commented Aug 3, 2015

Does anyone have an opinion on what would be ok behavior from a backwards compatibility standpoint?

Talked with @dakrone and we decided that the most correct thing as to make position_offset_gap be immutable once the index is created even if it isn't set. So indexes created before 2.0.0-beta1 will always have a default position_offset_gap of 0 regardless of which version of elasticsearch operates on them. Indexes created by versions of Elasticsearch on or after 2.0.0-beta1 will default to 10 like the issue says they should.

@nik9000 nik9000 added >feature >breaking review :Search/Mapping Index mappings, including merging and defining field types and removed WIP labels Aug 3, 2015
@nik9000
Copy link
Member Author

nik9000 commented Aug 3, 2015

I believe this is ready for review. @jpountz - its more fiddly than I expected at first so it might be worth a review from you or someone who's pretty experienced with mapping.

@mute - I think this is the right way to do #12538. Its much more complicated than it looked and certainly wasn't really low hanging fruit.

@nik9000
Copy link
Member Author

nik9000 commented Aug 11, 2015

I'd love this in 2.0.0 at some point. It'll need a review soon if it is going to.

@@ -150,7 +150,7 @@ limit to `32766 / 3 = 10922` since UTF-8 characters may occupy at most 3
bytes.

|`position_offset_gap` |Position increment gap between field instances
with the same field name. Defaults to 0.
with the same field name. Defaults to the default of the analyzer which is 10.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 seems so arbitrary, but I'm OK with it. I assume you didn't pick 1 because then a sloppy phrase query with slop=1 would incorrectly match...

But, can we explain why it's non-zero here? E.g. just add "... so that proximity queries like PhraseQuery do not incorrectly match across 2 values of a multi-valued field"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! I'll do that.

10 is totally arbitrary. Its much more than 1 which is too low. I'd be happy to change it to 1024 or some other magic number. I just want it much much more than 1.

@clintongormley
Copy link

I agree that 10 is small, and could quite easily overlap with a typical slop value. Personally i'd go for eg 100

@@ -155,6 +173,9 @@ public StringFieldMapper build(BuilderContext context) {
iterator.remove();
} else if (propName.equals("position_offset_gap")) {
builder.positionOffsetGap(XContentMapValues.nodeIntegerValue(propNode, -1));
if (builder.positionOffsetGap < 0) {
throw new MapperParsingException("positions_offset_gap less than 0 aren't allowed.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the exc we throw here effectively discards builder but I think it's less smelly to first assign to a local "untrusted" var first, e.g. int newPositionOffsetGap, and check whether that's < 0, before calling builder.positionOffsetGap(newPositionOffsetGap).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I thought about that. I'll make that change.

@nik9000
Copy link
Member Author

nik9000 commented Aug 11, 2015

Ok - I'll make these changes. I don't think I'll have time today because I'm in class and can't really concentrate. But tonight or tomorrow "morning".

analyzer = (NamedAnalyzer) analyzerF;
if (overridePositionOffsetGap >= 0 && analyzer.getPositionIncrementGap(analyzer.name()) != overridePositionOffsetGap) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add some comments here explaining? I'm confused...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - this was confusing when I wrote it. Its funky and deserves more comments....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - added some comments above.

@mikemccand
Copy link
Contributor

Thanks @nik9000 this is a great change (so prox aware queries never match across 2 values of a multi-valued field), I just left some minor comments.

Can you open a follow-on issue to rename positionOffsetGap? Need not block this good change, but this is extremely confusing :) As @rmuir said, position gap and offset gap are wildly different things!

@nik9000
Copy link
Member Author

nik9000 commented Aug 11, 2015

Can you open a follow-on issue to rename positionOffsetGap? Need not block this good change, but this is extremely confusing :) As @rmuir said, position gap and offset gap are wildly different things!

#12562

@nik9000 nik9000 added v2.1.0 and removed v2.0.0 labels Aug 24, 2015
@nik9000
Copy link
Member Author

nik9000 commented Aug 24, 2015

Swapped 2.0 for 2.1 because we're too late in the 2.0 release cycle to get this merged there.

I'm going to pick this one back up in a few minutes and have another read through. I'll see if I can address that last open comment of @mikemccand and rebase. And I'll change all the 2.0s into 2.1s in the code.

@nik9000 nik9000 changed the title Mapping: Default position_offset_gap to 10 Mapping: Default position_offset_gap to 100 Aug 24, 2015
This is much more fiddly than you'd expect it to be because of the way
position_offset_gap is applied in StringFieldMapper. Instead of setting
the default to 100 its simpler to make sure that all the analyzers default
to 100 and that StringFieldMapper doesn't override the default unless the
user specifies something different. Unless the index was created before
2.1, in which case the old default of 0 has to take.

Also postition_offset_gaps less than 0 aren't allowed at all.

New tests test that:
1. the new default doesn't match phrases across values with reasonably low
slop (5)
2. the new default doest match phrases across values with reasonably high
slop (50)
3. you can override the value and phrases work as you'd expect
4. if you leave the value undefined in the mapping and define it on a
custom analyzer the the value from the custom analyzer shines through

Closes elastic#7268
@@ -0,0 +1,651 @@
[[mapping-core-types]]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grrr - how did I add a whole new file....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I figured it out - rebase woes. Fixing.

My rebased added core-types back which was removed in the mean time. This
properly re-removes the file and moves the updated documentation to the
string field mapper docs.
@nik9000
Copy link
Member Author

nik9000 commented Aug 24, 2015

@mikemccand - would you mind having another look at this? Its now ready for review again based on 2.1.

@mikemccand
Copy link
Contributor

LGTM, thanks @nik9000!

@s1monw
Copy link
Contributor

s1monw commented Aug 25, 2015

@nik9000 this looks good I think this should go into 2.0 though

@nik9000
Copy link
Member Author

nik9000 commented Aug 25, 2015

@nik9000 this looks good I think this should go into 2.0 though

Ok - I'll merge to 2.1 now and backport it. I'm not super comfortable sticking it in 2.0 but I'll try because you want it in there.

@nik9000 nik9000 changed the title Mapping: Default position_offset_gap to 100 Default position_offset_gap to 100 Aug 25, 2015
@nik9000
Copy link
Member Author

nik9000 commented Aug 25, 2015

I merged this into master but github doesn't recognize it - probably because I merged it locally and pushed. That failed so I rebased onto elastic's master and squashed. At this point github's must have lost any connection to the patch.

I'll leave this open to work the backport to 2.0. I'll close it once I've merged there.

* values.
*/
public static final int POSITION_OFFSET_GAP = 100;
public static final int POSITION_OFFSET_GAP_PRE_2_1 = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is going to 2.0 branch right? Do we need to change this to 2_0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, yup. I had originally targeted 2.0, then gave up on making it in, then @s1monw said we should jam it in there anyway. So I got what I had reviewed merged as quickly as I could. It's looking like it'll be reasonably quick to merge to 2.0 so I'll do that and then go fix master so it makes sense.

@nik9000 nik9000 added the v2.0.0 label Aug 25, 2015
=== Mapping changes

==== position_offset_gap
The default `position_offset_grap` is now 100. Indexes created in Elasticsearch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

position_offset_grap -> position_offset_gap

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - Will fix this one right quick.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nik9000
Copy link
Member Author

nik9000 commented Aug 25, 2015

And merged to 2.0. For my last trick on this pull request I'll fix master so that it has the same version range checks as 2.0. And then finally, finally, this is done.

@nik9000
Copy link
Member Author

nik9000 commented Aug 25, 2015

And merged to 2.0. For my last trick on this pull request I'll fix master so that it has the same version range checks as 2.0. And then finally, finally, this is done.

Scratch that, I'll send that as another pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>breaking :Search/Mapping Index mappings, including merging and defining field types v2.0.0-beta2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants