Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change numeric data types to use SORTED_NUMERIC docvalues type #6967

Closed
wants to merge 6 commits into from

Conversation

rmuir
Copy link
Contributor

@rmuir rmuir commented Jul 22, 2014

Change numeric data types to use SORTED_NUMERIC docvalues type instead of a custom encoding in BINARY.

In low level benchmarks this is 2x to 5x faster: its also optimized for the common case where fields actually only contain at most one value for each document.

Additionally SORTED_NUMERIC doesn't lose values if they appear more than once, so mathematical computations such as averages are correct.

instead of a custom encoding in BINARY.

In low level benchmarks this is 2x to 5x faster: its also optimized
for the common case where fields actually only contain at most one
value for each document.

Additionally SORTED_NUMERIC doesn't lose values if they appear more
than once, so mathematical computations such as averages are correct.
@s1monw
Copy link
Contributor

s1monw commented Jul 22, 2014

I did a review and it looks great. One thing that I really would want to see here is a BWC test that creates & uses the numeric variants with DV on a mixed version cluster and then upgrades the cluster and checks if we are still operating fine. One way of doing this is to simply add some sorting with doubles / longs to BasicBackwardsCompatibilityTest#testIndexRollingUpgrade as well as BasicBackwardsCompatibilityTest#testIndexAndSearch the dynamic index template should take care of randomly selecting docvalues there so simple sorting or faceting should do the job

@s1monw s1monw removed the review label Jul 22, 2014
s1monw added a commit to s1monw/elasticsearch that referenced this pull request Jul 22, 2014
Today we only do count searches to ensure sane results are returned
after upgrading etc. This change adds sorting to the picture asserting
on simple numeric sorting that uses field data etc. after upgrading.

Relates to elastic#6967
@@ -107,7 +109,13 @@ public Builder numericType(NumericType type) {
assert !numericType.isFloatingPoint();
return new NumericDVIndexFieldData(index, fieldNames, mapper.fieldDataType());
} else if (numericType != null) {
return new BinaryDVNumericIndexFieldData(index, fieldNames, numericType, mapper.fieldDataType());
Version version = indexSettings.getAsVersion(IndexMetaData.SETTING_VERSION_CREATED, org.elasticsearch.Version.CURRENT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is the Version.indexCreated method that does the same with additional assertions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks I will switch!

s1monw added a commit that referenced this pull request Jul 22, 2014
Today we only do count searches to ensure sane results are returned
after upgrading etc. This change adds sorting to the picture asserting
on simple numeric sorting that uses field data etc. after upgrading.

Relates to #6967
@rmuir
Copy link
Contributor Author

rmuir commented Jul 23, 2014

@jpountz spotted a horrible backwards break here, I'm too used to the lucene system. I will make sure back compat tests are failing with the current PR first and then update...

@rmuir rmuir added the review label Jul 23, 2014
@rmuir
Copy link
Contributor Author

rmuir commented Jul 23, 2014

I fixed the back compat: backwards tests pass now.

@s1monw
Copy link
Contributor

s1monw commented Jul 23, 2014

LGTM @jpountz can you take another look please

@jpountz
Copy link
Contributor

jpountz commented Jul 23, 2014

I just started. :)

@@ -189,6 +199,8 @@ protected NumberFieldMapper(Names names, int precisionStep, float boost, FieldTy
}
this.ignoreMalformed = ignoreMalformed;
this.coerce = coerce;
Version v = indexSettings == null ? Version.CURRENT : Version.indexCreated(indexSettings);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should open an issue about these null settings, that's worrying!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #6993 as a followup!

@jpountz
Copy link
Contributor

jpountz commented Jul 23, 2014

LGTM

@jpountz jpountz closed this Jul 23, 2014
@jpountz jpountz reopened this Jul 23, 2014
@jpountz
Copy link
Contributor

jpountz commented Jul 23, 2014

sorry, closed by mistake. Just reopened

@rmuir rmuir closed this in 66825ac Jul 23, 2014
rmuir added a commit that referenced this pull request Jul 23, 2014
instead of a custom encoding in BINARY.

In low level benchmarks this is 2x to 5x faster: its also optimized
for the common case where fields actually only contain at most one
value for each document.

Additionally SORTED_NUMERIC doesn't lose values if they appear more
than once, so mathematical computations such as averages are correct.

Closes #6967
@jpountz jpountz removed the review label Jul 24, 2014
@clintongormley clintongormley changed the title Change numeric data types to use SORTED_NUMERIC docvalues type Internal: Change numeric data types to use SORTED_NUMERIC docvalues type Sep 8, 2014
@clintongormley clintongormley added the :Core/Infra/Core Core issues without another label label Jun 7, 2015
@clintongormley clintongormley changed the title Internal: Change numeric data types to use SORTED_NUMERIC docvalues type Change numeric data types to use SORTED_NUMERIC docvalues type Jun 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants