Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a benchmark for the numeric facet builder and use sort.Sort in it (just like for the terms one) #134

Merged
merged 2 commits into from
Mar 6, 2015

Conversation

Shugyousha
Copy link
Contributor

I am not 100% sure that the benchmark is measuring the right thing so some checking would be advised. In its current form it adds 10/100/1000 ranges to the range facet builder and then updates it with a very limited set of 4 prefix coded values over and over again. Afterwards it measures how long the Result() function takes to return its result.

If the benchmark is indeed mimicking real world uses, using the slice-based Result()-implementation is a net win just like with the term facet builder. Here are the numbers (on my i7 under Linux).

Old (using a list):

PASS
BenchmarkNumericFacet10   100000             14304 ns/op
BenchmarkNumericFacet100           10000            147730 ns/op
BenchmarkNumericFacet1000            300           4351645 ns/op
...

New (using sort.Sort):

PASS
BenchmarkNumericFacet10   200000              8603 ns/op
BenchmarkNumericFacet100           20000             73970 ns/op
BenchmarkNumericFacet1000           2000            782526 ns/op
...

@mschoch
Copy link
Member

mschoch commented Jan 16, 2015

Sorry I've been slow to get around to this one. I think this is the right approach, though I'm not sure the tests do the right thing.

Do you remember why you manually created the nu.PrefixCoded values? Or what the significance was of the 4 values you chose here? Shugyousha@222770b#diff-6f28ca93dc2a04f0ba1465c0b1902f0eR14

@Shugyousha
Copy link
Contributor Author

On Fri, Jan 16, 2015 at 03:02:34PM -0800, Marty Schoch wrote:

Sorry I've been slow to get around to this one. I think this is the

No problem!

right approach, though I'm not sure the tests do the right thing.

What I gathered from reading the code is that you create a
NumericFacetBuilder (NFB) for a field and then create Ranges for it using
the AddRange method. Each range has a min and max value as well as a name.

To use this builder you have to call its Update() method. This method
takes a FieldTerm instance as an argument which maps a field name
to a string slice. The strings in the slice are then interpreted as
nu.PrefixCoded values and counted for a Range (previously added to the
builder) if the value falls between the min/max of the Range.

In my benchmark I add a number of ranges (10-10'000) each with a slightly
different name and min/max values. For each Range added I call Update()
on the NFB with the same 4 PrefixCoded values.

Considering that what this benchmark eventually measures is the time it
takes for the NFB to return the NumericRanges with the highest Counts
this may not be ideal. Updating the NFB with the 4 values for each Range
added means that we have only 40 values added through Update() when 10
Ranges are added, but 40'000 when we have 10'000 ranges. A more typical
use case may be to have 10'000 values with 10 ranges. Since I am only
benchmarking the Result() function though this should not matter as long
as we have a few Ranges with differing counts to sort and return from it.

Looking at the code again now, I see one potential issue. The count of
NumericRanges returned is always the same as the total of NumericRanges
that has been added to the NFB. This does not matter for my implementation
using sort.Sort since all Ranges are being sorted in Result() regardless
of whether only a subset or all of them are being returned. For the old
implementation using a List however, that means that the complete list
has to be built incrementally before being returned which is the worst
case for this implementation while at the same time probably not being
a very common use case.

It may be better to add a fixed amount of NumericRanges (1000 max)
and returning a variable subset of them (5,10,100,500 maybe?). What do
you think?

Do you remember why you manually created the nu.PrefixCoded values?

As far as I understood I just needed some arbitrary test data that is
prefix encoded. It does not really matter what those values are (but
see below).

Or what the significance was of the 4 values you chose here?
Shugyousha@222770b#diff-6f28ca93dc2a04f0ba1465c0b1902f0eR14

There is no real significance, I am afraid. I am just not too familiar
with the PrefixCoded format for numbers so I took the first 4 values
that are being used in prefix_coded_test.go. These have a shift value
of 0 which are the only ones being handled by the NFB.Update() method
(why that is I am not clear about though). If I added PrefixCoded values
that are not shifted by 0 we would have only Ranges in the NFB with a
Count of 0 which would skew the benchmark results for sorting them, or
putting them into a linked list (in the old implementation) respectively.

mschoch added a commit that referenced this pull request Mar 6, 2015
Add a benchmark for the numeric facet builder and use sort.Sort in it (just like for the terms one)
@mschoch mschoch merged commit eaccd74 into blevesearch:master Mar 6, 2015
mschoch added a commit that referenced this pull request Mar 14, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants