Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nth percentile calculation #137

Merged
merged 16 commits into from Oct 12, 2017
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
20 changes: 2 additions & 18 deletions lib/benchee/statistics.ex
Expand Up @@ -5,6 +5,7 @@ defmodule Benchee.Statistics do
"""

alias Benchee.Statistics.Mode
alias Benchee.Statistics.Percentile

defstruct [:average, :ips, :std_dev, :std_dev_ratio, :std_dev_ips, :median,
:mode, :minimum, :maximum, :sample_size]
Expand Down Expand Up @@ -159,7 +160,7 @@ defmodule Benchee.Statistics do
deviation = standard_deviation(run_times, average, iterations)
standard_dev_ratio = deviation / average
standard_dev_ips = ips * standard_dev_ratio
median = compute_median(run_times, iterations)
median = Percentile.percentile(run_times, iterations, 50)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call re-using this! I never would have put that together myself.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hahaha, I think you would have. I didn't realize until I kept seeing the same values for median and 50th percentile. And reading!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's non obvious yeah but this is how I imagined it. Thanks 😁

mode = Mode.mode(run_times)
minimum = Enum.min run_times
maximum = Enum.max run_times
Expand Down Expand Up @@ -189,21 +190,4 @@ defmodule Benchee.Statistics do
variance = total_variance / iterations
:math.sqrt variance
end

defp compute_median(run_times, iterations) do
# this is rather inefficient, as O(log(n) * n + n) - there are
# O(n) algorithms to do compute this should it get to be a problem.
sorted = Enum.sort(run_times)
middle = div(iterations, 2)

if Integer.is_odd(iterations) do
sorted |> Enum.at(middle) |> to_float
else
(Enum.at(sorted, middle) + Enum.at(sorted, middle - 1)) / 2
end
end

defp to_float(maybe_integer) do
:erlang.float maybe_integer
end
end
86 changes: 86 additions & 0 deletions lib/benchee/statistics/percentile.ex
@@ -0,0 +1,86 @@
defmodule Benchee.Statistics.Percentile do
@moduledoc false

@doc """
Calculates the value at the `percentile_number`-th percentile. Think of this as the
value below which `percentile_number` percent of the samples lie. For example,
if `Benchee.Statistics.Percentile.percentile(samples, 99)` == 123.45,
99% of samples are less than 123.45.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

less than or equal to imo?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely open to improvements in the documentation. My understanding is that this value we are calculating is a "less than" number. Here's what Wikipedia says:

A percentile (or a centile) is a measure used in statistics indicating the value below which a given percentage of observations in a group of observations fall. For example, the 20th percentile is the value (or score) below which 20% of the observations may be found.

The term percentile and the related term percentile rank are often used in the reporting of scores from norm-referenced tests. For example, if a score is at the 86th percentile, where 86 is the percentile rank, it is equal to the value below which 86% of the observations may be found (carefully contrast with in the 86th percentile, which means the score is at or below the value of which 86% of the observations may be found - every score is in the 100th percentile).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo it's or equal. E.g.

When we have 10 values 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 - the 80th percentile would be 8 and 8 is a member of that 80th percentile group so from my understanding it should be less than or equal

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However....

iex(2)> Percentile.percentile([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 80)
8.8

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but.... but... I guess I have to take a look at the calculation again then and the involved values or the algorithm used to obtain them. I had some suspicions about indexes being off somewhere... for my understanding of 80th percentile (aka 80 percent of the data) this should most definitely return 8 - or where am I going wrong?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After cross checking this seems to be the right value (tm) but it still appearsto be weird to me. I mean 80% of my values are up until including 8. In this particular case I would have understood if it's like 8.2 or something... 8.8 just plain seems too weird. Need to up my percentile knowledge apparently


## Examples

iex> Benchee.Statistics.Percentile.percentile([5, 3, 4, 5, 1, 3, 1, 3], 8, 100)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the lovely doctests 🎉

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 🎉 🎉

5.0

iex> Benchee.Statistics.Percentile.percentile([5, 3, 4, 5, 1, 3, 1, 3], 8, 150)
5.0

iex> Benchee.Statistics.Percentile.percentile([5, 3, 4, 5, 1, 3, 1, 3], 8, 0)
1.0

iex> Benchee.Statistics.Percentile.percentile([5, 3, 4, 5, 1, 3, 1, 3], 8, -1)
1.0

iex> Benchee.Statistics.Percentile.percentile([5, 3, 4, 5, 1, 3, 1, 3], 50)
3.0

iex> Benchee.Statistics.Percentile.percentile([5, 3, 4, 5, 1, 3, 1, 3], 75)
4.75
"""
@spec percentile(list(number()), integer()) :: float()
def percentile(samples, percentile_number) do
percentile(samples, length(samples), percentile_number)
end

@spec percentile(list(number()), integer(), integer()) :: float()
def percentile(samples, number_of_samples, percentile_number) when percentile_number > 100 do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First off, do we want percentile/3 to be public? It seems to me like the caller can in all cases use percentile/2 and be totally cool.

What would you think about failing in this case and the one on line 40, and giving a helpful error message? I can't imagine that a user would give us a percentage of more than 99 or less than 1 on purpose. I think it would be likely that these would be cases of typos, and that by coercing these error cases we might be giving users results they think are strange and hiding their typos when we should be telling them how to fix their typos and get the results they want.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devonestes so....the reason I have percentile/3 is because Statistics has both samples and iterations. I assumed that was to avoid calculating the length of the list over and over again (though I'm not certain about that). So—you're right, percentile/2 is really the only necessary interface, unless we want to be able to pass in number_of_samples (i.e. iterations) for efficiency. Have we.....benchmarked? 😁

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you think about failing in this case and the one on line 40, and giving a helpful error message?

Actually that was my first inclination, but then I backed it out. What do you imagine, throwing something like raise ArgumentError, "bad percentile value: #{bad_value}. Percentile must be greater than 0 and less than 100"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For performance see #139 - with what is reasonable to expect to hit us (~1 Million) it takes ~4ms on my machine, we might have multiple scenarios so let's say * 10 that gives 40ms which is ok I guess ;) For 10 Million we'd be at about 300ms total according to this calculation. Non noticable for the given scenario user wise.
However, that's not the real reason I reused iterations - it just feels bad to me when I have calculated a value in a top level method and then just recalculate the value in a method directly called from that method. Such waste, could just pass it along.
When I eventually extract a library called statistex from benchee I think I'd keep the /3 version because I'd like to give that possibility to people as they might operate on bigger data and it might make a different for them.

Raising an argument error sounds nice! 👍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, not sure where we stand with percentile/2 vs percentile/3. Speed-wise, not a big deal (thanks @PragTob). Actually, percentile/2 is not used here at all, I just included it because it felt like the best interface. And I think there is a difference between carrying along the precomputed length within a module, versus allowing it across modules. I mean, if you pass the wrong number there, you get wrong results. So, for that reason, I'm going to remove the percentile/3 version. It's trivial to add back again in the future.

percentile(samples, number_of_samples, 100)
end

def percentile(samples, number_of_samples, percentile_number) when percentile_number < 0 do
percentile(samples, number_of_samples, 0)
end

def percentile(samples, number_of_samples, percentile_number) do
sorted = Enum.sort(samples)
rank = (percentile_number / 100) * max(0, number_of_samples + 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why number_of_samples + 1 - i.e. why the + 1, shouldn't we be good without it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PragTob I had the same thought. But it's part of the formula. everything comes out wrong if you just use number_of_samples. Imagine you have a list of 10 items, and you're looking for the 50th percentile (median):

currently: rank = (50 / 100) * 11 = 5.5
without + 1: rank = (50 / 100) * 10 = 5

So, as-is, we look for the value halfway between the 5th and 6th elements (yay, median). If we remove the +1, then we look for the value of the 5th element (boo, not the median).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense... for the odd case of 9, 1/2 * 9 would also be 4.5 but we are looking for 5. Thanks.

However.... dum dum dum - we are 0 indexed. So in my example above we should be looking at number 4 not number 5? But maybe that is redeemed through the drop... weird.

I think it's ok as the median tests I wrote back in the day + your tests are passing. One of these things where I trust the tests more than my brain :D

percentile_value(sorted, rank)
end

defp percentile_value(sorted, rank) when trunc(rank) == 0 do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had never seen trunc before - very cool!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With our new error throwing ways we shouldn't need this method anymore :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing for the function version below but it won't let me comment :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this probably got buried in my initial comment, but I added this back as a private function mainly so we don't have to sort the list again while calculating multiple percentiles on the same list

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i meant the specific == 0 but maybe we can't get rid off it :'(

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try—it's actually a guard for the Enum.drop later, to make sure we have enough elements to make a match. But I'm not sure it adequately protects us. I need to write some more tests.

sorted
|> hd
|> to_float
end

defp percentile_value(sorted, rank) when trunc(rank) >= length(sorted) do
sorted
|> Enum.reverse
|> hd
|> to_float
end

defp percentile_value(sorted, rank) do
index = trunc(rank)
[lower_bound, upper_bound | _] = Enum.drop(sorted, index - 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at first I was pretty skeptical, but this seems like a great usage of drop - thanks! 👍

interpolation_value = interpolation_value(lower_bound, upper_bound, rank)
lower_bound + interpolation_value
end

# "Type 6" interpolation strategy. There are many ways to interpolate a value
# when the rank is not an integer (in other words, we don't exactly land on a
# particular sample). Of the 9 main strategies, (types 1-9), types 6, 7, and 8
# are generally acceptable and give similar results.
#
# For more information on interpolation strategies, see:
# - https://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html
# - http://www.itl.nist.gov/div898/handbook/prc/section2/prc262.htm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 information

defp interpolation_value(lower_bound, upper_bound, rank) do
interpolation_weight = rank - trunc(rank)
interpolation_weight * (upper_bound - lower_bound)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍

end

defp to_float(maybe_integer) do
:erlang.float maybe_integer
end
end
26 changes: 26 additions & 0 deletions test/benchee/statistics/percentile_test.exs
@@ -0,0 +1,26 @@
defmodule Benchee.Statistics.PercentileTest do
use ExUnit.Case, async: true
alias Benchee.Statistics.Percentile
doctest Percentile

@nist_sample_data [
95.1772,
95.1567,
95.1937,
95.1959,
95.1442,
95.0610,
95.1591,
95.1195,
95.1065,
95.0925,
95.1990,
95.1682
]

@tag run: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need the tag, do we? :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😊

test "90th percentile" do
result = Percentile.percentile(@nist_sample_data, 90)
assert Float.round(result, 4) == 95.1981
end
end