Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

Correctly handle samples with heavily clustered values (gh-11)

If a sample contains many values that are clustered around a single
value, this throws off classifyOutliers.

Before this change, we could easily end up considering a value as
both high and low, thereby counting it in more than one bucket at
a time (which should not happen). As a result, we would sometimes
report more outliers in a data set than sample values.

With this change, every outlier should be classified into a single
bucket. Our estimate-based weighted average method can still lead
to the wrong bucket being chosen, but at least there should be only
one bucket now!
  • Loading branch information...
commit ea5cd9edd3f634929aa83aa735d1c2d10bb29abf 1 parent 53d3203
@bos authored
Showing with 2 additions and 2 deletions.
  1. +2 −2 Criterion/Analysis.hs
View
4 Criterion/Analysis.hs
@@ -46,10 +46,10 @@ classifyOutliers :: Sample -> Outliers
classifyOutliers sa = U.foldl' ((. outlier) . mappend) mempty ssa
where outlier e = Outliers {
samplesSeen = 1
- , lowSevere = if e <= loS then 1 else 0
+ , lowSevere = if e <= loS && e < hiM then 1 else 0
, lowMild = if e > loS && e <= loM then 1 else 0
, highMild = if e >= hiM && e < hiS then 1 else 0
- , highSevere = if e >= hiS then 1 else 0
+ , highSevere = if e >= hiS && e > loM then 1 else 0
}
loS = q1 - (iqr * 3)
loM = q1 - (iqr * 1.5)
Please sign in to comment.
Something went wrong with that request. Please try again.