src/Test/Tasty/Bench.hs

{- |
Module:      Test.Tasty.Bench
Copyright:   (c) 2021 Andrew Lelechenko
Licence:     MIT

Featherlight benchmark framework (only one file!) for performance
measurement with API
mimicking [@criterion@](http://hackage.haskell.org/package/criterion)
and [@gauge@](http://hackage.haskell.org/package/gauge).
A prominent feature is built-in comparison against previous runs
and between benchmarks.

=== How lightweight is it?

There is only one source file "Test.Tasty.Bench" and no non-boot
dependencies except [@tasty@](http://hackage.haskell.org/package/tasty). So
if you already depend on @tasty@ for a test suite, there is nothing else
to install.

Compare this to @criterion@ (10+ modules, 50+ dependencies) and @gauge@
(40+ modules, depends on @basement@ and @vector@). A build on a clean machine is up to 16x
faster than @criterion@ and up to 4x faster than @gauge@. A build without dependencies
is up to 6x faster than @criterion@ and up to 8x faster than @gauge@.

@tasty-bench@ is a native Haskell library and works everywhere, where GHC
does. We support a full range of architectures (@i386@, @amd64@, @armhf@,
@arm64@, @ppc64le@, @s390x@) and operating systems (Linux, Windows, macOS,
FreeBSD, OpenBSD, NetBSD), plus any GHC from 7.0 to 9.6.

=== How is it possible?

Our benchmarks are literally regular @tasty@ tests, so we can leverage
all existing machinery for command-line options, resource management,
structuring, listing and filtering benchmarks, running and reporting
results. It also means that @tasty-bench@ can be used in conjunction
with other @tasty@ ingredients.

Unlike @criterion@ and @gauge@ we use a very simple statistical model
described below. This is arguably a questionable choice, but it works
pretty well in practice. A rare developer is sufficiently well-versed in
probability theory to make sense and use of all numbers generated by
@criterion@.

=== How to switch?

<https://cabal.readthedocs.io/en/3.4/cabal-package.html#pkg-field-mixins Cabal mixins>
allow to taste @tasty-bench@ instead of @criterion@ or @gauge@ without
changing a single line of code:

> cabal-version: 2.0
>
> benchmark foo
>   ...
>   build-depends:
>     tasty-bench
>   mixins:
>     tasty-bench (Test.Tasty.Bench as Criterion, Test.Tasty.Bench as Criterion.Main, Test.Tasty.Bench as Gauge, Test.Tasty.Bench as Gauge.Main)

This works vice versa as well: if you use @tasty-bench@, but at some
point need a more comprehensive statistical analysis, it is easy to
switch temporarily back to @criterion@.

=== How to write a benchmark?

Benchmarks are declared in a separate section of @cabal@ file:

> cabal-version:   2.0
> name:            bench-fibo
> version:         0.0
> build-type:      Simple
> synopsis:        Example of a benchmark
>
> benchmark bench-fibo
>   main-is:       BenchFibo.hs
>   type:          exitcode-stdio-1.0
>   build-depends: base, tasty-bench
>   ghc-options:   "-with-rtsopts=-A32m"
>   if impl(ghc >= 8.6)
>     ghc-options: -fproc-alignment=64

And here is @BenchFibo.hs@:

> import Test.Tasty.Bench
>
> fibo :: Int -> Integer
> fibo n = if n < 2 then toInteger n else fibo (n - 1) + fibo (n - 2)
>
> main :: IO ()
> main = defaultMain
>   [ bgroup "fibonacci numbers"
>     [ bench "fifth"     $ nf fibo  5
>     , bench "tenth"     $ nf fibo 10
>     , bench "twentieth" $ nf fibo 20
>     ]
>   ]

Since @tasty-bench@ provides an API compatible with @criterion@, one can
refer to
<http://www.serpentine.com/criterion/tutorial.html#how-to-write-a-benchmark-suite its documentation>
for more examples.

=== How to read results?

Running the example above (@cabal@ @bench@ or @stack@ @bench@) results in
the following output:

> All
>   fibonacci numbers
>     fifth:     OK (2.13s)
>        63 ns ± 3.4 ns
>     tenth:     OK (1.71s)
>       809 ns ±  73 ns
>     twentieth: OK (3.39s)
>       104 μs ± 4.9 μs
>
> All 3 tests passed (7.25s)

The output says that, for instance, the first benchmark was repeatedly
executed for 2.13 seconds (wall-clock time), its predicted mean CPU time was
63 nanoseconds and means of individual samples do not often diverge from it
further than ±3.4 nanoseconds (double standard deviation). Take standard
deviation numbers with a grain of salt; there are lies, damned lies, and
statistics.

=== Wall-clock time vs. CPU time

What time are we talking about?
Both @criterion@ and @gauge@ by default report wall-clock time, which is
affected by any other application which runs concurrently.
Ideally benchmarks are executed on a dedicated server without any other load,
but — let's face the truth — most of developers run benchmarks
on a laptop with a hundred other services and a window manager, and
watch videos while waiting for benchmarks to finish. That's the cause
of a notorious "variance introduced by outliers: 88% (severely inflated)" warning.

To alleviate this issue @tasty-bench@ measures CPU time by 'getCPUTime'
instead of wall-clock time by default.
It does not provide a perfect isolation from other processes (e. g.,
if CPU cache is spoiled by others, populating data back from RAM
is your burden), but is a bit more stable.

Caveat: this means that for multithreaded algorithms
@tasty-bench@ reports total elapsed CPU time across all cores, while
@criterion@ and @gauge@ print maximum of core's wall-clock time.
It also means that by default @tasty-bench@ does not measure time spent out of process,
e. g., calls to other executables. To work around this limitation
use @--time-mode@ command-line option or set it locally via 'TimeMode' option.

=== Statistical model

Here is a procedure used by @tasty-bench@ to measure execution time:

1.  Set \( n \leftarrow 1 \).
2.  Measure execution time \( t_n \) of \( n \) iterations and execution time
    \( t_{2n} \) of \( 2n \) iterations.
3.  Find \( t \) which minimizes deviation of \( (nt, 2nt) \) from
    \( (t_n, t_{2n}) \), namely \( t \leftarrow (t_n + 2t_{2n}) / 5n \).
4.  If deviation is small enough (see @--stdev@ below)
    or time is running out soon (see @--timeout@ below),
    return \( t \) as a mean execution time.
5.  Otherwise set \( n \leftarrow 2n \) and jump back to Step 2.

This is roughly similar to the linear regression approach which
@criterion@ takes, but we fit only two last points. This allows us to
simplify away all heavy-weight statistical analysis. More importantly,
earlier measurements, which are presumably shorter and noisier, do not
affect overall result. This is in contrast to @criterion@, which fits
all measurements and is biased to use more data points corresponding to
shorter runs (it employs \( n \leftarrow 1.05n \) progression).

Mean time and its deviation does not say much about the
distribution of individual timings. E. g., imagine a computation which
(according to a coarse system timer) takes either 0 ms or 1 ms with equal
probability. While one would be able to establish that its mean time is 0.5 ms
with a very small deviation, this does not imply that individual measurements
are anywhere near 0.5 ms. Even assuming an infinite precision of a system
timer, the distribution of individual times is not known to be
<https://en.wikipedia.org/wiki/Normal_distribution normal>.

Obligatory disclaimer: statistics is a tricky matter, there is no
one-size-fits-all approach. In the absence of a good theory simplistic
approaches are as (un)sound as obscure ones. Those who seek statistical
soundness should rather collect raw data and process it themselves using
a proper statistical toolbox. Data reported by @tasty-bench@ is only of
indicative and comparative significance.

=== Memory usage

Configuring RTS to collect GC statistics
(e. g., via @cabal@ @bench@ @--benchmark-options@ @\'+RTS@ @-T\'@ or
@stack@ @bench@ @--ba@ @\'+RTS@ @-T\'@) enables @tasty-bench@ to estimate and
report memory usage:

> All
>   fibonacci numbers
>     fifth:     OK (2.13s)
>        63 ns ± 3.4 ns, 223 B  allocated,   0 B  copied, 2.0 MB peak memory
>     tenth:     OK (1.71s)
>       809 ns ±  73 ns, 2.3 KB allocated,   0 B  copied, 4.0 MB peak memory
>     twentieth: OK (3.39s)
>       104 μs ± 4.9 μs, 277 KB allocated,  59 B  copied, 5.0 MB peak memory
>
> All 3 tests passed (7.25s)

This data is reported as per 'RTSStats' fields: 'allocated_bytes', 'copied_bytes'
and 'max_mem_in_use_bytes'.

=== Combining tests and benchmarks

When optimizing an existing function, it is important to check that its
observable behavior remains unchanged. One can rebuild both tests and
benchmarks after each change, but it would be more convenient to run
sanity checks within benchmark itself. Since our benchmarks are
compatible with @tasty@ tests, we can easily do so.

Imagine you come up with a faster function @myFibo@ to generate
Fibonacci numbers:

> import Test.Tasty.Bench
> import Test.Tasty.QuickCheck -- from tasty-quickcheck package
>
> fibo :: Int -> Integer
> fibo n = if n < 2 then toInteger n else fibo (n - 1) + fibo (n - 2)
>
> myFibo :: Int -> Integer
> myFibo n = if n < 3 then toInteger n else myFibo (n - 1) + myFibo (n - 2)
>
> main :: IO ()
> main = Test.Tasty.Bench.defaultMain -- not Test.Tasty.defaultMain
>   [ bench "fibo   20" $ nf fibo   20
>   , bench "myFibo 20" $ nf myFibo 20
>   , testProperty "myFibo = fibo" $ \n -> fibo n === myFibo n
>   ]

This outputs:

> All
>   fibo   20:     OK (3.02s)
>     104 μs ± 4.9 μs
>   myFibo 20:     OK (1.99s)
>      71 μs ± 5.3 μs
>   myFibo = fibo: FAIL
>     *** Failed! Falsified (after 5 tests and 1 shrink):
>     2
>     1 /= 2
>     Use --quickcheck-replay=927711 to reproduce.
>
> 1 out of 3 tests failed (5.03s)

We see that @myFibo@ is indeed significantly faster than @fibo@, but
unfortunately does not do the same thing. One should probably look for
another way to speed up generation of Fibonacci numbers.

=== Troubleshooting

-   If benchmarks take too long, set @--timeout@ to limit execution time
    of individual benchmarks, and @tasty-bench@ will do its best to fit
    into a given time frame. Without @--timeout@ we rerun benchmarks until
    achieving a target precision set by @--stdev@, which in a noisy
    environment of a modern laptop with GUI may take a lot of time.

    While @criterion@ runs each benchmark at least for 5 seconds,
    @tasty-bench@ is happy to conclude earlier, if it does not
    compromise the quality of results. In our experiments @tasty-bench@
    suites tend to finish earlier, even if some individual benchmarks
    take longer than with @criterion@.

    A common source of noisiness is garbage collection. Setting a larger
    allocation area (/nursery/) is often a good idea, either via
    @cabal@ @bench@ @--benchmark-options@ @\'+RTS@ @-A32m\'@ or
    @stack@ @bench@ @--ba@ @\'+RTS@ @-A32m\'@. Alternatively bake it into @cabal@
    file as @ghc-options:@ @\"-with-rtsopts=-A32m\"@.

    For GHC ≥ 8.10 consider switching benchmarks to a non-moving garbage collector,
    because it decreases GC pauses and corresponding noise: @+RTS@ @--nonmoving-gc@.

-   Never compile benchmarks with @-fstatic-argument-transformation@, because it
    breaks a trick we use to force GHC into reevaluation of the same function application
    over and over again.

-   If benchmark results look malformed like below, make sure that you
    are invoking 'Test.Tasty.Bench.defaultMain' and not
    'Test.Tasty.defaultMain' (the difference is 'consoleBenchReporter'
    vs. 'consoleTestReporter'):

    > All
    >   fibo 20:       OK (1.46s)
    >     Response {respEstimate = Estimate {estMean = Measurement {measTime = 87496728, measAllocs = 0, measCopied = 0}, estStdev = 694487}, respIfSlower = FailIfSlower Infinity, respIfFaster = FailIfFaster Infinity}

-   If benchmarks fail with an error message

    > Unhandled resource. Probably a bug in the runner you're using.

    or

    > Unexpected state of the resource (NotCreated) in getResource. Report as a tasty bug.

    this is likely caused by 'env' or 'envWithCleanup' affecting
    benchmarks structure. You can use 'env' to read test data from 'IO',
    but not to read benchmark names or affect their hierarchy in other
    way. This is a fundamental restriction of @tasty@ to list and filter
    benchmarks without launching missiles.

-   If benchmarks fail with @Test dependencies form a loop@
    or @Test dependencies have cycles@, this is likely
    because of 'bcompare', which compares a benchmark with itself.
    Locating a benchmark in a global environment may be tricky, please refer to
    [@tasty@ documentation](https://github.com/UnkindPartition/tasty#patterns) for details
    and consider using 'locateBenchmark'.

=== Isolating interfering benchmarks

One difficulty of benchmarking in Haskell is that it is hard to isolate
benchmarks so that they do not interfere. Changing the order of
benchmarks or skipping some of them has an effect on heap’s layout and
thus affects garbage collection. This issue is well attested in
<https://github.com/haskell/criterion/issues/166 both>
<https://github.com/haskell/criterion/issues/60 criterion> and
<https://github.com/vincenthz/hs-gauge/issues/2 gauge>.

Usually (but not always) skipping some benchmarks speeds up remaining
ones. That’s because once a benchmark allocated heap which for some
reason was not promptly released afterwards (e. g., it forced a
top-level thunk in an underlying library), all further benchmarks are
slowed down by garbage collector processing this additional amount of
live data over and over again.

There are several mitigation strategies. First of all, giving garbage
collector more breathing space by @+RTS@ @-A32m@ (or more) is often good
enough.

Further, avoid using top-level bindings to store large test data. Once
such thunks are forced, they remain allocated forever, which affects
detrimentally subsequent unrelated benchmarks. Treat them as external
data, supplied via 'env': instead of

> largeData :: String
> largeData = replicate 1000000 'a'
>
> main :: IO ()
> main = defaultMain
>   [ bench "large" $ nf length largeData, ... ]

use

> import Control.DeepSeq (force)
> import Control.Exception (evaluate)
>
> main :: IO ()
> main = defaultMain
>   [ env (evaluate (force (replicate 1000000 'a'))) $ \largeData ->
>     bench "large" $ nf length largeData, ... ]

Finally, as an ultimate measure to reduce interference between
benchmarks, one can run each of them in a separate process. We do not
quite recommend this approach, but if you are desperate, here is how:

> cabal run -v0 all:benches -- -l | sed -e 's/[\"]/\\\\\\&/g' | while read -r name; do cabal run -v0 all:benches -- -p '$0 == "'"$name"'"'; done

This assumes that there is a single benchmark suite in the project
and that benchmark names do not contain newlines.

=== Comparison against baseline

One can compare benchmark results against an earlier run in an automatic way.

When using this feature, it's especially important to compile benchmarks with
@ghc-options:@ [@-fproc-alignment@](https://downloads.haskell.org/ghc/latest/docs/users_guide/debugging.html#ghc-flag--fproc-alignment)@=64@, otherwise results could be skewed by
intermittent changes in cache-line alignment.

Firstly, run @tasty-bench@ with
@--csv@ @FILE@ key to dump results to @FILE@ in CSV format
(it could be a good idea to set smaller @--stdev@, if possible):

> Name,Mean (ps),2*Stdev (ps)
> All.fibonacci numbers.fifth,48453,4060
> All.fibonacci numbers.tenth,637152,46744
> All.fibonacci numbers.twentieth,81369531,3342646

Now modify implementation and rerun benchmarks with @--baseline@ @FILE@
key. This produces a report as follows:

> All
>   fibonacci numbers
>     fifth:     OK (0.44s)
>        53 ns ± 2.7 ns,  8% more than baseline
>     tenth:     OK (0.33s)
>       641 ns ±  59 ns,       same as baseline
>     twentieth: OK (0.36s)
>        77 μs ± 6.4 μs,  5% less than baseline
>
> All 3 tests passed (1.50s)

You can also fail benchmarks, which deviate too far from baseline, using
@--fail-if-slower@ and @--fail-if-faster@ options. For example, setting
both of them to 6 will fail the first benchmark above (because it is
more than 6% slower), but the last one still succeeds (even while it is
measurably faster than baseline, deviation is less than 6%). Consider
also using @--hide-successes@ to show only problematic benchmarks, or
even [@tasty-rerun@](http://hackage.haskell.org/package/tasty-rerun)
package to focus on rerunning failing items only.

If you wish to compare two CSV reports non-interactively, here is a handy @awk@ incantation:

> awk 'BEGIN{FS=",";OFS=",";print "Name,Old,New,Ratio"}FNR==1{trueNF=NF;next}NF<trueNF{print "Benchmark names should not contain newlines";exit 1}FNR==NR{oldTime=$(NF-trueNF+2);NF-=trueNF-1;a[$0]=oldTime;next}{newTime=$(NF-trueNF+2);NF-=trueNF-1;print $0,a[$0],newTime,newTime/a[$0];gs+=log(newTime/a[$0]);gc++}END{if(gc>0)print "Geometric mean,,",exp(gs/gc)}' old.csv new.csv

Note that columns in CSV report are different from what @criterion@ or @gauge@
would produce. If names do not contain commas, missing columns can be faked this way:

> awk 'BEGIN{FS=",";OFS=",";print "Name,Mean,MeanLB,MeanUB,Stddev,StddevLB,StddevUB"}NR==1{trueNF=NF;next}NF<trueNF{print $0;next}{mean=$(NF-trueNF+2);stddev=$(NF-trueNF+3);NF-=trueNF-1;print $0,mean/1e12,mean/1e12,mean/1e12,stddev/2e12,stddev/2e12,stddev/2e12}'

To fake @gauge@ in @--csvraw@ mode use

> awk 'BEGIN{FS=",";OFS=",";print "name,iters,time,cycles,cpuTime,utime,stime,maxrss,minflt,majflt,nvcsw,nivcsw,allocated,numGcs,bytesCopied,mutatorWallSeconds,mutatorCpuSeconds,gcWallSeconds,gcCpuSeconds"}NR==1{trueNF=NF;next}NF<trueNF{print $0;next}{mean=$(NF-trueNF+2);fourth=$(NF-trueNF+4);fifth=$(NF-trueNF+5);sixth=$(NF-trueNF+6);NF-=trueNF-1;print $0,1,mean/1e12,0,mean/1e12,mean/1e12,0,sixth+0,0,0,0,0,fourth+0,0,fifth+0,0,0,0,0}'

=== Comparison between benchmarks

You can also compare benchmarks to each other without any
external tools, all in the comfort of your terminal.

> import Test.Tasty.Bench
>
> fibo :: Int -> Integer
> fibo n = if n < 2 then toInteger n else fibo (n - 1) + fibo (n - 2)
>
> main :: IO ()
> main = defaultMain
>   [ bgroup "fibonacci numbers"
>     [ bcompare "tenth"  $ bench "fifth"     $ nf fibo  5
>     ,                     bench "tenth"     $ nf fibo 10
>     , bcompare "tenth"  $ bench "twentieth" $ nf fibo 20
>     ]
>   ]

This produces a report, comparing mean times of @fifth@ and @twentieth@
to @tenth@:

> All
>   fibonacci numbers
>     fifth:     OK (16.56s)
>       121 ns ± 2.6 ns, 0.08x
>     tenth:     OK (6.84s)
>       1.6 μs ±  31 ns
>     twentieth: OK (6.96s)
>       203 μs ± 4.1 μs, 128.36x

To locate a baseline benchmark in a larger suite use 'locateBenchmark'.

One can leverage comparisons between benchmarks to implement portable performance
tests, expressing properties like "this algorithm must be at least twice faster
than that one" or "this operation should not be more than thrice slower than that".
This can be achieved with 'bcompareWithin', which takes an acceptable interval
of performance as an argument.

=== Plotting results

Users can dump results into CSV with @--csv@ @FILE@ and plot them using
@gnuplot@ or other software. But for convenience there is also a
built-in quick-and-dirty SVG plotting feature, which can be invoked by
passing @--svg@ @FILE@. Here is a sample of its output:

![Plotting](example.svg)

=== Build flags

Build flags are a brittle subject and users do not normally need to touch them.

* If you find yourself in an environment, where @tasty@ is not available and you
  have access to boot packages only, you can still use @tasty-bench@! Just copy
  @Test\/Tasty\/Bench.hs@ to your project (imagine it like a header-only C library).
  It will provide you with functions to build 'Benchmarkable' and run them manually
  via 'measureCpuTime'. This mode of operation can be also configured
  by disabling Cabal flag @tasty@.

* If results are amiss or oscillate wildly and adjusting @--timeout@ and @--stdev@
  does not help, you may be interested to investigate individual timings of
  successive runs by enabling Cabal flag @debug@. This will pipe raw data into @stderr@.

=== Command-line options

Use @--help@ to list command-line options.

[@-p@, @--pattern@]:

    This is a standard @tasty@ option, which allows filtering benchmarks
    by a pattern or @awk@ expression. Please refer
    to [@tasty@ documentation](https://github.com/UnkindPartition/tasty#patterns)
    for details.

[@-t@, @--timeout@]:

    This is a standard @tasty@ option, setting timeout for individual
    benchmarks in seconds. Use it when benchmarks tend to take too long:
    @tasty-bench@ will make an effort to report results (even if of
    subpar quality) before timeout. Setting timeout too tight
    (insufficient for at least three iterations) will result in a
    benchmark failure. One can adjust it locally for a group
    of benchmarks, e. g., 'localOption' ('mkTimeout' 100000000) for 100 seconds.

[@--stdev@]:

    Target relative standard deviation of measurements in percents (5%
    by default). Large values correspond to fast and loose benchmarks,
    and small ones to long and precise.
    It can also be adjusted locally for a group of benchmarks,
    e. g., 'localOption' ('RelStDev' 0.02).
    If benchmarking takes far too long, consider setting @--timeout@,
    which will interrupt benchmarks,
    potentially before reaching the target deviation.

[@--csv@]:

    File to write results in CSV format.

[@--baseline@]:

    File to read baseline results in CSV format (as produced by
    @--csv@).

[@--fail-if-slower@, @--fail-if-faster@]:

    Upper bounds of acceptable slow down \/ speed up in percents. If a
    benchmark is unacceptably slower \/ faster than baseline (see
    @--baseline@), it will be reported as failed. Can be used in
    conjunction with a standard @tasty@ option @--hide-successes@ to
    show only problematic benchmarks.
    Both options can be adjusted locally for a group of benchmarks,
    e. g., 'localOption' ('FailIfSlower' 0.10).

[@--svg@]:

    File to plot results in SVG format.

[@--time-mode@]:

    Whether to measure CPU time (@cpu@, default) or wall-clock time (@wall@).

[@+RTS@ @-T@]:

    Estimate and report memory usage.

=== Custom command-line options

As usual with @tasty@, it is easy to extend benchmarks with custom command-line options.
Here is an example:

> import Data.Proxy
> import Test.Tasty.Bench
> import Test.Tasty.Ingredients.Basic
> import Test.Tasty.Options
> import Test.Tasty.Runners
>
> newtype RandomSeed = RandomSeed Int
>
> instance IsOption RandomSeed where
>   defaultValue = RandomSeed 42
>   parseValue = fmap RandomSeed . safeRead
>   optionName = pure "seed"
>   optionHelp = pure "Random seed used in benchmarks"
>
> main :: IO ()
> main = do
>   let customOpts  = [Option (Proxy :: Proxy RandomSeed)]
>       ingredients = includingOptions customOpts : benchIngredients
>   opts <- parseOptions ingredients benchmarks
>   let RandomSeed seed = lookupOption opts
>   defaultMainWithIngredients ingredients benchmarks
>
> benchmarks :: Benchmark
> benchmarks = bgroup "All" []

-}

{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE CPP #-}
{-# LANGUAGE DeriveDataTypeable #-}
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TupleSections #-}

module Test.Tasty.Bench
  (
#ifdef MIN_VERSION_tasty
  -- * Running 'Benchmark'
    defaultMain
  , Benchmark
  , bench
  , bgroup
  , bcompare
  , bcompareWithin
  , env
  , envWithCleanup
  ,
#endif
  -- * Creating 'Benchmarkable'
    Benchmarkable(..)
  , nf
  , whnf
  , nfIO
  , whnfIO
  , nfAppIO
  , whnfAppIO
  , measureCpuTime
#ifdef MIN_VERSION_tasty
  -- * Ingredients
  , benchIngredients
  , consoleBenchReporter
  , csvReporter
  , svgReporter
  , RelStDev(..)
  , FailIfSlower(..)
  , FailIfFaster(..)
  , CsvPath(..)
  , BaselinePath(..)
  , SvgPath(..)
  , TimeMode(..)
  -- * Utils
  , locateBenchmark
  , mapLeafBenchmarks
#else
  , Timeout(..)
  , RelStDev(..)
#endif
  ) where

import Prelude hiding (Int, Integer)
import qualified Prelude
import Control.Applicative
import Control.Arrow (first, second)
import Control.DeepSeq (NFData, force)
import Control.Exception (bracket, evaluate)
import Control.Monad (void, unless, guard, (>=>), when)
import Data.Data (Typeable)
import Data.Foldable (foldMap, traverse_)
import Data.Int (Int64)
import Data.IORef
import Data.List (intercalate, stripPrefix, isPrefixOf, genericLength, genericDrop, foldl1')
import Data.Maybe (fromMaybe)
import Data.Monoid (All(..), Any(..))
import Data.Proxy
import Data.Traversable (forM)
import Data.Word (Word64)
import GHC.Conc
#if MIN_VERSION_base(4,5,0)
import GHC.IO.Encoding
#endif
#if MIN_VERSION_base(4,6,0)
import GHC.Stats
#endif
import System.CPUTime
import System.Exit
import System.IO
import System.IO.Unsafe
import System.Mem
import Text.Printf

#ifdef DEBUG
import Debug.Trace
#endif

#ifdef MIN_VERSION_tasty
#if !MIN_VERSION_base(4,8,0)
import Data.Monoid (Monoid(..))
#endif
#if MIN_VERSION_base(4,9,0)
import Data.Semigroup (Semigroup(..))
#endif
#if MIN_VERSION_containers(0,5,0)
import qualified Data.IntMap.Strict as IM
#else
import qualified Data.IntMap as IM
#endif
import Data.IntMap (IntMap)
import Data.Sequence (Seq, (<|))
import qualified Data.Sequence as Seq
import qualified Data.Set as S
import Test.Tasty hiding (defaultMain)
import qualified Test.Tasty
import Test.Tasty.Ingredients
import Test.Tasty.Ingredients.ConsoleReporter
import Test.Tasty.Options
import Test.Tasty.Patterns.Eval (eval, asB, withFields)
import Test.Tasty.Patterns.Types (Expr (And, Field, IntLit, NF, StringLit, Sub))
import qualified Test.Tasty.Patterns.Types as Patterns
import Test.Tasty.Providers
import Test.Tasty.Runners
#endif

#if defined(mingw32_HOST_OS)
import Data.Word (Word32)
#endif

#if MIN_VERSION_ghc_prim(0,3,1)
import GHC.Types (SPEC(..))
#else
import GHC.Exts (SpecConstrAnnotation(..))

data SPEC = SPEC | SPEC2
{-# ANN type SPEC ForceSpecConstr #-}
#endif

#ifndef MIN_VERSION_tasty
data Timeout
  = Timeout
    Prelude.Integer -- ^ number of microseconds (e. g., 200000)
    String          -- ^ textual representation (e. g., @"0.2s"@)
  | NoTimeout
  deriving (Show)
#endif


-- | In addition to @--stdev@ command-line option,
-- one can adjust target relative standard deviation
-- for individual benchmarks and groups of benchmarks
-- using 'adjustOption' and 'localOption'.
--
-- E. g., set target relative standard deviation to 2% as follows:
--
-- > import Test.Tasty (localOption)
-- > localOption (RelStDev 0.02) (bgroup [...])
--
-- If you set 'RelStDev' to infinity,
-- a benchmark will be executed
-- only once and its standard deviation will be recorded as zero.
-- This is rather a blunt approach, but it might be a necessary evil
-- for extremely long benchmarks. If you wish to run all benchmarks
-- only once, use command-line option @--stdev@ @Infinity@.
--
-- @since 0.2
newtype RelStDev = RelStDev Double
  deriving (Show, Read, Typeable)

-- | Whether to measure CPU time or wall-clock time.
-- Normally 'CpuTime' is a better option (and default),
-- but consider switching to 'WallTime'
-- to measure multithreaded algorithms or time spent in external processes.
--
-- One can switch the default measurement mode globally
-- using @--time-mode@ command-line option,
-- but it is usually better to adjust the mode locally:
--
-- > import Test.Tasty (localOption)
-- > localOption WallTime (bgroup [...])
--
-- section of your cabal file.
--
-- @since 0.3.2
data TimeMode = CpuTime
  -- ^ Measure CPU time.
#ifdef MIN_VERSION_tasty
  | WallTime
  -- ^ Measure wall-clock time.
#endif
  deriving (Typeable)

#ifdef MIN_VERSION_tasty
instance IsOption RelStDev where
  defaultValue = RelStDev 0.05
  parseValue = fmap RelStDev . parsePositivePercents
  optionName = pure "stdev"
  optionHelp = pure "Target relative standard deviation of measurements in percents (5 by default). Large values correspond to fast and loose benchmarks, and small ones to long and precise. If it takes far too long, consider setting --timeout, which will interrupt benchmarks, potentially before reaching the target deviation."

-- | In addition to @--fail-if-slower@ command-line option,
-- one can adjust an upper bound of acceptable slow down
-- in comparison to baseline for
-- individual benchmarks and groups of benchmarks
-- using 'adjustOption' and 'localOption'.
--
-- E. g., set upper bound of acceptable slow down to 10% as follows:
--
-- > import Test.Tasty (localOption)
-- > localOption (FailIfSlower 0.10) (bgroup [...])
--
-- @since 0.2
newtype FailIfSlower = FailIfSlower Double
  deriving (Show, Read, Typeable)

instance IsOption FailIfSlower where
  defaultValue = FailIfSlower (1.0 / 0.0)
  parseValue = fmap FailIfSlower . parsePositivePercents
  optionName = pure "fail-if-slower"
  optionHelp = pure "Upper bound of acceptable slow down in percents. If a benchmark is unacceptably slower than baseline (see --baseline), it will be reported as failed."

-- | In addition to @--fail-if-faster@ command-line option,
-- one can adjust an upper bound of acceptable speed up
-- in comparison to baseline for
-- individual benchmarks and groups of benchmarks
-- using 'adjustOption' and 'localOption'.
--
-- E. g., set upper bound of acceptable speed up to 10% as follows:
--
-- > import Test.Tasty (localOption)
-- > localOption (FailIfFaster 0.10) (bgroup [...])
--
-- @since 0.2
newtype FailIfFaster = FailIfFaster Double
  deriving (Show, Read, Typeable)

instance IsOption FailIfFaster where
  defaultValue = FailIfFaster (1.0 / 0.0)
  parseValue = fmap FailIfFaster . parsePositivePercents
  optionName = pure "fail-if-faster"
  optionHelp = pure "Upper bound of acceptable speed up in percents. If a benchmark is unacceptably faster than baseline (see --baseline), it will be reported as failed."

parsePositivePercents :: String -> Maybe Double
parsePositivePercents xs = do
  x <- safeRead xs
  guard (x > 0)
  pure (x / 100)

instance IsOption TimeMode where
  defaultValue = CpuTime
  parseValue v = case v of
    "cpu" -> Just CpuTime
    "wall" -> Just WallTime
    _ -> Nothing
  optionName = pure "time-mode"
  optionHelp = pure "Whether to measure CPU time (\"cpu\") or wall-clock time (\"wall\")"
#if MIN_VERSION_tasty(1,3,0)
  showDefaultValue m = Just $ case m of
    CpuTime -> "cpu"
    WallTime -> "wall"
#endif
#endif

-- | Something that can be benchmarked, produced by 'nf', 'whnf', 'nfIO', 'whnfIO',
-- 'nfAppIO', 'whnfAppIO' below.
--
-- Drop-in replacement for @Criterion.@'Criterion.Benchmarkable' and
-- @Gauge.@'Gauge.Benchmarkable'.
--
-- @since 0.1
newtype Benchmarkable =
    -- | @since 0.3
    Benchmarkable
  { unBenchmarkable :: Word64 -> IO () -- ^ Run benchmark given number of times.
  } deriving (Typeable)

#ifdef MIN_VERSION_tasty

-- | 'defaultMain' forces 'setLocaleEncoding' to 'utf8', but users might
-- be running benchmarks outside of it (e. g., via 'defaultMainWithIngredients').
supportsUnicode :: Bool
#if MIN_VERSION_base(4,5,0)
supportsUnicode = take 3 (textEncodingName enc) == "UTF"
#if defined(mingw32_HOST_OS)
  && unsafePerformIO getConsoleOutputCP == 65001
#endif
  where
    enc = unsafePerformIO getLocaleEncoding
#else
supportsUnicode = False
#endif
{-# NOINLINE supportsUnicode #-}

mu :: Char
mu = if supportsUnicode then 'μ' else 'u'

pm :: String
pm = if supportsUnicode then " ± " else " +-"

-- | Show picoseconds, fitting number in 3 characters.
showPicos3 :: Word64 -> String
showPicos3 i
  | t < 995   = printf "%3.0f ps" t
  | t < 995e1 = printf "%3.1f ns" (t / 1e3)
  | t < 995e3 = printf "%3.0f ns" (t / 1e3)
  | t < 995e4 = printf "%3.1f %cs" (t / 1e6) mu
  | t < 995e6 = printf "%3.0f %cs" (t / 1e6) mu
  | t < 995e7 = printf "%3.1f ms" (t / 1e9)
  | t < 995e9 = printf "%3.0f ms" (t / 1e9)
  | otherwise = printf "%4.2f s"  (t / 1e12)
  where
    t = word64ToDouble i

-- | Show picoseconds, fitting number in 4 characters.
showPicos4 :: Word64 -> String
showPicos4 i
  | t < 995   = printf "%3.0f  ps"  t
  | t < 995e1 = printf "%4.2f ns"  (t / 1e3)
  | t < 995e2 = printf "%4.1f ns"  (t / 1e3)
  | t < 995e3 = printf "%3.0f  ns" (t / 1e3)
  | t < 995e4 = printf "%4.2f %cs"  (t / 1e6) mu
  | t < 995e5 = printf "%4.1f %cs"  (t / 1e6) mu
  | t < 995e6 = printf "%3.0f  %cs" (t / 1e6) mu
  | t < 995e7 = printf "%4.2f ms"  (t / 1e9)
  | t < 995e8 = printf "%4.1f ms"  (t / 1e9)
  | t < 995e9 = printf "%3.0f  ms" (t / 1e9)
  | otherwise = printf "%4.3f s"   (t / 1e12)
  where
    t = word64ToDouble i

showBytes :: Word64 -> String
showBytes i
  | t < 1000                 = printf "%3.0f B " t
  | t < 10189                = printf "%3.1f KB" (t / 1024)
  | t < 1023488              = printf "%3.0f KB" (t / 1024)
  | t < 10433332             = printf "%3.1f MB" (t / 1048576)
  | t < 1048051712           = printf "%3.0f MB" (t / 1048576)
  | t < 10683731149          = printf "%3.1f GB" (t / 1073741824)
  | t < 1073204953088        = printf "%3.0f GB" (t / 1073741824)
  | t < 10940140696372       = printf "%3.1f TB" (t / 1099511627776)
  | t < 1098961871962112     = printf "%3.0f TB" (t / 1099511627776)
  | t < 11202704073084108    = printf "%3.1f PB" (t / 1125899906842624)
  | t < 1125336956889202624  = printf "%3.0f PB" (t / 1125899906842624)
  | t < 11471568970838126592 = printf "%3.1f EB" (t / 1152921504606846976)
  | otherwise                = printf "%3.0f EB" (t / 1152921504606846976)
  where
    t = word64ToDouble i
#endif

data Measurement = Measurement
  { measTime   :: !Word64 -- ^ time in picoseconds
  , measAllocs :: !Word64 -- ^ allocations in bytes
  , measCopied :: !Word64 -- ^ copied bytes
  , measMaxMem :: !Word64 -- ^ max memory in use
  } deriving (Show, Read)

data Estimate = Estimate
  { estMean  :: !Measurement
  , estStdev :: !Word64  -- ^ stdev in picoseconds
  } deriving (Show, Read)

#ifdef MIN_VERSION_tasty

data WithLoHi a = WithLoHi
  !a      -- payload
  !Double -- lower bound (e. g., 0.9 for -10% speedup)
  !Double -- upper bound (e. g., 1.2 for +20% slowdown)
  deriving (Show, Read)

prettyEstimate :: Estimate -> String
prettyEstimate (Estimate m stdev) =
  showPicos4 (measTime m)
  ++ (if stdev == 0 then "         " else pm ++ showPicos3 (2 * stdev))

prettyEstimateWithGC :: Estimate -> String
prettyEstimateWithGC (Estimate m stdev) =
  showPicos4 (measTime m)
  ++ (if stdev == 0 then ",          " else pm ++ showPicos3 (2 * stdev) ++ ", ")
  ++ showBytes (measAllocs m) ++ " allocated, "
  ++ showBytes (measCopied m) ++ " copied, "
  ++ showBytes (measMaxMem m) ++ " peak memory"

csvEstimate :: Estimate -> String
csvEstimate (Estimate m stdev) = show (measTime m) ++ "," ++ show (2 * stdev)

csvEstimateWithGC :: Estimate -> String
csvEstimateWithGC (Estimate m stdev) = show (measTime m) ++ "," ++ show (2 * stdev)
  ++ "," ++ show (measAllocs m) ++ "," ++ show (measCopied m) ++ "," ++ show (measMaxMem m)
#endif

predict
  :: Measurement -- ^ time for one run
  -> Measurement -- ^ time for two runs
  -> Estimate
predict (Measurement t1 a1 c1 m1) (Measurement t2 a2 c2 m2) = Estimate
  { estMean  = Measurement t (fit a1 a2) (fit c1 c2) (max m1 m2)
  , estStdev = truncate (sqrt d :: Double)
  }
  where
    fit x1 x2 = x1 `quot` 5 + 2 * (x2 `quot` 5)
    t = fit t1 t2
    sqr x = x * x
    d = sqr (word64ToDouble t1 -     word64ToDouble t)
      + sqr (word64ToDouble t2 - 2 * word64ToDouble t)

predictPerturbed :: Measurement -> Measurement -> Estimate
predictPerturbed t1 t2 = Estimate
  { estMean = estMean (predict t1 t2)
  , estStdev = max
    (estStdev (predict (lo t1) (hi t2)))
    (estStdev (predict (hi t1) (lo t2)))
  }
  where
    prec = max (fromInteger cpuTimePrecision) 1000000000 -- 1 ms
    hi meas = meas { measTime = measTime meas + prec }
    lo meas = meas { measTime = measTime meas - prec }

hasGCStats :: Bool
#if MIN_VERSION_base(4,10,0)
hasGCStats = unsafePerformIO getRTSStatsEnabled
#elif MIN_VERSION_base(4,6,0)
hasGCStats = unsafePerformIO getGCStatsEnabled
#else
hasGCStats = False
#endif

getAllocsAndCopied :: IO (Word64, Word64, Word64)
getAllocsAndCopied = do
  if not hasGCStats then pure (0, 0, 0) else
#if MIN_VERSION_base(4,10,0)
    (\s -> (allocated_bytes s, copied_bytes s, max_mem_in_use_bytes s)) <$> getRTSStats
#elif MIN_VERSION_base(4,6,0)
    (\s -> (int64ToWord64 $ bytesAllocated s, int64ToWord64 $ bytesCopied s, int64ToWord64 $ peakMegabytesAllocated s * 1024 * 1024)) <$> getGCStats
#else
    pure (0, 0, 0)
#endif

getTimePicoSecs :: TimeMode -> IO Word64
getTimePicoSecs timeMode = case timeMode of
  CpuTime -> fromInteger <$> getCPUTime
#ifdef MIN_VERSION_tasty
  WallTime -> round . (1e12 *) <$> getTime
#endif

measure :: TimeMode -> Word64 -> Benchmarkable -> IO Measurement
measure timeMode n (Benchmarkable act) = do
  let getTimePicoSecs' = getTimePicoSecs timeMode
  performGC
  startTime <- getTimePicoSecs'
  (startAllocs, startCopied, startMaxMemInUse) <- getAllocsAndCopied
  act n
  endTime <- getTimePicoSecs'
  (endAllocs, endCopied, endMaxMemInUse) <- getAllocsAndCopied
  let meas = Measurement
        { measTime   = endTime - startTime
        , measAllocs = endAllocs - startAllocs
        , measCopied = endCopied - startCopied
        , measMaxMem = max endMaxMemInUse startMaxMemInUse
        }
#ifdef DEBUG
  pure $ trace (show n ++ (if n == 1 then " iteration gives " else " iterations give ") ++ show meas) meas
#else
  pure meas
#endif

measureUntil :: TimeMode -> Bool -> Timeout -> RelStDev -> Benchmarkable -> IO Estimate
measureUntil timeMode _ _ (RelStDev targetRelStDev) b
  | isInfinite targetRelStDev, targetRelStDev > 0 = do
  t1 <- measure timeMode 1 b
  pure $ Estimate { estMean = t1, estStdev = 0 }
measureUntil timeMode warnIfNoTimeout timeout (RelStDev targetRelStDev) b = do
  t1 <- measure' 1 b
  go 1 t1 0
  where
    measure' = measure timeMode

    go :: Word64 -> Measurement -> Word64 -> IO Estimate
    go n t1 sumOfTs = do
      t2 <- measure' (2 * n) b

      let Estimate (Measurement meanN allocN copiedN maxMemN) stdevN = predictPerturbed t1 t2
          isTimeoutSoon = case timeout of
            NoTimeout -> False
            -- multiplying by 12/10 helps to avoid accidental timeouts
            Timeout micros _ -> (sumOfTs' + 3 * measTime t2) `quot` (1000000 * 10 `quot` 12) >= fromInteger micros
          isStDevInTargetRange = stdevN < truncate (max 0 targetRelStDev * word64ToDouble meanN)
          scale = (`quot` n)
          sumOfTs' = sumOfTs + measTime t1

      case timeout of
        NoTimeout | warnIfNoTimeout, sumOfTs' + measTime t2 > 100 * 1000000000000
          -> hPutStrLn stderr "This benchmark takes more than 100 seconds. Consider setting --timeout, if this is unexpected (or to silence this warning)."
        _ -> pure ()

      if isStDevInTargetRange || isTimeoutSoon
        then pure $ Estimate
          { estMean  = Measurement (scale meanN) (scale allocN) (scale copiedN) maxMemN
          , estStdev = scale stdevN }
        else go (2 * n) t2 sumOfTs'

-- | An internal routine to measure CPU execution time in seconds
-- for a given timeout (put 'NoTimeout', or 'mkTimeout' 100000000 for 100 seconds)
-- and a target relative standard deviation
-- (put 'RelStDev' 0.05 for 5% or 'RelStDev' (1/0) to run only one iteration).
--
-- 'Timeout' takes soft priority over 'RelStDev': this function prefers
-- to finish in time even if at cost of precision. However, timeout is guidance
-- not guarantee: 'measureCpuTime' can take longer, if there is not enough time
-- to run at least thrice or an iteration takes unusually long.
--
-- @since 0.3
measureCpuTime :: Timeout -> RelStDev -> Benchmarkable -> IO Double
measureCpuTime
    = ((fmap ((/ 1e12) . word64ToDouble . measTime . estMean) .) .)
    . measureUntil CpuTime False

#ifdef MIN_VERSION_tasty

instance IsTest Benchmarkable where
  testOptions = pure
    [ Option (Proxy :: Proxy RelStDev)
    -- FailIfSlower and FailIfFaster must be options of a test provider rather
    -- than options of an ingredient to allow setting them on per-test level.
    , Option (Proxy :: Proxy FailIfSlower)
    , Option (Proxy :: Proxy FailIfFaster)
    , Option (Proxy :: Proxy TimeMode)
    ]
  run opts b = const $ case getNumThreads (lookupOption opts) of
    1 -> do
      let timeMode = lookupOption opts
      est <- measureUntil timeMode True (lookupOption opts) (lookupOption opts) b
      let FailIfSlower ifSlower = lookupOption opts
          FailIfFaster ifFaster = lookupOption opts
      pure $ testPassed $ show (WithLoHi est (1 - ifFaster) (1 + ifSlower))
    _ -> pure $ testFailed "Benchmarks must not be run concurrently. Please pass -j1 and/or avoid +RTS -N."

-- | Attach a name to 'Benchmarkable'.
--
-- This is actually a synonym of 'Test.Tasty.Providers.singleTest' to
-- provide an interface compatible with @Criterion.@'Criterion.bench'
-- and @Gauge.@'Gauge.bench'.
--
-- @since 0.1
bench :: String -> Benchmarkable -> Benchmark
bench = singleTest

-- | Attach a name to a group of 'Benchmark'.
--
-- This is actually a synonym of 'Test.Tasty.testGroup' to provide an
-- interface compatible with @Criterion.@'Criterion.bgroup' and
-- @Gauge@.'Gauge.bgroup'.
--
-- @since 0.1
bgroup :: String -> [Benchmark] -> Benchmark
bgroup = testGroup

-- | Compare benchmarks, reporting relative speed up or slow down.
--
-- This function is a vague reminiscence of @bcompare@, which existed in pre-1.0
-- versions of @criterion@, but their types are incompatible. Under the hood
-- 'bcompare' is a thin wrapper over 'after'.
--
-- Here is a basic example:
--
-- > import Test.Tasty.Bench
-- >
-- > fibo :: Int -> Integer
-- > fibo n = if n < 2 then toInteger n else fibo (n - 1) + fibo (n - 2)
-- >
-- > main :: IO ()
-- > main = defaultMain
-- >   [ bgroup "fibonacci numbers"
-- >     [ bcompare "tenth"  $ bench "fifth"     $ nf fibo  5
-- >     ,                     bench "tenth"     $ nf fibo 10
-- >     , bcompare "tenth"  $ bench "twentieth" $ nf fibo 20
-- >     ]
-- >   ]
--
-- More complex examples:
--
-- * https://hackage.haskell.org/package/chimera-0.3.2.0/src/bench/Bench.hs
-- * https://hackage.haskell.org/package/fast-digits-0.3.1.0/src/bench/Bench.hs
-- * https://hackage.haskell.org/package/unicode-data-0.3.0/src/bench/Main.hs
--
-- @since 0.2.4
bcompare
  :: String
  -- ^ @tasty@ pattern, which must unambiguously
  -- match a unique baseline benchmark. Consider using 'locateBenchmark' to construct it.
  -> Benchmark
  -- ^ Benchmark (or a group of benchmarks)
  -- to be compared against the baseline benchmark by dividing measured mean times.
  -- The result is reported by 'consoleBenchReporter', e. g., 0.50x or 1.25x.
  -> Benchmark
bcompare = bcompareWithin (-1/0) (1/0)

-- | Same as 'bcompare', but takes expected lower and upper bounds of
-- comparison. If the result is not within provided bounds, benchmark fails.
-- This allows to create portable performance tests: instead of comparing
-- to an absolute timeout or to previous runs, you can state that one implementation
-- of an algorithm must be faster than another.
--
-- E. g., 'bcompareWithin' 2.0 3.0 passes only if a benchmark is at least 2x
-- and at most 3x slower than a baseline.
--
-- @since 0.3.1
bcompareWithin
  :: Double    -- ^ Lower bound of relative speed up.
  -> Double    -- ^ Upper bound of relative spped up.
  -> String    -- ^ @tasty@ pattern to locate a baseline benchmark.
  -> Benchmark -- ^ Benchmark to compare against baseline.
  -> Benchmark
bcompareWithin lo hi s = case parseExpr s of
  Nothing -> error $ "Could not parse bcompare pattern " ++ s
  Just e  -> after_ AllSucceed (And (StringLit (bcomparePrefix ++ show (lo, hi))) e)

bcomparePrefix :: String
bcomparePrefix = "tasty-bench"

-- | Benchmarks are actually just a regular 'Test.Tasty.TestTree' in disguise.
--
-- This is a drop-in replacement for @Criterion.@'Criterion.Benchmark'
-- and @Gauge.@'Gauge.Benchmark'.
--
-- @since 0.1
type Benchmark = TestTree

-- | Run benchmarks and report results, providing an interface
-- compatible with @Criterion.@'Criterion.defaultMain' and
-- @Gauge.@'Gauge.defaultMain'.
--
-- @since 0.1
defaultMain :: [Benchmark] -> IO ()
defaultMain bs = do
  let act = Test.Tasty.defaultMainWithIngredients benchIngredients $ testGroup "All" bs
#if MIN_VERSION_base(4,5,0)
  setLocaleEncoding utf8
#endif
#if defined(mingw32_HOST_OS)
  codePage <- getConsoleOutputCP
  bracket (setConsoleOutputCP 65001) (const $ setConsoleOutputCP codePage) (const act)
#else
  act
#endif

-- | List of default benchmark ingredients. This is what 'defaultMain' runs.
--
-- @since 0.2
benchIngredients :: [Ingredient]
benchIngredients = [listingTests, composeReporters consoleBenchReporter (composeReporters csvReporter svgReporter)]

#endif

funcToBench :: forall a b c. (b -> c) -> (a -> b) -> a -> Benchmarkable
funcToBench frc = (Benchmarkable .) . benchLoop SPEC
  where
    -- Here we rely on the fact that GHC (unless spurred by
    -- -fstatic-argument-transformation) is not smart enough:
    -- it does not notice that `f` and `x` arguments are loop invariant
    -- and could be floated, and the whole `f x` expression shared.
    -- If we create a closure with `f` and `x` bound in the environment,
    -- then GHC is smart enough to share computation of `f x`.
    --
    -- For perspective, gauge and criterion < 1.4 mark similar functions as INLINE,
    -- while criterion >= 1.4 switches to NOINLINE.
    -- If we mark `benchLoop` NOINLINE then benchmark results are slightly larger
    -- (noticeable in bench-fibo), because the loop body is slightly bigger,
    -- since GHC does not unbox numbers or inline `Eq @Word64` dictionary.
    --
    -- This function is called `benchLoop` instead of, say, `go`,
    -- so it is easier to spot in Core dumps.
    --
    -- Forcing SpecConst optimization with SPEC makes the behaviour of benchmarks
    -- independent of -fspec-constr-count.
    benchLoop :: SPEC -> (a -> b) -> a -> Word64 -> IO ()
    benchLoop !_ f x n
      | n == 0    = pure ()
      | otherwise = do
        _ <- evaluate (frc (f x))
        benchLoop SPEC f x (n - 1)
{-# INLINE funcToBench #-}

-- | 'nf' @f@ @x@ measures time to compute
-- a normal form (by means of 'force', not 'Control.DeepSeq.rnf')
-- of an application of @f@ to @x@.
-- This does not include time to evaluate @f@ or @x@ themselves.
-- Ideally @x@ should be a primitive data type like 'Data.Int.Int'.
--
-- The same thunk of @x@ is shared by multiple calls of @f@. We cannot evaluate
-- @x@ beforehand: there is no 'NFData' @a@ constraint, and potentially @x@ may
-- be an infinite structure. Thus @x@ will be evaluated in course of the first
-- application of @f@. This noisy measurement is to be discarded soon,
-- but if @x@ is not a primitive data type, consider forcing its evaluation
-- separately, e. g., via 'env' or 'withResource'.
--
-- Here is a textbook anti-pattern: 'nf' 'sum' @[1..1000000]@.
-- Since an input list is shared by multiple invocations of 'sum',
-- it will be allocated in memory in full, putting immense pressure
-- on garbage collector. Also no list fusion will happen.
-- A better approach is 'nf' (@\\n@ @->@ 'sum' @[1..n]@) @1000000@.
--
-- If you are measuring an inlinable function,
-- it is prudent to ensure that its invocation is fully saturated,
-- otherwise inlining will not happen. That's why one can often
-- see 'nf' (@\\n@ @->@ @f@ @n@) @x@ instead of 'nf' @f@ @x@.
-- Same applies to rewrite rules.
--
-- While @tasty-bench@ is capable to perform micro- and even nanobenchmarks,
-- such measurements are noisy and involve an overhead. Results are more reliable
-- when @f@ @x@ takes at least several milliseconds.
--
-- Remember that forcing a normal form requires an additional
-- traverse of the structure. In certain scenarios (imagine benchmarking 'tail'),
-- especially when 'NFData' instance is badly written,
-- this traversal may take non-negligible time and affect results.
--
-- 'nf' @f@ is equivalent to 'whnf' ('force' '.' @f@), but not to
-- 'whnf' ('Control.DeepSeq.rnf' '.' @f@). The former retains the result
-- in memory until it is fully evaluated, while the latter allows
-- evaluated parts of the result to be garbage-collected immediately.
--
-- Drop-in replacement for @Criterion.@'Criterion.nf' and
-- @Gauge.@'Gauge.nf'.
--
-- @since 0.1
nf :: NFData b => (a -> b) -> a -> Benchmarkable
nf = funcToBench force
{-# INLINE nf #-}

-- | 'whnf' @f@ @x@ measures time to compute
-- a weak head normal form of an application of @f@ to @x@.
-- This does not include time to evaluate @f@ or @x@ themselves.
-- Ideally @x@ should be a primitive data type like 'Data.Int.Int'.
--
-- The same thunk of @x@ is shared by multiple calls of @f@. We cannot evaluate
-- @x@ beforehand: there is no 'NFData' @a@ constraint, and potentially @x@ may
-- be an infinite structure. Thus @x@ will be evaluated in course of the first
-- application of @f@. This noisy measurement is to be discarded soon,
-- but if @x@ is not a primitive data type, consider forcing its evaluation
-- separately, e. g., via 'env' or 'withResource'.
--
-- Computing only a weak head normal form is
-- rarely what intuitively is meant by "evaluation".
-- Beware that many educational materials contain examples with 'whnf':
-- this is a wrong default.
-- Unless you understand precisely, what is measured,
-- it is recommended to use 'nf' instead.
--
-- Here is a textbook anti-pattern: 'whnf' ('Data.List.replicate' @1000000@) @1@.
-- This will succeed in a matter of nanoseconds, because weak head
-- normal form forces only the first element of the list.
--
-- Drop-in replacement for @Criterion.@'Criterion.whnf' and @Gauge.@'Gauge.whnf'.
--
-- @since 0.1
whnf :: (a -> b) -> a -> Benchmarkable
whnf = funcToBench id
{-# INLINE whnf #-}

ioToBench :: (b -> c) -> IO b -> Benchmarkable
ioToBench frc act = Benchmarkable go
  where
    go n
      | n == 0    = pure ()
      | otherwise = do
        val <- act
        _ <- evaluate (frc val)
        go (n - 1)
{-# INLINE ioToBench #-}

-- | 'nfIO' @x@ measures time to evaluate side-effects of @x@
-- and compute its normal form (by means of 'force', not 'Control.DeepSeq.rnf').
--
-- Pure subexpression of an effectful computation @x@
-- may be evaluated only once and get cached.
-- To avoid surprising results it is usually preferable
-- to use 'nfAppIO' instead.
--
-- Remember that forcing a normal form requires an additional
-- traverse of the structure. In certain scenarios,
-- especially when 'NFData' instance is badly written,
-- this traversal may take non-negligible time and affect results.
--
-- A typical use case is 'nfIO' ('readFile' @"foo.txt"@).
-- However, if your goal is not to benchmark I\/O per se,
-- but just read input data from a file, it is cleaner to
-- use 'env' or 'withResource'.
--
-- Drop-in replacement for @Criterion.@'Criterion.nfIO' and @Gauge.@'Gauge.nfIO'.
--
-- @since 0.1
nfIO :: NFData a => IO a -> Benchmarkable
nfIO = ioToBench force
{-# INLINE nfIO #-}

-- | 'whnfIO' @x@ measures time to evaluate side-effects of @x@
-- and compute its weak head normal form.
--
-- Pure subexpression of an effectful computation @x@
-- may be evaluated only once and get cached.
-- To avoid surprising results it is usually preferable
-- to use 'whnfAppIO' instead.
--
-- Computing only a weak head normal form is
-- rarely what intuitively is meant by "evaluation".
-- Unless you understand precisely, what is measured,
-- it is recommended to use 'nfIO' instead.
--
-- Lazy I\/O is treacherous.
-- If your goal is not to benchmark I\/O per se,
-- but just read input data from a file, it is cleaner to
-- use 'env' or 'withResource'.
--
-- Drop-in replacement for @Criterion.@'Criterion.whnfIO' and @Gauge.@'Gauge.whnfIO'.
--
-- @since 0.1
whnfIO :: IO a -> Benchmarkable
whnfIO = ioToBench id
{-# INLINE whnfIO #-}

ioFuncToBench :: (b -> c) -> (a -> IO b) -> a -> Benchmarkable
ioFuncToBench frc = (Benchmarkable .) . go
  where
    go f x n
      | n == 0    = pure ()
      | otherwise = do
        val <- f x
        _ <- evaluate (frc val)
        go f x (n - 1)
{-# INLINE ioFuncToBench #-}

-- | 'nfAppIO' @f@ @x@ measures time to evaluate side-effects of
-- an application of @f@ to @x@
-- and compute its normal form (by means of 'force', not 'Control.DeepSeq.rnf').
-- This does not include time to evaluate @f@ or @x@ themselves.
-- Ideally @x@ should be a primitive data type like 'Data.Int.Int'.
--
-- The same thunk of @x@ is shared by multiple calls of @f@. We cannot evaluate
-- @x@ beforehand: there is no 'NFData' @a@ constraint, and potentially @x@ may
-- be an infinite structure. Thus @x@ will be evaluated in course of the first
-- application of @f@. This noisy measurement is to be discarded soon,
-- but if @x@ is not a primitive data type, consider forcing its evaluation
-- separately, e. g., via 'env' or 'withResource'.
--
-- Remember that forcing a normal form requires an additional
-- traverse of the structure. In certain scenarios,
-- especially when 'NFData' instance is badly written,
-- this traversal may take non-negligible time and affect results.
--
-- A typical use case is 'nfAppIO' 'readFile' @"foo.txt"@.
-- However, if your goal is not to benchmark I\/O per se,
-- but just read input data from a file, it is cleaner to
-- use 'env' or 'withResource'.
--
-- Drop-in replacement for @Criterion.@'Criterion.nfAppIO' and @Gauge.@'Gauge.nfAppIO'.
--
-- @since 0.1
nfAppIO :: NFData b => (a -> IO b) -> a -> Benchmarkable
nfAppIO = ioFuncToBench force
{-# INLINE nfAppIO #-}

-- | 'whnfAppIO' @f@ @x@ measures time to evaluate side-effects of
-- an application of @f@ to @x@
-- and compute its weak head normal form.
-- This does not include time to evaluate @f@ or @x@ themselves.
-- Ideally @x@ should be a primitive data type like 'Data.Int.Int'.
--
-- The same thunk of @x@ is shared by multiple calls of @f@. We cannot evaluate
-- @x@ beforehand: there is no 'NFData' @a@ constraint, and potentially @x@ may
-- be an infinite structure. Thus @x@ will be evaluated in course of the first
-- application of @f@. This noisy measurement is to be discarded soon,
-- but if @x@ is not a primitive data type, consider forcing its evaluation
-- separately, e. g., via 'env' or 'withResource'.
--
-- Computing only a weak head normal form is
-- rarely what intuitively is meant by "evaluation".
-- Unless you understand precisely, what is measured,
-- it is recommended to use 'nfAppIO' instead.
--
-- Lazy I\/O is treacherous.
-- If your goal is not to benchmark I\/O per se,
-- but just read input data from a file, it is cleaner to
-- use 'env' or 'withResource'.
--
-- Drop-in replacement for @Criterion.@'Criterion.whnfAppIO' and @Gauge.@'Gauge.whnfAppIO'.
--
-- @since 0.1
whnfAppIO :: (a -> IO b) -> a -> Benchmarkable
whnfAppIO = ioFuncToBench id
{-# INLINE whnfAppIO #-}

#ifdef MIN_VERSION_tasty

-- | Run benchmarks in the given environment, usually reading large input data from file.
--
-- One might wonder why 'env' is needed,
-- when we can simply read all input data
-- before calling 'defaultMain'. The reason is that large data
-- dangling in the heap causes longer garbage collection
-- and slows down all benchmarks, even those which do not use it at all.
--
-- It is instrumental not only for proper 'IO' actions,
-- but also for a large statically-known data as well. Instead of a top-level
-- definition, which once evaluated will slow down garbage collection
-- during all subsequent benchmarks,
--
-- > largeData :: String
-- > largeData = replicate 1000000 'a'
-- >
-- > main :: IO ()
-- > main = defaultMain
-- >   [ bench "large" $ nf length largeData, ... ]
--
-- use
--
-- > import Control.DeepSeq (force)
-- > import Control.Exception (evaluate)
-- >
-- > main :: IO ()
-- > main = defaultMain
-- >   [ env (evaluate (force (replicate 1000000 'a'))) $ \largeData ->
-- >     bench "large" $ nf length largeData, ... ]
--
-- @Test.Tasty.Bench.@'env' is provided only for the sake of
-- compatibility with @Criterion.@'Criterion.env' and
-- @Gauge.@'Gauge.env', and involves 'unsafePerformIO'. Consider using
-- 'withResource' instead.
--
-- 'defaultMain' requires that the hierarchy of benchmarks and their names is
-- independent of underlying 'IO' actions. While executing 'IO' inside 'bench'
-- via 'nfIO' is fine, and reading test data from files via 'env' is also fine,
-- using 'env' to choose benchmarks or their names depending on 'IO' side effects
-- will throw a rather cryptic error message:
--
-- > Unhandled resource. Probably a bug in the runner you're using.
--
-- @since 0.2
env :: NFData env => IO env -> (env -> Benchmark) -> Benchmark
env res = envWithCleanup res (const $ pure ())

-- | Similar to 'env', but includes an additional argument
-- to clean up created environment.
--
-- Provided only for the sake of compatibility with
-- @Criterion.@'Criterion.envWithCleanup' and
-- @Gauge.@'Gauge.envWithCleanup', and involves
-- 'unsafePerformIO'. Consider using 'withResource' instead.
--
-- @since 0.2
envWithCleanup :: NFData env => IO env -> (env -> IO a) -> (env -> Benchmark) -> Benchmark
envWithCleanup res fin f = withResource
  (res >>= evaluate . force)
  (void . fin)
  (f . unsafePerformIO)

-- | A path to write results in CSV format, populated by @--csv@.
--
-- This is an option of 'csvReporter' and can be set only globally.
-- Modifying it via 'adjustOption' or 'localOption' does not have any effect.
-- One can however pass it to 'tryIngredients' 'benchIngredients'. For example,
-- here is how to set a default CSV location:
--
-- @
-- import Data.Maybe
-- import System.Exit
-- import Test.Tasty.Bench
-- import Test.Tasty.Ingredients
-- import Test.Tasty.Options
-- import Test.Tasty.Runners
--
-- main :: IO ()
-- main = do
--   let benchmarks = bgroup \"All\" ...
--   opts <- parseOptions benchIngredients benchmarks
--   let opts' = changeOption (Just . fromMaybe (CsvPath "foo.csv")) opts
--   case tryIngredients benchIngredients opts' benchmarks of
--     Nothing -> exitFailure
--     Just mb -> mb >>= \\b -> if b then exitSuccess else exitFailure
-- @
--
-- @since 0.3
newtype CsvPath = CsvPath FilePath
  deriving (Typeable)

instance IsOption (Maybe CsvPath) where
  defaultValue = Nothing
  parseValue = Just . Just . CsvPath
  optionName = pure "csv"
  optionHelp = pure "File to write results in CSV format"

-- | Run benchmarks and save results in CSV format.
-- It activates when @--csv@ @FILE@ command line option is specified.
--
-- @since 0.1
csvReporter :: Ingredient
csvReporter = TestReporter [Option (Proxy :: Proxy (Maybe CsvPath))] $
  \opts tree -> do
    CsvPath path <- lookupOption opts
    let names = testsNames opts tree
        namesMap = IM.fromDistinctAscList $ zip [0..] names
    pure $ \smap -> do
      case findNonUniqueElement names of
        Nothing -> pure ()
        Just name -> do -- 'die' is not available before base-4.8
          hPutStrLn stderr $ "CSV report cannot proceed, because name '" ++ name ++ "' corresponds to two or more benchmarks. Please disambiguate them."
          exitFailure
      let augmented = IM.intersectionWith (,) namesMap smap
      bracket
        (do
          h <- openFile path WriteMode
          hSetBuffering h LineBuffering
          hPutStrLn h $ "Name,Mean (ps),2*Stdev (ps)" ++
            (if hasGCStats then ",Allocated,Copied,Peak Memory" else "")
          pure h
        )
        hClose
        (`csvOutput` augmented)
      pure $ const $ isSuccessful smap

findNonUniqueElement :: Ord a => [a] -> Maybe a
findNonUniqueElement = go S.empty
  where
    go _ [] = Nothing
    go acc (x : xs)
      | x `S.member` acc = Just x
      | otherwise = go (S.insert x acc) xs

csvOutput :: Handle -> IntMap (TestName, TVar Status) -> IO ()
csvOutput h = traverse_ $ \(name, tv) -> do
  let csv = if hasGCStats then csvEstimateWithGC else csvEstimate
  r <- atomically $ readTVar tv >>= \s -> case s of Done r -> pure r; _ -> retry
  case safeRead (resultDescription r) of
    Nothing -> pure ()
    Just (WithLoHi est _ _) -> do
      msg <- formatMessage $ csv est
      hPutStrLn h (encodeCsv name ++ ',' : msg)

encodeCsv :: String -> String
encodeCsv xs
  | any (`elem` xs) ",\"\n\r"
  = '"' : go xs -- opening quote
  | otherwise = xs
  where
    go [] = '"' : [] -- closing quote
    go ('"' : ys) = '"' : '"' : go ys
    go (y : ys) = y : go ys

-- | A path to plot results in SVG format, populated by @--svg@.
--
-- This is an option of 'svgReporter' and can be set only globally.
-- Modifying it via 'adjustOption' or 'localOption' does not have any effect.
-- One can however pass it to 'tryIngredients' 'benchIngredients'.
--
-- @since 0.3
newtype SvgPath = SvgPath FilePath
  deriving (Typeable)

instance IsOption (Maybe SvgPath) where
  defaultValue = Nothing
  parseValue = Just . Just . SvgPath
  optionName = pure "svg"
  optionHelp = pure "File to plot results in SVG format"

-- | Run benchmarks and plot results in SVG format.
-- It activates when @--svg@ @FILE@ command line option is specified.
--
-- @since 0.2.4
svgReporter :: Ingredient
svgReporter = TestReporter [Option (Proxy :: Proxy (Maybe SvgPath))] $
  \opts tree -> do
    SvgPath path <- lookupOption opts
    let names = testsNames opts tree
        namesMap = IM.fromDistinctAscList $ zip [0..] names
    pure $ \smap -> do
      ref <- newIORef []
      svgCollect ref (IM.intersectionWith (,) namesMap smap)
      res <- readIORef ref
      writeFile path (svgRender (reverse res))
      pure $ const $ isSuccessful smap

isSuccessful :: StatusMap -> IO Bool
isSuccessful = go . IM.elems
  where
    go [] = pure True
    go (tv : tvs) = do
      b <- atomically $ readTVar tv >>= \s -> case s of Done r -> pure (resultSuccessful r); _ -> retry
      if b then go tvs else pure False

svgCollect :: IORef [(TestName, Estimate)] -> IntMap (TestName, TVar Status) -> IO ()
svgCollect ref = traverse_ $ \(name, tv) -> do
  r <- atomically $ readTVar tv >>= \s -> case s of Done r -> pure r; _ -> retry
  case safeRead (resultDescription r) of
    Nothing -> pure ()
    Just (WithLoHi est _ _) -> modifyIORef ref ((name, est) :)

svgRender :: [(TestName, Estimate)] -> String
svgRender [] = ""
svgRender pairs = header ++ concat (zipWith
  (\i (name, est) -> svgRenderItem i l xMax (dropAllPrefix name) est)
  [0..]
  pairs) ++ footer
  where
    dropAllPrefix
      | all (("All." `isPrefixOf`) . fst) pairs = drop 4
      | otherwise = id

    l = genericLength pairs
    findMaxX (Estimate m stdev) = measTime m + 2 * stdev
    xMax = word64ToDouble $ maximum $ minBound : map (findMaxX . snd) pairs
    header = printf "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"%i\" width=\"%f\" font-size=\"%i\" font-family=\"sans-serif\" stroke-width=\"2\">\n<g transform=\"translate(%f 0)\">\n" (svgItemOffset l - 15) svgCanvasWidth svgFontSize svgCanvasMargin
    footer = "</g>\n</svg>\n"

svgCanvasWidth :: Double
svgCanvasWidth = 960

svgCanvasMargin :: Double
svgCanvasMargin = 10

svgItemOffset :: Word64 -> Word64
svgItemOffset i = 22 + 55 * i

svgFontSize :: Word64
svgFontSize = 16

svgRenderItem :: Word64 -> Word64 -> Double -> TestName -> Estimate -> String
svgRenderItem i iMax xMax name est@(Estimate m stdev) =
  (if genericLength shortTextContent * glyphWidth < boxWidth then longText else shortText) ++ box
  where
    y  = svgItemOffset i
    y' = y  + (svgFontSize * 3) `quot` 8
    y1 = y' + whiskerMargin
    y2 = y' + boxHeight `quot` 2
    y3 = y' + boxHeight - whiskerMargin
    x1 = boxWidth - whiskerWidth
    x2 = boxWidth + whiskerWidth
    deg = (i * 360) `quot` iMax
    glyphWidth = word64ToDouble svgFontSize / 2

    scale w       = word64ToDouble w * (svgCanvasWidth - 2 * svgCanvasMargin) / xMax
    boxWidth      = scale (measTime m)
    whiskerWidth  = scale (2 * stdev)
    boxHeight     = 22
    whiskerMargin = 5

    box = printf boxTemplate
      (prettyEstimate est)
      y' boxHeight boxWidth deg deg
      deg
      x1 x2 y2 y2
      x1 x1 y1 y3
      x2 x2 y1 y3
    boxTemplate
      =  "<g>\n<title>%s</title>\n"
      ++ "<rect y=\"%i\" rx=\"5\" height=\"%i\" width=\"%f\" fill=\"hsl(%i, 100%%, 80%%)\" stroke=\"hsl(%i, 100%%, 55%%)\" />\n"
      ++ "<g stroke=\"hsl(%i, 100%%, 40%%)\">"
      ++ "<line x1=\"%f\" x2=\"%f\" y1=\"%i\" y2=\"%i\" />\n"
      ++ "<line x1=\"%f\" x2=\"%f\" y1=\"%i\" y2=\"%i\" />\n"
      ++ "<line x1=\"%f\" x2=\"%f\" y1=\"%i\" y2=\"%i\" />\n"
      ++ "</g>\n</g>\n"

    longText = printf longTextTemplate
      deg
      y (encodeSvg name)
      y boxWidth (showPicos4 (measTime m))
    longTextTemplate
      =  "<g fill=\"hsl(%i, 100%%, 40%%)\">\n"
      ++ "<text y=\"%i\">%s</text>\n"
      ++ "<text y=\"%i\" x=\"%f\" text-anchor=\"end\">%s</text>\n"
      ++ "</g>\n"

    shortTextContent  = encodeSvg name ++ " " ++ showPicos4 (measTime m)
    shortText         = printf shortTextTemplate deg y shortTextContent
    shortTextTemplate = "<text fill=\"hsl(%i, 100%%, 40%%)\" y=\"%i\">%s</text>\n"

encodeSvg :: String -> String
encodeSvg [] = []
encodeSvg ('<' : xs) = '&' : 'l' : 't' : ';' : encodeSvg xs
encodeSvg ('&' : xs) = '&' : 'a' : 'm' : 'p' : ';' : encodeSvg xs
encodeSvg (x : xs) = x : encodeSvg xs

-- | A path to read baseline results in CSV format, populated by @--baseline@.
--
-- This is an option of 'csvReporter' and can be set only globally.
-- Modifying it via 'adjustOption' or 'localOption' does not have any effect.
-- One can however pass it to 'tryIngredients' 'benchIngredients'.
--
-- @since 0.3
newtype BaselinePath = BaselinePath FilePath
  deriving (Typeable)

instance IsOption (Maybe BaselinePath) where
  defaultValue = Nothing
  parseValue = Just . Just . BaselinePath
  optionName = pure "baseline"
  optionHelp = pure "File with baseline results in CSV format to compare against"

-- | Run benchmarks and report results
-- in a manner similar to 'consoleTestReporter'.
--
-- If @--baseline@ @FILE@ command line option is specified,
-- compare results against an earlier run and mark
-- too slow / too fast benchmarks as failed in accordance to
-- bounds specified by @--fail-if-slower@ @PERCENT@ and @--fail-if-faster@ @PERCENT@.
--
-- @since 0.2
consoleBenchReporter :: Ingredient
consoleBenchReporter = modifyConsoleReporter [Option (Proxy :: Proxy (Maybe BaselinePath))] $ \opts -> do
  baseline <- case lookupOption opts of
    Nothing -> pure S.empty
    Just (BaselinePath path) -> S.fromList . joinQuotedFields . lines <$> (readFile path >>= evaluate . force)
  let pretty = if hasGCStats then prettyEstimateWithGC else prettyEstimate
  pure $ \name mDepR r -> case safeRead (resultDescription r) of
    Nothing  -> r
    Just (WithLoHi est lowerBound upperBound) ->
      (if isAcceptable then id else forceFail)
      r { resultDescription = pretty est ++ bcompareMsg ++ formatSlowDown mSlowDown }
      where
        isAcceptable = isAcceptableVsBaseline && isAcceptableVsBcompare
        mSlowDown = compareVsBaseline baseline name est
        slowDown = fromMaybe 1 mSlowDown
        isAcceptableVsBaseline = slowDown >= lowerBound && slowDown <= upperBound
        (isAcceptableVsBcompare, bcompareMsg) = case mDepR of
          Nothing -> (True, "")
          Just (WithLoHi depR depLowerBound depUpperBound) -> case safeRead (resultDescription depR) of
            Nothing -> (True, "")
            Just (WithLoHi depEst _ _) -> let ratio = estTime est / estTime depEst in
              ( ratio >= depLowerBound && ratio <= depUpperBound
              , printf ", %.2fx" ratio
              )

-- | A well-formed CSV entry contains an even number of quotes: 0, 2 or more.
joinQuotedFields :: [String] -> [String]
joinQuotedFields [] = []
joinQuotedFields (x : xs)
  | areQuotesBalanced x = x : joinQuotedFields xs
  | otherwise = case span areQuotesBalanced xs of
    (_, [])      -> [] -- malformed CSV
    (ys, z : zs) -> unlines (x : ys ++ [z]) : joinQuotedFields zs
  where
    areQuotesBalanced = even . length . filter (== '"')

estTime :: Estimate -> Double
estTime = word64ToDouble . measTime . estMean

compareVsBaseline :: S.Set String -> TestName -> Estimate -> Maybe Double
compareVsBaseline baseline name (Estimate m stdev) = case mOld of
  Nothing -> Nothing
  Just (oldTime, oldDoubleSigma)
    -- time and oldTime must be signed integers to use 'abs'
    | abs (time - oldTime) < max (2 * word64ToInt64 stdev) oldDoubleSigma -> Just 1
    | otherwise -> Just $ int64ToDouble time / int64ToDouble oldTime
  where
    time = word64ToInt64 $ measTime m

    mOld :: Maybe (Int64, Int64)
    mOld = do
      let prefix = encodeCsv name ++ ","
      (line, furtherLines) <- S.minView $ snd $ S.split prefix baseline

      case S.minView furtherLines of
        Nothing -> pure ()
        Just (nextLine, _) -> case stripPrefix prefix nextLine of
          Nothing -> pure ()
          -- If there are several lines matching prefix, skip them all.
          -- Should not normally happen, 'csvReporter' prohibits repeating test names.
          Just{}  -> Nothing

      (timeCell, ',' : rest) <- span (/= ',') <$> stripPrefix prefix line
      let doubleSigmaCell = takeWhile (/= ',') rest
      (,) <$> safeRead timeCell <*> safeRead doubleSigmaCell

formatSlowDown :: Maybe Double -> String
formatSlowDown Nothing = ""
formatSlowDown (Just ratio) = case percents `compare` 0 of
  LT -> printf ", %2i%% less than baseline" (-percents)
  EQ -> ",       same as baseline"
  GT -> printf ", %2i%% more than baseline" percents
  where
    percents :: Int64
    percents = truncate ((ratio - 1) * 100)

forceFail :: Result -> Result
forceFail r = r { resultOutcome = Failure TestFailed, resultShortDescription = "FAIL" }

data Unique a = None | Unique !a | NotUnique
  deriving (Functor)

appendUnique :: Unique a -> Unique a -> Unique a
appendUnique None a = a
appendUnique a None = a
appendUnique _ _ = NotUnique

#if MIN_VERSION_base(4,9,0)
instance Semigroup (Unique a) where
  (<>) = appendUnique
#endif

instance Monoid (Unique a) where
  mempty = None
#if MIN_VERSION_base(4,9,0)
  mappend = (<>)
#else
  mappend = appendUnique
#endif

modifyConsoleReporter
    :: [OptionDescription]
    -> (OptionSet -> IO (TestName -> Maybe (WithLoHi Result) -> Result -> Result))
    -> Ingredient
modifyConsoleReporter desc' iof = TestReporter (desc ++ desc') $ \opts tree ->
  let nameSeqs     = IM.fromDistinctAscList $ zip [0..] $ testNameSeqs opts tree
      namesAndDeps = IM.fromDistinctAscList $ zip [0..] $ map (second isSingle)
                   $ testNamesAndDeps nameSeqs opts tree
      modifySMap   = (iof opts >>=) . flip postprocessResult
                   . IM.intersectionWith (\(a, b) c -> (a, b, c)) namesAndDeps
  in (modifySMap >=>) <$> cb opts tree
  where
    (desc, cb) = case consoleTestReporter of
      TestReporter d c -> (d, c)
      _ -> error "modifyConsoleReporter: consoleTestReporter must be TestReporter"

    isSingle (Unique a) = Just a
    isSingle _ = Nothing

testNameSeqs :: OptionSet -> TestTree -> [Seq TestName]
testNameSeqs = foldTestTree trivialFold
  { foldSingle = const $ const . (:[]) . Seq.singleton
#if MIN_VERSION_tasty(1,4,0)
  , foldGroup  = const $ map . (<|)
#else
  , foldGroup  = map . (<|)
#endif
  }

testNamesAndDeps :: IntMap (Seq TestName) -> OptionSet -> TestTree -> [(TestName, Unique (WithLoHi IM.Key))]
testNamesAndDeps im = foldTestTree trivialFold
  { foldSingle = const $ const . (: []) . (, mempty)
#if MIN_VERSION_tasty(1,4,0)
  , foldGroup  = const $ map . first . (++) . (++ ".")
  , foldAfter  = const foldDeps
#else
  , foldGroup  = map . first . (++) . (++ ".")
  , foldAfter  = foldDeps
#endif
  }
  where
    foldDeps :: DependencyType -> Expr -> [(a, Unique (WithLoHi IM.Key))] -> [(a, Unique (WithLoHi IM.Key))]
    foldDeps AllSucceed (And (StringLit xs) p)
      | bcomparePrefix `isPrefixOf` xs
      , Just (lo :: Double, hi :: Double) <- safeRead $ drop (length bcomparePrefix) xs
      = map $ second $ mappend $ (\x -> WithLoHi x lo hi) <$> findMatchingKeys im p
    foldDeps _ _ = id

findMatchingKeys :: IntMap (Seq TestName) -> Expr -> Unique IM.Key
findMatchingKeys im pattern =
  foldMap (\(k, v) -> if withFields v pat == Right True then Unique k else mempty) $ IM.assocs im
  where
    pat = eval pattern >>= asB

postprocessResult
    :: (TestName -> Maybe (WithLoHi Result) -> Result -> Result)
    -> IntMap (TestName, Maybe (WithLoHi IM.Key), TVar Status)
    -> IO StatusMap
postprocessResult f src = do
  paired <- forM src $ \(name, mDepId, tv) -> (name, mDepId, tv,) <$> newTVarIO NotStarted
  let doUpdate = atomically $ do
        (Any anyUpdated, All allDone) <-
          getApp $ flip foldMap paired $ \(name, mDepId, newTV, oldTV) -> Ap $ do
            old <- readTVar oldTV
            case old of
              Done{} -> pure (Any False, All True)
              _ -> do
                new <- readTVar newTV
                case new of
                  Done res -> do

                    depRes <- case mDepId of
                      Nothing -> pure Nothing
                      Just (WithLoHi depId lo hi) -> case IM.lookup depId src of
                        Nothing -> pure Nothing
                        Just (_, _, depTV) -> do
                          depStatus <- readTVar depTV
                          case depStatus of
                            Done dep -> pure $ Just (WithLoHi dep lo hi)
                            _ -> pure Nothing

                    writeTVar oldTV (Done (f name depRes res))
                    pure (Any True, All True)
                  -- ignoring Progress nodes, we do not report any
                  -- it would be helpful to have instance Eq Progress
                  _ -> pure (Any False, All False)
        if anyUpdated || allDone then pure allDone else retry
      adNauseam = doUpdate >>= (`unless` adNauseam)
  _ <- forkIO adNauseam
  pure $ fmap (\(_, _, _, a) -> a) paired

int64ToDouble :: Int64 -> Double
int64ToDouble = fromIntegral

word64ToInt64 :: Word64 -> Int64
word64ToInt64 = fromIntegral

#endif

word64ToDouble :: Word64 -> Double
word64ToDouble = fromIntegral

#if !MIN_VERSION_base(4,10,0) && MIN_VERSION_base(4,6,0)
int64ToWord64 :: Int64 -> Word64
int64ToWord64 = fromIntegral
#endif


#if defined(mingw32_HOST_OS)

#if defined(i386_HOST_ARCH)
#define CCONV stdcall
#else
#define CCONV ccall
#endif

foreign import CCONV unsafe "windows.h GetConsoleOutputCP" getConsoleOutputCP :: IO Word32
foreign import CCONV unsafe "windows.h SetConsoleOutputCP" setConsoleOutputCP :: Word32 -> IO ()

#endif

#ifdef MIN_VERSION_tasty

-- | Map leaf benchmarks ('bench', not 'bgroup') with a provided function,
-- which has an access to leaf's reversed path.
--
-- This helper is useful for bulk application of 'bcompare'.
-- See also 'locateBenchmark'.
--
-- Real world examples:
--
-- * https://hackage.haskell.org/package/chimera-0.3.3.0/src/bench/Bench.hs
-- * https://hackage.haskell.org/package/text-builder-linear-0.1/src/bench/Main.hs
--
-- @since 0.3.2
mapLeafBenchmarks :: ([String] -> Benchmark -> Benchmark) -> Benchmark -> Benchmark
mapLeafBenchmarks processLeaf = go mempty
  where
    go :: [String] -> Benchmark -> Benchmark
    go path x = case x of
      SingleTest name t    -> processLeaf (name : path) (SingleTest name t)
      TestGroup name tts   -> TestGroup name (map (go (name : path))  tts)
      PlusTestOptions g tt -> PlusTestOptions g (go path tt)
      WithResource res f   -> WithResource res (go path . f)
      AskOptions f         -> AskOptions (go path . f)
      After dep expr tt    -> After dep expr (go path tt)

-- | Construct an AWK expression to locate an individual element or elements in 'Benchmark'
-- by the suffix of the path. Names are listed in reverse order:
-- from 'bench'\'s own name to a name of the outermost 'bgroup'.
--
-- This function is meant to be used in conjunction with 'bcompare', e. g.,
-- 'bcompare' ('Test.Tasty.Patterns.Printer.printAwkExpr' ('locateBenchmark' @path@)).
-- See also 'mapLeafBenchmarks'.
--
-- Real world examples:
--
-- * https://hackage.haskell.org/package/chimera-0.3.3.0/src/bench/Bench.hs
-- * https://hackage.haskell.org/package/text-builder-linear-0.1/src/bench/Main.hs
--
-- @since 0.3.2
locateBenchmark :: [String] -> Expr
locateBenchmark [] = IntLit 1
locateBenchmark path
  = foldl1' And
  $ zipWith (\i name -> Patterns.EQ (Field (Sub NF (IntLit i))) (StringLit name)) [0..] path

#endif