approx-rand-test

Introduction

This Haskell package provides a module and utilities to perform paired and unpaired approximate randomization tests.

Approximate randomization tests rely on a simple premise: given a test statistic, if the null-hypothesis (the samples do not differ) is true, we can randomly swap values between samples without an (extreme) impact on the test statistic. Otherwise, the null-hypothesis must be rejected.

The test works by generating a given number of sample shuffles and computing the test statistic for each shuffle. If r is the number of shuffled samples where the test statistic is at least as high as the test statistic applied on the original samples; and N the number of shuffles, then the null-hypothesis is rejected iff (r + 1):(N + 1) < p-value (for one-sided tests).

The included command-line utilities can perform randomization tests and draw histograms of the test statistic for the randomized samples.

Installation

Install the necessary dependencies, and run:

cabal install

The command-line utilities also have support for making histograms. By default, this uses the diagrams backend, which can produce EPS and SVG files. The Cairo backend can produce more files (such as PNG and PDF), but may be more difficult to install on some systems. To compile the package with Cairo support, use:

cabal install -fwithCairo

Usage

Documentation for the Haskell module can be read through Haddock. The package also provides two utilities: approx_rand_test and approx_rand_test_paired. Both utilities provide nearly the same options. The first is for unpaired tests, the latter for paired tests.

The format for samples is simple: use one value per line. Three samples are provided in the examples directory. The three samples contain evaluation scores of fluency ranking components:

ngram.scores: Scores of an n-gram language model
fluency.scores: Scores of a feature-based fluency ranking model
reversible.scores: Scores of a reversible model (a model that can be used in parsing and generation)

We can now use pair-wise test utility to see that the evaluation scores of the n-gram language model and the feature-based fluency model differ significantly:

% approx_rand_test_paired -i 10000 -p 0.05 examples/ngram.scores examples/fluency.scores
Iterations: 10000
Sample size: 1621
Test statistic: -0.030646088066079557
Test type: TwoTailed
Test significance: 0.05
Tail significance: 0.025
Significant: 0.00009999000099990002

Here we generate 10,000 shuffled samples, with a significance level of p = 0.05. Likewise, we can compare the scores of feature-based fluency model and the reversible model:

% approx_rand_test_paired -i 10000 -p 0.05 examples/fluency.scores examples/reversible.scores 
Iterations: 10000
Sample size: 1621
Test statistic: 0.0032431465344367667
Test type: TwoTailed
Test significance: 0.05
Tail significance: 0.025
Not significant: 0.0273972602739726

In this case, the samples do not differ significantly.

Both utilities can also draw the distribution of test scores of the randomized samples and how it relates to the test score of the original samples:

% approx_rand_test_paired -h -i 10000 -p 0.05 examples/fluency.scores examples/reversible.scores 
Iterations: 10000
Sample size: 1621
Test type: TwoTailed
Test significance: 0.05
Tail significance: 0.025
Test statistic: 0.0032431465344367667
Not significant: 0.025997400259974

   -4.402e-3 | █
   -3.794e-3 | ████
   -3.187e-3 | ████████
   -2.579e-3 | ███████████████
   -1.972e-3 | █████████████████████████
   -1.364e-3 | ███████████████████████████████████
   -7.567e-4 | ███████████████████████████████████████████████
   -1.492e-4 | ███████████████████████████████████████████████████
    4.583e-4 | ██████████████████████████████████████████████████
    1.066e-3 | ███████████████████████████████████████
    1.673e-3 | ███████████████████████████████
    2.281e-3 | ██████████████████████
    2.888e-3 | ███████████
    3.496e-3 | ✣✣✣✣✣✣
    4.103e-3 | ██
    4.711e-3 | █

Or, if you prefer, you can create a chart in a format such as SVG:

% approx_rand_test_paired -w chart.svg -i 10000 -p 0.05 examples/fluency.scores examples/reversible.scores

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
examples		examples
src/Statistics/Test		src/Statistics/Test
tests		tests
utils		utils
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
Setup.lhs		Setup.lhs
approx-rand-test.cabal		approx-rand-test.cabal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples

examples

src/Statistics/Test

src/Statistics/Test

tests

tests

utils

utils

.travis.yml

.travis.yml

LICENSE

LICENSE

README.md

README.md

Setup.lhs

Setup.lhs

approx-rand-test.cabal

approx-rand-test.cabal

Repository files navigation

approx-rand-test

Introduction

Installation

Usage

About

Releases

Packages

Languages

License

danieldk/approx-rand-test

Folders and files

Latest commit

History

Repository files navigation

approx-rand-test

Introduction

Installation

Usage

About

Resources

License

Stars

Watchers

Forks

Languages