Failing tests #42

Open
ivan-m opened this Issue Nov 6, 2012 · 4 comments

Projects

None yet

2 participants

@ivan-m
ivan-m commented Nov 6, 2012

Two possibly different things at work here:

  1. When running the tests with LANG=C (e.g. via a distribution's build system), the tests fail almost instantly:

    Running 1 test suites...
    Test suite tests: RUNNING...
    Tests for all distributions:
      Tests for: BetaDistribution:
        C.D.F. sanity: [OK, passed 100 tests]
        CDF limit at +tests: <stdout>: commitBuffer: invalid argument (invalid character)
    Test suite tests: FAIL
    Test suite logged to: dist/test/statistics-0.10.2.0-tests.log
    0 of 1 test suites (0 of 1 test cases) passed.
    

    Could this be from the infinity symbol?

  2. When running it with my usual UTF-8 based locale, the "Quantile is CDF inverse" tests all failed:

    Running 1 test suites...
    Test suite tests: RUNNING...
    Tests for all distributions:
      Tests for: BetaDistribution:
        C.D.F. sanity: [OK, passed 100 tests]
        CDF limit at +∞: [OK, passed 100 tests]
        CDF limit at -∞: [OK, passed 100 tests]
        CDF is nondecreasing: [OK, passed 100 tests]
        1-CDF is correct: [OK, passed 100 tests]
        PDF sanity: [OK, passed 100 tests]
        Quantile is CDF inverse: [Failed]
    Falsifiable with seed 7414117163583754365, after 69 tests. Reason: BD {bdAlpha = 8.406843987606653, bdBeta = 0.4269461043937311}
    1510.9852388084596
    Quantile     = 0.9999952255644082
    Probability  = 0.9852388084595987
    Probability' = 0.9852388084596589
    Error        = 6.0285110237146e-14
        quantile fails p<0||p>1: [OK, passed 100 tests]
      Tests for: CauchyDistribution:
        C.D.F. sanity: [OK, passed 100 tests]
        CDF limit at +∞: [OK, passed 100 tests]
        CDF limit at -∞: [OK, passed 100 tests]
        CDF is nondecreasing: [OK, passed 100 tests]
        1-CDF is correct: [OK, passed 100 tests]
        PDF sanity: [OK, passed 100 tests]
        Quantile is CDF inverse: [OK, passed 100 tests]
        quantile fails p<0||p>1: [OK, passed 100 tests]
      Tests for: ChiSquared:
        C.D.F. sanity: [OK, passed 100 tests]
        CDF limit at +∞: [OK, passed 100 tests]
        CDF limit at -∞: [OK, passed 100 tests]
        CDF is nondecreasing: [OK, passed 100 tests]
        1-CDF is correct: [OK, passed 100 tests]
        PDF sanity: [OK, passed 100 tests]
        Quantile is CDF inverse: [Failed]
    Falsifiable with seed 4208265792671652991, after 51 tests. Reason: ChiSquared 92
    144.62333347082813
    Quantile     = 95.64486955846601
    Probability  = 0.6233334708281291
    Probability' = 0.6233334708281137
    Error        = 1.532107773982716e-14
        quantile fails p<0||p>1: [OK, passed 100 tests]
      Tests for: ExponentialDistribution:
        C.D.F. sanity: [OK, passed 100 tests]
        CDF limit at +∞: [OK, passed 100 tests]
        CDF limit at -∞: [OK, passed 100 tests]
        CDF is nondecreasing: [OK, passed 100 tests]
        1-CDF is correct: [OK, passed 100 tests]
        PDF sanity: [OK, passed 100 tests]
        Quantile is CDF inverse: [OK, passed 100 tests]
        quantile fails p<0||p>1: [OK, passed 100 tests]
      Tests for: GammaDistribution:
        C.D.F. sanity: [OK, passed 100 tests]
        CDF limit at +∞: [OK, passed 100 tests]
        CDF limit at -∞: [OK, passed 100 tests]
        CDF is nondecreasing: [OK, passed 100 tests]
        1-CDF is correct: [OK, passed 100 tests]
        PDF sanity: [OK, passed 100 tests]
        Quantile is CDF inverse: [OK, passed 100 tests]
        quantile fails p<0||p>1: [OK, passed 100 tests]
      Tests for: NormalDistribution:
        C.D.F. sanity: [OK, passed 100 tests]
        CDF limit at +∞: [OK, passed 100 tests]
        CDF limit at -∞: [OK, passed 100 tests]
        CDF is nondecreasing: [OK, passed 100 tests]
        1-CDF is correct: [OK, passed 100 tests]
        PDF sanity: [OK, passed 100 tests]
        Quantile is CDF inverse: [Failed]
    Falsifiable with seed -8041988097717865771, after 37 tests. Reason: ND {mean = 2.3201475383520176, stdDev = 497.49045920814746, ndPdfDenom = 1247.0236514103028, ndCdfDenom = 703.5577545633812}
    124.31668308499577
    Quantile     = -234.97997317574743
    Probability  = 0.31668308499577336
    Probability' = 0.31668308499575337
    Error        = 1.9984014443252818e-14
        quantile fails p<0||p>1: [OK, passed 100 tests]
      Tests for: UniformDistribution:
        C.D.F. sanity: [OK, passed 100 tests]
        CDF limit at +∞: [OK, passed 100 tests]
        CDF limit at -∞: [OK, passed 100 tests]
        CDF is nondecreasing: [OK, passed 100 tests]
        1-CDF is correct: [OK, passed 100 tests]
        PDF sanity: [OK, passed 100 tests]
        Quantile is CDF inverse: [OK, passed 100 tests]
        quantile fails p<0||p>1: [OK, passed 100 tests]
      Tests for: StudentT:
        C.D.F. sanity: [OK, passed 100 tests]
        CDF limit at +∞: [OK, passed 100 tests]
        CDF limit at -∞: [OK, passed 100 tests]
        CDF is nondecreasing: [OK, passed 100 tests]
        1-CDF is correct: [OK, passed 100 tests]
        PDF sanity: [OK, passed 100 tests]
        Quantile is CDF inverse: [Failed]
    Falsifiable with seed -8496255333126878786, after 44 tests. Reason: StudentT {studentTndf = 205.34925970355764}
    28.50694788712507
    Quantile     = 1.7437873772404973e-2
    Probability  = 0.5069478871250688
    Probability' = 0.5069478871246982
    Error        = 3.7059244561987725e-13
        quantile fails p<0||p>1: [OK, passed 100 tests]
      Tests for: FDistribution:
        C.D.F. sanity: [OK, passed 100 tests]
        CDF limit at +∞: [Failed]
    Falsifiable with seed 5934505402744235409, after 90 tests. Reason: F {fDistributionNDF1 = 2.9744846725671688e16, fDistributionNDF2 = 3.974722746925967e16, _pdfFactor = 1.347450939929375e18}
    Last elements: [1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0]
        CDF limit at -∞: [OK, passed 100 tests]
        CDF is nondecreasing: [OK, passed 100 tests]
        1-CDF is correct: [OK, passed 100 tests]
        PDF sanity: [OK, passed 100 tests]
        Quantile is CDF inverse: [Failed]
    Falsifiable with seed 8937924613887352287, after 28 tests. Reason: F {fDistributionNDF1 = 60027.0, fDistributionNDF2 = 21539.0, _pdfFactor = 461225.35985645605}
    24.44359239191443
    Quantile     = 0.9984277157803625
    Probability  = 0.4435923919144287
    Probability' = 0.4435923919164872
    Error        = 2.058520021108734e-12
        quantile fails p<0||p>1: [OK, passed 100 tests]
      Tests for: BinomialDistribution:
        C.D.F. sanity: [OK, passed 100 tests]
        CDF limit at +∞: [OK, passed 100 tests]
        CDF limit at -∞: [OK, passed 100 tests]
        CDF is nondecreasing: [OK, passed 100 tests]
        1-CDF is correct: [OK, passed 100 tests]
        Prob. sanity: [OK, passed 100 tests]
        CDF is sum of prob.: [OK, passed 100 tests]
      Tests for: GeometricDistribution:
        C.D.F. sanity: [OK, passed 100 tests]
        CDF limit at +∞: [OK, passed 100 tests]
        CDF limit at -∞: [OK, passed 100 tests]
        CDF is nondecreasing: [OK, passed 100 tests]
        1-CDF is correct: [OK, passed 100 tests]
        Prob. sanity: [OK, passed 100 tests]
        CDF is sum of prob.: [OK, passed 100 tests]
      Tests for: HypergeometricDistribution:
        C.D.F. sanity: [OK, passed 100 tests]
        CDF limit at +∞: [OK, passed 100 tests]
        CDF limit at -∞: [OK, passed 100 tests]
        CDF is nondecreasing: [OK, passed 100 tests]
        1-CDF is correct: [OK, passed 100 tests]
        Prob. sanity: [OK, passed 100 tests]
        CDF is sum of prob.: [OK, passed 100 tests]
      Tests for: PoissonDistribution:
        C.D.F. sanity: [OK, passed 100 tests]
        CDF limit at +∞: [OK, passed 100 tests]
        CDF limit at -∞: [OK, passed 100 tests]
        CDF is nondecreasing: [OK, passed 100 tests]
        1-CDF is correct: [OK, passed 100 tests]
        Prob. sanity: [OK, passed 100 tests]
        CDF is sum of prob.: [OK, passed 100 tests]
      Unit tests:
        density (gammaDistr 150 1/150) 1 == 4.883311: [OK]
        density (studentT 0.3) 1.34 ≈ 0.0648215: [OK]
        density (studentT 1.0) 0.42 ≈ 0.27058: [OK]
        density (studentT 4.4) 0.33 ≈ 0.352994: [OK]
        cumulative (studentT 0.3) 3.34 ≈ 0.757146: [OK]
        cumulative (studentT 1.0) 0.42 ≈ 0.626569: [OK]
        cumulative (studentT 4.4) 0.33 ≈ 0.621739: [OK]
        density (fDistribution 1 3) 3.0 ≈ 0.05305164769729845 [got 0.0530516477147324]: [OK]
        density (fDistribution 2 2) 1.2 ≈ 0.206612 [got 0.20661157024793383]: [OK]
        density (fDistribution 10 12) 8.0 ≈ 0.0003856131792818928 [got 0.00038561318051585333]: [OK]
        cumulative (fDistribution 1 3) 3.0 ≈ 0.8183098861837906 [got 0.8183098861240833]: [OK]
        cumulative (fDistribution 2 2) 1.2 ≈ 0.545455 [got 0.5454545454545454]: [OK]
        cumulative (fDistribution 10 12) 8.0 ≈ 0.9993550986345141 [got 0.9993550986324504]: [OK]
    Nonparametric tests:
      Mann-Whitney: [OK]
      Mann-Whitney: [OK]
      Mann-Whitney: [OK]
      Mann-Whitney: [OK]
      Mann-Whitney: [OK]
      Mann-Whitney: [OK]
      Mann-Whitney U Critical Values, m=1: [OK]
      Mann-Whitney U Critical Values, m=2, p=0.025: [OK]
      Mann-Whitney U Critical Values, m=6, p=0.05: [OK]
      Mann-Whitney U Critical Values, m=20, p=0.025: [OK]
      Wilcoxon Sum: [OK]
      Wilcoxon Sum: [OK]
      Wilcoxon Paired 0: [OK]
      Wilcoxon Paired 1: [OK]
      Wilcoxon Paired 2: [OK]
      Wilcoxon Paired 3: [OK]
      Wilcoxon Paired 4: [OK]
      Wilcoxon Paired 5: [OK]
      Sig 16, 35: [OK]
      Sig 16, 36: [OK]
      Wilcoxon critical values, p=0.05: [OK]
      Wilcoxon critical values, p=0.025: [OK]
      Wilcoxon critical values, p=0.01: [OK]
      Wilcoxon critical values, p=0.005: [OK]
      K-S D statistics: [OK]
      K-S 2-sample statistics: [OK]
      K-S probability: [OK]
    fft:
      t_impulse: [OK, passed 100 tests]
      t_impulse_offset: [OK, passed 100 tests]
      ifft . fft = id: [OK, passed 100 tests]
      fft . ifft = id: [OK, passed 100 tests]
      idct . dct = id [up to scale]: [OK, passed 100 tests]
      dct . idct = id [up to scale]: [OK, passed 100 tests]
      DCT test for fromList [1.0,0.0]: [OK]
      DCT test for fromList [0.0,1.0]: [OK]
      DCT test for fromList [1.0,0.0,0.0,0.0]: [OK]
      DCT test for fromList [0.0,1.0,0.0,0.0]: [OK]
      DCT test for fromList [0.0,0.0,1.0,0.0]: [OK]
      DCT test for fromList [0.0,0.0,0.0,1.0]: [OK]
      IDCT test for fromList [1.0,0.0]: [OK]
      IDCT test for fromList [0.0,1.0]: [OK]
      IDCT test for fromList [1.0,0.0,0.0,0.0]: [OK]
      IDCT test for fromList [0.0,1.0,0.0,0.0]: [OK]
      IDCT test for fromList [0.0,0.0,1.0,0.0]: [OK]
      IDCT test for fromList [0.0,0.0,0.0,1.0]: [OK]
    S.Function:
      Sort is sort: [OK, passed 100 tests]
    KDE:
      integral(PDF) == 1: [OK, passed 100 tests]
    
             Properties    Test Cases   Total        
     Passed  102           52           154          
     Failed  6             0            6            
     Total   108           52           160          
    Test suite tests: FAIL
    Test suite logged to: dist/test/statistics-0.10.2.0-tests.log
    0 of 1 test suites (0 of 1 test cases) passed.
    
@Shimuuar
Collaborator
Shimuuar commented Nov 6, 2012

It looks like unicode indeed cause problems. I'll replace it with ASCII.

Second problem is well known. Testing that two functions are inverses of each other is tricky for floating point. Rounding errors result in loss of precision during round trip. So for some inputs cumulative . quantile is not id and it's difficult (I'd say impossible) to draw line between correct and incorrect code. Probably QuickCheck is not right tool. Plotting (λx → x - cumulative (quantile x))) turned out to be much more informative. But it's difficult to automate.

@ivan-m
ivan-m commented Nov 6, 2012

Maybe those tests should be removed then, or some kind of delta equality used rather than ==?

@Shimuuar
Collaborator
Shimuuar commented Nov 6, 2012

Of course they use approximate equality. Anyone who use == with
floating point is asking for trouble and should be hit with crowbar.
Problem is choosing of delta. Generally there are always some corners
in the parameter space where things begin to fall apart, precision is
lost and big delta is needed there. But choosing big delta means that
small discrepancy in good region goes unnoticed.

I don't like idea of removing tests. Creating algorithms that works is
much better. I tried that. And it kind of worked but it needs more
experimentation to show that I didn't miss something important.

Algorithm description is here:
http://sepulcarium.org/blog/posts/2012-07-19-rounding_effect_on_inverse.html

And haskell implementation is here:
https://github.com/Shimuuar/quickcheck-numeric

@aslatter aslatter referenced this issue in fpco/stackage Nov 26, 2012
Merged

Add uuid & byteorder #2

@Shimuuar Shimuuar added a commit to Shimuuar/statistics that referenced this issue Nov 28, 2012
@Shimuuar Shimuuar Remove non-ASCII character from messages from test suite.
They prevent tests from running on machines with locales
where they couldn't be printed.

(Reported at #42)
70cd06d
@Shimuuar
Collaborator

I removed unicode character from strings but I could missed some.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment