stddev stability issue #657

sk1p · 2020-02-29T12:48:30Z

See the following CI failure: https://travis-ci.org/LiberTEM/LiberTEM/jobs/656632518

Not sure if this is fixed in #640.

sk1p · 2020-03-01T17:18:09Z

See also: https://ci.appveyor.com/project/sk1p/libertem/builds/31162834/job/b5jxjmtkt13ii6l2

uellue · 2020-03-09T14:23:55Z

Lovely! </irony> Those are not small errors anymore. We should definitely observe if this happens again with the new code. How can we make the error reproducible, i.e. get the same state of the random number generator etc.?

Or perhaps we could add code similar to

LiberTEM/tests/test_analysis_masks.py

Lines 932 to 1011 in afe3c42

    
           @pytest.mark.parametrize( 
        
               'TYPE', 
        
               ['JOB', 'UDF'] 
        
           ) 
        
           def test_numerics_fail(lt_ctx, TYPE): 
        
               dtype = 'float32' 
        
               # Highest expected detector resolution 
        
               RESOLUTION = 4096 
        
               # Highest expected detector dynamic range 
        
               RANGE = 1e6 
        
               # default value for all cells 
        
               VAL = 1.1 
        
               data = np.full((2, 1, RESOLUTION, RESOLUTION), VAL, dtype=np.float32) 
        
               data[0, 0, 0, 0] += VAL * RANGE 
        
               dataset = MemoryDataSet( 
        
                   data=data, 
        
                   tileshape=(2, RESOLUTION, RESOLUTION), 
        
                   num_partitions=1, 
        
                   sig_dims=2, 
        
               ) 
        
               mask0 = np.ones((RESOLUTION, RESOLUTION), dtype=np.float64) 
        
               analysis = lt_ctx.create_mask_analysis( 
        
                   dataset=dataset, factories=[lambda: mask0], mask_count=1, mask_dtype=dtype, 
        
               ) 
        
               analysis.TYPE = TYPE 
        
               results = lt_ctx.run(analysis) 
        
               expected = np.array([[ 
        
                   [VAL*RESOLUTION**2 + VAL*RANGE], 
        
                   [VAL*RESOLUTION**2] 
        
               ]]) 
        
               naive = _naive_mask_apply([mask0], data) 
        
               naive_32 = _naive_mask_apply([mask0.astype(dtype)], data) 
        
               # The masks are float64, that means the calculation is performed with high resolution 
        
               # and the naive result should be correct 
        
               assert np.allclose(expected, naive) 
        
               # We make sure LiberTEM calculated this with the lower-precision dtype we set 
        
               assert np.allclose(results.mask_0.raw_data, expected[0]) == np.allclose(naive_32, expected) 
        
               # Confirm that the numerical precision is actually insufficient. 
        
               # If this succeeds, we have to rethink the premise of this test. 
        
               assert not np.allclose(results.mask_0.raw_data, expected[0]) 
        
           @pytest.mark.parametrize( 
        
               'TYPE', 
        
               ['JOB', 'UDF'] 
        
           ) 
        
           def test_numerics_succeed(lt_ctx, TYPE): 
        
               dtype = 'float64' 
        
               # Highest expected detector resolution 
        
               RESOLUTION = 4096 
        
               # Highest expected detector dynamic range 
        
               RANGE = 1e6 
        
               # default value for all cells 
        
               VAL = 1.1 
        
               data = np.full((2, 1, RESOLUTION, RESOLUTION), VAL, dtype=np.float32) 
        
               data[0, 0, 0, 0] += VAL * RANGE 
        
               dataset = MemoryDataSet( 
        
                   data=data, 
        
                   tileshape=(2, RESOLUTION, RESOLUTION), 
        
                   num_partitions=1, 
        
                   sig_dims=2, 
        
               ) 
        
               mask0 = np.ones((RESOLUTION, RESOLUTION), dtype=np.float32) 
        
               analysis = lt_ctx.create_mask_analysis( 
        
                   dataset=dataset, factories=[lambda: mask0], mask_count=1, mask_dtype=dtype, 
        
               ) 
        
               analysis.TYPE = TYPE 
        
               results = lt_ctx.run(analysis) 
        
               expected = np.array([[ 
        
                   [VAL*RESOLUTION**2 + VAL*RANGE], 
        
                   [VAL*RESOLUTION**2] 
        
               ]]) 
        
               naive = _naive_mask_apply([mask0.astype(dtype)], data.astype(dtype)) 
        
               assert np.allclose(expected, naive) 
        
               assert np.allclose(expected[0], results.mask_0.raw_data)

to test standard deviation numerical stability with a test case that is sensitive to numerical errors and can be verified with a simple, stable calculation?

sk1p · 2020-03-09T14:32:02Z

Those are not small errors anymore

I didn't have time to check in the error log yet - how large were the absolute errors? Maybe we should have better assertion output here, which gives some statistics on the difference?

To reproduce, maybe try to run in a loop again?

sensitive to numerical errors

I mean, one easy way is to just increase the dynamic range by a large margin, but this would be guaranteed to fail and also be expected. The hard part would be to find out how stable the algorithm should actually be.

I remember from the paper that they did implement Kahan summation in the minibatch part to increase stability, but I'm not sure how feasible this is.

uellue · 2020-03-09T14:49:04Z

how large were the absolute errors?

For example 2.4994637e-01 vs 2.4994595e-01, i.e. two orders of magnitude more than the decimal digits of precision of float32.

The hard part would be to find out how stable the algorithm should actually be

What about "at least as good as numpy.var()" if we pose the task in such way that the true result is known?

In fact, LiberTEM already performs a three-level hierarchical sum if tile and partition sizes are well-chosen: First within the tile, then within the partition, and then final aggregation.

I was already thinking on how to do a complete pairwise hierarchical sum without modifying the source buffer with only s * log(N) memory use (s: tile size, N: number of frames) in case this becomes an issue😁.

Or just switch to float64 😄.

sk1p · 2020-03-09T15:01:59Z

What about "at least as good as numpy.var()" if we pose the task in such way that the true result is known?

Well, that would mean actually decreasing the dynamic range of our test data, as the two-pass algorithm is AFAIK the most stable of the ones compared in the paper, right?

Counter stability issues with this algorithm Closes LiberTEM#657

Counter stability issues with this algorithm Closes #657

Counter stability issues with this algorithm Closes LiberTEM#657

sk1p added the bug Something isn't working label Feb 29, 2020

uellue added a commit to uellue/LiberTEM that referenced this issue Mar 9, 2020

Perform calculation in float64

e6e3f77

Counter stability issues with this algorithm Closes LiberTEM#657

uellue mentioned this issue Mar 9, 2020

Perform calculation in float64 #672

Merged

3 tasks

sk1p closed this as completed in #672 Mar 9, 2020

sk1p pushed a commit that referenced this issue Mar 9, 2020

Perform calculation in float64

e5a2802

Counter stability issues with this algorithm Closes #657

twentyse7en pushed a commit to twentyse7en/LiberTEM that referenced this issue Mar 12, 2020

Perform calculation in float64

43491ea

Counter stability issues with this algorithm Closes LiberTEM#657

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stddev stability issue #657

stddev stability issue #657

sk1p commented Feb 29, 2020

sk1p commented Mar 1, 2020

uellue commented Mar 9, 2020 •

edited

sk1p commented Mar 9, 2020

uellue commented Mar 9, 2020

sk1p commented Mar 9, 2020

stddev stability issue #657

stddev stability issue #657

Comments

sk1p commented Feb 29, 2020

sk1p commented Mar 1, 2020

uellue commented Mar 9, 2020 • edited

sk1p commented Mar 9, 2020

uellue commented Mar 9, 2020

sk1p commented Mar 9, 2020

uellue commented Mar 9, 2020 •

edited