num_regression with storage in NPZ files? #71

tovrstra · 2021-09-03T09:00:47Z

It would be useful to store numerical results in NPZ files, because these files can contain arrays with arbitrary shapes. Also for large arrays, this format is more compact (binary + compression), which is useful when dealing with large arrays. The downside is obviously that NPZ files are not human-readable. Would this be a feature of interest for the project? Are there currently other ways of handling large (say 10^6 elements) and/or high-dimensional arrays?

nicoddemus · 2021-09-03T12:34:30Z

Hi @tovrstra!

Thanks for the suggestion.

Yeah I think an option to num_regression.check and dataframe_regression.check could be use to save arrays in NPZ format (or automatically if it detects that the current array wouldn't be saved in CSV, the current format).

We would love to review a PR in that direction. 👍

tarcisiofischer · 2021-09-03T13:45:50Z

I'm interested in this, specially for 2d arrays (e.g. For comparing scalar fields).
I'm curious to see the implementation details. Perhaps some questions to think about:

Could this be implemented keeping the dict input for num-regression or would it be only for a single nparray?
For example, would this code below be possible? (I believe no, but is there any alternative? Perhaps a new specific nparray_regression?)

            num_regression.check({
                'U_field': my_np_array_3d,
                'U_mean': my_np_array_2d,
                'U_center_mean': my_np_array_1d,
                'P': Pa_to_bar(P)[positions], // 1d np array smaller than the other one
            })

How would the error messages be presented for such dense matrices / multidimensional data? Perhaps sparse visualization? Or no message at all, just the file re-generation and the user makes a diff manually?

Perhaps I'm overthinking. Anything that improves the current implementation is actually very welcome, IMO!
Would love to hear more and/or see a PR too :) Thanks!

tovrstra · 2021-09-03T14:49:32Z

Thanks for the enthusiasm! Just a quick attempt to answer a few of the questions:

NPZ is a ZIP file of NPY files, so a dictionary of arrays (with different shapes and dtypes) can be stored in such a file. See NumPy's savez_compressed. You can pass in a dictionary whose keys are not proper Python variable names. E.g. following works (!):
```
import numpy as np
data = {"odd/name": np.array([1, 2]), "other.name": np.array([[3.0, 4.0, 5.0]])}
np.savez_compressed("somefile.npz", **data)
data2 = dict(np.load("somefile.npz"))
```
Error reporting can indeed be difficult. It depends on how sensitive the test should be. The simplest is to requires strict reproduction of the arrays, e.g. reporting the number of mismatched elements for each item in the dictionary, some examples of mismatches (old, new and error value for each). In some cases, one may prefer to have atol and rtol arguments for the check method, in analogy to the same arguments of NumPy's allclose method. This may not be perfect though, because the dictionary may contain different arrays with different precision requirements (or some not even containing floating point numbers).

tarcisiofischer · 2021-09-03T15:15:45Z

SGTM. About the error reporting, I think you can choose whatever you think its good enough for a first version, and perhaps we could discuss further in the PR :)

Fixes ESSS#71 This is a first attempt, including unit tests. Also some minor issues in DataFrameRegressionFixture were fixed, which was used as a starting point for this PR. The new fixture only uses NumPy, so also the error message makes no use for pandas for formatting the errors.

nicoddemus added the question Further information is requested label Sep 3, 2021

tovrstra mentioned this issue Sep 3, 2021

NDArraysRegressionFixture: regression on arrays with arbitrary shape. #72

Merged

tarcisiofischer closed this as completed in #72 Sep 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

num_regression with storage in NPZ files? #71

num_regression with storage in NPZ files? #71

tovrstra commented Sep 3, 2021

nicoddemus commented Sep 3, 2021 •

edited

tarcisiofischer commented Sep 3, 2021

tovrstra commented Sep 3, 2021

tarcisiofischer commented Sep 3, 2021

num_regression with storage in NPZ files? #71

num_regression with storage in NPZ files? #71

Comments

tovrstra commented Sep 3, 2021

nicoddemus commented Sep 3, 2021 • edited

tarcisiofischer commented Sep 3, 2021

tovrstra commented Sep 3, 2021

tarcisiofischer commented Sep 3, 2021

nicoddemus commented Sep 3, 2021 •

edited