unit test failures on Perlmutter #1963

marcelo-alvarez · 2023-01-11T00:12:02Z

Two unit tests fail in the main environment (desiconda/20220119-2.0.1) on Perlmutter:

source /global/common/software/desi/desi_environment.sh main
git clone https://github.com/desihub/desispec.git
cd desispec
python setup.py test 
...
======================================================================
ERROR: test_load_gdarc_lines (desispec.test.test_bootcalib.TestBoot)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/global/common/software/desi/perlmutter/desiconda/20220119-2.0.1/code/desispec/main/py/desispec/test/test_bootcalib.py", line 124, in test_load_gdarc_lines
    llist = desiboot.load_arcline_list(camera)
  File "/global/common/software/desi/perlmutter/desiconda/20220119-2.0.1/code/desispec/main/py/desispec/bootcalib.py", line 640, in load_arcline_list
    tbl = parse_nist(iline, vacuum=vacuum)
  File "/global/common/software/desi/perlmutter/desiconda/20220119-2.0.1/code/desispec/main/py/desispec/bootcalib.py", line 563, in parse_nist
    nist_tbl = Table.read(nist_file, format='ascii.fixed_width')
  File "/global/common/software/desi/perlmutter/desiconda/20220119-2.0.1/conda/lib/python3.9/site-packages/astropy/table/connect.py", line 62, in __call__
    out = self.registry.read(cls, *args, **kwargs)
  File "/global/common/software/desi/perlmutter/desiconda/20220119-2.0.1/conda/lib/python3.9/site-packages/astropy/io/registry/core.py", line 199, in read
    data = reader(*args, **kwargs)
  File "/global/common/software/desi/perlmutter/desiconda/20220119-2.0.1/conda/lib/python3.9/site-packages/astropy/io/ascii/connect.py", line 18, in io_read
    return read(filename, **kwargs)
  File "/global/common/software/desi/perlmutter/desiconda/20220119-2.0.1/conda/lib/python3.9/site-packages/astropy/io/ascii/ui.py", line 317, in read
    table = fileobj.read()
  File "/global/common/software/desi/perlmutter/desiconda/20220119-2.0.1/conda/lib/python3.9/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 613: ordinal not in range(128)

======================================================================
ERROR: test_parse_nist (desispec.test.test_bootcalib.TestBoot)
Test parsing of NIST arc line files.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/global/common/software/desi/perlmutter/desiconda/20220119-2.0.1/code/desispec/main/py/desispec/test/test_bootcalib.py", line 114, in test_parse_nist
    tbl = desiboot.parse_nist('CdI')
  File "/global/common/software/desi/perlmutter/desiconda/20220119-2.0.1/code/desispec/main/py/desispec/bootcalib.py", line 563, in parse_nist
    nist_tbl = Table.read(nist_file, format='ascii.fixed_width')
  File "/global/common/software/desi/perlmutter/desiconda/20220119-2.0.1/conda/lib/python3.9/site-packages/astropy/table/connect.py", line 62, in __call__
    out = self.registry.read(cls, *args, **kwargs)
  File "/global/common/software/desi/perlmutter/desiconda/20220119-2.0.1/conda/lib/python3.9/site-packages/astropy/io/registry/core.py", line 199, in read
    data = reader(*args, **kwargs)
  File "/global/common/software/desi/perlmutter/desiconda/20220119-2.0.1/conda/lib/python3.9/site-packages/astropy/io/ascii/connect.py", line 18, in io_read
    return read(filename, **kwargs)
  File "/global/common/software/desi/perlmutter/desiconda/20220119-2.0.1/conda/lib/python3.9/site-packages/astropy/io/ascii/ui.py", line 317, in read
    table = fileobj.read()
  File "/global/common/software/desi/perlmutter/desiconda/20220119-2.0.1/conda/lib/python3.9/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 613: ordinal not in range(128)

@sbailey the same two errors also occur with a fresh desiconda install, so these failures should not block our updating desiconda, since desispec is working fine on Perlmutter in spite of these two failures. While not a high priority, we should get this fixed at some point.

The text was updated successfully, but these errors were encountered:

weaverba137 · 2023-01-11T00:18:21Z

Is there a specific reason you're using python setup.py test? You should try to run the tests as similar as possible to the GitHub Actions tests, and those use pytest.

marcelo-alvarez · 2023-01-11T01:08:40Z

@weaverba137 thanks for the suggestion. There is no specific reason other than for consistency with what is done in desitest. Did you determine that using pytest would solve these failures?

weaverba137 · 2023-01-11T03:15:12Z

If the tests are passing on GitHub Actions, then one should try to minimize the differences between that environment and perlmutter. Also, desitest is probably not what we want to be consistent with even in the near-term future.

sbailey · 2023-01-11T06:02:31Z

Curiously, 4 (not 2!) bootcalib tests fail with pytest on perlmutter when running the full test suite (they all boil down to hitting the same "UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 613: ordinal not in range(128)")

pytest py/desispec/test

But they pass if you run just the bootcalib test on its own:

pytest py/desispec/test/test_bootcalib.py

I don't have a good explanation for why or how that could matter.

Although the test failures are probably harmless, we should get them fixed ASAP so that we don't get used to failing unit tests, which eventually leads to not noticing a genuine failure.

marcelo-alvarez · 2023-03-22T23:56:25Z

@sbailey @weaverba137 I suspect this failure is related to #291 but you may be able to find a better solution. Do you want to take a try at resolving this? Thanks.

weaverba137 · 2023-03-23T00:16:52Z

The short version is that we might not need the elaborate locale manipulations any longer because perlmutter sets LANG=en_US.UTF-8. Something for tomorrow.

marcelo-alvarez · 2023-03-23T21:44:00Z

Sadly getting rid of the locale manipulations didn't fix it (although it would be great to get rid of them if no longer necessary, even cori sets LANG=en_US.UTF-8).

Something about test_binscripts.py changes the state such that the bootcalib unit test fails with it and succeeds when it is not included. Investigating further now.

marcelo-alvarez · 2023-03-23T23:24:25Z

Replacing the line

        device_smoothed = cupyx.scipy.ndimage.median_filter(device_flux, width, mode='constant')

with

        return scipy.ndimage.median_filter(flux, width, mode='constant')

results in success of desispec unit tests on Perlmutter.

@sbailey please have a look and let me know if you have any idea why calling cupyx.scipy.ndimage might cause this type of error downstream. I debugged with

        sys.stdout = open('/global/cfs/cdirs/desi/users/malvarez/testscripts/env_before','w'); print(os.environ)
        device_smoothed = cupyx.scipy.ndimage.median_filter(device_flux, width, mode='constant')
        sys.stdout = open('/global/cfs/cdirs/desi/users/malvarez/testscripts/env_after','w'); print(os.environ);

and verified that the environments are identical before and after this call, so there are no hidden changes to environment variables in the call to cupyx.scipy.ndimage.

marcelo-alvarez · 2023-03-23T23:31:35Z

see the bootcalib-test branch for the change that "fixes" the unit test failures @sbailey

sbailey · 2023-03-23T23:39:42Z

I have no idea. @dmargala or @craigwarner-ufastro do you have any ideas for why a call to cupyx.scipy.ndimage.median_filter could have a side effect of breaking the ability to parse a unicode string?

craigwarner-ufastro · 2023-03-24T05:13:19Z

Wow. That is really strange. I was able to reproduce the error and the success when replacing with scipy.ndimage.median_filter but I have no idea why or how cupyx.scipy.ndimage.median_filter could cause that side effect.

weaverba137 · 2023-03-24T15:31:54Z

Do any of the scripts tested in test_binscripts.py use desispec.bootcalib at all?

Is the same cupyx failure seen on main, or just the bootcalib-test branch?

dmargala · 2023-03-24T15:57:39Z

It seems like the underlying issue here is that there are non-ascii characters in a file that is parsed using an ascii codec. Is that correct?

From a quick and non-thorough look at the related issues referenced above, it seems like a work-around was put in place instead of a fix to the underlying problem. I'm not sure what or why but perhaps something in cupy is undo-ing that work-around?

source /global/common/software/desi/desi_environment.sh main
git clone https://github.com/desihub/desispec.git
cd desispec
pytest -x py/desispec/test # fails
sed -i s/Å/Ang/g py/desispec/data/arc_lines/*ascii # replace offending character
pytest -x py/desispec/test # passes

weaverba137 · 2023-03-24T17:08:44Z

Yes, the file contains non-ascii characters.

The key point here is that it is the test that is broken, not the actual code.

If you simply open Python and load the file using the function in desispec.bootcalib, it just works.

marcelo-alvarez · 2023-03-24T17:18:04Z

@weaverba137 I know the order is determined alphabetically, so maybe not feasible, but in the spirit of making the smallest change possible to keep the unit test in and not getting sidetracked, what do you think of somehow changing the order of unit tests so bootcalib comes first?

weaverba137 · 2023-03-24T17:23:50Z

@marcelo-alvarez, maybe I'm misunderstanding something here, but it seems like it's a bit too early to give up on fixing the problem.

marcelo-alvarez · 2023-03-24T17:53:11Z

That all depends on your definition of fixing the problem, but I'm happy to use yours :)

weaverba137 · 2023-03-24T17:56:23Z

OK, let's back up a step and let me ask: in detail, how did you identify that cupyx function as influencing the result?

sbailey · 2023-03-24T18:21:42Z

The key point here is that it is the test that is broken, not the actual code.

I don't think that's accurate. The test is checking whether desiboot.load_arcline_list(camera) and desiboot.parse_nist('CdI') work, and it has somewhat accidentally discovered that those commands don't work if cupy is imported. We could say that we don't care because we never need to use bootcalib with cupy (bootcalib is for determining the wavelength solution of a newly installed CCD and is run by hand, not part of the regular pipeline). But it is a code error that it is using an ascii reader for a non-ascii file and that there are some situations where apparently that doesn't work.

Reproducer outsider of the framework of the tests:

from desispec import bootcalib

#- Works
bootcalib.parse_nist('CdI')

#- bootcalib fails if cupy is imported and used
import cupy as cp
x = cp.arange(10)

#- Fails
bootcalib.parse_nist('CdI')

For further context, the files that is it trying to parse are downloaded directly from NIST, so it would be nice to be able to use them as-is and not require a manual cleaning step if we ever need to download an updated version. But that might be less painful that writing our own unicode-aware equivalent of Table.read(filename, format='ascii.fixed_width')

weaverba137 · 2023-03-24T18:59:02Z

@sbailey, thank you for the clarification. I'm still very interested to hear exactly how @marcelo-alvarez discovered that cupy is having this effect, that could provide additional clues.

marcelo-alvarez · 2023-03-24T19:00:55Z

Thanks for the context @sbailey. I think the solution here may depend on how likely it is that these particular calib files will ever be read after importing and using cupy/cupyx. Given that it hasn't happened in Iron or on nights daily processing was on Perlmutter, maybe it's not likely?

OK, let's back up a step and let me ask: in detail, how did you identify that cupyx function as influencing the result?

@weaverba137 I traversed all lines previous to the unit test failures in question systematically until I came upon the line that, if removed, would result in success.

dmargala · 2023-03-24T19:02:15Z

Using cupy does seem to have a side effect of changing the preferred encoding that astropy uses when it tries to load the file.

This works (no error):

> python -c 'from astropy.table import Table; Table.read("ArI_air.ascii", format="ascii.fixed_width")'

Adding a bit of cupy reproduces before that reproduces the error:

> python -c 'import cupy; cupy.arange(10); from astropy.table import Table; Table.read("ArI_air.ascii", format="ascii.fixed_width")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/conda/envs/test-ascii/lib/python3.10/site-packages/astropy/table/connect.py", line 62, in __call__
    out = self.registry.read(cls, *args, **kwargs)
  File "/tmp/conda/envs/test-ascii/lib/python3.10/site-packages/astropy/io/registry/core.py", line 219, in read
    data = reader(*args, **kwargs)
  File "/tmp/conda/envs/test-ascii/lib/python3.10/site-packages/astropy/io/ascii/connect.py", line 19, in io_read
    return read(filename, **kwargs)
  File "/tmp/conda/envs/test-ascii/lib/python3.10/site-packages/astropy/io/ascii/ui.py", line 350, in read
    table = fileobj.read()
  File "/tmp/conda/envs/test-ascii/lib/python3.10/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 675: ordinal not in range(128)

Comparing locale.getpreferredencoding() (this seems to be how astropy decides which encoding to use) before and after cupy usage:

> python -c 'import locale; print(locale.getpreferredencoding())'
UTF-8
> python -c 'import cupy; cupy.arange(10); import locale; print(locale.getpreferredencoding())'
ANSI_X3.4-1968

It looks like there is a special Python UTF-8 mode that you can use with either:

> python -X utf8 -c 'import cupy; cupy.arange(10); import locale; print(locale.getpreferredencoding())'
UTF-8
> PYTHONUTF8=1 python -c 'import cupy; cupy.arange(10); import locale; print(locale.getpreferredencoding())'
UTF-8

Forcing Python UTF-8 mode avoids the error:

> python -X utf8 -c 'import cupy; cupy.arange(10); from astropy.table import Table; Table.read("ArI_air.ascii", format="ascii.fixed_width")'

This actually seems more like a CUDA issue rather than cupy. Adding the "ANSI_X3.4-1968" to google search reveals similar occurrences of this issue in other libraries using CUDA besides cupy. This issue NVIDIA/cuda-python#29 has a reference to an nvidia bug report at https://developer.nvidia.com/nvidia_bug/3833924 but I am unable to view that.

marcelo-alvarez · 2023-03-24T19:12:38Z

@dmargala thanks for getting (closer) to the bottom of this. I can't test it myself right now, but maybe a more comprehensive forcing of the encoding in parse_nist could be the solution

weaverba137 · 2023-03-24T19:55:10Z

I'm planning to run some tests now. I'll work on the bootcalib-test branch.

marcelo-alvarez assigned marcelo-alvarez, sbailey and weaverba137 Mar 22, 2023

weaverba137 mentioned this issue Mar 24, 2023

Set encoding when reading NIST files #2029

Merged

marcelo-alvarez closed this as completed in #2029 Mar 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unit test failures on Perlmutter #1963

unit test failures on Perlmutter #1963

marcelo-alvarez commented Jan 11, 2023

weaverba137 commented Jan 11, 2023

marcelo-alvarez commented Jan 11, 2023

weaverba137 commented Jan 11, 2023

sbailey commented Jan 11, 2023

marcelo-alvarez commented Mar 22, 2023

weaverba137 commented Mar 23, 2023

marcelo-alvarez commented Mar 23, 2023

marcelo-alvarez commented Mar 23, 2023

marcelo-alvarez commented Mar 23, 2023

sbailey commented Mar 23, 2023

craigwarner-ufastro commented Mar 24, 2023

weaverba137 commented Mar 24, 2023

dmargala commented Mar 24, 2023

weaverba137 commented Mar 24, 2023

marcelo-alvarez commented Mar 24, 2023

weaverba137 commented Mar 24, 2023

marcelo-alvarez commented Mar 24, 2023 •

edited

weaverba137 commented Mar 24, 2023

sbailey commented Mar 24, 2023

weaverba137 commented Mar 24, 2023

marcelo-alvarez commented Mar 24, 2023 •

edited

dmargala commented Mar 24, 2023

marcelo-alvarez commented Mar 24, 2023

weaverba137 commented Mar 24, 2023

unit test failures on Perlmutter #1963

unit test failures on Perlmutter #1963

Comments

marcelo-alvarez commented Jan 11, 2023

weaverba137 commented Jan 11, 2023

marcelo-alvarez commented Jan 11, 2023

weaverba137 commented Jan 11, 2023

sbailey commented Jan 11, 2023

marcelo-alvarez commented Mar 22, 2023

weaverba137 commented Mar 23, 2023

marcelo-alvarez commented Mar 23, 2023

marcelo-alvarez commented Mar 23, 2023

marcelo-alvarez commented Mar 23, 2023

sbailey commented Mar 23, 2023

craigwarner-ufastro commented Mar 24, 2023

weaverba137 commented Mar 24, 2023

dmargala commented Mar 24, 2023

weaverba137 commented Mar 24, 2023

marcelo-alvarez commented Mar 24, 2023

weaverba137 commented Mar 24, 2023

marcelo-alvarez commented Mar 24, 2023 • edited

weaverba137 commented Mar 24, 2023

sbailey commented Mar 24, 2023

weaverba137 commented Mar 24, 2023

marcelo-alvarez commented Mar 24, 2023 • edited

dmargala commented Mar 24, 2023

marcelo-alvarez commented Mar 24, 2023

weaverba137 commented Mar 24, 2023

marcelo-alvarez commented Mar 24, 2023 •

edited

marcelo-alvarez commented Mar 24, 2023 •

edited