Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error reading Solar Dynamics Observatory, HMI continuum, FITS file with "nan" fields #10153

Closed
ndl303 opened this issue Apr 15, 2020 · 11 comments · Fixed by #10165
Closed

Error reading Solar Dynamics Observatory, HMI continuum, FITS file with "nan" fields #10153

ndl303 opened this issue Apr 15, 2020 · 11 comments · Fixed by #10165

Comments

@ndl303
Copy link

ndl303 commented Apr 15, 2020

Description

I am unable to read a FITS file with astropy.io.fits produced by the Solar Dynamics Observatory for the HMI continuum data product. I obtained the FITS files from the SDO JSOC but this requires users to register email and learn how to request data. I have the file temporarily available here if you want to replicate the problem.

I actually think this is not really a bug in astropy.io.fits but is due to the SDO HMI team using NaN values which is beyond the FITS standard (I think). Unfortunately this is a good idea and they obviously have FITS software that lets them create these files. From a pragmatic point of view it might make sense for astropy.io.fits to also support the option, which seems to be relatively straight forward, so it can keep up with other FITS software out there.

I have done some diagnostics outlined below and I have hacked my version of astropy to allow me to keep on working with the SDO HMI data.

Expected behavior

I hoped astropy.io.fits would read the SDO HMI continuum FITS file without issue. It consists of one 2-D image, 4096x4096 numbers.

The problem seems to be that the SDO HMI people have encoded "nan" values into some of the CARD values used in their FITS file. This is not supported by astropy.io.fits and may not be part of the FITS standard. Unfortunately it is used by the HMI instrument.

Other people have seen this problem and reported issues on Stack Overflow.

The sample code is straight-forward but must use SDO HMI data files. The example is shown below. The fits.open line works without issue but crashes on the line data = hdul[1].data.

    from astropy.io import fits
    
    fullname='hmi.Ic_720s.20170530_000000_TAI.3.continuum.fits'
    hdul =  fits.open(fullname)
    data =  hdul[1].data

Curiously the code does not always crash when using the PyCharm/Python debugger environment. I suspect the debugger is calling many @Property based values when it displays variable values and this is circumventing the normal internal caching and checking going on inside the FITS code. Definitely complicates debugging.

The code always crashes when it runs without a debugger. The following errors are output when executing the last line,

    WARNING: VerifyWarning: Verification reported errors: [astropy.io.fits.verify]
    WARNING: VerifyWarning: Card 'CRDER1' is not FITS standard (invalid value string: 'nan').  Fixed 'CRDER1' card to meet the FITS standard. [astropy.io.fits.verify]
    WARNING: VerifyWarning: Note: astropy.io.fits uses zero-based indexing.
    [astropy.io.fits.verify]
    [astropy.io.fits.verify]
    Traceback (most recent call last):
      File "C:\Users\nickl\Anaconda3\envs\altius\lib\site-packages\astropy\utils\decorators.py", line 744, in __get__ 
    val = self.fget(obj)
      File "C:\Users\nickl\Anaconda3\envs\altius\lib\site-packages\astropy\io\fits\hdu\compressed.py", line 1363, in data
    data = compression.decompress_hdu(self)
      File "C:\Users\nickl\Anaconda3\envs\altius\lib\site-packages\astropy\utils\decorators.py", line 744, in __get__
    val = self.fget(obj)
      File "C:\Users\nickl\Anaconda3\envs\altius\lib\site-packages\astropy\io\fits\hdu\compressed.py", line 1415, in compressed_data
    compressed_data = super().data
      File "C:\Users\nickl\Anaconda3\envs\altius\lib\site-packages\astropy\utils\decorators.py", line 744, in __get__
    val = self.fget(obj)
      File "C:\Users\nickl\Anaconda3\envs\altius\lib\site-packages\astropy\io\fits\hdu\table.py", line 399, in data
    data = self._get_tbdata()
      File "C:\Users\nickl\Anaconda3\envs\altius\lib\site-packages\astropy\io\fits\hdu\table.py", line 156, in _get_tbdata
    columns = self.columns
      File "C:\Users\nickl\Anaconda3\envs\altius\lib\site-packages\astropy\utils\decorators.py", line 744, in __get__
    val = self.fget(obj)
      File "C:\Users\nickl\Anaconda3\envs\altius\lib\site-packages\astropy\io\fits\hdu\table.py", line 395, in columns
    return self._columns_type(self)
      File "C:\Users\nickl\Anaconda3\envs\altius\lib\site-packages\astropy\io\fits\column.py", line 1383, in __init__
    self._init_from_table(input)
      File "C:\Users\nickl\Anaconda3\envs\altius\lib\site-packages\astropy\io\fits\column.py", line 1448, in _init_from_table
    for keyword, value in hdr.items():
      File "C:\Users\nickl\Anaconda3\envs\altius\lib\site-packages\astropy\io\fits\header.py", line 983, in items
    yield (card.keyword, card.value)
      File "C:\Users\nickl\Anaconda3\envs\altius\lib\site-packages\astropy\io\fits\card.py", line 285, in value
    value = self._value = self._parse_value()
      File "C:\Users\nickl\Anaconda3\envs\altius\lib\site-packages\astropy\io\fits\card.py", line 767, in _parse_value
    ".verify('fix').".format(self.keyword))
    astropy.io.fits.verify.VerifyError: Unparsable card (CRDER2), fix it first with .verify('fix').
    
    
    Process finished with exit code 1

System Details

Numpy 1.18.1
astropy 4.0.1.post1
Scipy 1.3.1

Diagnostics

I have debugged the code to the point that I have a temporary fix on my own machine. The problem is in file astropy.io.fits.card.py in function Card._parse_value(self).

Around line 764 there is a piece of code which is where the problem occurs.

        m = self._value_NFSC_RE.match(self._split()[1])

        if m is None:
            raise VerifyError("Unparsable card ({}), fix it first with "
                              ".verify('fix').".format(self.keyword))

The line m = self._value_NFSC_RE.match(self._split()[1]) is trying to ensure that a CARD value is valid and 'nan' is not an acceptable value. The match returns as None and the exception is thrown.

The quick fix I have implemented changes the regular expression so it passes the match.

  1. The first action is to change the regular expression around line 111 of card.py to include a new match group called nan. In my implementation I only match 'nan'. More general implementations might also match 'NaN' and 'NAN'.
    _value_NFSC_RE = re.compile(
        r'(?P<valu_field> *'
            r'(?P<valu>'
                r'\'(?P<strg>([ -~]+?|\'\'|) *?)\'(?=$|/| )|'
                r'(?P<bool>[FT])|'
                r'(?P<nan>nan)|'
                r'(?P<numr>' + _numr_NFSC + r')|'
                r'(?P<cplx>\( *'
                    r'(?P<real>' + _numr_NFSC + r') *, *'
                    r'(?P<imag>' + _numr_NFSC + r') *\))'
            r')? *)'
        r'(?P<comm_field>'
            r'(?P<sepr>/ *)'
            r'(?P<comm>(.|\n)*)'
        r')?$')
  1. The second action is to edit function Card._parse_value(self) around line 764 to manage this new match group (nan) and use the numpy.nan function to assign a value,
        m = self._value_NFSC_RE.match(self._split()[1])

        if m is None:
            raise VerifyError("Unparsable card ({}), fix it first with "
                              ".verify('fix').".format(self.keyword))

        if m.group('bool') is not None:
            value = m.group('bool') == 'T'
        if m.group('nan') is not None:
            value = np.nan
        elif m.group('strg') is not None:
            value = re.sub("''", "'", m.group('strg'))
        ...

With these fixes, the code on my machine is working. In practice a robust fix would check some of the other places where CARD values are parsed.

Please let me know if I can be of any assistance. Astropy is a great package and I would be more than happy to help with this problem.

@ndl303
Copy link
Author

ndl303 commented Apr 15, 2020

System Details

Windows-10-10.0.18362-SP0
Python 3.7.6 | packaged by conda-forge | (default, Mar 5 2020, 14:47:50) [MSC v.1916 64 bit (AMD64)]
Numpy 1.18.1
astropy 4.0.1.post1
Scipy 1.3.1

@ayshih
Copy link
Contributor

ayshih commented Apr 15, 2020

Is running hdul.verify('fix'), as suggested by the error message, not an acceptable fix for you?

Also, for working with SDO/HMI data, I recommend that you use SunPy, since it'll do additional stuff for you. (Under the hood, SunPy calls .verify('silentfix+warn').) Here's some example code for the file you provided:

>>> import sunpy.map
>>> hmi_map = sunpy.map.Map('hmi.Ic_720s.20170530_000000_TAI.3.continuum.fits')
>>> hmi_map.peek()

hmi

@pllim
Copy link
Member

pllim commented Apr 16, 2020

Yeah, if the maker of the data violates the standard, it is out of scope. Did you contact the observatory first?

@ndl303
Copy link
Author

ndl303 commented Apr 16, 2020

Thanks for the help,

The fix you suggest is acceptable. I did see the .verify("fix") suggestion in the error report but I guess the problem I had is that I did not know which object the .verify('fix') applied to. I tried it on a few objects, the header, the cards, the data without success. I must not have tried it on the hdul, my mistake. I read the astropy.io.fits documentation and did not find any documentation on the verify method, it does not exist on the astropy fits documentation. I did find documentation on function open with option output_verify which I tried with no success.

I have sent an email to the observatory at Stanford to let them know. I mostly use Earth science data which are netcdf based and I am not a FITS expert. I assumed that the code run by the Joint Science Operations Centre at Stanford for a major NASA mission like SDO, will have been thoroughly validated. I thought it much more likely that astropy.io.fits was a little bit behind the latest updates in FITS standards especially given that NaN is such a sensible idea in many situations.

Again thanks for the help and suggestions. I found and installed Sunpy yesterday but I have not looked at it yet, but it sounds very promising.

@ndl303
Copy link
Author

ndl303 commented Apr 16, 2020

Maybe a sensible option is to add some documentation to the astropy.io.fits.open function (or elsewhere) explaining the usage of .verify('fix') for cases where there are non-standard entries in the FITS file. A couple of lines in the documentation would have saved me a few hours of fiddling around.

@ayshih
Copy link
Contributor

ayshih commented Apr 16, 2020

It's not linked to particularly well, but there is a page devoted to the issue of verification: https://docs.astropy.org/en/stable/io/fits/usage/verification.html

I'll emphasize a few sentences from there:

Since FITS standard is a “loose” standard, there are many places the violation can occur and to enforce them all will be almost impossible. It is not uncommon for major observatories to generate data products which are not 100% FITS compliant. Some observatories have also developed their own nonstandard dialect and some of these are so prevalent that they have become de facto standards.

@ndl303
Copy link
Author

ndl303 commented Apr 16, 2020

Thanks for the link and I think your statement underlines the need for documentation on the .verify('fix') method near to the open function: you know there are many violations from the FITS standard and the code is expected to fail unless .verify('fix') is called at the correct place.

Let me know if I can be any of assistance. I would be happy to edit the FITS documentation as a novice contributor if you think that would be helpful.

@saimn
Copy link
Contributor

saimn commented Apr 16, 2020

Yes, the issue is known (e.g. #873), but it seems that since that nobody pushed for allowing NaNs in header cards in the standard. And sadly people use this kind of non-standard things, so I guess we could add an option to allow reading those NaNs, it would not be the first exception.

And as mentioned by @ayshih, .verify is documented in https://docs.astropy.org/en/stable/io/fits/usage/verification.html But it has its flaws, e.g. #887, partially because verifying FITS files is hard because of all the little non-standard things ;). Having a better verification mechanism has been a long time issue, but this requires a substantial amount if work.

@saimn
Copy link
Contributor

saimn commented Apr 16, 2020

Maybe a link to the verification page could be added at the end of the "Opening a FITS File" paragraph ? http://docs.astropy.org/en/latest/io/fits/#opening-a-fits-file

@bsipocz
Copy link
Member

bsipocz commented Apr 16, 2020

Let me know if I can be any of assistance. I would be happy to edit the FITS documentation as a novice contributor if you think that would be helpful.

@ndl303 - your contribution would be most useful.

@embray
Copy link
Member

embray commented Apr 19, 2020

Indeed, most of the verify('fix') system is a holdover from the original PyFITS that I've long wanted to improve (see e.g. #3668 though I think there are some even older issues that discuss this more generally, but I can't find them).

I definitely did improve it a lot back in the day, by having the reader enforce strict standards by default, but then there are several places where it can identify various common standard violations and in some cases propose fixes. This is done in a very ad-hoc manner though, and there's no good way to control the behavior (e.g. verify('fix') just fixes everything it can and there's no good way to control the process).

A smarter approach would be to define some specific verification phases (this is already defined implicitly in the code) such as verifying overall header structure, verifying card structure, verifying individual keywords and values, etc. And there could be a cleaner way to represent (e.g. in the form of exceptions) specific standards that are being violated (perhaps even with reference to the relevant chapter and verse of the FITS standard). This could allow users to register hooks for how to fix and/or ignore specific standard violations. So if you're working with data from an observatory that writes "nan" as card values, and Astropy doesn't otherwise support that case, you could register your own fix for InvalidCardValue("nan") which parses the value "nan" into float('nan') and allows the error to be ignored. This would also require an analogous function for output verification (output verification should normally not let you write nan as a header value, but you could register a custom card serializer for otherwise unsupported values).

I think this would all be relatively easy to do, and @ndl303 has already done good work pinpointing some of the relevant parts of the code. But it should definitely be handled with care.

Additionally, it would be nice to have a better UI for fixing violations (alluded to somewhat in #3668) including an interactive mode when working in a REPL, which would allow inspecting invalid data one at a time, seeing what the proposed fix(es) are, and possibly also allowing a manual override. Something like that (I'm thinking e.g. git add -p as an inspiration).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants