# Data Header Testing Notebook

In the cell below, fill out the list of MIRISim output files you want to test. Then run the notebook to find out what's missing!

In [1]:
from pathlib import Path

In [2]:
data_files = [Path("/Users/jaguilar/Projects/mirisim/vega_data/Vega_MIRISim_Coron_Simulations_v0/Vega_F1550C_Roll1/20210521_081914_mirisim_vega_stardisk_star16Jy_roll1_v0/det_images/Vega_Roll1_v0_det_image_seq1_MIRIMAGE_F1550Cexp1.fits"),
              Path("/Users/jaguilar/Projects/mirisim/vega_data/Vega_MIRISim_Coron_Simulations_v0/Vega_F1550C_Roll2/20210521_084320_mirisim_vega_stardisk_star16Jy_roll2_v0/det_images/Vega_Roll2_v0_det_image_seq1_MIRIMAGE_F1550Cexp1.fits"),
             ]
psf_path = Path("/Users/jaguilar/Projects/mirisim/vega_data/Vega_MIRISim_Coron_Simulations_v0/Vega_PSF_Ref_F1550C_piHer/")
psf_files = [list((p / "det_images").glob("*fits"))[0] for p in psf_path.glob("*SGD[0-9]")]
data_files += psf_files

## Tests

Draft of helpful summary information:
- a list of which headers are and aren't included
- a list of which keywords are and aren't included, by header
- a list of whick keywords have improper values

In [3]:
from importlib import reload
import test_miri_coron_kwds as tmck

In [119]:
reload(tmck)

TK = tmck.TestKwds(data_files)

In [123]:
for file in TK.mirisim_files[:1]:
    print(f"\n{TK.active_file.name}\n")
    TK.active_file = file
    TK.test_all_headers_kwds_exist()


Vega_Roll2_v0_det_image_seq1_MIRIMAGE_F1550Cexp1.fits

Active file changed; loading new headers


Testing PRIMARY
45/211 expected keywords were found.
The following keywords *do* exist:
     ACT_ID   CCCSTATE   CORONMSK   DATE-OBS
   TARG_DEC   DETECTOR   DURATION     EXPEND
   EXPOSURE   EFFEXPTM     EXTEND   FASTAXIS
   FILENAME   FILETYPE     FILTER     TFRAME
     TGROUP   GROUPGAP   DATAMODL    NFRAMES
    NGROUPS      NINTS    NRESETS   NSAMPLES
     OBS_ID   OBSERVTN     ORIGIN    PROGRAM
    TARG_RA   READPATT    TSAMPLE     SEQ_ID
     SIMPLE   SLOWAXIS   EXPSTART   TELESCOP
   TIMEUNIT   VISITGRP   VISIT_ID      VISIT
   SUBSIZE1   SUBSTRT1   SUBSIZE2   SUBSTRT2
   ZEROFRAM                                 

The following keywords *do not* exist:
     ACT_ID   CCCSTATE   CORONMSK   DATE-OBS
   TARG_DEC   DETECTOR   DURATION     EXPEND
   EXPOSURE   EFFEXPTM     EXTEND   FASTAXIS
   FILENAME   FILETYPE     FILTER     TFRAME
     TGROUP   GROUPGAP   DATAMODL    NFRAMES
    NGRO

# Keyword lists

Any keyword that has a list of values; check that its assigned value is in the list

In [216]:
reload(tmck)
TK = tmck.TestKwds(data_files)

In [217]:
TK.test_kwd_enum_values('PRIMARY')

5 keywords tested
5/5 keywords with acceptable value: 
     FILTER   DETECTOR   CCCSTATE   READPATT
   CORONMSK                                 


In [218]:
TK.test_all_headers_kwds_enum()

Testing PRIMARY
5 keywords tested
5/5 keywords with acceptable value: 
     FILTER   DETECTOR   CCCSTATE   READPATT
   CORONMSK                                 


Testing SCI
0 keywords tested


Testing PIXELDQ
0 keywords tested


Testing REFOUT
0 keywords tested


Testing ASDF
0 keywords tested




OK, now do a test for each kind of type

Can you do a 2-level groupby for enum and type?
For enum, you need to group by the result of a function that returns True if list and False if not a list.

In [246]:
# here it is! the first level of the groupby is enum or not, the second level is by datatype
gb = TK.ref_kwds.groupby([lambda x: isinstance(TK.ref_kwds.loc[x, 'enum'], list), 'type'], axis=0)

In [247]:
TK.ref_kwds.columns

Index(['calculation', 'default_value', 'description', 'example', 'fits_hdu',
       'fits_keyword', 'level', 'misc', 'mode', 'section', 'si', 'source',
       'sql_dtype', 'sw_source', 'title', 'type', 'units', 'comment_line',
       'destination', 'special_processing', 'enum', 'comments'],
      dtype='object')

In [248]:
rk = TK.ref_kwds[TK.ref_kwds['enum'].isna()]

In [257]:
gb = rk.groupby(['fits_hdu', 'type'])

In [258]:
gb.groups.keys()

dict_keys([('CHEBY', 'string'), ('FOUND_TARGETS', 'int'), ('FOUND_TARGETS', 'string'), ('GROUP', 'string'), ('GROUPDQ', 'integer'), ('GROUPDQ', 'string'), ('INT_TIMES', 'string'), ('MOVING_TARGET_POSITION', 'float'), ('MOVING_TARGET_POSITION', 'string'), ('MSA_TARG_ACQ', 'float'), ('MSA_TARG_ACQ', 'int'), ('MSA_TARG_ACQ', 'string'), ('PRIMARY', 'boolean'), ('PRIMARY', 'float'), ('PRIMARY', 'integer'), ('PRIMARY', 'string'), ('REFOUT', 'float'), ('REFOUT', 'integer'), ('REFOUT', 'string'), ('REMOVED_TARGETS', 'int'), ('REMOVED_TARGETS', 'string'), ('SCI', 'float'), ('SCI', 'integer'), ('SCI', 'string'), ('TARG_ACQ', 'float'), ('TARG_ACQ', 'int'), ('TARG_ACQ', 'string')])

In [262]:
sciint = gb.get_group(('SCI', 'integer'))

In [264]:
hdr = TK.active_headers['SCI']

In [266]:
sciint[['fits_keyword', 'type']]

Unnamed: 0,fits_keyword,type
bitpix_sci_image,BITPIX,integer
bzero_sci_image,BZERO,integer
extver_sci_image,EXTVER,integer
gcount_sci_image,GCOUNT,integer
naxis1_sci_image,NAXIS1,integer
naxis2_sci_image,NAXIS2,integer
naxis3_sci_image,NAXIS3,integer
naxis4_sci_image,NAXIS4,integer
naxis_sci_image,NAXIS,integer
pcount_sci_image,PCOUNT,integer


In [270]:
for i in sciint['fits_keyword']:
    isinstance(hdr[i], 'int')


TypeError: isinstance() arg 2 must be a type or tuple of types

In [273]:
type('integer', (), {})

__main__.integer