This notebook is used to pull out all the keywords, and their allowed values, that go into the MIRI coronagraphic data. 

This is defined here: https://archive.stsci.edu/jwst/keyword/latest/MIRICoronagraphKeywordsSchemaMetadata.html (see also https://archive.stsci.edu/jwst/keyword/latest/)

All the schema have been downloaded to the folder ./schema_folder. If necessary, you can re-download them using the links above.

In [1]:
import yaml
import glob

from pathlib import Path

# Get the keywords

First, load the top-level schema for coronagraphic data that contains *all* the keywords

In [2]:
fname = "./schema_folder/"+"top.miri.coron.schema.json"
with open(fname, 'r') as file:
     top_schema = yaml.load(file, Loader=yaml.SafeLoader)

In [3]:
print(top_schema)

{'type': 'object', 'title': 'root', 'properties': {'meta': {'title': 'MIRI Coronagraph Keywords Schema Metadata', 'type': 'object', 'properties': {'standard': {'title': 'Standard parameters', 'type': 'object', 'properties': {'$ref': 'standard.schema.json'}}, 'basic': {'title': 'Basic parameters', 'type': 'object', 'properties': {'$ref': 'core.basic.schema.json'}}, 'coordinates': {'title': 'Information about the coordinates in the file', 'type': 'object', 'properties': {'$ref': 'core.coordinates.schema.json'}}, 'program': {'title': 'Programmatic information', 'type': 'object', 'properties': {'$ref': 'core.program.schema.json'}}, 'observation': {'title': 'Observation identifiers', 'type': 'object', 'properties': {'allOf': [{'$ref': 'core.observation.schema.json'}, {'$ref': 'science.observation.schema.json'}, {'$ref': 'core.coron.schema.json'}]}}, 'visit': {'title': 'Visit information', 'type': 'object', 'properties': {'$ref': 'core.visit.schema.json'}}, 'target': {'title': 'Target inform

It turns out this file mostly contains references to other files. We'll have to go down the rabbit hole to get to the files where the keywords are actually defined.

The way this schema works is if you go to `top_schema['properties']['meta']['properties']`, you get a dictionary where each entry is a "section" of the file metadata (e.g. "Standard", "Coordinates", "Observation", "Program", "Aperture", etc.) and under each section the `'properties'` entry contains a dictionary of references to files. Every schema file is stored as the value of a dictionary with the corresponding key `$ref`.  
So basically all you have to do is traverse the json and find all the dictionary keywords `$ref`, and return the value of `dict['$ref']`.

In [4]:
subschema_all = top_schema['properties']['meta']['properties']

In [5]:
# here's an example of a complicated case
subschema_all['instrument']

{'title': 'Instrument configuration information',
 'type': 'object',
 'properties': {'allOf': [{'$ref': 'core.instrument.schema.json'},
   {'$ref': 'science.instrument.schema.json'},
   {'$ref': 'miri.filter.schema.json'},
   {'$ref': 'miri.all.instrument.schema.json'},
   {'$ref': 'miri.coron.schema.json'}]}}

In [6]:
def ref_generator(schema_dict):
    " this function traverses the json, looking for `$ref` keywords"
    if not isinstance(schema_dict, dict):
        return
    for k, v in schema_dict.items():
        if k == '$ref':
            yield v
        elif isinstance(v, dict):
            for id_val in ref_generator(v):
                yield id_val
        elif isinstance(v, list):
            for el in v:
                for id_val in ref_generator(el):
                    yield id_val
        else:
            pass

In [7]:
files = [i for i in ref_generator(subschema_all)]

Oh no! Some of these files also have their own internal file dependencies. Traverse them all!

In [8]:
sub_files = []
for file in files:
    with open(Path("./schema_folder/") / file) as ff:
        schema = yaml.load(ff, yaml.SafeLoader)
        sub_files.append([i for i in ref_generator(schema) if i])

In [9]:
from functools import reduce 

sub_files = reduce(lambda l,m: l+m, [i for i in sub_files if i])
files = files + sub_files

Finally, we have a list of all the files that store the keywords that go in the headers

In [10]:
print(f"There are {len(files)} schema files\n")
print('Schema files:\n'+'\n\t'.join(sorted(files)))


There are 43 schema files

Schema files:
all.dither.schema.json
	cheby.extension.schema.json
	core.aperture.schema.json
	core.association.schema.json
	core.background.schema.json
	core.basic.schema.json
	core.cal_step.schema.json
	core.coordinates.schema.json
	core.coron.schema.json
	core.ephemeris.schema.json
	core.exposure.schema.json
	core.instrument.schema.json
	core.observation.schema.json
	core.photometry.schema.json
	core.program.schema.json
	core.ref_file.schema.json
	core.resample.schema.json
	core.sc_pointing.schema.json
	core.subarray.schema.json
	core.table.extensions.schema.json
	core.targacq.msa.found.schema.json
	core.targacq.msa.removed.schema.json
	core.targacq.msa.schema.json
	core.target.acquisition.schema.json
	core.target.schema.json
	core.time.schema.json
	core.velocity_aberration.schema.json
	core.visit.schema.json
	core.wcs.schema.json
	group.extension.schema.json
	guidestar.schema.json
	integration.extension.schema.json
	miri.all.instrument.schema.json
	miri.ba

Something I have discovered is that the different files have different structures. However, I think I can still just traverse the json as before, looking for a few keywords and returning those and their corresponding values. The keywords are:
- fits_keyword : the keyword that goes in the fits header
- enum : the allowed values
- type : the datatype of the value
- special_processing : not really sure but seems useful?

In [11]:
files[0]

'standard.schema.json'

In [12]:
a = set(sorted(schema.keys()))
b = set(['title', 'type', 'properties'])

In [13]:
a.issubset(b) and b.issubset(a)

False

In [14]:
all_kwds = {}
for file in files:
    with open(Path("./schema_folder/") / file) as ff:
        schema = yaml.load(ff, yaml.SafeLoader)
    all_kwds.update(schema)

In [15]:
bad_keys = []
for k, v in all_kwds.items():
    if 'fits_keyword' not in v:
        bad_keys.append(k)
for k in bad_keys:
    all_kwds.pop(k)
# for k, v in all_kwds.items():
#     print(k, v['fits_keyword'])

In [16]:
with open("./all_coron_kwds.json", 'wt') as ff:
    yaml.dump(all_kwds, ff)

Now I have all the keywords! Mwahahahahaha. Time to write tests.