-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardize header metadata. #10
Conversation
0486e10
to
a4e72c8
Compare
Metadata still missing: band or filter, exposure duration (grab independently instead of doing end - start if feasible), some kind of processed/reduced flag (probably either "yes" or "unknown") |
Added filter and exposure time for Added test dataset to get an overview of where functionality is right now. Coverage right now is about 50/50 of the dataset you compiled. Images in the dataset have been cropped for size but the WCS values have been shifted to maintain positional accuracy (approximately). I started gnawing at getting more instruments supported today but nothing that I would like to commit quite yet. I think our discussion was on-point and that whether we want or no we will have to create a |
9d73c83
to
7ee1de1
Compare
Ok, this is now getting much better. I think someone may take a look at what is in this code and not feel offended. Added header standardizer classes, essentially maps between what we want in our DB and what we can find in particular instrument's header Processors are essentially recipes of how particular file should be processed. For example, a compressed archive needs to be uncompressed, "unarchived" then each file needs to be processed; but is this particular file Rubin calexp - then the image is only in 1st HDU - or is it a DECam NOAO image - then all HDUs have headers and data - is it an SDSS.... To do:
|
196da71
to
3a02a54
Compare
3a02a54
to
92781e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a marathon of a review! Impressive work, let me know if any of my comments are unclear. I trust you to merge when you think you've addressed them.
trail/upload/process_uploads/processors/single_extension_fits.py
Outdated
Show resolved
Hide resolved
Attempt to use astro_metadata_translator and Astropy WCS to standardize the FITS metadata we want to push to our DB.
Separate image center pixel, image corner pixel and radius (between the two) into separate values and store those. TODO: store the actual WCS as a blob data and move on to astro metadata translator.
Fix the Django metadata model.
Prettify code. Allow for future expansion of processing, potentially, compressed, fits archives. Create small and large thumbnails for gallery. Add some extra practical functionality to TemporaryUploadedFile. archives
Fixes for PR review comments. Add a minimal test dataset. Add tests. Fix for naive timezone DB error. Fix for table migration error. Remove unneccessary (I think) migration directories from apps that don't do any database interaction.
Add Processors - classes that understand the layout of differet FITS files and how they should be processed. Add Standardizers- classes that understand the different types of HDUs and are extract our database model keys from them. Add dataclasses - wider adoption unclear but they do make life significantly easier when it comes to comparing and validating various collections of standardized keys. Fix the rest of the upload app to interface the new functionality. Add tests. Add docstring.
Fix the ugly test data yaml file. Replace the DECam test data with newer one that will more accurately represent DECam data. Fix a slight dataclass issue once we changed the format of the test dataset yaml file.
Move SingleExtensionFits and MultiExtensionFits to processors. Move primary header stuff to FitsProcessor as common pattern. Fix minor problems with priority values. Move FitsProcessor to UploadProcessor level.
Get rid of the long and short output format. Simplify dataclasses. Fix tests, fix comment.
Storing multi-ext FITS files is complicated because it's not trivial to unravel them (leaving potentially many images without trails) and it's not easy to store only the relevant data. As was discussed, we store the whole file, and live with a little bit of non-user-friendliness. Fix the poor naming scheme of UploadProcessor attributes, what was `upload` is `fileWrapper`, what is `upload` now is the UploadInfo model. Add UploadInfo model, which can be used to track what IP uploaded which data and when. Reset migration history. Refactor the view and intiial processing code, punting much of the functionality into the UploadProcessor. Replace metadata_translator_name column by standardizer and processor name columns. Add special naming convention for astro_metadata_translator classes. Default the rest to their class names. Fix tests.
Merge standardized dataclasses functionality with database models. Add thumbnails class in preparation for gallery upgrade to querying the database for images. Make processed thumbnail images grayscale instead of viridis. Minor refactoring of processors and standardizers. Got rid of unused methods, transitioned from returning dictionaries to returning model classes. Fix tests.
92781e0
to
70fb3d6
Compare
Ok, fixed the last comments and added tests for Models. The remainder of issues here were punted other issues in which we will tackle them. Mainly all is well except: we don't have good error handling, data update/replace/remove logic and MOA-II WCS is a placeholder pending error handling, or getting Astrometry.net solver working. Looking for any last comments? |
Understood. Let's get it merged so we can find fun new ways to break things 😄 |
This is a first attempt at taking the various varied FITS headers and standardizing a select number of header keywords that we can use in our header metadata DB table.
The header keywords we standardize on here will set our table schema and our query tool interface but there are some outstanding issues with standardizing the WCSs.
If we can not easily separate WCS data into a standardized set of columns querying that table will be much harder but it should still be possible to save a subset of keywords (center pixel values and coordinates) and the whole WCS as a pickled blob.
This would not be optimal, as optimal as storing all of these values separately, I suspect. Ergo, this draft PR to give some insight into what the difficulties are.