Fixes and features needed for SELENE Kaguya TC data import #63

pkgw · 2021-10-13T18:58:07Z

These are a bunch of improvements needed to handle the Kaguya dataset.

Add a simple U8 image mode
Add support for reading JPEG2000 files in chunked format (using glymur)
Add support for chunked plate-carree TOAST sampling
Add toasty transform u8-to-rgb
Add support for transforming into a different tile pyramid tree (e.g., NPY and JPG in different hierarchies)
Removed unneeded "dispatcher" threads from multiprocessing operations
Fix race conditions in large-scale multiprocessing jobs
Don't create directories everywhere when cleaning up lockfiles
Use a lockfile approach that works in multi-host (HPC) contexts
Add proper support for the "planetary" TOAST coordinate system, which is rotated 180° in longitude
Add some helpful TOAST TIle APIs
Add support for filtering out subtrees when generating a TOAST hierarchy
Add support for a filtered TOAST sampling operation
Massively clean up the "transform" infrastructure for, e.g., float-to-RGB conversions

With a slight reordering of the built-in level 1 tiles so that we can recurse simply.

This will allow us to be a lot more efficient when doing chunked TOAST samplings.

This is for the Kaguya lunar dataset. We're a bit sloppy here by using 0 as the "mask" value, but it's a reasonable approach. This patch also includes a few fixups to handle the F64 mode in the same way as F32, where it wasn't added to all of the logic.

Fortunately, all we need to do is spin things by 180 degrees in longitude.

This can be used with a chunked image to do a TOAST tiling from something that's too large to fit in memory.

…b mode

… tree

I need to propagate this fix to other multiprocessing tasks too.

For some reason I can't get a reference to `toasty.image.SUPPORTED_FORMATS` to work as a Sphinx `:data:...` reference?

... which actually introduces some dead code here. Oh well.

This turned out to be a bit of a silly idea, since the main thread can just do the dispatch work. Also make sure that we handle the done_event without race conditions.

…in a nice API

codecov · 2021-10-13T19:03:09Z

Codecov Report

Merging #63 (cd16179) into master (bf50952) will increase coverage by 1.65%.
The diff coverage is 64.12%.

❗ Current head cd16179 differs from pull request most recent head 381e53f. Consider uploading reports for the commit 381e53f to get more accurate results

@@            Coverage Diff             @@
##           master      #63      +/-   ##
==========================================
+ Coverage   73.11%   74.77%   +1.65%     
==========================================
  Files          21       22       +1     
  Lines        2738     3037     +299     
==========================================
+ Hits         2002     2271     +269     
- Misses        736      766      +30

Impacted Files	Coverage Δ
toasty/multi_wcs.py	`65.16% <8.33%> (+65.16%)`	⬆️
toasty/image.py	`75.81% <39.13%> (+1.35%)`	⬆️
toasty/cli.py	`83.03% <52.17%> (-2.36%)`	⬇️
toasty/transform.py	`41.90% <53.12%> (-12.51%)`	⬇️
toasty/samplers.py	`69.84% <54.43%> (-10.16%)`	⬇️
toasty/toast.py	`80.98% <65.85%> (-17.89%)`	⬇️
toasty/collection.py	`51.45% <66.66%> (-0.03%)`	⬇️
toasty/jpeg2000.py	`83.33% <83.33%> (ø)`
toasty/multi_tan.py	`78.57% <91.66%> (+0.04%)`	⬆️
toasty/merge.py	`94.40% <94.44%> (ø)`
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3bab4f7...381e53f. Read the comment docs.

We weren't correctly updating Images when they were originally read in by PIL, because we'd write out the old PIL object and not the updated array. I thought that the fact that we would get a read-only Numpy array would prevent such issues, but apparently not? We may need a more generic API to ensure that such bugs don't reoccur, but there are some specific cases where it will always be correct to clear the PIL data.

My initial code was hardcoded for the U8 image format, but we should be better than that. Unfortunately I haven't tested whether it still works for U8 images. This is a signpost that the TOAST sampling code should be migrated to use an Image buffer in general, but I won't do that here.

pkgw added 20 commits October 5, 2021 15:34

toasty/pyramid.py: don't create every directory when removing lockfiles

7ea2894

toasty/toast.py: add create_single_tile()

811a64e

With a slight reordering of the built-in level 1 tiles so that we can recurse simply.

toasty/toast.py: add toast_tile_get_coords()

298c987

toasty/toast.py: add generate_tiles_filtered

5ce6af7

This will allow us to be a lot more efficient when doing chunked TOAST samplings.

toasty/image.py: add a U8 mode

5b25821

This is for the Kaguya lunar dataset. We're a bit sloppy here by using 0 as the "mask" value, but it's a reasonable approach. This patch also includes a few fixups to handle the F64 mode in the same way as F32, where it wasn't added to all of the logic.

toasty/jpeg2000.py: add basic support for reading chunked JPEG2000 files

337776c

toasty/toasty.py: add count_tiles_matching_filter()

ce5fabe

toasty/samplers.py: add the chunked plate-carree sampler

c98f7a3

toasty/toast.py: add support for the planetary TOAST coordinate system

0f88032

Fortunately, all we need to do is spin things by 180 degrees in longitude.

toasty/toast.py: add the filtered sampler approach

ad8052d

This can be used with a chunked image to do a TOAST tiling from something that's too large to fit in memory.

toasty/transform.py: avoid race conditions in parallel transformation

4910752

toasty/transform.py: better naming for our functions that assume RGBA

a398b0b

toasty/transform.py: rework transform infrastructure and add u8-to-rg…

db7f610

…b mode

toasty/transform.py: add ability to transform into a separate pyramid…

aad5dc9

… tree

toasty/pyramid.py: need to use SoftFileLock in HPC context

fef5eeb

toasty/transform.py: fix dumb race condition in multiprocessing approach

c5bf5ab

I need to propagate this fix to other multiprocessing tasks too.

Fix up docs build for recent changes

396d44d

For some reason I can't get a reference to `toasty.image.SUPPORTED_FORMATS` to work as a Sphinx `:data:...` reference?

Get the test suite passing again

5c85c1f

... which actually introduces some dead code here. Oh well.

Tidy up "dispatcher"-based multiprocessors

95739b9

This turned out to be a bit of a silly idea, since the main thread can just do the dispatch work. Also make sure that we handle the done_event without race conditions.

toasty/toast.py: finish exposing the filtered sampling functionality …

7293e68

…in a nice API

pkgw added 5 commits October 13, 2021 15:13

Fix up exit logic in multi-TAN and multi-WCS tiling

f2343d6

Fix up some naming mistakes

d31cd73

toasty/tests/test_toast.py: add a chunk planetary JPEG2000 test

381e53f

pkgw merged commit db23ba9 into WorldWideTelescope:master Oct 14, 2021

pkgw deleted the kaguya branch October 14, 2021 13:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes and features needed for SELENE Kaguya TC data import #63

Fixes and features needed for SELENE Kaguya TC data import #63

pkgw commented Oct 13, 2021 •

edited

Loading

codecov bot commented Oct 13, 2021 •

edited

Loading

Fixes and features needed for SELENE Kaguya TC data import #63

Fixes and features needed for SELENE Kaguya TC data import #63

Conversation

pkgw commented Oct 13, 2021 • edited Loading

codecov bot commented Oct 13, 2021 • edited Loading

Codecov Report

pkgw commented Oct 13, 2021 •

edited

Loading

codecov bot commented Oct 13, 2021 •

edited

Loading