Merge pull request #41 from pkgw/pipeline-fixes

Some pipeline fixes for @astrodavid10
WorldWideTelescope · Dec 9, 2020 · f083551 · f083551
2 parents cd1bf57 + 483b102
commit f083551
Show file tree

Hide file tree

Showing 9 changed files with 119 additions and 34 deletions.
diff --git a/README.md b/README.md
@@ -78,7 +78,7 @@ and [PyPI](https://pypi.org/project/toasty/#history).
 - [pytest] to run the test suite
 - [PyYAML]
 - [tqdm]
-- [wwt_data_formats]
+- [wwt_data_formats] >= 0.7
 
 [astropy]: https://www.astropy.org/
 [azure-storage-blob]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/storage/azure-storage-blob

diff --git a/docs/api/toasty.image.Image.rst b/docs/api/toasty.image.Image.rst
@@ -10,10 +10,12 @@ Image
 
    .. autosummary::
 
+      ~Image.default_format
       ~Image.dtype
       ~Image.height
       ~Image.mode
       ~Image.shape
+      ~Image.wcs
       ~Image.width
 
    .. rubric:: Methods Summary
@@ -32,10 +34,12 @@ Image
 
    .. rubric:: Attributes Documentation
 
+   .. autoattribute:: default_format
    .. autoattribute:: dtype
    .. autoattribute:: height
    .. autoattribute:: mode
    .. autoattribute:: shape
+   .. autoattribute:: wcs
    .. autoattribute:: width
 
    .. rubric:: Methods Documentation

diff --git a/docs/api/toasty.image.ImageMode.rst b/docs/api/toasty.image.ImageMode.rst
@@ -12,6 +12,7 @@ ImageMode
 
       ~ImageMode.F16x3
       ~ImageMode.F32
+      ~ImageMode.F64
       ~ImageMode.RGB
       ~ImageMode.RGBA
 
@@ -26,6 +27,7 @@ ImageMode
 
    .. autoattribute:: F16x3
    .. autoattribute:: F32
+   .. autoattribute:: F64
    .. autoattribute:: RGB
    .. autoattribute:: RGBA
 

diff --git a/docs/cli/pipeline-approve.rst b/docs/cli/pipeline-approve.rst
@@ -16,7 +16,10 @@ Usage
    toasty pipeline approve [--workdir=WORKDIR] {IMAGE-IDs...}
 
 The ``IMAGE-IDs`` argument specifies one or more images by their unique
-identifiers.
+identifiers. You can specify exact ID’s, or `glob patterns`_ as processed by the
+Python ``fnmatch`` module. See examples below.
+
+.. _glob patterns: https://docs.python.org/3/library/fnmatch.html#module-fnmatch
 
 The ``WORKDIR`` argument optionally specifies the location of the pipeline
 workspace directory. The default is the current directory.
@@ -26,27 +29,46 @@ Example
 =======
 
 Before approving an image, it should be validated. First, check the astrometry
-with the help of ``wwtdatatool`` command:
+with the help of ``wwtdatatool`` command. To check a group of images all at once,
+it can be convenient to merge the individual image files into a temporary index:
 
 .. code-block:: shell
 
-   wwtdatatool serve processed/noao0201b/
-   [open up http://localhost:8080/index.wtml in the webclient, review]
+   wwtdatatool wtml merge processed/*/index_rel.wtml processed/index_rel.wtml
+   wwtdatatool preview processed/index_rel.wtml
+
+(Change the forward slashes to backslashes if you’re using Windows.) The first
+command merges the individual image WTMLs into a new file,
+``processed/index_rel.wtml``. The second command opens up this combined file in
+the WWT webclient, running an internal webserver to make the data available.
 
 Next, get a metadata report and check for any issues:
 
 .. code-block:: shell
 
    wwtdatatool wtml report processed/noao0201b/index_rel.wtml
 
-If everything is OK, the image may be approved:
+If everything is OK, you can mark the image as approved:
 
 .. code-block:: shell
 
    toasty pipeline approve noao0201b
 
+You can use `glob patterns`_ to match image names. For instance,
+
+.. code-block:: shell
+
+   toasty pipeline approve "vla*20" "?vlba"
+
+will match every processed image whose identifier begins with ``vla`` and ends
+with ``20``, as well as those whose names are exactly four letters long and end
+with ``vlba``. You generally must make sure to encase glob arguments in
+quotation marks, as shown above, to prevent your shell from attempting to
+process them before Toasty gets a chance to.
+
 After approval of a batch of images, the next step is to :ref:`cli-pipeline-publish`.
 
+
 Notes
 =====
 

diff --git a/docs/cli/pipeline-fetch.rst b/docs/cli/pipeline-fetch.rst
@@ -16,7 +16,10 @@ Usage
    toasty pipeline fetch [--workdir=WORKDIR] {IMAGE-IDs...}
 
 The ``IMAGE-IDs`` argument specifies one or more images by their unique
-identifiers.
+identifiers. You can specify exact ID’s, or `glob patterns`_ as processed by the
+Python ``fnmatch`` module. See examples below.
+
+.. _glob patterns: https://docs.python.org/3/library/fnmatch.html#module-fnmatch
 
 The ``WORKDIR`` argument optionally specifies the location of the pipeline
 workspace directory. The default is the current directory.
@@ -34,15 +37,37 @@ Fetch two images:
 After fetching, the next step is to :ref:`cli-pipeline-process-todos`.
 
 
+Example
+=======
+
+You can use `glob patterns`_ to match candidate names. For instance,
+
+.. code-block:: shell
+
+   toasty pipeline fetch "rubin-*" "soar?"
+
+will match every candidate whose name begins with ``rubin-``,  as well as those
+whose names are exactly five letters long and start with ``soar``. You generally
+must make sure to encase glob arguments in quotation marks, as shown above, to
+prevent your shell from attempting to process them before Toasty gets a chance
+to.
+
+
 Notes
 =====
 
 Candidate names may be found by looking at the filenames contained in the
 ``candidates`` subdirectory of your workspace.
 
-For each candidate that is successfully fetched, a sub-subdirectory is created
-in the ``cache_todo`` subdirectory with a name corresponding to the unique
-candidate ID.
+During the fetch process, the candidates are analyzed. Some of them may be
+deemed “not actionable” — a common reason being that an image may not have
+sufficient astrometric information attached for it to be placed on the sky as
+WWT requires. Such candidates will be discarded, with their information files
+moved into the ``rejects`` subdirectory.
+
+For each candidate that is successfully fetched and validated, a
+sub-subdirectory is created in the ``cache_todo`` subdirectory with a name
+corresponding to the unique candidate ID.
 
 
 See Also

diff --git a/docs/pipeline.rst b/docs/pipeline.rst
@@ -39,13 +39,13 @@ command-line program.
 Configuration
 =============
 
-The root of the *destionation* data repository should contain a configuration
+The root of the *destination* data repository should contain a configuration
 file named ``toasty-pipeline-config.yaml``. Once a pipeline workflow is set up,
 you shouldn’t need to worry about this file. But to get a new pipeline going,
 you need to create it and then place it in your data destination.
 
-As implied, this file contains structured data in the `YAML
-<https://yaml.org/>`_ format. An example is:
+This file contains structured data in the `YAML <https://yaml.org/>`_ format. An
+example is:
 
 .. code-block:: YAML
 
@@ -72,7 +72,7 @@ Djangoplicity Data Source
 Currently, the only functional ``source_type`` is ``djangoplicity``, which
 downloads and parses an imagery feed from a website powered by the the
 `Djangoplicity <https://github.com/djangoplicity/djangoplicity>`_ gallery
-system. An example is the `ESO Hubble gallery
+system. An example is the `ESA Hubble gallery
 <https://spacetelescope.org/images/>`_.
 
 When using the ``djangoplicity`` data source, the ``toasty-pipeline-config.yaml``

diff --git a/setup.py b/setup.py
@@ -78,7 +78,7 @@ def get_long_desc():
         'pillow>=7.0',
         'PyYAML>=5.0',
         'tqdm>=4.0',
-        'wwt_data_formats>=0.2.0',
+        'wwt_data_formats>=0.7.0',
     ],
 
     extras_require = {

diff --git a/toasty/pipeline/cli.py b/toasty/pipeline/cli.py
@@ -12,13 +12,43 @@
 '''.split()
 
 import argparse
+from fnmatch import fnmatch
+import glob
 import os.path
 import sys
 
 from ..cli import die, warn
 from . import NotActionableError
 
 
+def evaluate_imageid_args(searchdir, args):
+    """
+    Figure out which image-ID's to process.
+    """
+
+    matched_ids = set()
+    globs_todo = set()
+
+    for arg in args:
+        if glob.has_magic(arg):
+            globs_todo.add(arg)
+        else:
+            # If an ID is explicitly (non-gobbily) added, always add it to the
+            # list, without checking if it exists in `searchdir`. We could check
+            # for it in searchdir now, but we'll have to check later anyway, so
+            # we don't bother.
+            matched_ids.add(arg)
+
+    if len(globs_todo):
+        for filename in os.listdir(searchdir):
+            for g in globs_todo:
+                if fnmatch(filename, g):
+                    matched_ids.add(filename)
+                    break
+
+    return sorted(matched_ids)
+
+
 # The "approve" subcommand
 
 def approve_setup_parser(parser):
@@ -31,8 +61,8 @@ def approve_setup_parser(parser):
     parser.add_argument(
         'cand_ids',
         nargs = '+',
-        metavar = 'CAND-ID',
-        help = 'Name(s) of candidate(s) to approve and prepare for processing'
+        metavar = 'IMAGE-ID',
+        help = 'Name(s) of image(s) to approve for publication (globs accepted)'
     )
 
 
@@ -51,7 +81,7 @@ def approve_impl(settings):
     proc_dir = mgr._ensure_dir('processed')
     app_dir = mgr._ensure_dir('approved')
 
-    for cid in settings.cand_ids:
+    for cid in evaluate_imageid_args(proc_dir, settings.cand_ids):
         if not os.path.isdir(os.path.join(proc_dir, cid)):
             die(f'no such processed candidate ID {cid!r}')
 
@@ -90,7 +120,7 @@ def fetch_setup_parser(parser):
         'cand_ids',
         nargs = '+',
         metavar = 'CAND-ID',
-        help = 'Name(s) of candidate(s) to fetch and prepare for processing'
+        help = 'Name(s) of candidate(s) to fetch and prepare for processing (globs accepted)'
     )
 
 
@@ -102,25 +132,27 @@ def fetch_impl(settings):
     rej_dir = mgr._ensure_dir('rejects')
     src = mgr.get_image_source()
 
-    for cid in settings.cand_ids:
+    for cid in evaluate_imageid_args(cand_dir, settings.cand_ids):
+        # Funky structure here is to try to ensure that cdata is closed in case
+        # a NotActionable happens, so that we can move the directory on Windows.
         try:
-            cdata = open(os.path.join(cand_dir, cid), 'rb')
-        except FileNotFoundError:
-            die(f'no such candidate ID {cid!r}')
-
-        print(f'fetching {cid} ... ', end='')
-        sys.stdout.flush()
+            try:
+                cdata = open(os.path.join(cand_dir, cid), 'rb')
+            except FileNotFoundError:
+                die(f'no such candidate ID {cid!r}')
 
-        try:
-            cachedir = mgr._ensure_dir('cache_todo', cid)
-            src.fetch_candidate(cid, cdata, cachedir)
-            print('done')
+            try:
+                print(f'fetching {cid} ... ', end='')
+                sys.stdout.flush()
+                cachedir = mgr._ensure_dir('cache_todo', cid)
+                src.fetch_candidate(cid, cdata, cachedir)
+                print('done')
+            finally:
+                cdata.close()
         except NotActionableError:
             print('not usable')
             os.rename(os.path.join(cand_dir, cid), os.path.join(rej_dir, cid))
             os.rmdir(cachedir)
-        finally:
-            cdata.close()
 
 
 # The "init" subcommand

diff --git a/toasty/tests/test_pipeline.py b/toasty/tests/test_pipeline.py
@@ -89,7 +89,7 @@ def test_workflow(self):
         args = [
             'pipeline', 'fetch',
             '--workdir', self.work_path('work'),
-            'fake_test1',
+            'fake_test1', '*nomatchisok*',
         ]
         cli.entrypoint(args)
 
@@ -102,7 +102,7 @@ def test_workflow(self):
         args = [
             'pipeline', 'approve',
             '--workdir', self.work_path('work'),
-            'fake_test1',
+            'fake_test1', 'fake_test?',
         ]
         cli.entrypoint(args)