Skip to content

Commit

Permalink
Merge pull request #41 from pkgw/pipeline-fixes
Browse files Browse the repository at this point in the history
Some pipeline fixes for @astrodavid10
  • Loading branch information
pkgw committed Dec 9, 2020
2 parents cd1bf57 + 483b102 commit f083551
Show file tree
Hide file tree
Showing 9 changed files with 119 additions and 34 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ and [PyPI](https://pypi.org/project/toasty/#history).
- [pytest] to run the test suite
- [PyYAML]
- [tqdm]
- [wwt_data_formats]
- [wwt_data_formats] >= 0.7

[astropy]: https://www.astropy.org/
[azure-storage-blob]: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/storage/azure-storage-blob
Expand Down
4 changes: 4 additions & 0 deletions docs/api/toasty.image.Image.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,12 @@ Image

.. autosummary::

~Image.default_format
~Image.dtype
~Image.height
~Image.mode
~Image.shape
~Image.wcs
~Image.width

.. rubric:: Methods Summary
Expand All @@ -32,10 +34,12 @@ Image

.. rubric:: Attributes Documentation

.. autoattribute:: default_format
.. autoattribute:: dtype
.. autoattribute:: height
.. autoattribute:: mode
.. autoattribute:: shape
.. autoattribute:: wcs
.. autoattribute:: width

.. rubric:: Methods Documentation
Expand Down
2 changes: 2 additions & 0 deletions docs/api/toasty.image.ImageMode.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ ImageMode

~ImageMode.F16x3
~ImageMode.F32
~ImageMode.F64
~ImageMode.RGB
~ImageMode.RGBA

Expand All @@ -26,6 +27,7 @@ ImageMode

.. autoattribute:: F16x3
.. autoattribute:: F32
.. autoattribute:: F64
.. autoattribute:: RGB
.. autoattribute:: RGBA

Expand Down
32 changes: 27 additions & 5 deletions docs/cli/pipeline-approve.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,10 @@ Usage
toasty pipeline approve [--workdir=WORKDIR] {IMAGE-IDs...}
The ``IMAGE-IDs`` argument specifies one or more images by their unique
identifiers.
identifiers. You can specify exact ID’s, or `glob patterns`_ as processed by the
Python ``fnmatch`` module. See examples below.

.. _glob patterns: https://docs.python.org/3/library/fnmatch.html#module-fnmatch

The ``WORKDIR`` argument optionally specifies the location of the pipeline
workspace directory. The default is the current directory.
Expand All @@ -26,27 +29,46 @@ Example
=======

Before approving an image, it should be validated. First, check the astrometry
with the help of ``wwtdatatool`` command:
with the help of ``wwtdatatool`` command. To check a group of images all at once,
it can be convenient to merge the individual image files into a temporary index:

.. code-block:: shell
wwtdatatool serve processed/noao0201b/
[open up http://localhost:8080/index.wtml in the webclient, review]
wwtdatatool wtml merge processed/*/index_rel.wtml processed/index_rel.wtml
wwtdatatool preview processed/index_rel.wtml
(Change the forward slashes to backslashes if you’re using Windows.) The first
command merges the individual image WTMLs into a new file,
``processed/index_rel.wtml``. The second command opens up this combined file in
the WWT webclient, running an internal webserver to make the data available.

Next, get a metadata report and check for any issues:

.. code-block:: shell
wwtdatatool wtml report processed/noao0201b/index_rel.wtml
If everything is OK, the image may be approved:
If everything is OK, you can mark the image as approved:

.. code-block:: shell
toasty pipeline approve noao0201b
You can use `glob patterns`_ to match image names. For instance,

.. code-block:: shell
toasty pipeline approve "vla*20" "?vlba"
will match every processed image whose identifier begins with ``vla`` and ends
with ``20``, as well as those whose names are exactly four letters long and end
with ``vlba``. You generally must make sure to encase glob arguments in
quotation marks, as shown above, to prevent your shell from attempting to
process them before Toasty gets a chance to.

After approval of a batch of images, the next step is to :ref:`cli-pipeline-publish`.


Notes
=====

Expand Down
33 changes: 29 additions & 4 deletions docs/cli/pipeline-fetch.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,10 @@ Usage
toasty pipeline fetch [--workdir=WORKDIR] {IMAGE-IDs...}
The ``IMAGE-IDs`` argument specifies one or more images by their unique
identifiers.
identifiers. You can specify exact ID’s, or `glob patterns`_ as processed by the
Python ``fnmatch`` module. See examples below.

.. _glob patterns: https://docs.python.org/3/library/fnmatch.html#module-fnmatch

The ``WORKDIR`` argument optionally specifies the location of the pipeline
workspace directory. The default is the current directory.
Expand All @@ -34,15 +37,37 @@ Fetch two images:
After fetching, the next step is to :ref:`cli-pipeline-process-todos`.


Example
=======

You can use `glob patterns`_ to match candidate names. For instance,

.. code-block:: shell
toasty pipeline fetch "rubin-*" "soar?"
will match every candidate whose name begins with ``rubin-``, as well as those
whose names are exactly five letters long and start with ``soar``. You generally
must make sure to encase glob arguments in quotation marks, as shown above, to
prevent your shell from attempting to process them before Toasty gets a chance
to.


Notes
=====

Candidate names may be found by looking at the filenames contained in the
``candidates`` subdirectory of your workspace.

For each candidate that is successfully fetched, a sub-subdirectory is created
in the ``cache_todo`` subdirectory with a name corresponding to the unique
candidate ID.
During the fetch process, the candidates are analyzed. Some of them may be
deemed “not actionable” — a common reason being that an image may not have
sufficient astrometric information attached for it to be placed on the sky as
WWT requires. Such candidates will be discarded, with their information files
moved into the ``rejects`` subdirectory.

For each candidate that is successfully fetched and validated, a
sub-subdirectory is created in the ``cache_todo`` subdirectory with a name
corresponding to the unique candidate ID.


See Also
Expand Down
8 changes: 4 additions & 4 deletions docs/pipeline.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,13 +39,13 @@ command-line program.
Configuration
=============

The root of the *destionation* data repository should contain a configuration
The root of the *destination* data repository should contain a configuration
file named ``toasty-pipeline-config.yaml``. Once a pipeline workflow is set up,
you shouldn’t need to worry about this file. But to get a new pipeline going,
you need to create it and then place it in your data destination.

As implied, this file contains structured data in the `YAML
<https://yaml.org/>`_ format. An example is:
This file contains structured data in the `YAML <https://yaml.org/>`_ format. An
example is:

.. code-block:: YAML
Expand All @@ -72,7 +72,7 @@ Djangoplicity Data Source
Currently, the only functional ``source_type`` is ``djangoplicity``, which
downloads and parses an imagery feed from a website powered by the the
`Djangoplicity <https://github.com/djangoplicity/djangoplicity>`_ gallery
system. An example is the `ESO Hubble gallery
system. An example is the `ESA Hubble gallery
<https://spacetelescope.org/images/>`_.

When using the ``djangoplicity`` data source, the ``toasty-pipeline-config.yaml``
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ def get_long_desc():
'pillow>=7.0',
'PyYAML>=5.0',
'tqdm>=4.0',
'wwt_data_formats>=0.2.0',
'wwt_data_formats>=0.7.0',
],

extras_require = {
Expand Down
66 changes: 49 additions & 17 deletions toasty/pipeline/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,43 @@
'''.split()

import argparse
from fnmatch import fnmatch
import glob
import os.path
import sys

from ..cli import die, warn
from . import NotActionableError


def evaluate_imageid_args(searchdir, args):
"""
Figure out which image-ID's to process.
"""

matched_ids = set()
globs_todo = set()

for arg in args:
if glob.has_magic(arg):
globs_todo.add(arg)
else:
# If an ID is explicitly (non-gobbily) added, always add it to the
# list, without checking if it exists in `searchdir`. We could check
# for it in searchdir now, but we'll have to check later anyway, so
# we don't bother.
matched_ids.add(arg)

if len(globs_todo):
for filename in os.listdir(searchdir):
for g in globs_todo:
if fnmatch(filename, g):
matched_ids.add(filename)
break

return sorted(matched_ids)


# The "approve" subcommand

def approve_setup_parser(parser):
Expand All @@ -31,8 +61,8 @@ def approve_setup_parser(parser):
parser.add_argument(
'cand_ids',
nargs = '+',
metavar = 'CAND-ID',
help = 'Name(s) of candidate(s) to approve and prepare for processing'
metavar = 'IMAGE-ID',
help = 'Name(s) of image(s) to approve for publication (globs accepted)'
)


Expand All @@ -51,7 +81,7 @@ def approve_impl(settings):
proc_dir = mgr._ensure_dir('processed')
app_dir = mgr._ensure_dir('approved')

for cid in settings.cand_ids:
for cid in evaluate_imageid_args(proc_dir, settings.cand_ids):
if not os.path.isdir(os.path.join(proc_dir, cid)):
die(f'no such processed candidate ID {cid!r}')

Expand Down Expand Up @@ -90,7 +120,7 @@ def fetch_setup_parser(parser):
'cand_ids',
nargs = '+',
metavar = 'CAND-ID',
help = 'Name(s) of candidate(s) to fetch and prepare for processing'
help = 'Name(s) of candidate(s) to fetch and prepare for processing (globs accepted)'
)


Expand All @@ -102,25 +132,27 @@ def fetch_impl(settings):
rej_dir = mgr._ensure_dir('rejects')
src = mgr.get_image_source()

for cid in settings.cand_ids:
for cid in evaluate_imageid_args(cand_dir, settings.cand_ids):
# Funky structure here is to try to ensure that cdata is closed in case
# a NotActionable happens, so that we can move the directory on Windows.
try:
cdata = open(os.path.join(cand_dir, cid), 'rb')
except FileNotFoundError:
die(f'no such candidate ID {cid!r}')

print(f'fetching {cid} ... ', end='')
sys.stdout.flush()
try:
cdata = open(os.path.join(cand_dir, cid), 'rb')
except FileNotFoundError:
die(f'no such candidate ID {cid!r}')

try:
cachedir = mgr._ensure_dir('cache_todo', cid)
src.fetch_candidate(cid, cdata, cachedir)
print('done')
try:
print(f'fetching {cid} ... ', end='')
sys.stdout.flush()
cachedir = mgr._ensure_dir('cache_todo', cid)
src.fetch_candidate(cid, cdata, cachedir)
print('done')
finally:
cdata.close()
except NotActionableError:
print('not usable')
os.rename(os.path.join(cand_dir, cid), os.path.join(rej_dir, cid))
os.rmdir(cachedir)
finally:
cdata.close()


# The "init" subcommand
Expand Down
4 changes: 2 additions & 2 deletions toasty/tests/test_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ def test_workflow(self):
args = [
'pipeline', 'fetch',
'--workdir', self.work_path('work'),
'fake_test1',
'fake_test1', '*nomatchisok*',
]
cli.entrypoint(args)

Expand All @@ -102,7 +102,7 @@ def test_workflow(self):
args = [
'pipeline', 'approve',
'--workdir', self.work_path('work'),
'fake_test1',
'fake_test1', 'fake_test?',
]
cli.entrypoint(args)

Expand Down

0 comments on commit f083551

Please sign in to comment.