Skip to content

Commit

Permalink
Merge pull request #671 from azavea/rde/feature/parallelize
Browse files Browse the repository at this point in the history
Parallelization of commands (CHIP and PREDICT)
  • Loading branch information
lewfish committed Mar 25, 2019
2 parents 1c869ce + 2c6e560 commit 091feb9
Show file tree
Hide file tree
Showing 105 changed files with 2,088 additions and 942 deletions.
2 changes: 2 additions & 0 deletions .codecov.yml
Expand Up @@ -2,3 +2,5 @@ coverage:
status:
project: off
patch: off
comment:
behavior: new
2 changes: 2 additions & 0 deletions .travis.yml
Expand Up @@ -11,6 +11,8 @@ env:
- CLEAN_TRAVIS_TAG=${TRAVIS_TAG/[[:space:]]/}
- COMMIT=${CLEAN_TRAVIS_TAG:-${TRAVIS_COMMIT:0:7}}

if: (type = pull_request) OR (tag IS present) OR (branch = develop) OR (branch =~ /^feature.*/)

script:
- .travis/build
- .travis/test
Expand Down
3 changes: 3 additions & 0 deletions docs/CONTRIBUTING.rst
Expand Up @@ -5,6 +5,9 @@ We are happy to take contributions! It is best to get in touch with the maintain
about larger features or design changes *before* starting the work,
as it will make the process of accepting changes smoother.

Contributor License Agreement (CLA)
-----------------------------------

Everyone who contributes code to Raster Vision will be asked to sign the
Azavea CLA, which is based off of the Apache CLA.

Expand Down
4 changes: 3 additions & 1 deletion docs/changelog.rst
Expand Up @@ -16,11 +16,13 @@ Raster Vision 0.9.0
- Decrease semseg memory usage `#630 <https://github.com/azavea/raster-vision/pull/630>`_
- Add support for vector tiles in .mbtiles files `#601 <https://github.com/azavea/raster-vision/pull/601>`_
- Add support for getting labels from zxy vector tiles `#532 <https://github.com/azavea/raster-vision/pull/532>`_
- Remove custom ``__deepcopy__`` implementation from ``ConfigBuilder``s. `#567 <https://github.com/azavea/raster-vision/pull/567>`_
- Remove custom ``__deepcopy__`` implementation from ``ConfigBuilder``\s. `#567 <https://github.com/azavea/raster-vision/pull/567>`_
- Add ability to shift raster images by given numbers of meters. `#573 <https://github.com/azavea/raster-vision/pull/573>`_
- Add ability to generate GeoJSON segmentation predictions. `#575 <https://github.com/azavea/raster-vision/pull/575>`_
- Add ability to run the DeepLab eval script. `#653 <https://github.com/azavea/raster-vision/pull/653>`_
- Submit CPU-only stages to a CPU queue on Aws. `#668 <https://github.com/azavea/raster-vision/pull/668>`_
- Parallelize CHIP and PREDICT commands `#671 <https://github.com/azavea/raster-vision/pull/671>`_
- Refactor ``update_for_command`` to split out the IO reporting into ``report_io``. `#671 <https://github.com/azavea/raster-vision/pull/671>`_

Raster Vision 0.8
-----------------
Expand Down
17 changes: 16 additions & 1 deletion docs/cli.rst
Expand Up @@ -25,8 +25,14 @@ Run is the main interface into running ``ExperimentSet`` workflows.

Some specific parameters to call out:

-\\-arg
~~~~~~~~~~~

Use ``-a`` to pass arguments into the experiment methods; many of which take a root_uri, which is where Raster Vision will store all the output of the experiment. If you forget to supply this, Raster Vision will remind you.

-\\-dry-run
~~~~~~~~~~~

Using the ``-n`` or ``--dry-run`` flag is useful to see what you're about to run before you run it. Combine this with the verbose flag for different levels of output:

.. code:: shell
Expand All @@ -35,8 +41,17 @@ Using the ``-n`` or ``--dry-run`` flag is useful to see what you're about to run
> rastervision -v run spacenet.chip_classification -a root_uri s3://example/ --dry_run
> rastervision -vv run spacenet.chip_classification -a root_uri s3://example/ --dry_run
-\\-skip-file-check
~~~~~~~~~~~~~~~~~~~

Use ``--skip-file-check`` or ``-x`` to avoid checking if files exist, which can take a long time for large experiments. This is useful to do the first run, but if you haven't changed anything about the experiment and are sure the files are there, it's often nice to skip that step.

.. _run split option:

-\\-splits
~~~~~~~~~~

Use ``-x`` to avoid checking if files exist, which can take a long time for large experiments. This is useful to do the first run, but if you haven't changed anything about the experiment and are sure the files are there, it's often nice to skip that step.
Use ``-s N`` or ``--splits N``, where ``N`` is the number of splits to create, to parallelize commands that can be split into parallelizable chunks. See :ref:`parallelizing commands` for more information.

.. _predict cli command:

Expand Down
47 changes: 47 additions & 0 deletions docs/codebase.rst
Expand Up @@ -54,3 +54,50 @@ Global Registry
---------------

Another major design pattern of Raster Vision is the use of a global registry. This is what gives the ability for the single interface to construct all subclass builders through the static ``builder()`` method on the ``Config`` via a key, e.g. ``rv.RasterSourceConfig.builder(rv.GEOTIFF_SOURCE)``. The key is used to look up what ConfigBuilders are registered inside the global registery, and the registry determines what builder to return from the ``build()`` call. More importantly, this enables Raster Vision to have a flexible system to create :ref:`plugins` out of anything that has a keyed ConfigBuilder. The registry pattern goes beyond Configs and ConfigBuilders, though: this is also how internal classes and plugins are chosen for :ref:`default provider`, :ref:`experiment runner`, and :ref:`filesystem`.

.. _configuration topics:

Configuration Topics
--------------------

Configuration objects have a couple of methods that require some understanding if you'd like deeper
knowledge of how Raster Vision works - for example if you are creating plugins.

Implicit Configuration
^^^^^^^^^^^^^^^^^^^^^^

Configuration values can be set implicitly from other configuration. For example, if my backend
requires a ``model_uri`` to save a model to, and it is not set, the configuration may set
it to ``/opt/data/rv_root/train/experiment-name/model.hdf``. This was implicitly set by knowing the
root URI for the train command is ``/opt/data/rv_root/train/experiment-name``, which is set on the
experiment (by default constructed from the ``root_uri`` and ``experiment_id``).
The mechanism that allows this is that configurations
implement a method called ``update_for_command``, with the following signature:

.. autoclass:: rastervision.core.Config
:members: update_for_command

This method is called before running commands on an experiment, and gives the configuration a
chance to update any values it needs to based on the experiment and any other context it needs.
The context argument is, for example, the ``SceneConfig`` that the configuration is attached
to (e.g. a ``RasterSourceConfig``). Context should be set whenever a parent configuration
calls ``update_for_command`` on child configuration, when that parent configuration is part
of a collection of configurations (e.g., the collection of ``SceneConfig``s in a ``DataSetConfig``).

Reporting IO
^^^^^^^^^^^^

Raster Vision requires that configuration reports on its input and output files, which allows it to tie
together commands into a Directed Acyclic Graph of operations that the ``ExperimentRunner``\s can execute.
The way this reporting happens is through the ``report_io`` method on configuration:

.. autoclass:: rastervision.core.Config
:members: report_io

For each specific command, configuration should set any input files or directories onto the ``io_def`` through the add ``add_input`` method, and set any output files or directories using the ``add_output`` method.

If a configuration does not correctly report on its IO, it could result in commands not running or
rerunning happening even though output already exists and the ``--rerun`` flag is not used. This
can be a common pitfall for plugin development, and care should be taken to ensure that IO is
properly being reported. The ``--dry-run`` flag with the ``-v`` verbosity flag can be useful here
for ensuring the IO that is reported is what is expected.
4 changes: 4 additions & 0 deletions docs/commands.rst
Expand Up @@ -16,6 +16,8 @@ ANALYZE

The ANALYZE command is used to analyze scenes that are part of an experiment and produce some output that can be consumed by later commands. Geospatial raster sources such as GeoTIFFs often contain 16- and 32-bit pixel color values, but many deep learning libraries expect 8-bit values. In order to perform this transformation, we need to know the distribution of pixel values. So one usage of the ANALYZE command is to compute statistics of the raster sources and save them to a JSON file which is later used by the StatsTransformer (one of the available :ref:`raster transformer`) to do the conversion.

.. _chip command:

CHIP
^^^^

Expand All @@ -26,6 +28,8 @@ TRAIN

The TRAIN command is used to train a model using the dataset generated by the CHIP command. The command is a thin wrapper around the train method in the backend that synchronizes files with the cloud, configures and calls the training routine provided by the associated third-party machine learning library, and sets up a log visualization server in some cases (e.g. Tensorboard). The output is a trained model that can be used to make predictions and fine-tune on another dataset.

.. _predict command:

PREDICT
^^^^^^^

Expand Down
2 changes: 1 addition & 1 deletion docs/index.rst
Expand Up @@ -182,7 +182,7 @@ usage patterns.
codebase
plugins
qgis
contributing
CONTRIBUTING
release

API Reference
Expand Down
11 changes: 11 additions & 0 deletions docs/plugins.rst
Expand Up @@ -26,6 +26,17 @@ you later refer to in your experiment configurations. For instance, if you devel
You'll need to implement the ``to_proto`` method and the ``Config`` and the ``from_proto`` method on ``ConfigBuilder`` - in the ``.proto`` files for the entity you are creating a plugin for, you'll see a ``google.protobuf.Struct custom_config`` section. This is the field in the protobuf that can handle arbitrary JSON, and should be used in plugins for configuration.

.. note::

Be sure to review the :ref:`configuration topics` and ensure you're implementing ``report_io`` and ``update_for_command`` properly in your configuration.

.. note::

A common pitfall is implementing the ``ConfigBuilder.from_proto`` and ``Config.to_proto`` methods
correctly. Look to other ``Config`` and ``ConfigBuilder`` implementations in the Raster Vision
codebase for examples on how to do this correctly - and utilize the ``custom_config`` in the protobufs
to be able to set arbitrary configuration that is specific to your plugin implementation.

Registering the Plugin
----------------------

Expand Down
24 changes: 24 additions & 0 deletions docs/runners.rst
Expand Up @@ -76,3 +76,27 @@ includes plugin files.

.. note::
To run on AWS Batch, you'll need the proper setup. See :ref:`aws batch setup` for instructions.

.. _parallelizing commands:

Running commands in Parallel
----------------------------

Raster Vision can run certain commands in parallel, such as the :ref:`chip command` and :ref:`predict command` commands. To do so, use the :ref:`run split option` option in the ``run`` command of the CLI.

Commands implement a ``split`` method on them, that either returns the original command if they
can not be split, e.g. with training, or a sequence of commands that are split up into
a given number of groups. For instance, using ``--splits 5`` on a ``CHIP`` command over
50 training scenes and 25 validation scenes will result in 5 CHIP commands, that can be run
in parallel, that will each create chips for 15 scenes.

The command DAG that is given to the experiment runner is constructed such that each split command
can be run in parallel if the runner supports parallelization, and that any command that is dependent on
the output of the split command will be dependent on each of the splits. So that means, in the above example,
a ``TRAIN`` command, which was dependent on a single ``CHIP`` command pre-split, will be dependent each of the
5 individual ``CHIP`` commands after the split.

Each runner will handle parallelization differently. For instance, the local runner will run each
of the splits simultaneously, so be sure the split number is in relation to the number of CPUs available.
The AWS Batch runner will submit jobs for each of the command splits, and the Batch Compute Environment will
dictate how many resources are available to run Batch jobs simultaneously.
59 changes: 33 additions & 26 deletions integration_tests/chip_classification_tests/experiment.py
Expand Up @@ -10,12 +10,15 @@ def get_path(part):

img_path = get_path('scene/image.tif')
label_path = get_path('scene/labels.json')

img2_path = get_path('scene/image2.tif')
label2_path = get_path('scene/labels2.json')

backend_conf_path = get_path('configs/backend.config')

pretrained_model = (
'https://github.com/fchollet/'
'deep-learning-models/releases/download/v0.2/'
'resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5')
'https://github.com/azavea/raster-vision-data/'
'releases/download/v0.0.7/chip-classification-test-weights.hdf5')

task = rv.TaskConfig.builder(rv.CHIP_CLASSIFICATION) \
.with_chip_size(200) \
Expand All @@ -38,31 +41,35 @@ def get_path(part):
replace_model=True) \
.build()

label_source = rv.LabelSourceConfig.builder(rv.CHIP_CLASSIFICATION_GEOJSON) \
.with_uri(label_path) \
.with_ioa_thresh(0.5) \
.with_use_intersection_over_cell(False) \
.with_pick_min_class_id(True) \
.with_background_class_id(3) \
.with_infer_cells(True) \
.build()

raster_source = rv.RasterSourceConfig.builder(rv.GEOTIFF_SOURCE) \
.with_uri(img_path) \
.with_channel_order([0, 1, 2]) \
.with_stats_transformer() \
.build()

scene = rv.SceneConfig.builder() \
.with_task(task) \
.with_id('cc_test') \
.with_raster_source(raster_source) \
.with_label_source(label_source) \
.build()
def make_scene(i_path, l_path):
label_source = rv.LabelSourceConfig.builder(rv.CHIP_CLASSIFICATION) \
.with_uri(l_path) \
.with_ioa_thresh(0.5) \
.with_use_intersection_over_cell(False) \
.with_pick_min_class_id(True) \
.with_background_class_id(3) \
.with_infer_cells(True) \
.build()

raster_source = rv.RasterSourceConfig.builder(rv.GEOTIFF_SOURCE) \
.with_uri(i_path) \
.with_channel_order([0, 1, 2]) \
.with_stats_transformer() \
.build()

return rv.SceneConfig.builder() \
.with_task(task) \
.with_id(os.path.basename(i_path)) \
.with_raster_source(raster_source) \
.with_label_source(label_source) \
.build()

scene_1 = make_scene(img_path, label_path)
scene_2 = make_scene(img2_path, label2_path)

dataset = rv.DatasetConfig.builder() \
.with_train_scene(scene) \
.with_validation_scene(scene) \
.with_train_scenes([scene_1, scene_2]) \
.with_validation_scenes([scene_1, scene_2]) \
.build()

experiment = rv.ExperimentConfig.builder() \
Expand Down

0 comments on commit 091feb9

Please sign in to comment.