Segmentation: Chips and Training #337

jamesmcclain · 2018-07-31T15:26:29Z

Overview

This pull request adds the ability to generate Deeplab-compatible TFRecords and the ability to train a Deeplab model using the same. Preliminary to #321 .

Checklist

Ran scripts/format_code and commited any changes
Documentation updated if needed
- Will do as part of [WiP] Segmentation #321
PR has a name that won't get you publicly shamed for vagueness
DocStrings
Remote operation
Ensure Tensorboard
Hex to class mapping, use colors in debug chips
Add Potsdam Georeferencing Script
Respond to comments

Notes

Optional. Ancillary topics, caveats, alternative strategies that didn't work out, anything else.

Testing Instructions

Step 1

Prepare the test data by using the contrib/cowc/transfer_georeference.py script. Typing

transfer_georeference.py  top_potsdam_2_10_RGBIR.tif top_potsdam_2_10_label_noBoundary.tif top_potsdam_2_10_label_georeferenced.tif

will apply the georeferencing information from top_potsdam_2_10_RGBIR.tif to the ungeoreferenced label file top_potsdam_2_10_label_noBoundary.tif to produce a new file top_potsdam_2_10_label_georeferenced.tif. The same must be done for 2_11.

Step 2

Use one of the two workflow configuration files: samples/workflow-configs/segmentation/deeplab-test.json or samples/workflow-configs/segmentation/deeplab-remote-test.json.

jamesmcclain · 2018-08-02T17:38:49Z

INFO:tensorflow:global step 10: loss = 3.0484 (3.197 sec/step)
INFO:tensorflow:global step 20: loss = 2.1668 (3.190 sec/step)
INFO:tensorflow:global step 30: loss = 1.0515 (3.623 sec/step)
INFO:tensorflow:global step 40: loss = 0.6545 (3.574 sec/step)
INFO:tensorflow:global step 50: loss = 0.6408 (3.289 sec/step)
INFO:tensorflow:global step 60: loss = 0.5511 (3.453 sec/step)
INFO:tensorflow:global step 70: loss = 0.2832 (3.401 sec/step)
INFO:tensorflow:global step 80: loss = 0.4439 (3.533 sec/step)
INFO:tensorflow:global step 90: loss = 0.9702 (3.548 sec/step)
INFO:tensorflow:global step 100: loss = 1.1956 (3.269 sec/step)
INFO:tensorflow:global step 110: loss = 0.4424 (3.513 sec/step)
INFO:tensorflow:global step 120: loss = 0.8231 (3.320 sec/step)
INFO:tensorflow:global step 130: loss = 0.7730 (3.300 sec/step)

jamesmcclain · 2018-08-02T19:18:47Z

lossyrob · 2018-08-03T17:37:18Z

src/rastervision/ml_backends/tf_deeplab.py

+                                      start_sync)
+
+
+def numpy_to_png(array: np.ndarray) -> str:


Perhaps these are broadly useful enough to pull into a utility package?

Okay, will-do.

lewfish · 2018-08-03T18:27:00Z

src/rastervision/protos/label_store.proto

@@ -37,9 +39,17 @@ message ClassificationGeoJSONFile {
    optional Options options = 2;
 }

+message SegmentationRasterFile {
+    optional RasterSource src = 1;


I think a comment for each of these fields would be helpful.

Maybe comments would resolve my confusion, but more generally, I don't understand the notion of source and destination in this PR. Does source correspond to where the ground truth comes from and destination where the predictions go? If so, you should use ground_truth_label_store for the ground truth and a separate prediction_label_store for the predictions to follow the convention in the rest of RV.

Okay, will-do.

Maybe comments would resolve my confusion, but more generally, I don't understand the notion of source and destination in this PR.

The nomenclature may be poorly chosen.

In the case of rasters, the "source" refers to the given label data (read from the filesystem or some remote location) and the "destination" is where internally generated labeling information is sent (written to the filesystem or some remote location).

In the case of classes, the "source" classes are those of the source rasters; the "destination" classes are those used internally (e.g. "cars" being class of 1 [from the configuration files] instead of class 0xffff00 [as they are in the given labels]).

The use of same two words in slightly different contexts is to emphasize the connection between the two: one translates from classes in the source raster to those in the destination raster by considering source classes versus destination classes.

Source classes changed from integers to hex numbers, comments added to .proto file.

lewfish · 2018-08-03T18:33:41Z

src/rastervision/samples/workflow-configs/segmentation/deeplab-test-remote.json

+                        }
+                    },
+                    "src_classes": [ 6, 1 ],
+                    "dst_classes": [ 1, 0 ]


The class_ids should start at 1.

I'm guessing these are indices into the channel dimension. It would be good to clarify that.

These are not indices, please see the (rewritten) response above. All of these will be well documented.

In this particular case what you see is 0xffff00 -> 110b -> 6 (0xffff00 is the labeling for a "car" in the source raster) being mapped to 1 (the classmap maps the class 1 to the word "car").

The class 0x0000ff -> 001 -> 1 is being mapped to zero because 0x0000ff is for buildings and we don't want to consider those.

lewfish · 2018-08-03T18:45:44Z

src/rastervision/label_stores/segmentation_raster_file.py

+            xmin = window.xmin
+            ymax = window.ymax
+            xmax = window.xmax
+            return np.zeros((self.channels, ymax - ymin, xmax - xmin))


Chips are stored with channels in the last dimension elsewhere in RV.

Also true of TF.

lewfish · 2018-08-03T18:54:56Z

src/rastervision/protos/train.proto

@@ -17,6 +17,25 @@ message TrainConfig {
        optional string export_py = 3 [default="/opt/tf-models/object_detection/export_inference_graph.py"];
    }

+    message SegmentationOptions {


It seems like these parameters should be in a backend config file which is referenced from the backend_config_uri field to be consistent with the rest of RV. This would involve moving the bulk of the fields in SegmentationOptions into a DeepLabBackendConfig proto or similar.

lewfish · 2018-08-03T18:56:34Z

src/rastervision/ml_backends/tf_deeplab.py

+        args.append('--save_summaries_secs={}'.format(
+            soptions.save_summaries_secs))
+        args.append('--save_summaries_images={}'.format(
+            soptions.save_summaries_images))


Building up the args seems like a good thing to extract to another function.

... and other miscellaneous changes.

lewfish

I got this to run locally, and will test remotely next.

lewfish · 2018-08-13T13:33:52Z

src/rastervision/ml_tasks/semantic_segmentation.py

+        chip_size = options.chip_size
+
+        windows = []
+        for i in range(100):  # XXX insensitive


I think this should be an option.

It is in my current branch.

lewfish · 2018-08-13T15:31:50Z

src/rastervision/contrib/cowc/transfer_georeference.py

@@ -0,0 +1,38 @@
+#!/usr/bin/env python
+
+import os


I had trouble running this in the Docker container, I think due to a Python version mismatch. But, I've realized there's no need to add georeferencing to the label files. We can just use ImageFile instead of GeoTiffFiles as the RasterSource to handle non-georeferenced imagery. Here is the relevant part of the workflow config I used to get it to work:

{ "train_scenes": [ { "id": "2-10", "raster_source": { "geotiff_files": { "uris": [ "{raw}/isprs-potsdam/4_Ortho_RGBIR/top_potsdam_2_10_RGBIR.tif" ] } }, "ground_truth_label_store": { "segmentation_raster_file": { "src": { "image_file": { "uri": "{raw}/isprs-potsdam/5_Labels_for_participants_no_Boundary/top_potsdam_2_10_label_noBoundary.tif" } }, "src_classes": [ "#ffff00", "#0000ff" ], "dst_classes": [ 1, 0 ] } } } ], "test_scenes": [ { "id": "2-11", "raster_source": { "geotiff_files": { "uris": [ "{raw}/isprs-potsdam/4_Ortho_RGBIR/top_potsdam_2_11_RGBIR.tif" ] } }, "ground_truth_label_store": { "segmentation_raster_file": { "src": { "image_file": { "uri": "{raw}/isprs-potsdam/5_Labels_for_participants_no_Boundary/top_potsdam_2_11_label_noBoundary.tif" } }, "src_classes": [ "#ffff00", "#0000ff" ], "dst_classes": [ 1, 0 ] } } } ],

~~I am running in the container, as well. I am not sure how a version mis-match could occur.~~ Edit: I see now that you are talking about the georeferencing script. I'll address that.

Does image_file work when there is more than one label raster for the scene? If there is only one label raster per scene, does that imply that there is only one image raster per scene?

image_file does not work if there is more than one label raster for the scene. I'm guessing that case won't come up very frequently, but it's possible. For this dataset, it's not a problem.
Re: your second question, I think it's possible for there to be a single label raster, but multiple image rasters per scene.

Since this is going to be a "getting started" example, we should probably use image_file and avoid having to run the georeferencing script, although it could be useful for another application.

Here's what I got when I ran the script in the container after running update:
root@e820085104e2:/opt/src# ./rastervision/contrib/cowc/transfer_georeference.py \

/opt/data/raw-data/isprs-potsdam/4_Ortho_RGBIR/top_potsdam_2_10_RGBIR.tif \ /opt/data/raw-data/isprs-potsdam/5_Labels_for_participants_no_Boundary/top_potsdam_2_10_label_noBoundary.tif \ /opt/data/raw-data/isprs-potsdam/labels/2_10.tif

Traceback (most recent call last):
File "./rastervision/contrib/cowc/transfer_georeference.py", line 27, in
ul = re.search(ul_re, ullr)
File "/usr/lib/python3.5/re.py", line 173, in search
return _compile(pattern, flags).search(string)
TypeError: cannot use a string pattern on a bytes-like object

lewfish · 2018-08-13T15:39:04Z

src/rastervision/label_stores/segmentation_raster_file.py

+        """Constructor.
+
+        Args:
+             src: A source of raster label data (either an object that


I think I understand the difference between src and dst now. I was confused because in the other LabelStores I was just using uri for both reading and writing (but had separate readable and writable fields). But I still think it's confusing that the term dst is being used in both dst, and dst_classes which seems like a different concept. Perhaps dst_classes should be called rv_classes since they are the class ids that RV is using internally, and they are used even when there is no dst specified.

lewfish · 2018-08-13T15:42:36Z

src/rastervision/label_stores/segmentation_raster_file.py

+        if isinstance(src, RasterSource):
+            self.src = src
+        elif isinstance(src, RasterSourceProto):
+            self.src = raster_source_builder.build(src)


I like how this is smart enough to handle different types. Seems like a good API design pattern.

lewfish · 2018-08-13T15:44:01Z

src/rastervision/ml_backends/tf_deeplab.py

+def make_debug_images(record_path: str,
+                      output_dir: str,
+                      class_map: ClassMap,
+                      p: float = 0.25) -> None:


Good idea to only save a random subset.

lewfish · 2018-08-13T15:51:36Z

src/rastervision/ml_backends/tf_deeplab.py

+         file.
+
+    """
+    return join(base_uri, '{}-0.record'.format(split))


Is there any reason for the -0? Just wondering because the code for OD doesn't do that.

Yes, for some reason deeplap wants files with names of the form *-[0-9]\+.record.

lewfish · 2018-08-13T15:53:40Z

src/rastervision/utils/files.py

@@ -24,6 +27,40 @@ class ProtobufParseException(Exception):
    pass


+def numpy_to_png(array: np.ndarray) -> str:


I think these two functions should be in utils.misc since they don't really deal with file handling.

lewfish · 2018-08-13T15:57:57Z

src/rastervision/ml_backends/tf_deeplab.py

+        backend_config_uri = get_local_path(options.backend_config_uri,
+                                            self.temp_dir)
+        with open(backend_config_uri) as f:
+            be_options = json.load(f)


Instead of just using raw JSON file for the backend options, I think we should use a protobuf file, like in the rest of RV. That way we can more easily document and validate the options.

lewfish · 2018-08-13T16:05:17Z

src/rastervision/samples/workflow-configs/segmentation/deeplab-test-remote.json

+                        }
+                    },
+                    "src_classes": [ "#ffff00", "#0000ff" ],
+                    "dst_classes": [ 1, 0 ]


Why are there two elements in dst_classes but only one in class_items? I think it has something to do with the fact that everything that's not in the class_map counts as background and is assigned the id 0. But if that's true, then I still don't understand why 0 needs to be mapped to a hex value at all. Perhaps there should be more documentation about converting classes to the background value.

#ffff00 is for cars, #0000ff is for buildings. #ffff00 is mapped to 1 (cars) and #0000ff is mapped to 0 (no label).

So we are training a model to distinguish between car and building. But, then shouldn't there be an item in class_items for building?

There is only one class under consideration: cars. Let me make some changes to clarify the situation.

lewfish · 2018-08-13T17:42:43Z

Do you have any idea why some of the images are zoomed out (with gray backgrounds) in Tensorboard? Also, after 2000 steps I thought I would see some labels popping up in the predictions.

lewfish · 2018-08-13T17:48:10Z

Do you have any idea why some of the images are zoomed out (with gray backgrounds) in Tensorboard? Also, after 2000 steps I thought I would see some labels popping up in the predictions.

If you think there's a bug we should probably just make a separate issue for it.

jamesmcclain · 2018-08-13T18:46:34Z

Do you have any idea why some of the images are zoomed out (with gray backgrounds) in Tensorboard?

I do not. Changing the chip size to 513x513 seemed to reduce this, but I still see some oddly scaled images. The might be from edges or corners but I noticed that seem to be smaller in scale so I don't think that that is so.

Also, after 2000 steps I thought I would see some labels popping up in the predictions.

So did I, still working on it.

jamesmcclain · 2018-08-13T21:48:57Z

Comments addressed.

jamesmcclain · 2018-08-13T22:01:29Z

Do you have any idea why some of the images are zoomed out (with gray backgrounds) in Tensorboard?

Looks like augmentation.

lewfish

All the other changes look good to me!

lewfish · 2018-08-14T13:56:09Z

src/rastervision/samples/workflow-configs/segmentation/deeplab-test.json

+                            ]
+                        }
+                    },
+                    "raster_class_map": [


The raster_class_map is easier to understand. But, I still don't know why there is only an item for car, and not for building. Also, class_ids are expected to go from 1 to N for object detection and classification or things will break, and that should be fixed at some point. So I'm glad your implementation doesn't depend on it (since you've chosen 127 and 255 as ids), but that could be confusing, so maybe they should be changed to 1 and 2?

Isn't this sample only detecting cars? There are several classes inside the labels (building, low veg, trees, clutter, impervious surface) - bit confused why we would include buildings but not all of these, and I can see including just the target class, in this case, cars

The class_items contains items for cars and buildings, but the raster_class_map only has an item for car. The mismatch is what's confusing me. I think it makes sense to use a (consistent) subset though.

lewfish · 2018-08-14T13:56:59Z

src/rastervision/samples/workflow-configs/segmentation/deeplab-test.json

+                            ]
+                        }
+                    },
+                    "raster_class_map": [


Also, what do you think about switch from geotiff_files to image_file so your script doesn't need to be run, esp. for new users?

lewfish · 2018-08-14T14:25:39Z

src/rastervision/samples/workflow-configs/segmentation/deeplab-test.json

            }
        ]
    },
    "make_training_chips_options": {
        "segmentation_options": {
-            "empty_survival_probability": 0.2,
+            "empty_survival_probability": 0.1,


Can you also switch to using image_file on this file too?

jamesmcclain · 2018-08-14T14:44:01Z

Are there any further changes requested?

lossyrob · 2018-08-14T14:49:02Z

Looks like the Travis CI failed because of code formatting? https://travis-ci.org/azavea/raster-vision/builds/415948141#L2128

lewfish · 2018-08-14T14:51:27Z

Once the build passes this is good to merge. I think you did an awesome job on this, especially considering the lack of documentation and being new to the project. Thanks!

jamesmcclain · 2018-08-14T14:56:36Z

Failed merge with working branch, trying to resolve.

CloudNiner added the review label Jul 31, 2018

jamesmcclain mentioned this pull request Jul 31, 2018

[WiP] Segmentation #321

Closed

20 tasks

jamesmcclain force-pushed the segmentation branch 3 times, most recently from 79ab472 to f86c3af Compare August 2, 2018 11:36

jamesmcclain force-pushed the segmentation branch from 101c91d to cc39673 Compare August 2, 2018 19:23

lossyrob reviewed Aug 3, 2018

View reviewed changes

lewfish reviewed Aug 3, 2018

View reviewed changes

James McClain added 13 commits August 6, 2018 11:24

Protobuf

1b56683

Workflow Configuration

50228d2

Render to TFRecords

73f281e

Lint, Format Code

001dc3f

Debug Images

8d522ca

Source-to-Destination Label Translation

d3cbf5e

Install deeplab

1371c14

Initial Training Support

e9d1fc8

More Training Parameters

8dc7dbc

Label Store Unit Tests

ebe87c9

DeepLab Backend Tests

8a95b18

PyDocs

f1a0163

Remote

edd6b64

jamesmcclain force-pushed the segmentation branch 3 times, most recently from 5617426 to 4c3f895 Compare August 7, 2018 22:01

James McClain added 2 commits August 8, 2018 16:20

Export and TensorBoard

4c005b9

Class Map Colors

e9edb58

Backend Configuration File

521df6d

... and other miscellaneous changes.

jamesmcclain force-pushed the segmentation branch from e8c8587 to 521df6d Compare August 9, 2018 17:13

jamesmcclain requested a review from lewfish August 9, 2018 17:19

Window Filtering

bc69e23

lewfish suggested changes Aug 13, 2018

View reviewed changes

Faster Debug Chips

14e062b

James McClain added 3 commits August 13, 2018 17:08

Review Comments

2f81754

Change Class Mapping Scheme

bedae75

ProtoBuf for Backend Config

c934ef7

lewfish suggested changes Aug 14, 2018

View reviewed changes

jamesmcclain force-pushed the segmentation branch from 9d0316a to ce178e0 Compare August 14, 2018 14:21

lewfish suggested changes Aug 14, 2018

View reviewed changes

jamesmcclain force-pushed the segmentation branch from ce178e0 to 4019b02 Compare August 14, 2018 14:35

lewfish approved these changes Aug 14, 2018

View reviewed changes

James McClain added 2 commits August 14, 2018 11:06

Greater Configurability

7e47b35

Clarify Configuration

825e900

jamesmcclain force-pushed the segmentation branch from 4019b02 to 825e900 Compare August 14, 2018 15:07

jamesmcclain merged commit f0b93ea into azavea:develop Aug 14, 2018

jamesmcclain deleted the segmentation branch August 14, 2018 15:20

CloudNiner removed the review label Aug 14, 2018

jamesmcclain restored the segmentation branch August 14, 2018 15:21

		@@ -24,6 +27,40 @@ class ProtobufParseException(Exception):
		pass


		def numpy_to_png(array: np.ndarray) -> str:

Segmentation: Chips and Training #337

Segmentation: Chips and Training #337

Conversation

jamesmcclain commented Jul 31, 2018 • edited

Overview

Checklist

Notes

Testing Instructions

Step 1

Step 2

jamesmcclain commented Aug 2, 2018

jamesmcclain commented Aug 2, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lewfish Aug 3, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamesmcclain Aug 3, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamesmcclain Aug 3, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lewfish left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamesmcclain Aug 13, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lewfish commented Aug 13, 2018

lewfish commented Aug 13, 2018

jamesmcclain commented Aug 13, 2018 • edited

jamesmcclain commented Aug 13, 2018

jamesmcclain commented Aug 13, 2018

lewfish left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamesmcclain commented Aug 14, 2018

lossyrob commented Aug 14, 2018

lewfish commented Aug 14, 2018

jamesmcclain commented Aug 14, 2018

jamesmcclain commented Jul 31, 2018 •

edited

lewfish Aug 3, 2018 •

edited

jamesmcclain Aug 3, 2018 •

edited

jamesmcclain Aug 3, 2018 •

edited

jamesmcclain Aug 13, 2018 •

edited

jamesmcclain commented Aug 13, 2018 •

edited