Add augmentors to chip classification #851

lmbak · 2019-10-31T09:17:06Z

Overview

Adds the option to augment training data via blurring, rotating, and/or adding snow to the training data. To do this the albumentations library is used.

Checklist

Updated docs/changelog.rst
Added needs-backport label if PR is bug fix that applies to previous minor release
Ran scripts/format_code and committed any changes
Documentation updated if needed
PR has a name that won't get you publicly shamed for vagueness

Notes

Still work in progress.

Testing Instructions

How to test this PR
Prefer bulleted description
Start after checking out this branch
Include any setup required, such as rebuilding the Docker image.
Include test case, and expected output if not captured by automated tests.

Closes #831

lmbak · 2019-10-31T09:20:20Z

Currently running into an issue I have not been able to solve yet. I will continue solving the issue.

File "/opt/conda/lib/python3.6/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/opt/src/rastervision/cli/main.py", line 294, in run_command
rv.runner.CommandRunner.run(command_config_uri)
File "/opt/src/rastervision/runner/command_runner.py", line 11, in run
CommandRunner.run_from_proto(msg)
File "/opt/src/rastervision/runner/command_runner.py", line 17, in run_from_proto
command.run()
File "/opt/src/rastervision/command/train_command.py", line 21, in run
task.train(tmp_dir)
File "/opt/src/rastervision/task/task.py", line 137, in train
self.backend.train(tmp_dir)
File "/opt/src/rastervision/backend/pytorch_chip_classification.py", line 291, in train
opt, loss_fn, step_scheduler)
File "/opt/src/rastervision/backend/torch_utils/chip_classification/train.py", line 17, in train_epoch
for batch_ind, (x, y) in enumerate(bar):
File "/opt/conda/lib/python3.6/site-packages/click/_termui_impl.py", line 259, in next
rv = next(self.iter)
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 819, in next
return self._process_data(data)
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
data.reraise()
File "/opt/conda/lib/python3.6/site-packages/torch/_utils.py", line 369, in reraise
raise self.exc_type(msg)
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/src/rastervision/backend/torch_utils/chip_classification/folder.py", line 159, in getitem
sample = self.transform(sample)
File "/opt/conda/lib/python3.6/site-packages/torchvision/transforms/transforms.py", line 61, in call
img = t(img)
File "/opt/conda/lib/python3.6/site-packages/albumentations/core/transforms_interface.py", line 87, in call
return self.apply_with_params(params, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/albumentations/core/transforms_interface.py", line 94, in apply_with_params
params = self.update_params(params, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/albumentations/core/transforms_interface.py", line 142, in update_params
params.update({"cols": kwargs["image"].shape[1], "rows": kwargs["image"].shape[0]})
KeyError: 'image'

lmbak · 2019-10-31T12:30:08Z

So far I have found the following.

Albumentations requires a keyword argument "image". torchvision does not call the transformers with this keyword. So manually changing this line to img = t(image=img) solves this keyword issue.
This causes a new issue, as the final ToTensor() transform uses the keyword "pic", not "image". Hence, changing this and this "pic" to "image" solves this issue.
Albumentations library requires the images to be a numpy array, so I have prepended a self made augmentor that converts the PIL image to a numpy array, thus allowing albumentations to work with them.

So far we have:

(object): PIL image
(transform): convert to numpy array with self made augmentor
(object): numpy array
(transform): use albumentations
(object): dict
(transform): torchvision ToTensor(), which requires a PIL image or numpy array

The albumentations augmentation takes a numpy array as input, but returns a dict. Hence the final ToTensor() does not work, as this requires a PIL image or Numpy array.

Question @lewfish :
Considering that the calling of the transform objects is handled from within torchvision, and albumentations returns directly to torchvision, is there a way to solve this problem from within RV? Or should the change take place in the albumentations/torchvision libraries?

lewfish · 2019-10-31T15:25:41Z

I'm not sure if you saw this, but it might help: https://github.com/albu/albumentations/blob/master/notebooks/migrating_from_torchvision_to_albumentations.ipynb

One way to solve this is to modify the Dataset to return numpy arrays like you did. I think a cleaner way, although it might not work, is to write a transform that just converts from PIL to numpy and put that as the first transform in the list. (I think you'd also need a transform at the end of the list that converts from numpy to PIL or a PyTorch tensor). This latter option would allow you to take an off-the-shelf Dataset from torchvision and use it without modification with albumentations.

lmbak · 2019-11-04T10:43:46Z

I had not seen that link, thank you. Augmenting to numpy array and in the end back to pil does not work, so I'm going for overriding the ImageFolder class. I will continue working for now.

lmbak · 2019-11-04T13:28:50Z

Still to do:

~~Add more augmentors~~
~~Allow user to specify the probability that an augmentor is applied~~ Default of 0.5 is good enough for now
~~Allow for multiple chains of different augmentors e.g. (flip AND blur) OR (rotate AND mirror)~~ Currently this already happens automatically because of the standard probability of 0.5

I'll finish those this week.

Added ToNumpyArray transform for compatibility with Albumentations Finished preliminary implementation of albumentations augmentors Added more augmentors Fixed the to protobuf for the augmentors Fixed typo Changed return value for the mock augmentor Updated changelog.rst Cleaning up of some imports and requirements Cleaning up code

Added ToNumpyArray transform for compatibility with Albumentations Finished preliminary implementation of albumentations augmentors Added more augmentors Fixed the to protobuf for the augmentors Fixed typo Changed return value for the mock augmentor Updated changelog.rst Cleaning up of some imports and requirements Removed unneeded import Removed another unneeded import Added augmentor description Added missing ']' Style fixes

lmbak · 2019-11-18T12:00:21Z

@lewfish Good morning, I think this PR is nearing completion. The errors that are thrown in the integration test however, seem to occur in a part of the code that is unrelated to the part I changed; the cv2.findContours() function returns 2 instead of 3 objects. This seems strange to me, as I don't think I have changed anything related to this part in the code. I did find this SE question which seems to indicate that it has to do with the version of opencv.
edit: ok, logically it has to be something I did that causes the error, as on other branches the erros does not happen. I'll dive into it.

This leaves me with two questions:

What else needs to be done before this PR is ready?
~~Do you want me to try and fix the OpenCV error?~~

lmbak · 2019-11-18T13:26:13Z

I have found the issue. albumentations requires opencv-python>=4.1.1, while mask-to-polygons requires opencv-python==3.4.*.

lewfish · 2019-11-18T18:01:15Z

I'm doing the following and then hopefully merging: run ./scripts/format_code (i'm not having the problem you described), pointing to the new commit you added of mask-to-polygons, getting CI to pass, and manually testing. Unfortunately I don't see an easy way to unit test this right now.

lewfish · 2019-11-18T20:07:09Z

I'm closing this in favor of #859. I had to make a new PR because I am unable to simply add commits to this branch since I don't have permission.

lmbak changed the title ~~WIP: Add augmentors to chip classification~~ Add augmentors to chip classification Nov 18, 2019

lmbak mentioned this pull request Nov 18, 2019

Upgraded from OpenCV 3.4.* to 4.1.* azavea/mask-to-polygons#13

Merged

lewfish mentioned this pull request Nov 18, 2019

Add augmentors to chip classification #859

Merged

lewfish closed this Nov 18, 2019

lmbak deleted the lmb/augmentation branch November 25, 2019 10:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add augmentors to chip classification #851

Add augmentors to chip classification #851

lmbak commented Oct 31, 2019 •

edited

lmbak commented Oct 31, 2019 •

edited

lmbak commented Oct 31, 2019 •

edited

lewfish commented Oct 31, 2019

lmbak commented Nov 4, 2019

lmbak commented Nov 4, 2019 •

edited

lmbak commented Nov 18, 2019 •

edited

lmbak commented Nov 18, 2019

lewfish commented Nov 18, 2019

lewfish commented Nov 18, 2019

Navigation Menu

Add augmentors to chip classification #851

Add augmentors to chip classification #851

Conversation

lmbak commented Oct 31, 2019 • edited

Overview

Checklist

Notes

Testing Instructions

lmbak commented Oct 31, 2019 • edited

lmbak commented Oct 31, 2019 • edited

lewfish commented Oct 31, 2019

lmbak commented Nov 4, 2019

lmbak commented Nov 4, 2019 • edited

lmbak commented Nov 18, 2019 • edited

lmbak commented Nov 18, 2019

lewfish commented Nov 18, 2019

lewfish commented Nov 18, 2019

lmbak commented Oct 31, 2019 •

edited

lmbak commented Oct 31, 2019 •

edited

lmbak commented Oct 31, 2019 •

edited

lmbak commented Nov 4, 2019 •

edited

lmbak commented Nov 18, 2019 •

edited