New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Generator #132

Merged
merged 1 commit into from Sep 22, 2017

Conversation

Projects
None yet
3 participants
@vessemer
Copy link
Contributor

vessemer commented Sep 21, 2017

Description

The data augmentation process is a very common stage for a plenty of approaches in machine learning, especially in computer vision. The majority of the solutions of the #1, #2, #3 has the similar augmentation pipeline (e.g. grt123, Therapixel, Daniel Hammack, etc.). The aforementioned pipeline mainly consists of Hounsfield scaling, lungs segmentation (#120), CT re-orientation and a batch of trivial, yet resource consumptive spacial operations: zoom, rotation, shear, shift, flip and combinations of them. The convenient solution for the latter process is to build a generator which will yield a processed patches via affine transformations. The worth notable example of such a generator is Keras DataGenerator complemented by the thorough description. The main advantage of the Keras approach is that it for each patch build new transformation matrix from scratch but apply it only ones in comparison with step-by-step application of scipy.ndimage perturbations e.g. code of Daniel Hammack where only for a rotation around the origin two calls of affine transformations were performed. Unfortunately, the main disadvantage for us is that Keras DataGenerator doesn't work with 3D data. Thus what have I done is the extension of the DataGenerator in order to deal with volumetric data such as CT scans. I've also achieved compatibility with scipy.ndimage to check integrity of the methods by the trusted functions (i.e. scipy.ndimage.zoom, .rotate, etc.) which I further have used in the unit tests. Resulted speedup is about 2.5 in comparison with sequantial calls of scipy.ndimage functions.

Reference to official issue

#137

The examples of the application:

Random Rotation

rand_rotation_b
rand_rotation_full_b

Random Zoom

rand_zoom_b

Random Shear

random_shear

Random Shift

rand_shift_b

Agregated

one
two

CLA

  • I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well

@vessemer vessemer changed the title classification algorithm Data Generators Sep 21, 2017

@vessemer vessemer changed the title Data Generators Data Generator Sep 21, 2017

@@ -5,8 +5,8 @@
import numpy as np
import pytest

from ..preprocess import errors

This comment has been minimized.

@lamby

lamby Sep 21, 2017

Contributor

How come?

@@ -82,7 +82,9 @@ def __init__(self, params=None):

def __call__(self, voxel_data, meta):
if not isinstance(meta, load_ct.MetaData):
raise ValueError('The meta should be an instance of %s.' % str(load_ct.MetaData))
print(meta.__class__)

This comment has been minimized.

@lamby

lamby Sep 21, 2017

Contributor

Whoops? :)

return list(reversed(self.meta.GetSpacing()))
if self.xyz_order:
return list(reversed(self.meta.GetSpacing()))
else:

This comment has been minimized.

@lamby

lamby Sep 21, 2017

Contributor

No need for an else; you've just called return on the previous lin. (and elsewhere)

try:
patches.append(next(patch))
except StopIteration:

This comment has been minimized.

@lamby

lamby Sep 21, 2017

Contributor

flake might complain here, not sure...

This comment has been minimized.

@vessemer

vessemer Sep 21, 2017

Contributor

Looks ugly, agreed. I've replaced it via itertools.islice.

@vessemer vessemer force-pushed the vessemer:data_generators branch 4 times, most recently from cca7305 to acb62b6 Sep 21, 2017

@vessemer vessemer force-pushed the vessemer:data_generators branch from acb62b6 to abe8d2f Sep 21, 2017

@vessemer

This comment has been minimized.

Copy link
Contributor

vessemer commented Sep 21, 2017

Thanks for the review, @lamby, all have been rebased and stashed.

@lamby lamby merged commit 44d5016 into drivendataorg:master Sep 22, 2017

2 checks passed

concept-to-clinic/cla @vessemer has signed the CLA.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@reubano reubano referenced this pull request Sep 22, 2017

Closed

Lung Segmentation #120

0 of 3 tasks complete
@reubano

This comment has been minimized.

Copy link
Contributor

reubano commented Sep 22, 2017

I'm having trouble figuring out what issue this addresses. Ideas @vessemer @lamby?

@vessemer vessemer deleted the vessemer:data_generators branch Sep 23, 2017

@reubano

This comment has been minimized.

Copy link
Contributor

reubano commented Sep 26, 2017

Created #137 to accompany this

@vessemer vessemer referenced this pull request Jan 23, 2018

Open

Nodules augmentation #294

0 of 1 task complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment