Skip to content
This repository has been archived by the owner. It is now read-only.

Data Generator #132

Merged
merged 1 commit into from Sep 22, 2017
Merged

Data Generator #132

merged 1 commit into from Sep 22, 2017

Conversation

@vessemer
Copy link
Contributor

@vessemer vessemer commented Sep 21, 2017

Description

The data augmentation process is a very common stage for a plenty of approaches in machine learning, especially in computer vision. The majority of the solutions of the #1, #2, #3 has the similar augmentation pipeline (e.g. grt123, Therapixel, Daniel Hammack, etc.). The aforementioned pipeline mainly consists of Hounsfield scaling, lungs segmentation (#120), CT re-orientation and a batch of trivial, yet resource consumptive spacial operations: zoom, rotation, shear, shift, flip and combinations of them. The convenient solution for the latter process is to build a generator which will yield a processed patches via affine transformations. The worth notable example of such a generator is Keras DataGenerator complemented by the thorough description. The main advantage of the Keras approach is that it for each patch build new transformation matrix from scratch but apply it only ones in comparison with step-by-step application of scipy.ndimage perturbations e.g. code of Daniel Hammack where only for a rotation around the origin two calls of affine transformations were performed. Unfortunately, the main disadvantage for us is that Keras DataGenerator doesn't work with 3D data. Thus what have I done is the extension of the DataGenerator in order to deal with volumetric data such as CT scans. I've also achieved compatibility with scipy.ndimage to check integrity of the methods by the trusted functions (i.e. scipy.ndimage.zoom, .rotate, etc.) which I further have used in the unit tests. Resulted speedup is about 2.5 in comparison with sequantial calls of scipy.ndimage functions.

Reference to official issue

#137

The examples of the application:

Random Rotation

rand_rotation_b
rand_rotation_full_b

Random Zoom

rand_zoom_b

Random Shear

random_shear

Random Shift

rand_shift_b

Agregated

one
two

CLA

  • I have signed the CLA; if other committers are in the commit history, they have signed the CLA as well
@vessemer vessemer changed the title classification algorithm Data Generators Sep 21, 2017
@vessemer vessemer changed the title Data Generators Data Generator Sep 21, 2017
@@ -5,8 +5,8 @@
import numpy as np
import pytest

from ..preprocess import errors

This comment has been minimized.

@lamby

lamby Sep 21, 2017
Contributor

How come?

@@ -82,7 +82,9 @@ def __init__(self, params=None):

def __call__(self, voxel_data, meta):
if not isinstance(meta, load_ct.MetaData):
raise ValueError('The meta should be an instance of %s.' % str(load_ct.MetaData))
print(meta.__class__)

This comment has been minimized.

@lamby

lamby Sep 21, 2017
Contributor

Whoops? :)

return list(reversed(self.meta.GetSpacing()))
if self.xyz_order:
return list(reversed(self.meta.GetSpacing()))
else:

This comment has been minimized.

@lamby

lamby Sep 21, 2017
Contributor

No need for an else; you've just called return on the previous lin. (and elsewhere)

try:
patches.append(next(patch))
except StopIteration:

This comment has been minimized.

@lamby

lamby Sep 21, 2017
Contributor

flake might complain here, not sure...

This comment has been minimized.

@vessemer

vessemer Sep 21, 2017
Author Contributor

Looks ugly, agreed. I've replaced it via itertools.islice.

@vessemer vessemer force-pushed the vessemer:data_generators branch 4 times, most recently from cca7305 to acb62b6 Sep 21, 2017
@vessemer vessemer force-pushed the vessemer:data_generators branch from acb62b6 to abe8d2f Sep 21, 2017
@vessemer
Copy link
Contributor Author

@vessemer vessemer commented Sep 21, 2017

Thanks for the review, @lamby, all have been rebased and stashed.

@lamby lamby merged commit 44d5016 into drivendataorg:master Sep 22, 2017
2 checks passed
2 checks passed
@drivendata
concept-to-clinic/cla @vessemer has signed the CLA.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@reubano reubano mentioned this pull request Sep 22, 2017
0 of 3 tasks complete
@reubano
Copy link
Contributor

@reubano reubano commented Sep 22, 2017

I'm having trouble figuring out what issue this addresses. Ideas @vessemer @lamby?

@vessemer vessemer deleted the vessemer:data_generators branch Sep 23, 2017
@reubano
Copy link
Contributor

@reubano reubano commented Sep 26, 2017

Created #137 to accompany this

@vessemer vessemer mentioned this pull request Jan 23, 2018
0 of 1 task complete
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants