Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor data generator #71

Merged
merged 6 commits into from
Aug 25, 2017

Conversation

mihaimorariu
Copy link

This PR refactors the data generator code. The data generator now outputs ground truth boxes with classes and metadata (shape and scale of rescaled image).

@0x00b1
Copy link
Contributor

0x00b1 commented Aug 17, 2017

Hi, @mihaimorariu! I appreciate the contribution! I'll review either this evening or tomorrow. Meanwhile, would you mind adding yourself to AUTHORS.rst.

@0x00b1
Copy link
Contributor

0x00b1 commented Aug 18, 2017

@mihaimorariu Would you mind rebasing?

@mihaimorariu mihaimorariu force-pushed the features/data-generator branch 4 times, most recently from bd05718 to dfd9837 Compare August 18, 2017 07:42
@mihaimorariu
Copy link
Author

Hi, @0x00b1! Thank you!

I am new to this, but I would gladly like to contribute as well. I have rebased and added myself to AUTHORS.rst, as requested.


current_index = (self.batch_index * batch_size) % n
def scale_shape(shape, min_size=224, max_size=224):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! We really needed this! 👍

You should add a test for this function to tests.preprocessing.object_detection.


current_index = (self.batch_index * batch_size) % n
def scale_shape(shape, min_size=224, max_size=224):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s a little wonky and unexpected for shape to include the channels dimension. I would expect this to work like shape parameters elsewhere (e.g. Keras’ ImageGenerator) where it’s a (width, height) pair.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also has the benefit of not needing separate “channels first” and “channels last” branches.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can change that. But I think it would be useful to agree on whether we are going to use (width, height) tuples, from now on, or (height, width). I saw different usage in different parts of the code.

self.batch_index += 1
else:
current_batch_size = n - current_index
return (int(shape[0] * scale), int(shape[1] * scale), shape[2]), scale
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An implementation that might be clearer:

    rr, cc = shape

    rr *= scale
    cc *= scale

    return (int(rr), int(cc)), scale

self.batch_index += 1
else:
current_batch_size = n - current_index
return (int(shape[0] * scale), int(shape[1] * scale), shape[2]), scale
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you drop the channels dimension, you’ll need to drop the channels dimension here too.


yield index_array[current_index:current_index + current_batch_size], current_index, current_batch_size
return skimage.transform.rescale(image, scale=scale, mode="reflect"), scale
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think this function is needed. I’d inline skimage.transform.rescale.

return self.next(*args, **kwargs)
class DictionaryIterator(keras.preprocessing.image.Iterator):
def __init__(self, data, classes, image_data_generator, image_shape, batch_size=1,
shuffle=True, seed=numpy.uint32(time.time() * 1000)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, the seed semantics should match so it should be seed=None.

self.data = data
self.classes = classes
self.image_data_generator = image_data_generator
self.image_shape = image_shape
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.image_shape is unused.

self.classes = classes
self.image_data_generator = image_data_generator
self.image_shape = image_shape
self.target_shape, _ = scale_shape(image_shape)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn’t be inside the constructor since the target_shape is dependent on the image.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re missing the call to super().


return [image_batch, gt_boxes_batch, metadata], None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re missing the y value: [boxes, labels]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re also missing labels from x.

image = skimage.io.imread(pathname)
for batch_index, image_index in enumerate(selection):
path = self.data[image_index]["filename"]
image = skimage.io.imread(path, as_grey=(self.target_shape[2] == 1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don’t need to specify as_gray.

@0x00b1
Copy link
Contributor

0x00b1 commented Aug 18, 2017

Thanks, @mihaimorariu! This was great. I requested a few changes. Let me know if you need help!

def flow(self, dictionary, shuffle=True, seed=None):
return DictionaryIterator(dictionary, self, shuffle=shuffle, seed=seed)
def flow(self, data, classes, image_shape):
return DictionaryIterator(data, classes, keras.preprocessing.image.ImageDataGenerator(), image_shape)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keras.preprocessing.image.ImageDataGenerator() should be self


def flow(self, dictionary, shuffle=True, seed=None):
return DictionaryIterator(dictionary, self, shuffle=shuffle, seed=seed)
def flow(self, data, classes, image_shape):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend that we keep using dictionary rather than data. Is flow_from_dictionary is clearer?


def flow(self, dictionary, shuffle=True, seed=None):
return DictionaryIterator(dictionary, self, shuffle=shuffle, seed=seed)
def flow(self, data, classes, image_shape):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should drop the input_shape parameter.

shape should be inferred for situations where images have variable dimensions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will we handle differently sized images, w.r.t. the batch size? Say we have image A of size (500, 500, 3) and image B of size (1000, 1000, 3). Do we make a batch of size (2, 1000, 1000, 3)? Do we set the "unoccupied" data to 0 for image A?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hgaiser When shape inference works it’d be (None, None, 3). However, until it’s implemented, the model expects fixed dimensions so you’d need to crop or resize.

Copy link
Contributor

@0x00b1 0x00b1 Aug 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hgaiser @mihaimorariu Maybe I misinterpreted the image_size parameter?

Should it be target_size?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image_shape is the shape of the original image. target_shape is the shape of its rescaled version. So, if I understand correctly, shall we assume that image_shape is the same for all images, for now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@0x00b1 I get that, but generators work with numpy arrays and numpy arrays don't work with shapes like (None, None, 3). A numpy array has to be created, containing any number of images with any kind of shape, how?

If I remember correctly image_size is the original image size and this generator would force all images to be of that size (some check for this still needs to be added, would give a clearer error message too; alternative is to reshape all images to this size). In an ideal situation, image_size is not forced to one constant size. However, given the problem mentioned above, the most viable solutions is to either force batch size to 1 or force all images to be of the same size. A combination of the two is something I don't see a solution for. I'm very much open to suggestions on this part 👍

path = self.data[image_index]["filename"]
image = skimage.io.imread(path, as_grey=(self.target_shape[2] == 1))
image, scale = scale_image(image)
image = self.image_data_generator.random_transform(image)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would drop the random_transform and standardize calls for the moment.

Unfortunately, we can’t easily inherit from ImageDataGenerator since we’ll need to write augmentation and pre-processing functions that can manage images and bounding boxes together. We’ll also need to support masks in the near future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note, supporting masks is no issue using the seed parameter in random_transform. The easiest solution for boxes would be to make a fake image where the bounding box is a square segmentation, augment it using random_transform and re-compute the bounding box. Not efficient, but easy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hgaiser 👍


boxes = numpy.asarray([[d[k] for k in ['y1', 'x1', 'y2', 'x2']] for d in ds])
gt_data = [b["y1"], b["x1"], b["y2"], b["x2"], self.classes[b["class"]]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should be ['y1', 'x1', 'y2', 'x2'] (i.e. without the class label).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be yielding (my current understanding):

# image: (batches, width, height, channels)
# metadata: (batches, 3) where `3` corresponds to the values of `width`, `height`, and `scale`
# bounding boxes: (boxes, 4) where `4` corresponds to `y1`, `x1`, `y2`, and `x2`
# labels: (boxes, classes)

yield (image, metadata, bounding boxes, labels), (bounding boxes, labels)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(@jhung0 Is my understanding correct?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, boxes should just have the 4 coordinates. labels has the class info

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so we're deviating from py-faster-rcnn here? There, gt_boxes is a matrix containing the boxes and labels as a (?, 5) matrix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the labels here are in one-hot form.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to split them, its clearer. Why one-hot form if I may ask?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hgaiser It’s the standard for Keras. It does make some stuff easier (a notable exception is ImageNet’s 21,841 classes 😧).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright then :)

Figured with the sparse versions of the loss functions, for example sparse_categorical_crossentropy, it didn't matter much.


def __next__(self, *args, **kwargs):
return self.next(*args, **kwargs)
class DictionaryIterator(keras.preprocessing.image.Iterator):
Copy link
Contributor

@0x00b1 0x00b1 Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. 👍 Inheriting from keras.preprocessing.image.Iterator will be easier to maintain than continuing to rely on our implementation.

@mihaimorariu mihaimorariu changed the title Refactor data generator. Refactor data generator Aug 22, 2017

pathname = self.dictionary[index]["filename"]
# Labels has num_classes + 1 elements, since 0 is reserved for background.
num_classes = len(self.classes.keys())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could just do len(self.classes)


return [image_batch, gt_boxes_batch, metadata], None
return [images, boxes, labels, self.scale], None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have a None?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generators must return X, Y, but Y doesn't make sense in our case because the network generates Y. Therefore, Y can be ignored.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, shouldn't metadata have image width/height information too?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to include it, since that information can be obtained from images?

@@ -37,9 +37,14 @@ def __init__(self, dictionary, classes, generator, batch_size=1, shuffle=False,
# Compute and store the target image shape.
cols, rows, channels = dictionary[0]["shape"]
self.image_shape = (rows, cols, channels)

self.target_shape, self.scale = scale_size(self.image_shape[0:2])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would the min/max size for scale_size be set?

@0x00b1 0x00b1 merged commit c6afb5d into broadinstitute:master Aug 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants