-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor data generator #71
Refactor data generator #71
Conversation
Hi, @mihaimorariu! I appreciate the contribution! I'll review either this evening or tomorrow. Meanwhile, would you mind adding yourself to AUTHORS.rst. |
@mihaimorariu Would you mind rebasing? |
bd05718
to
dfd9837
Compare
Hi, @0x00b1! Thank you! I am new to this, but I would gladly like to contribute as well. I have rebased and added myself to AUTHORS.rst, as requested. |
|
||
current_index = (self.batch_index * batch_size) % n | ||
def scale_shape(shape, min_size=224, max_size=224): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! We really needed this! 👍
You should add a test for this function to tests.preprocessing.object_detection
.
|
||
current_index = (self.batch_index * batch_size) % n | ||
def scale_shape(shape, min_size=224, max_size=224): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s a little wonky and unexpected for shape
to include the channels dimension. I would expect this to work like shape parameters elsewhere (e.g. Keras’ ImageGenerator
) where it’s a (width, height)
pair.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also has the benefit of not needing separate “channels first” and “channels last” branches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can change that. But I think it would be useful to agree on whether we are going to use (width, height) tuples, from now on, or (height, width). I saw different usage in different parts of the code.
self.batch_index += 1 | ||
else: | ||
current_batch_size = n - current_index | ||
return (int(shape[0] * scale), int(shape[1] * scale), shape[2]), scale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An implementation that might be clearer:
rr, cc = shape
rr *= scale
cc *= scale
return (int(rr), int(cc)), scale
self.batch_index += 1 | ||
else: | ||
current_batch_size = n - current_index | ||
return (int(shape[0] * scale), int(shape[1] * scale), shape[2]), scale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you drop the channels dimension, you’ll need to drop the channels dimension here too.
|
||
yield index_array[current_index:current_index + current_batch_size], current_index, current_batch_size | ||
return skimage.transform.rescale(image, scale=scale, mode="reflect"), scale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think this function is needed. I’d inline skimage.transform.rescale
.
return self.next(*args, **kwargs) | ||
class DictionaryIterator(keras.preprocessing.image.Iterator): | ||
def __init__(self, data, classes, image_data_generator, image_shape, batch_size=1, | ||
shuffle=True, seed=numpy.uint32(time.time() * 1000)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, the seed
semantics should match so it should be seed=None
.
self.data = data | ||
self.classes = classes | ||
self.image_data_generator = image_data_generator | ||
self.image_shape = image_shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.image_shape
is unused.
self.classes = classes | ||
self.image_data_generator = image_data_generator | ||
self.image_shape = image_shape | ||
self.target_shape, _ = scale_shape(image_shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn’t be inside the constructor since the target_shape
is dependent on the image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You’re missing the call to super()
.
|
||
return [image_batch, gt_boxes_batch, metadata], None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You’re missing the y
value: [boxes, labels]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You’re also missing labels
from x
.
image = skimage.io.imread(pathname) | ||
for batch_index, image_index in enumerate(selection): | ||
path = self.data[image_index]["filename"] | ||
image = skimage.io.imread(path, as_grey=(self.target_shape[2] == 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don’t need to specify as_gray
.
Thanks, @mihaimorariu! This was great. I requested a few changes. Let me know if you need help! |
def flow(self, dictionary, shuffle=True, seed=None): | ||
return DictionaryIterator(dictionary, self, shuffle=shuffle, seed=seed) | ||
def flow(self, data, classes, image_shape): | ||
return DictionaryIterator(data, classes, keras.preprocessing.image.ImageDataGenerator(), image_shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keras.preprocessing.image.ImageDataGenerator()
should be self
|
||
def flow(self, dictionary, shuffle=True, seed=None): | ||
return DictionaryIterator(dictionary, self, shuffle=shuffle, seed=seed) | ||
def flow(self, data, classes, image_shape): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend that we keep using dictionary
rather than data
. Is flow_from_dictionary
is clearer?
|
||
def flow(self, dictionary, shuffle=True, seed=None): | ||
return DictionaryIterator(dictionary, self, shuffle=shuffle, seed=seed) | ||
def flow(self, data, classes, image_shape): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should drop the input_shape
parameter.
shape
should be inferred for situations where images have variable dimensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will we handle differently sized images, w.r.t. the batch size? Say we have image A of size (500, 500, 3) and image B of size (1000, 1000, 3). Do we make a batch of size (2, 1000, 1000, 3)? Do we set the "unoccupied" data to 0 for image A?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hgaiser When shape inference works it’d be (None, None, 3)
. However, until it’s implemented, the model expects fixed dimensions so you’d need to crop or resize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hgaiser @mihaimorariu Maybe I misinterpreted the image_size
parameter?
Should it be target_size
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
image_shape
is the shape of the original image. target_shape
is the shape of its rescaled version. So, if I understand correctly, shall we assume that image_shape
is the same for all images, for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@0x00b1 I get that, but generators work with numpy arrays and numpy arrays don't work with shapes like (None, None, 3). A numpy array has to be created, containing any number of images with any kind of shape, how?
If I remember correctly image_size
is the original image size and this generator would force all images to be of that size (some check for this still needs to be added, would give a clearer error message too; alternative is to reshape all images to this size). In an ideal situation, image_size
is not forced to one constant size. However, given the problem mentioned above, the most viable solutions is to either force batch size to 1 or force all images to be of the same size. A combination of the two is something I don't see a solution for. I'm very much open to suggestions on this part 👍
path = self.data[image_index]["filename"] | ||
image = skimage.io.imread(path, as_grey=(self.target_shape[2] == 1)) | ||
image, scale = scale_image(image) | ||
image = self.image_data_generator.random_transform(image) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would drop the random_transform
and standardize
calls for the moment.
Unfortunately, we can’t easily inherit from ImageDataGenerator
since we’ll need to write augmentation and pre-processing functions that can manage images and bounding boxes together. We’ll also need to support masks in the near future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note, supporting masks is no issue using the seed
parameter in random_transform
. The easiest solution for boxes would be to make a fake image where the bounding box is a square segmentation, augment it using random_transform
and re-compute the bounding box. Not efficient, but easy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hgaiser 👍
|
||
boxes = numpy.asarray([[d[k] for k in ['y1', 'x1', 'y2', 'x2']] for d in ds]) | ||
gt_data = [b["y1"], b["x1"], b["y2"], b["x2"], self.classes[b["class"]]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should be ['y1', 'x1', 'y2', 'x2']
(i.e. without the class label).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be yielding (my current understanding):
# image: (batches, width, height, channels)
# metadata: (batches, 3) where `3` corresponds to the values of `width`, `height`, and `scale`
# bounding boxes: (boxes, 4) where `4` corresponds to `y1`, `x1`, `y2`, and `x2`
# labels: (boxes, classes)
yield (image, metadata, bounding boxes, labels), (bounding boxes, labels)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(@jhung0 Is my understanding correct?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, boxes should just have the 4 coordinates. labels has the class info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah so we're deviating from py-faster-rcnn here? There, gt_boxes
is a matrix containing the boxes and labels as a (?, 5) matrix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the labels here are in one-hot form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to split them, its clearer. Why one-hot form if I may ask?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hgaiser It’s the standard for Keras. It does make some stuff easier (a notable exception is ImageNet’s 21,841 classes 😧).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright then :)
Figured with the sparse versions of the loss functions, for example sparse_categorical_crossentropy
, it didn't matter much.
|
||
def __next__(self, *args, **kwargs): | ||
return self.next(*args, **kwargs) | ||
class DictionaryIterator(keras.preprocessing.image.Iterator): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. 👍 Inheriting from keras.preprocessing.image.Iterator
will be easier to maintain than continuing to rely on our implementation.
32ea32a
to
9cb32f6
Compare
|
||
pathname = self.dictionary[index]["filename"] | ||
# Labels has num_classes + 1 elements, since 0 is reserved for background. | ||
num_classes = len(self.classes.keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could just do len(self.classes)
|
||
return [image_batch, gt_boxes_batch, metadata], None | ||
return [images, boxes, labels, self.scale], None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have a None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generators must return X, Y
, but Y
doesn't make sense in our case because the network generates Y
. Therefore, Y
can be ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, shouldn't metadata have image width/height information too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to include it, since that information can be obtained from images
?
23591e8
to
09a7c15
Compare
@@ -37,9 +37,14 @@ def __init__(self, dictionary, classes, generator, batch_size=1, shuffle=False, | |||
# Compute and store the target image shape. | |||
cols, rows, channels = dictionary[0]["shape"] | |||
self.image_shape = (rows, cols, channels) | |||
|
|||
self.target_shape, self.scale = scale_size(self.image_shape[0:2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would the min/max size for scale_size be set?
09a7c15
to
c33bf09
Compare
This PR refactors the data generator code. The data generator now outputs ground truth boxes with classes and metadata (shape and scale of rescaled image).