New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update ImageUtils to support resizing one, three, or four channel images #94
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ready for a pass, there are some issues described in my review comments
@@ -72,15 +90,40 @@ private[sparkdl] object ImageUtils { | |||
image | |||
} | |||
|
|||
/** Returns the number of channels in the passed-in buffered image. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method and getOCVType
invert the mapping done in ImageSchema.scala
val color = new Color(image.getRGB(w, h), image.getColorModel.hasAlpha) | ||
channels match { | ||
case 1 => | ||
decoded(offset) = color.getBlue.toByte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Unfortunately this doesn't always yield the original blue byte written in spImageToBufferedImage
. In general the setRGB and getRGB methods don't yield consistent results when working with grayscale BufferedImages (BufferedImages with image type TYPE_BYTE_GRAY
), see this code snippet:
https://gist.github.com/smurching/d6e918a84c79155b616f9339b30fdfca
It makes sense to me that this happens, since it doesn't really make sense to tell a grayscale image "set the pixel at 0, 0 to RGB (x, y, z)" and expect the RGB channel values to actually be (x, y, z) - if this were the case, the image might not be gray. I'm not sure how to work around it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think this can be fixed by stealing some code from ImageSchema that directly manipulates the Image raster, will push an update...
val bufferedImage = ImageUtils.spImageToBufferedImage(spImage) | ||
val testImage = ImageUtils.spImageFromBufferedImage(bufferedImage) | ||
assert(spImage === testImage, "Image changed during conversion.") | ||
for (channels <- Seq(3, 4)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: This test will fail for channels = 1, see my comment in spImageFromBufferedImage
val testImage = ImageUtils.resizeImage(tgtHeight, tgtWidth, tgtChannels, biggerImage) | ||
assert(testImage === smallerImage, "Resizing image did not produce expected smaller image.") | ||
for (channels <- Seq(1, 3, 4)) { | ||
val smallerImage = ImageUtilsSuite.getImageRow(getImagePath("small", channels)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I cheated here & had the test compare the original image to a smaller image generated via the resizeImage code path (smallerImage
is an image file generated by saving the result of calling resizeImage
on biggerImage
). @MrBago do you remember how you generated the original image pair for this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean you compare against image resized using BufferedImage or ImageUtils.resizeImage?
Just a suggestion, why don't we store only the original image, and compare against the image resized in java using BufferedImage on the fly?
Codecov Report
@@ Coverage Diff @@
## master #94 +/- ##
==========================================
+ Coverage 82.44% 82.57% +0.13%
==========================================
Files 34 34
Lines 1937 1946 +9
Branches 36 40 +4
==========================================
+ Hits 1597 1607 +10
+ Misses 340 339 -1
Continue to review full report at Codecov.
|
val isGray = image.getColorModel.getColorSpace.getType == ColorSpace.TYPE_GRAY | ||
val hasAlpha = image.getColorModel.hasAlpha | ||
var offset = 0 | ||
// Grayscale images in Java require special handling to get the correct intensity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic here is basically the inverse of Spark's logic for reading Grayscale images into a data array (this code sets the pixels of a grayscale image from a data array): https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala#L159
For context, if we set grayscale image data using image.setRGB(w, h, rgbInt)
, calling image.getRGB(w, h)
won't necessarily return the same result (see this gist). Hence we set grayscale image data directly in the raster here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work , thanks for the explanation! Did you consider making the 3/4 channel images also use raster directly? The code would be more consistent that way.
Tests failed for Python 3.5 because curling travis failed, retriggering the tests... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I left few comments.
-
The cheating test should be fixed (if I understood the comment :)
-
I think we should check that the image is one of the supported OpenCvTypes and throw meaningful exception otherwise.
-
I would consider using the direct raster access for both grayscale and colored images.
require(false, s"`Channels` must be 1 or 3, got $channels.") | ||
} | ||
val image = channels match { | ||
case 1 => new BufferedImage(width, height, BufferedImage.TYPE_BYTE_GRAY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically there can be images with 2 channels. We should throw UnsupportedOperationException or something like that and inform user that resize does not work for 2 channel images.
Actually, thinking about it, we should probably check that the open cv type is one of the supported ones - e.g. one of CV_8UC{1,3,4}, so how about something something like this :
val mode = ImageSchema.getMode(rowImage)
val image = mode match {
case imageSchema.ocvTypes("CV_8UC1") => new BufferedImage(width, height, BufferedImage.TYPE_BYTE_GRAY)
....
case _ => throw new UnsupportedOperationException("Can not resize images with mode = " + mode)
}
val isGray = image.getColorModel.getColorSpace.getType == ColorSpace.TYPE_GRAY | ||
val hasAlpha = image.getColorModel.hasAlpha | ||
var offset = 0 | ||
// Grayscale images in Java require special handling to get the correct intensity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work , thanks for the explanation! Did you consider making the 3/4 channel images also use raster directly? The code would be more consistent that way.
image | ||
} | ||
|
||
/** Returns the number of channels in the passed-in buffered image. */ | ||
private def getNumChannels(img: BufferedImage): Int = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not call image.getColorModel.getNumComponents?
for (h <- 0 until height) { | ||
for (w <- 0 until width) { | ||
val color = new Color(image.getRGB(w, h), hasAlpha) | ||
decoded(offset) = color.getBlue.toByte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, I wonder if it would be more consistent to use raster directly for both grayscale and BGR(A).
val testImage = ImageUtils.resizeImage(tgtHeight, tgtWidth, tgtChannels, biggerImage) | ||
assert(testImage === smallerImage, "Resizing image did not produce expected smaller image.") | ||
for (channels <- Seq(1, 3, 4)) { | ||
val smallerImage = ImageUtilsSuite.getImageRow(getImagePath("small", channels)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean you compare against image resized using BufferedImage or ImageUtils.resizeImage?
Just a suggestion, why don't we store only the original image, and compare against the image resized in java using BufferedImage on the fly?
val tgtImg = new BufferedImage(tgtWidth, tgtHeight, BufferedImage.TYPE_3BYTE_BGR) | ||
val tgtImgType = tgtChannels match { | ||
case 1 => BufferedImage.TYPE_BYTE_GRAY | ||
case 3 => BufferedImage.TYPE_3BYTE_BGR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add
case _ => throw meaningful exception.
Just pushed an update that hopefully addresses the ^comments :)
Addressed by adding a helper method that performs an image resize using as few spark-deep-learning utilities as possible (we still call into spImageFromBufferedImage) and comparing ImageUtils.resizeImage output to the result of the helper.
Done
Updated the Also another clarifying comment: the test images 1_channels/00074201.png, 3_channels/00074201.png, and 4_channels/00074201.png are variants of the three-channel 00074201.jpg image. The alpha channel of 4_channels/00074201.png is set to random values between 122 and 255. |
require(false, s"`Channels` must be 1 or 3, got $channels.") | ||
} | ||
val imageType = openCVModeToImageType.getOrElse(mode, | ||
throw new UnsupportedOperationException("Cannot convert row image with " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomasatdatabricks As of right now, this error can only occur if users try to pass an image row with an unsupported OpenCV mode to DeepImageFeaturizer. I wonder if the current error message is vague/unhelpful to the user in this case...maybe the image validation should be pushed into DeepImageFeaturizer (but it also seems useful to validate the row image here)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think it should definitely be here as well. I believe you should always try to throw informative exception. Deep ImageFeaturizer is just one of the possible users of this function. Also, it does not have to have the same restrictions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jus one nit - extra blank line, otherwise LGTM.
} | ||
Row(origin, height, width, channels, ImageSchema.ocvTypes("CV_8UC3"), decoded) | ||
Row(origin, height, width, channels, getOCVType(image), decoded) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: extra blank line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. just one question on rasterizing.
val image = new BufferedImage(width, height, imageType) | ||
var offset = 0 | ||
val raster = image.getRaster | ||
// NOTE: This code assumes the raw image data in rowImage directly corresponds to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this change the data compared to the previous way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good Q, it doesn't change the data layout compared to the previous way. The BufferedImage types supported by our utilities (TYPE_BYTE_GRAY, TYPE_3BYTE_BGR, TYPE_4BYTE_AGBR
) have a fixed BGR(A) channel ordering so this approach should also be consistent across Java versions (i.e. the internal raster representation of the BufferedImage data won't suddenly change on us, see BufferedImage docs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I was wondering more about the content, whether what we're doing manually produced the exact same result as Color
& Color.getRGB
. The tests look like they maintain the same resizing results as before the code change here but wanted to confirm, since doing anything different to the image can affect the results of applying models to it. (Part of it is, I'm not sure how reasonable this assumption stated here is.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh I see. Actually the resize results may change when feeding grayscale images to DeepImageFeaturizer, but not for three-channel images (our DIF tests only run against three-channel images so they don't catch the change).
The difference for grayscale images is that running
val b = rowImageData(offset)
val color = new Color(b, b, b)
image.setRGB(x, y, color.getRGB)
as the original code did doesn't necessarily set the grayscale pixel at (x, y)
to b
. This happens because Java takes a weighted combination of the (r, g, b) channels when setting the grayscale pixel value. However with the raster approach the pixel at (x, y)
is actually directly set to b
. Given that b
was likely obtained by calling raster.getSample
(how ImageSchema.readImages
reads the data of a grayscale image), the raster approach seems ok to me.
That being said I could see changes in model behavior being sufficiently undesirable that we wouldn't want to make this change.
Another thing: just realized this PR no longer validates that the input images to DIF are either 1 or 3 channels, I'm assuming we should add this validation back? The practical implication being that as it stands, this PR allows users to feed 4-channel images to DIF without hitting an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the direct raster access is the the right way, ImageSchema.decode does the same and using it for the BGR(A) images makes the code more consistent.
Why do you want to restrict the number of channels on DIF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomasatdatabricks mainly I just didn't want to inadvertently release support for four-channel inputs in DIF without adding unit tests :P. Although I suppose we already supported one-channel inputs without specifically testing DIF on one-channel images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok the current logic looks okay to me. the most important thing is not to change the logic if possible so there is no breaking change for users. i doubt this is a problem at this point.
for the one- and four- channel input stuff -- if we want any testing, it seems enough to have unit tests to make sure this resizing function returns the correct number of channels given targetChannels
. Just checking the number of channels seems good enough - we could also check the content of the image if we want to catch any breaking changes in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. Should we add the unit test in this PR or in a follow-up? I'm happy to do either, it's a pretty small test. Thanks again for the reviews :)
@smurching I'm happy to merge this. You can add the tests in this PR if it's useful for your Databricks PR, but can delay to another PR if it's not. |
@sueann thanks, let's go ahead & merge this and add the tests in a future PR :) |
This PR:
spImageToBufferedImage
andspImageFromBufferedImage
) to support one, three, and four channel imagestgtChannels
parameter to determine the number of channels in the output image instead of defaulting to three output channels