Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22730][ML] Add ImageSchema support for all OpenCv types. #20168

Conversation

tomasatdatabricks
Copy link

What changes were proposed in this pull request?

Added functionality to handle all OpenCV modes to ImageSchema:

  1. updated toImage and toNDArray functions to handle non-uint8 based images.
  2. add information about individual OpenCv modes

How was this patch tested?

Added test for conversion between numpy arrays and images stored as all possible OpenCV modes.

@HyukjinKwon
Copy link
Member

ok to test

@HyukjinKwon
Copy link
Member

cc @jkbradley, @imatiach-msft, @MrBago and @thunterdb.

OpenCvType.undefinedType +: (ordinals zip types).map(x => OpenCvType(x._1, x._2._1, x._2._2))
}

val javaOcvTypes = ocvTypes.asJava
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm .. why did we remove the doc here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explicit type.

dataType=x.dataType(),
nptype=self._ocvToNumpyMap[x.dataType()])
for x in ocvTypeList]
return self._ocvTypes[:]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it for copy? I usually do list(self._ocvTypes) tho.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it is for copy.

self._ocvTypesByName.keys())))
return self._ocvTypesByName[name]

def ocvTypeByMode(self, mode):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it meant to be public? Seems doc is missing and this one doesn't look consistent with Scala side?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it is not consistent with Scala side?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I am not seeing the method called ocvTypeByMode in ImageSchema.scala.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark's approach has been to keep python and scala APIs as close as possible.

We should either try and copy the scala api here (make an OpenCvType object with a get method and __call__ method), or drop the OpenCvType object from scala side and use the same method names as we're adding to the python side.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not consistent because python can not overload methods based on type but I can rename the Scala side to match python. It does not make a big difference in this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could make either API work for both languages but it's a bit unnatural. There's a tradeoff between doing the most natural and appropriate thing in each language and having matching APIs, Spark has chosen to prefer making the APIs match so let's do our best to do that.

@SparkQA
Copy link

SparkQA commented Jan 6, 2018

Test build #85742 has finished for PR 20168 at commit 70bae2f.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Let's fix the PR title to [SPARK-22730][ML] ... BTW.

@@ -201,8 +243,9 @@ def readImages(self, path, recursive=False, numPartitions=-1,
.. versionadded:: 2.3.0
"""

spark = SparkSession.builder.getOrCreate()
image_schema = spark._jvm.org.apache.spark.ml.image.ImageSchema
ctx = SparkContext.getOrCreate()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment: the change before was only two lines whereas this is 3 lines - I think the previous change was better, what is the advantage of the new code?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Looks like this was caused by auto-rebasing onto the latest changes. My original change was only to replace _active_context with getOrCreate() but it was clearly made obsolete in the meantime. I'll just remove it.

self.assertEqual(x, ImageSchema.ocvTypeByMode(x.mode))

def test_conversions(self):
ary_src = [[[1e7*random.random() for z in range(4)] for y in range(10)] for x in range(10)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you pass a seed to the random generator (I believe the policy for spark is to always generate the same random numbers in tests in order to reduce flaky tests)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something like:

s = np.random.RandomState(seed=987)
ary_src  = s.rand(4, 10, 10)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, will do.

@@ -1843,6 +1844,28 @@ def tearDown(self):

class ImageReaderTest(SparkSessionTestCase):

def test_ocv_types(self):
ocvList = ImageSchema.ocvTypes
self.assertEqual("Undefined", ocvList[0].name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confused, didn't you make this:
def name: String = "CV_" + dataType + "C" + nChannels
or "CV_-1C-1" instead of "Undefined" - how does the test pass?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test is failing:

======================================================================
FAIL: test_ocv_types (pyspark.ml.tests.ImageReaderTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/ml/tests.py", line 1849, in test_ocv_types
    self.assertEqual("Undefined", ocvList[0].name)
AssertionError: 'Undefined' != u'CV_N/AC-1'

----------------------------------------------------------------------

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch. There should have been if else clause setting name to Undefined for mode == -1.

ocvTypes.find(x => x.mode == mode).getOrElse(
throw new IllegalArgumentException("Unknown open cv mode " + mode))
}
val undefinedType = OpenCvType(-1, "N/A", -1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I understand correctly the name for undefinedType will then be:
def name: String = "CV_" + dataType + "C" + nChannels
which is "CV_-1C-1"? But below in the tests you are checking for the name to be "Undefined"? If this is correct, can it be fixed by overriding the name to check if datatype is N/A or something similar and in that case using a name of "Undefined"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the type here, I think all public members need to have explicit type, even trivial members.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the name method should have special case for mode == -1.

@@ -143,12 +174,12 @@ object ImageSchema {

val height = img.getHeight
val width = img.getWidth
val (nChannels, mode) = if (isGray) {
(1, ocvTypes("CV_8UC1"))
val (nChannels, mode: Int) = if (isGray) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't you need to add the other types here (eg floating point, etc)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the code bellow we always call toBytes per channel so everything will be cast to a 1 byte per channel image type. I think this is fine for standard image types (ie, jpg, png, and the like).

Technically you can register image readers in java and use a BufferedImage of TYPE_CUSTOM for more complex images, but in practice I think we'll want to take a very different code path for these complex images so we'll want to introduce a new method if we add support to handle them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The decode function still only handles unsigned bytes. The intention of this PR was only to add support for conversions between ImageSchema struct and numpy arrays and vice versa. Such use case occurs e.g. if you want to store images that have been preprocessed to be used by TF model (e.g. normalized).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, sorry, I see, I was confused by the description, if we aren't supporting float images for reading that makes sense now

self.assertEqual(ocvType, ImageSchema.ocvTypeByMode(img.mode))
npary1 = ImageSchema.toNDArray(img)
np.testing.assert_array_equal(npary0, npary1)

def test_read_images(self):
data_path = 'data/mllib/images/kittens'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessary for this PR, but it would be nice to add images with the new types to verify they can be read in with the new changes

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per my comment above, this PR does not address reading images, only image schema <=> numpy arrays conversions.

@imatiach-msft
Copy link
Contributor

@tomasatdatabricks nice PR! I've added a few comments.

def ocvTypeByMode(self, mode):
if self._ocvTypesByMode is None:
self._ocvTypesByMode = {x.mode: x for x in self.ocvTypes}
return self._ocvTypesByMode[mode]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not add if mode not in self._ocvTypesByMode: check here ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, it should be there.

self._ocvTypesByName.keys())))
return self._ocvTypesByName[name]

def ocvTypeByMode(self, mode):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it is not consistent with Scala side?

@@ -55,7 +72,7 @@ def imageSchema(self):
"""

if self._imageSchema is None:
ctx = SparkContext._active_spark_context
ctx = SparkContext.getOrCreate()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you check every place to use getOrCreate ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, everywhere in image schema code at least.

Copy link
Contributor

@MrBago MrBago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomasatdatabricks I left some comments. Can you take a look at the tests, there seem to be some failures. I can take another pass when you're ready.

self.assertEqual(x, ImageSchema.ocvTypeByMode(x.mode))

def test_conversions(self):
ary_src = [[[1e7*random.random() for z in range(4)] for y in range(10)] for x in range(10)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something like:

s = np.random.RandomState(seed=987)
ary_src  = s.rand(4, 10, 10)

@@ -143,12 +174,12 @@ object ImageSchema {

val height = img.getHeight
val width = img.getWidth
val (nChannels, mode) = if (isGray) {
(1, ocvTypes("CV_8UC1"))
val (nChannels, mode: Int) = if (isGray) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the code bellow we always call toBytes per channel so everything will be cast to a 1 byte per channel image type. I think this is fine for standard image types (ie, jpg, png, and the like).

Technically you can register image readers in java and use a BufferedImage of TYPE_CUSTOM for more complex images, but in practice I think we'll want to take a very different code path for these complex images so we'll want to introduce a new method if we add support to handle them.

OpenCvType.undefinedType +: (ordinals zip types).map(x => OpenCvType(x._1, x._2._1, x._2._2))
}

val javaOcvTypes = ocvTypes.asJava
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explicit type.

ocvTypes.find(x => x.mode == mode).getOrElse(
throw new IllegalArgumentException("Unknown open cv mode " + mode))
}
val undefinedType = OpenCvType(-1, "N/A", -1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the type here, I think all public members need to have explicit type, even trivial members.

)
case class OpenCvType(mode: Int, dataType: String, nChannels: Int) {
def name: String = "CV_" + dataType + "C" + nChannels
override def toString: String = "OpenCvType(mode = " + mode + ", name = " + name + ")"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark uses scala "string interpolation" a lot of places, I think it's quite nice and very readable: s"OpenCvType(mode = $mode, name = $name)"

@@ -1843,6 +1844,28 @@ def tearDown(self):

class ImageReaderTest(SparkSessionTestCase):

def test_ocv_types(self):
ocvList = ImageSchema.ocvTypes
self.assertEqual("Undefined", ocvList[0].name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test is failing:

======================================================================
FAIL: test_ocv_types (pyspark.ml.tests.ImageReaderTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/ml/tests.py", line 1849, in test_ocv_types
    self.assertEqual("Undefined", ocvList[0].name)
AssertionError: 'Undefined' != u'CV_N/AC-1'

----------------------------------------------------------------------

continue
x = [[ary_src[i][j][0:ocvType.nChannels]
for j in range(len(ary_src[0]))] for i in range(len(ary_src))]
npary0 = np.array(x).astype(ocvType.nptype)
Copy link
Contributor

@MrBago MrBago Jan 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ary_src is an array, this becomes:

nparry0 = ary_src[..., :ocvType.nChannels].astype(ocvType.nptype)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, I'll change that.

if dtype not in self._numpyToOcvMap:
raise ValueError(
"Unsupported array data type '%s', currently only supported formats are %s" %
(str(array.dtype), str(self._numpyToOcvMap.keys())))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"%s" will call __str__ automatically so you don't need to wrap in str.

raise ValueError(
"Can not find matching OpenCvFormat for type = '%s'; supported formats are = %s" %
(name, str(
self._ocvTypesByName.keys())))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: can we put this on the above line?

self._ocvTypesByName.keys())))
return self._ocvTypesByName[name]

def ocvTypeByMode(self, mode):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark's approach has been to keep python and scala APIs as close as possible.

We should either try and copy the scala api here (make an OpenCvType object with a get method and __call__ method), or drop the OpenCvType object from scala side and use the same method names as we're adding to the python side.

@tomasatdatabricks tomasatdatabricks changed the title SPARK-22730 Add ImageSchema support for non-integer image formats [SPARK-22730][ML] Add ImageSchema support for non-integer image formats Jan 9, 2018
@SparkQA
Copy link

SparkQA commented Jan 9, 2018

Test build #85878 has finished for PR 20168 at commit eee25ce.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tomasatdatabricks tomasatdatabricks force-pushed the tomas/ImageSchemaUpdate branch 2 times, most recently from d1b0ed2 to 48eddf1 Compare January 9, 2018 21:11
@SparkQA
Copy link

SparkQA commented Jan 9, 2018

Test build #85880 has finished for PR 20168 at commit 48eddf1.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 9, 2018

Test build #85884 has finished for PR 20168 at commit 763c8a6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@MrBago MrBago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only remaining concerns with this PR are the breaking changes it introduces, @jkbradley @imatiach-msft how do you feel about having breaking change in ImageSchema between 2.3 & 2.4?
ImageSchema is experimental and some of these changes could make maintenance easier long term.

@tomasatdatabricks do you mind documenting the breaking changes, we'll need this documentation anyways for the release notes eventually.

Otherwise LGTM!

img = ImageSchema.toImage(npary0)
self.assertEqual(ocvType, ImageSchema.ocvTypeByMode(img.mode))
npary1 = ImageSchema.toNDArray(img)
np.testing.assert_array_equal(npary0, npary1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also check nparry1.dtype = ocvType.nptype, numpy allows arrays of different types to be equal if their contents compare equal, eg [0, 1] == [0.0, 1.0].

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, I'll add the test.

@tomasatdatabricks
Copy link
Author

@MrBago Here is the description of the breaking changes. ImageSchema.ocvTypes and ImageSchema.javaOcvTypes changed types from Map[String,Int] to list of OpenCvType.

ImageSchema.ocvTypes: Map[String, Int] -> IndexedSeq[ImageSchema.OcvType].
ImageSchema.javaOcvTypes: java.util.Map[String,Int] -> java.util.List[ImageSchema.OcvType].

@SparkQA
Copy link

SparkQA commented Jan 12, 2018

Test build #86059 has finished for PR 20168 at commit 2401add.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 13, 2018

Test build #86061 has finished for PR 20168 at commit d2a864e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

undefinedImageType -> -1,
"CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24
)
val ocvTypes = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we set the explicit type?

*/
val javaOcvTypes: java.util.Map[String, Int] = ocvTypes.asJava
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's set the explicit type here ..


/**
* (Java-specific) OpenCV type mapping supported
* (Java Specific) list of OpenCv types
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep as is (Java-specific).

@@ -71,9 +88,33 @@ def ocvTypes(self):
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we should fix the doc for :return:. Seems it's going to be a list now.

for x in ocvTypeList]
return self._ocvTypes[:]

def ocvTypeByName(self, name):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's write a doc and doctest too.

@SparkQA
Copy link

SparkQA commented Jan 16, 2018

Test build #86148 has finished for PR 20168 at commit 68a5a94.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 16, 2018

Test build #86149 has finished for PR 20168 at commit 9ec8cd3.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imatiach-msft
Copy link
Contributor

@MrBago @tomasatdatabricks I think the breaking changes are fine, the code was marked experimental and it is expected that the interfaces will change a lot initially based on early feedback. The PR looks good to me.

* OpenCv type representation
*
* @param mode ordinal for the type
* @param dataType open cv data type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nitpick: I think we should always spell it as "OpenCV" to be consistent in the comments (unless you have any good objections)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

)
def ocvTypeByName(name: String): OpenCvType = {
ocvTypes.find(x => x.name == name).getOrElse(
throw new IllegalArgumentException("Unknown open cv type " + name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same minor nitpick: "OpenCV" instead of "open cv", and in code below as well

@imatiach-msft
Copy link
Contributor

@MrBago @tomasatdatabricks the changes look good to me, I went through everything one more time, I'll sign off as soon as the python tests are fixed (it looks like there were some style issues in last commit) and all other dev comments are resolved, thanks!

* OpenCv type representation
*
* @param mode ordinal for the type
* @param dataType open cv data type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

*/
val ocvTypes: IndexedSeq[OpenCvType] = {
val types =
for (nc <- Array(1, 2, 3, 4);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numChannel

/**
* A Mapping of Type to Numbers in OpenCV
*
* C1 C2 C3 C4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a brief header for row/column.

@@ -37,20 +37,67 @@ import org.apache.spark.sql.types._
@Since("2.3.0")
object ImageSchema {

val undefinedImageType = "Undefined"
/**
* OpenCv type representation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a reference link for OpenCV data type? Like this one: https://docs.opencv.org/2.4/modules/core/doc/basic_structures.html

@@ -83,7 +83,8 @@ class ImageSchemaSuite extends SparkFunSuite with MLlibTestSparkContext {
val bytes20 = getData(row).slice(0, 20)

val (expectedMode, expectedBytes) = firstBytes20(filename)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you use ocvTypeByName below to look up for it, it should be named as expectedType or expectedTypeName, other than expectedMode?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch. The name is definitely misleading as it is now.


def ocvTypeByMode(self, mode):
"""
Return the supported OpenCvType with matching mode or raise error if there is no matching type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenCvType -> OcvType?

Return the supported OpenCvType with matching mode or raise error if there is no matching type.

:param: int mode: OpenCv type mode; must be equal to mode of one of the supported types.
:return: OpenCvType with matching mode.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenCvType -> OcvType?

ocvType = self.ocvTypeByMode(image.mode)
if nChannels != ocvType.nChannels:
raise ValueError(
"Image has %d channels but OcvType '%s' expects %d channels." %
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image has %d channels but its OcvType ...

return self._ocvTypes[:]


def ocvTypeByName(self, name):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getOcvTypeByName or findOcvTypeByName?


def test_conversions(self):
s = np.random.RandomState(seed=987)
ary_src = s.rand(4, 10, 10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ary_src -> array_src?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s.rand(4, 10, 10) -> s.rand(10, 10, 4)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was the intention, good catch.

@viirya
Copy link
Member

viirya commented Jan 16, 2018

Btw, I think this isn't only to add non-integer image formats. So the PR title may be changed too. Like "Add ImageSchema support for all OpenCV image types"?

@SparkQA
Copy link

SparkQA commented Jan 16, 2018

Test build #86156 has finished for PR 20168 at commit 896ccc2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Jan 16, 2018

Overall looks good to me. Just some minor comments regarding with code comments and naming.

@tomasatdatabricks tomasatdatabricks changed the title [SPARK-22730][ML] Add ImageSchema support for non-integer image formats [SPARK-22730][ML] Add ImageSchema support for all OpenCv types. Jan 16, 2018
Added test for conversion between array and image struct for all ocv types.
…rn correct name for Undefined type. Removed OpenCvType object and renamed the methods to match python side. + few cosmetic changes.
@SparkQA
Copy link

SparkQA commented Jan 16, 2018

Test build #86207 has finished for PR 20168 at commit 5a632f5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Jun 9, 2018

Test build #91608 has finished for PR 20168 at commit 5a632f5.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@imatiach-msft
Copy link
Contributor

@tomasatdatabricks it looks like there are some conflicts that need to be resolved, otherwise looks good to me, can you please update the PR?

@imatiach-msft
Copy link
Contributor

@tomasatdatabricks @MrBago @WeichenXu123 sorry, any updates on this PR? It has been a while.

@HyukjinKwon
Copy link
Member

@tomasatdatabricks, mind updating this? Lately I happened to take a look for this few times. I will try to review.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@HyukjinKwon
Copy link
Member

Closing this due to author's inactivity. Feel free to open another PR or take this over if anyone is interested in this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
9 participants