[SPARK-22730][ML] Add ImageSchema support for all OpenCv types. #20168

tomasatdatabricks · 2018-01-05T18:32:07Z

What changes were proposed in this pull request?

Added functionality to handle all OpenCV modes to ImageSchema:

updated toImage and toNDArray functions to handle non-uint8 based images.
add information about individual OpenCv modes

How was this patch tested?

Added test for conversion between numpy arrays and images stored as all possible OpenCV modes.

HyukjinKwon · 2018-01-06T03:43:00Z

ok to test

HyukjinKwon · 2018-01-06T03:44:14Z

cc @jkbradley, @imatiach-msft, @MrBago and @thunterdb.

HyukjinKwon · 2018-01-06T03:49:28Z

mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala

+    OpenCvType.undefinedType +: (ordinals zip types).map(x => OpenCvType(x._1, x._2._1, x._2._2))
+  }
+
+  val javaOcvTypes = ocvTypes.asJava


Hm .. why did we remove the doc here?

explicit type.

HyukjinKwon · 2018-01-06T03:51:53Z

python/pyspark/ml/image.py

+                                            dataType=x.dataType(),
+                                            nptype=self._ocvToNumpyMap[x.dataType()])
+                              for x in ocvTypeList]
+        return self._ocvTypes[:]


Is it for copy? I usually do list(self._ocvTypes) tho.

yes it is for copy.

HyukjinKwon · 2018-01-06T03:54:42Z

python/pyspark/ml/image.py

+                    self._ocvTypesByName.keys())))
+        return self._ocvTypesByName[name]
+
+    def ocvTypeByMode(self, mode):


Is it meant to be public? Seems doc is missing and this one doesn't look consistent with Scala side?

Why it is not consistent with Scala side?

Because I am not seeing the method called ocvTypeByMode in ImageSchema.scala.

Spark's approach has been to keep python and scala APIs as close as possible.

We should either try and copy the scala api here (make an OpenCvType object with a get method and __call__ method), or drop the OpenCvType object from scala side and use the same method names as we're adding to the python side.

It is not consistent because python can not overload methods based on type but I can rename the Scala side to match python. It does not make a big difference in this case.

I think we could make either API work for both languages but it's a bit unnatural. There's a tradeoff between doing the most natural and appropriate thing in each language and having matching APIs, Spark has chosen to prefer making the APIs match so let's do our best to do that.

SparkQA · 2018-01-06T04:25:37Z

Test build #85742 has finished for PR 20168 at commit 70bae2f.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-01-06T04:29:24Z

Let's fix the PR title to [SPARK-22730][ML] ... BTW.

imatiach-msft · 2018-01-08T21:06:58Z

python/pyspark/ml/image.py

@@ -201,8 +243,9 @@ def readImages(self, path, recursive=False, numPartitions=-1,
        .. versionadded:: 2.3.0
        """

-        spark = SparkSession.builder.getOrCreate()
-        image_schema = spark._jvm.org.apache.spark.ml.image.ImageSchema
+        ctx = SparkContext.getOrCreate()


minor comment: the change before was only two lines whereas this is 3 lines - I think the previous change was better, what is the advantage of the new code?

Good catch. Looks like this was caused by auto-rebasing onto the latest changes. My original change was only to replace _active_context with getOrCreate() but it was clearly made obsolete in the meantime. I'll just remove it.

imatiach-msft · 2018-01-08T21:08:33Z

python/pyspark/ml/tests.py

+            self.assertEqual(x, ImageSchema.ocvTypeByMode(x.mode))
+
+    def test_conversions(self):
+        ary_src = [[[1e7*random.random() for z in range(4)] for y in range(10)] for x in range(10)]


can you pass a seed to the random generator (I believe the policy for spark is to always generate the same random numbers in tests in order to reduce flaky tests)

How about something like:

s = np.random.RandomState(seed=987) ary_src = s.rand(4, 10, 10)

That's a good point, will do.

imatiach-msft · 2018-01-08T21:13:55Z

python/pyspark/ml/tests.py

@@ -1843,6 +1844,28 @@ def tearDown(self):

 class ImageReaderTest(SparkSessionTestCase):

+    def test_ocv_types(self):
+        ocvList = ImageSchema.ocvTypes
+        self.assertEqual("Undefined", ocvList[0].name)


confused, didn't you make this:
def name: String = "CV_" + dataType + "C" + nChannels
or "CV_-1C-1" instead of "Undefined" - how does the test pass?

I think this test is failing:

====================================================================== FAIL: test_ocv_types (pyspark.ml.tests.ImageReaderTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/ml/tests.py", line 1849, in test_ocv_types self.assertEqual("Undefined", ocvList[0].name) AssertionError: 'Undefined' != u'CV_N/AC-1' ----------------------------------------------------------------------

Yes, good catch. There should have been if else clause setting name to Undefined for mode == -1.

imatiach-msft · 2018-01-08T21:20:44Z

mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala

+      ocvTypes.find(x => x.mode == mode).getOrElse(
+        throw new IllegalArgumentException("Unknown open cv mode " + mode))
+    }
+    val undefinedType = OpenCvType(-1, "N/A", -1)


if I understand correctly the name for undefinedType will then be:
def name: String = "CV_" + dataType + "C" + nChannels
which is "CV_-1C-1"? But below in the tests you are checking for the name to be "Undefined"? If this is correct, can it be fixed by overriding the name to check if datatype is N/A or something similar and in that case using a name of "Undefined"?

Can you add the type here, I think all public members need to have explicit type, even trivial members.

Yes, the name method should have special case for mode == -1.

imatiach-msft · 2018-01-08T21:22:19Z

mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala

@@ -143,12 +174,12 @@ object ImageSchema {

      val height = img.getHeight
      val width = img.getWidth
-      val (nChannels, mode) = if (isGray) {
-        (1, ocvTypes("CV_8UC1"))
+      val (nChannels, mode: Int) = if (isGray) {


don't you need to add the other types here (eg floating point, etc)?

In the code bellow we always call toBytes per channel so everything will be cast to a 1 byte per channel image type. I think this is fine for standard image types (ie, jpg, png, and the like).

Technically you can register image readers in java and use a BufferedImage of TYPE_CUSTOM for more complex images, but in practice I think we'll want to take a very different code path for these complex images so we'll want to introduce a new method if we add support to handle them.

The decode function still only handles unsigned bytes. The intention of this PR was only to add support for conversions between ImageSchema struct and numpy arrays and vice versa. Such use case occurs e.g. if you want to store images that have been preprocessed to be used by TF model (e.g. normalized).

ah, sorry, I see, I was confused by the description, if we aren't supporting float images for reading that makes sense now

imatiach-msft · 2018-01-08T21:29:10Z

python/pyspark/ml/tests.py

+            self.assertEqual(ocvType, ImageSchema.ocvTypeByMode(img.mode))
+            npary1 = ImageSchema.toNDArray(img)
+            np.testing.assert_array_equal(npary0, npary1)
+
    def test_read_images(self):
        data_path = 'data/mllib/images/kittens'


not necessary for this PR, but it would be nice to add images with the new types to verify they can be read in with the new changes

As per my comment above, this PR does not address reading images, only image schema <=> numpy arrays conversions.

imatiach-msft · 2018-01-08T21:31:40Z

@tomasatdatabricks nice PR! I've added a few comments.

WeichenXu123 · 2018-01-08T21:46:55Z

python/pyspark/ml/image.py

+    def ocvTypeByMode(self, mode):
+        if self._ocvTypesByMode is None:
+            self._ocvTypesByMode = {x.mode: x for x in self.ocvTypes}
+        return self._ocvTypesByMode[mode]


Why not add if mode not in self._ocvTypesByMode: check here ?

good catch, it should be there.

WeichenXu123 · 2018-01-08T21:48:21Z

python/pyspark/ml/image.py

+                    self._ocvTypesByName.keys())))
+        return self._ocvTypesByName[name]
+
+    def ocvTypeByMode(self, mode):


Why it is not consistent with Scala side?

WeichenXu123 · 2018-01-08T21:49:59Z

python/pyspark/ml/image.py

@@ -55,7 +72,7 @@ def imageSchema(self):
        """

        if self._imageSchema is None:
-            ctx = SparkContext._active_spark_context
+            ctx = SparkContext.getOrCreate()


Have you check every place to use getOrCreate ?

Yes, everywhere in image schema code at least.

MrBago

@tomasatdatabricks I left some comments. Can you take a look at the tests, there seem to be some failures. I can take another pass when you're ready.

MrBago · 2018-01-08T21:49:04Z

python/pyspark/ml/tests.py

+            self.assertEqual(x, ImageSchema.ocvTypeByMode(x.mode))
+
+    def test_conversions(self):
+        ary_src = [[[1e7*random.random() for z in range(4)] for y in range(10)] for x in range(10)]


How about something like:

s = np.random.RandomState(seed=987) ary_src = s.rand(4, 10, 10)

MrBago · 2018-01-08T22:01:04Z

mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala

@@ -143,12 +174,12 @@ object ImageSchema {

      val height = img.getHeight
      val width = img.getWidth
-      val (nChannels, mode) = if (isGray) {
-        (1, ocvTypes("CV_8UC1"))
+      val (nChannels, mode: Int) = if (isGray) {


In the code bellow we always call toBytes per channel so everything will be cast to a 1 byte per channel image type. I think this is fine for standard image types (ie, jpg, png, and the like).

Technically you can register image readers in java and use a BufferedImage of TYPE_CUSTOM for more complex images, but in practice I think we'll want to take a very different code path for these complex images so we'll want to introduce a new method if we add support to handle them.

MrBago · 2018-01-08T22:02:06Z

mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala

+    OpenCvType.undefinedType +: (ordinals zip types).map(x => OpenCvType(x._1, x._2._1, x._2._2))
+  }
+
+  val javaOcvTypes = ocvTypes.asJava


explicit type.

MrBago · 2018-01-08T22:02:19Z

mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala

+      ocvTypes.find(x => x.mode == mode).getOrElse(
+        throw new IllegalArgumentException("Unknown open cv mode " + mode))
+    }
+    val undefinedType = OpenCvType(-1, "N/A", -1)


Can you add the type here, I think all public members need to have explicit type, even trivial members.

MrBago · 2018-01-09T00:20:17Z

mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala

-  )
+  case class OpenCvType(mode: Int, dataType: String, nChannels: Int) {
+    def name: String = "CV_" + dataType + "C" + nChannels
+    override def toString: String = "OpenCvType(mode = " + mode + ", name = " + name + ")"


Spark uses scala "string interpolation" a lot of places, I think it's quite nice and very readable: s"OpenCvType(mode = $mode, name = $name)"

MrBago · 2018-01-09T00:32:05Z

python/pyspark/ml/tests.py

@@ -1843,6 +1844,28 @@ def tearDown(self):

 class ImageReaderTest(SparkSessionTestCase):

+    def test_ocv_types(self):
+        ocvList = ImageSchema.ocvTypes
+        self.assertEqual("Undefined", ocvList[0].name)


I think this test is failing:

====================================================================== FAIL: test_ocv_types (pyspark.ml.tests.ImageReaderTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/ml/tests.py", line 1849, in test_ocv_types self.assertEqual("Undefined", ocvList[0].name) AssertionError: 'Undefined' != u'CV_N/AC-1' ----------------------------------------------------------------------

MrBago · 2018-01-09T00:35:14Z

python/pyspark/ml/tests.py

+                continue
+            x = [[ary_src[i][j][0:ocvType.nChannels]
+                  for j in range(len(ary_src[0]))] for i in range(len(ary_src))]
+            npary0 = np.array(x).astype(ocvType.nptype)


If ary_src is an array, this becomes:

nparry0 = ary_src[..., :ocvType.nChannels].astype(ocvType.nptype)

good point, I'll change that.

MrBago · 2018-01-09T00:37:29Z

python/pyspark/ml/image.py

+        if dtype not in self._numpyToOcvMap:
+            raise ValueError(
+                "Unsupported array data type '%s', currently only supported formats are %s" %
+                (str(array.dtype), str(self._numpyToOcvMap.keys())))


"%s" will call __str__ automatically so you don't need to wrap in str.

MrBago · 2018-01-09T00:40:23Z

python/pyspark/ml/image.py

+            raise ValueError(
+                "Can not find matching OpenCvFormat for type = '%s'; supported formats are = %s" %
+                (name, str(
+                    self._ocvTypesByName.keys())))


style: can we put this on the above line?

MrBago · 2018-01-09T01:28:11Z

python/pyspark/ml/image.py

+                    self._ocvTypesByName.keys())))
+        return self._ocvTypesByName[name]
+
+    def ocvTypeByMode(self, mode):


Spark's approach has been to keep python and scala APIs as close as possible.

We should either try and copy the scala api here (make an OpenCvType object with a get method and __call__ method), or drop the OpenCvType object from scala side and use the same method names as we're adding to the python side.

SparkQA · 2018-01-09T21:05:21Z

Test build #85878 has finished for PR 20168 at commit eee25ce.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-01-09T21:52:26Z

Test build #85880 has finished for PR 20168 at commit 48eddf1.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-01-09T23:23:13Z

Test build #85884 has finished for PR 20168 at commit 763c8a6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

MrBago

My only remaining concerns with this PR are the breaking changes it introduces, @jkbradley @imatiach-msft how do you feel about having breaking change in ImageSchema between 2.3 & 2.4?
ImageSchema is experimental and some of these changes could make maintenance easier long term.

@tomasatdatabricks do you mind documenting the breaking changes, we'll need this documentation anyways for the release notes eventually.

Otherwise LGTM!

MrBago · 2018-01-12T21:24:05Z

python/pyspark/ml/tests.py

+            img = ImageSchema.toImage(npary0)
+            self.assertEqual(ocvType, ImageSchema.ocvTypeByMode(img.mode))
+            npary1 = ImageSchema.toNDArray(img)
+            np.testing.assert_array_equal(npary0, npary1)


Can we also check nparry1.dtype = ocvType.nptype, numpy allows arrays of different types to be equal if their contents compare equal, eg [0, 1] == [0.0, 1.0].

good point, I'll add the test.

tomasatdatabricks · 2018-01-12T21:54:54Z

@MrBago Here is the description of the breaking changes. ImageSchema.ocvTypes and ImageSchema.javaOcvTypes changed types from Map[String,Int] to list of OpenCvType.

ImageSchema.ocvTypes: Map[String, Int] -> IndexedSeq[ImageSchema.OcvType].
ImageSchema.javaOcvTypes: java.util.Map[String,Int] -> java.util.List[ImageSchema.OcvType].

SparkQA · 2018-01-12T23:14:41Z

Test build #86059 has finished for PR 20168 at commit 2401add.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-01-13T00:37:47Z

Test build #86061 has finished for PR 20168 at commit d2a864e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-01-13T07:20:49Z

mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala

-    undefinedImageType -> -1,
-    "CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24
-  )
+  val ocvTypes = {


Could we set the explicit type?

HyukjinKwon · 2018-01-13T07:21:03Z

mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala

   */
-  val javaOcvTypes: java.util.Map[String, Int] = ocvTypes.asJava


Let's set the explicit type here ..

HyukjinKwon · 2018-01-13T07:24:24Z

mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala


  /**
-   * (Java-specific) OpenCV type mapping supported
+   *  (Java Specific) list of OpenCv types


Let's keep as is (Java-specific).

HyukjinKwon · 2018-01-13T07:26:21Z

python/pyspark/ml/image.py

@@ -71,9 +88,33 @@ def ocvTypes(self):
        """


Seems we should fix the doc for :return:. Seems it's going to be a list now.

HyukjinKwon · 2018-01-13T07:29:03Z

python/pyspark/ml/image.py

+                              for x in ocvTypeList]
+        return self._ocvTypes[:]
+
+    def ocvTypeByName(self, name):


Let's write a doc and doctest too.

SparkQA · 2018-01-16T02:29:40Z

Test build #86148 has finished for PR 20168 at commit 68a5a94.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-01-16T02:39:46Z

Test build #86149 has finished for PR 20168 at commit 9ec8cd3.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

imatiach-msft · 2018-01-16T03:54:03Z

@MrBago @tomasatdatabricks I think the breaking changes are fine, the code was marked experimental and it is expected that the interfaces will change a lot initially based on early feedback. The PR looks good to me.

imatiach-msft · 2018-01-16T03:56:36Z

mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala

+   * OpenCv type representation
+   *
+   * @param mode ordinal for the type
+   * @param dataType open cv data type


small nitpick: I think we should always spell it as "OpenCV" to be consistent in the comments (unless you have any good objections)

imatiach-msft · 2018-01-16T03:58:03Z

mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala

-  )
+  def ocvTypeByName(name: String): OpenCvType = {
+    ocvTypes.find(x => x.name == name).getOrElse(
+      throw new IllegalArgumentException("Unknown open cv type " + name))


same minor nitpick: "OpenCV" instead of "open cv", and in code below as well

imatiach-msft · 2018-01-16T04:07:10Z

@MrBago @tomasatdatabricks the changes look good to me, I went through everything one more time, I'll sign off as soon as the python tests are fixed (it looks like there were some style issues in last commit) and all other dev comments are resolved, thanks!

viirya · 2018-01-16T05:08:13Z

mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala

+   * OpenCv type representation
+   *
+   * @param mode ordinal for the type
+   * @param dataType open cv data type


viirya · 2018-01-16T05:13:38Z

mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala

+   */
+  val ocvTypes: IndexedSeq[OpenCvType] = {
+    val types =
+      for (nc <- Array(1, 2, 3, 4);


numChannel

viirya · 2018-01-16T05:23:56Z

mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala

+  /**
+   * A Mapping of Type to Numbers in OpenCV
+   *
+   *        C1 C2  C3  C4


Add a brief header for row/column.

viirya · 2018-01-16T05:27:40Z

mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala

@@ -37,20 +37,67 @@ import org.apache.spark.sql.types._
 @Since("2.3.0")
 object ImageSchema {

-  val undefinedImageType = "Undefined"
+  /**
+   * OpenCv type representation


Add a reference link for OpenCV data type? Like this one: https://docs.opencv.org/2.4/modules/core/doc/basic_structures.html

viirya · 2018-01-16T05:34:42Z

mllib/src/test/scala/org/apache/spark/ml/image/ImageSchemaSuite.scala

@@ -83,7 +83,8 @@ class ImageSchemaSuite extends SparkFunSuite with MLlibTestSparkContext {
        val bytes20 = getData(row).slice(0, 20)

        val (expectedMode, expectedBytes) = firstBytes20(filename)


Since you use ocvTypeByName below to look up for it, it should be named as expectedType or expectedTypeName, other than expectedMode?

Yes, good catch. The name is definitely misleading as it is now.

viirya · 2018-01-16T05:44:13Z

python/pyspark/ml/image.py

+
+    def ocvTypeByMode(self, mode):
+        """
+        Return the supported OpenCvType with matching mode or raise error if there is no matching type.


OpenCvType -> OcvType?

viirya · 2018-01-16T05:44:18Z

python/pyspark/ml/image.py

+        Return the supported OpenCvType with matching mode or raise error if there is no matching type.
+
+        :param: int mode: OpenCv type mode; must be equal to mode of one of the supported types.
+        :return: OpenCvType with matching mode.


OpenCvType -> OcvType?

viirya · 2018-01-16T05:46:26Z

python/pyspark/ml/image.py

+        ocvType = self.ocvTypeByMode(image.mode)
+        if nChannels != ocvType.nChannels:
+            raise ValueError(
+                "Image has %d channels but OcvType '%s' expects %d channels." %


Image has %d channels but its OcvType ...

viirya · 2018-01-16T05:50:06Z

python/pyspark/ml/image.py

+        return self._ocvTypes[:]
+
+
+    def ocvTypeByName(self, name):


getOcvTypeByName or findOcvTypeByName?

viirya · 2018-01-16T05:55:53Z

python/pyspark/ml/tests.py

+
+    def test_conversions(self):
+        s = np.random.RandomState(seed=987)
+        ary_src = s.rand(4, 10, 10)


ary_src -> array_src?

s.rand(4, 10, 10) -> s.rand(10, 10, 4)?

Yes, that was the intention, good catch.

viirya · 2018-01-16T06:14:08Z

Btw, I think this isn't only to add non-integer image formats. So the PR title may be changed too. Like "Add ImageSchema support for all OpenCV image types"?

SparkQA · 2018-01-16T07:02:33Z

Test build #86156 has finished for PR 20168 at commit 896ccc2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2018-01-16T07:16:19Z

Overall looks good to me. Just some minor comments regarding with code comments and naming.

Added test for conversion between array and image struct for all ocv types.

…rn correct name for Undefined type. Removed OpenCvType object and renamed the methods to match python side. + few cosmetic changes.

…pesJava, fixed/added python comments.

SparkQA · 2018-01-16T23:45:41Z

Test build #86207 has finished for PR 20168 at commit 5a632f5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-06-09T09:00:16Z

ok to test

SparkQA · 2018-06-09T14:02:49Z

Test build #91608 has finished for PR 20168 at commit 5a632f5.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

imatiach-msft · 2018-06-10T18:26:06Z

@tomasatdatabricks it looks like there are some conflicts that need to be resolved, otherwise looks good to me, can you please update the PR?

imatiach-msft · 2018-09-27T21:17:18Z

@tomasatdatabricks @MrBago @WeichenXu123 sorry, any updates on this PR? It has been a while.

HyukjinKwon · 2018-11-11T03:42:28Z

@tomasatdatabricks, mind updating this? Lately I happened to take a look for this few times. I will try to review.

AmplabJenkins · 2019-09-16T18:23:33Z

Can one of the admins verify this patch?

HyukjinKwon · 2019-09-17T00:21:54Z

Closing this due to author's inactivity. Feel free to open another PR or take this over if anyone is interested in this.

HyukjinKwon reviewed Jan 6, 2018

View reviewed changes

imatiach-msft reviewed Jan 8, 2018

View reviewed changes

WeichenXu123 reviewed Jan 8, 2018

View reviewed changes

MrBago reviewed Jan 9, 2018

View reviewed changes

tomasatdatabricks changed the title ~~SPARK-22730 Add ImageSchema support for non-integer image formats~~ [SPARK-22730][ML] Add ImageSchema support for non-integer image formats Jan 9, 2018

tomasatdatabricks force-pushed the tomas/ImageSchemaUpdate branch from 70bae2f to eee25ce Compare January 9, 2018 20:58

tomasatdatabricks force-pushed the tomas/ImageSchemaUpdate branch 2 times, most recently from d1b0ed2 to 48eddf1 Compare January 9, 2018 21:11

tomasatdatabricks force-pushed the tomas/ImageSchemaUpdate branch from 48eddf1 to 763c8a6 Compare January 9, 2018 22:13

MrBago reviewed Jan 12, 2018

View reviewed changes

tomasatdatabricks force-pushed the tomas/ImageSchemaUpdate branch from 2401add to d2a864e Compare January 12, 2018 23:30

HyukjinKwon reviewed Jan 13, 2018

View reviewed changes

tomasatdatabricks force-pushed the tomas/ImageSchemaUpdate branch from d2a864e to 68a5a94 Compare January 16, 2018 02:25

tomasatdatabricks force-pushed the tomas/ImageSchemaUpdate branch from 68a5a94 to 9ec8cd3 Compare January 16, 2018 02:35

imatiach-msft reviewed Jan 16, 2018

View reviewed changes

tomasatdatabricks force-pushed the tomas/ImageSchemaUpdate branch from 9ec8cd3 to 896ccc2 Compare January 16, 2018 05:53

viirya reviewed Jan 16, 2018

View reviewed changes

tomasatdatabricks changed the title ~~[SPARK-22730][ML] Add ImageSchema support for non-integer image formats~~ [SPARK-22730][ML] Add ImageSchema support for all OpenCv types. Jan 16, 2018

tomasatdatabricks added 5 commits January 16, 2018 14:34

Added functionality for handling non-uint8-based images for ImageSchema

32064ce

Added test for conversion between array and image struct for all ocv types.

Addressed reviewers comments. Fixed name method on OpenCvType to retu…

53c4d76

…rn correct name for Undefined type. Removed OpenCvType object and renamed the methods to match python side. + few cosmetic changes.

Minor test fix - added type check to numpy array comparison

490454a

Adressed review comments: Added explicit types for ocvTypes and ocvTy…

31fef5e

…pesJava, fixed/added python comments.

Adressed reviw comments. Mostly update in comments, variable names.

5a632f5

tomasatdatabricks force-pushed the tomas/ImageSchemaUpdate branch from 896ccc2 to 5a632f5 Compare January 16, 2018 22:35

dongjoon-hyun added the ML label Jun 14, 2019

HyukjinKwon closed this Sep 17, 2019

		*/
		val javaOcvTypes: java.util.Map[String, Int] = ocvTypes.asJava

		@@ -83,7 +83,8 @@ class ImageSchemaSuite extends SparkFunSuite with MLlibTestSparkContext {
		val bytes20 = getData(row).slice(0, 20)

		val (expectedMode, expectedBytes) = firstBytes20(filename)

[SPARK-22730][ML] Add ImageSchema support for all OpenCv types. #20168

[SPARK-22730][ML] Add ImageSchema support for all OpenCv types. #20168

Conversation

tomasatdatabricks commented Jan 5, 2018

What changes were proposed in this pull request?

How was this patch tested?

HyukjinKwon commented Jan 6, 2018

HyukjinKwon commented Jan 6, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jan 6, 2018

HyukjinKwon commented Jan 6, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

imatiach-msft Jan 8, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomasatdatabricks Jan 9, 2018 • edited

Choose a reason for hiding this comment

imatiach-msft Jan 8, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

imatiach-msft commented Jan 8, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MrBago left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MrBago Jan 9, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jan 9, 2018

SparkQA commented Jan 9, 2018

SparkQA commented Jan 9, 2018

MrBago left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomasatdatabricks commented Jan 12, 2018

SparkQA commented Jan 12, 2018

SparkQA commented Jan 13, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jan 16, 2018

SparkQA commented Jan 16, 2018

imatiach-msft commented Jan 16, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

imatiach-msft Jan 8, 2018 •

edited

tomasatdatabricks Jan 9, 2018 •

edited

imatiach-msft Jan 8, 2018 •

edited

MrBago Jan 9, 2018 •

edited