New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-22730][ML] Add ImageSchema support for all OpenCv types. #20168
[SPARK-22730][ML] Add ImageSchema support for all OpenCv types. #20168
Conversation
ok to test |
cc @jkbradley, @imatiach-msft, @MrBago and @thunterdb. |
OpenCvType.undefinedType +: (ordinals zip types).map(x => OpenCvType(x._1, x._2._1, x._2._2)) | ||
} | ||
|
||
val javaOcvTypes = ocvTypes.asJava |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm .. why did we remove the doc here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explicit type.
dataType=x.dataType(), | ||
nptype=self._ocvToNumpyMap[x.dataType()]) | ||
for x in ocvTypeList] | ||
return self._ocvTypes[:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it for copy? I usually do list(self._ocvTypes)
tho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it is for copy.
python/pyspark/ml/image.py
Outdated
self._ocvTypesByName.keys()))) | ||
return self._ocvTypesByName[name] | ||
|
||
def ocvTypeByMode(self, mode): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it meant to be public? Seems doc is missing and this one doesn't look consistent with Scala side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why it is not consistent with Scala side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I am not seeing the method called ocvTypeByMode
in ImageSchema.scala
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark's approach has been to keep python and scala APIs as close as possible.
We should either try and copy the scala api here (make an OpenCvType
object with a get
method and __call__
method), or drop the OpenCvType
object from scala side and use the same method names as we're adding to the python side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not consistent because python can not overload methods based on type but I can rename the Scala side to match python. It does not make a big difference in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could make either API work for both languages but it's a bit unnatural. There's a tradeoff between doing the most natural and appropriate thing in each language and having matching APIs, Spark has chosen to prefer making the APIs match so let's do our best to do that.
Test build #85742 has finished for PR 20168 at commit
|
Let's fix the PR title to |
python/pyspark/ml/image.py
Outdated
@@ -201,8 +243,9 @@ def readImages(self, path, recursive=False, numPartitions=-1, | |||
.. versionadded:: 2.3.0 | |||
""" | |||
|
|||
spark = SparkSession.builder.getOrCreate() | |||
image_schema = spark._jvm.org.apache.spark.ml.image.ImageSchema | |||
ctx = SparkContext.getOrCreate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment: the change before was only two lines whereas this is 3 lines - I think the previous change was better, what is the advantage of the new code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Looks like this was caused by auto-rebasing onto the latest changes. My original change was only to replace _active_context with getOrCreate() but it was clearly made obsolete in the meantime. I'll just remove it.
python/pyspark/ml/tests.py
Outdated
self.assertEqual(x, ImageSchema.ocvTypeByMode(x.mode)) | ||
|
||
def test_conversions(self): | ||
ary_src = [[[1e7*random.random() for z in range(4)] for y in range(10)] for x in range(10)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you pass a seed to the random generator (I believe the policy for spark is to always generate the same random numbers in tests in order to reduce flaky tests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about something like:
s = np.random.RandomState(seed=987)
ary_src = s.rand(4, 10, 10)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, will do.
@@ -1843,6 +1844,28 @@ def tearDown(self): | |||
|
|||
class ImageReaderTest(SparkSessionTestCase): | |||
|
|||
def test_ocv_types(self): | |||
ocvList = ImageSchema.ocvTypes | |||
self.assertEqual("Undefined", ocvList[0].name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confused, didn't you make this:
def name: String = "CV_" + dataType + "C" + nChannels
or "CV_-1C-1" instead of "Undefined" - how does the test pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test is failing:
======================================================================
FAIL: test_ocv_types (pyspark.ml.tests.ImageReaderTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/ml/tests.py", line 1849, in test_ocv_types
self.assertEqual("Undefined", ocvList[0].name)
AssertionError: 'Undefined' != u'CV_N/AC-1'
----------------------------------------------------------------------
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good catch. There should have been if else clause setting name to Undefined for mode == -1.
ocvTypes.find(x => x.mode == mode).getOrElse( | ||
throw new IllegalArgumentException("Unknown open cv mode " + mode)) | ||
} | ||
val undefinedType = OpenCvType(-1, "N/A", -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I understand correctly the name for undefinedType will then be:
def name: String = "CV_" + dataType + "C" + nChannels
which is "CV_-1C-1"? But below in the tests you are checking for the name to be "Undefined"? If this is correct, can it be fixed by overriding the name to check if datatype is N/A or something similar and in that case using a name of "Undefined"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the type here, I think all public members need to have explicit type, even trivial members.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the name method should have special case for mode == -1.
@@ -143,12 +174,12 @@ object ImageSchema { | |||
|
|||
val height = img.getHeight | |||
val width = img.getWidth | |||
val (nChannels, mode) = if (isGray) { | |||
(1, ocvTypes("CV_8UC1")) | |||
val (nChannels, mode: Int) = if (isGray) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't you need to add the other types here (eg floating point, etc)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the code bellow we always call toBytes
per channel so everything will be cast to a 1 byte per channel image type. I think this is fine for standard image types (ie, jpg, png, and the like).
Technically you can register image readers in java and use a BufferedImage of TYPE_CUSTOM
for more complex images, but in practice I think we'll want to take a very different code path for these complex images so we'll want to introduce a new method if we add support to handle them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The decode function still only handles unsigned bytes. The intention of this PR was only to add support for conversions between ImageSchema struct and numpy arrays and vice versa. Such use case occurs e.g. if you want to store images that have been preprocessed to be used by TF model (e.g. normalized).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, sorry, I see, I was confused by the description, if we aren't supporting float images for reading that makes sense now
python/pyspark/ml/tests.py
Outdated
self.assertEqual(ocvType, ImageSchema.ocvTypeByMode(img.mode)) | ||
npary1 = ImageSchema.toNDArray(img) | ||
np.testing.assert_array_equal(npary0, npary1) | ||
|
||
def test_read_images(self): | ||
data_path = 'data/mllib/images/kittens' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not necessary for this PR, but it would be nice to add images with the new types to verify they can be read in with the new changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per my comment above, this PR does not address reading images, only image schema <=> numpy arrays conversions.
@tomasatdatabricks nice PR! I've added a few comments. |
python/pyspark/ml/image.py
Outdated
def ocvTypeByMode(self, mode): | ||
if self._ocvTypesByMode is None: | ||
self._ocvTypesByMode = {x.mode: x for x in self.ocvTypes} | ||
return self._ocvTypesByMode[mode] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not add if mode not in self._ocvTypesByMode:
check here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, it should be there.
python/pyspark/ml/image.py
Outdated
self._ocvTypesByName.keys()))) | ||
return self._ocvTypesByName[name] | ||
|
||
def ocvTypeByMode(self, mode): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why it is not consistent with Scala side?
@@ -55,7 +72,7 @@ def imageSchema(self): | |||
""" | |||
|
|||
if self._imageSchema is None: | |||
ctx = SparkContext._active_spark_context | |||
ctx = SparkContext.getOrCreate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you check every place to use getOrCreate
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, everywhere in image schema code at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomasatdatabricks I left some comments. Can you take a look at the tests, there seem to be some failures. I can take another pass when you're ready.
python/pyspark/ml/tests.py
Outdated
self.assertEqual(x, ImageSchema.ocvTypeByMode(x.mode)) | ||
|
||
def test_conversions(self): | ||
ary_src = [[[1e7*random.random() for z in range(4)] for y in range(10)] for x in range(10)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about something like:
s = np.random.RandomState(seed=987)
ary_src = s.rand(4, 10, 10)
@@ -143,12 +174,12 @@ object ImageSchema { | |||
|
|||
val height = img.getHeight | |||
val width = img.getWidth | |||
val (nChannels, mode) = if (isGray) { | |||
(1, ocvTypes("CV_8UC1")) | |||
val (nChannels, mode: Int) = if (isGray) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the code bellow we always call toBytes
per channel so everything will be cast to a 1 byte per channel image type. I think this is fine for standard image types (ie, jpg, png, and the like).
Technically you can register image readers in java and use a BufferedImage of TYPE_CUSTOM
for more complex images, but in practice I think we'll want to take a very different code path for these complex images so we'll want to introduce a new method if we add support to handle them.
OpenCvType.undefinedType +: (ordinals zip types).map(x => OpenCvType(x._1, x._2._1, x._2._2)) | ||
} | ||
|
||
val javaOcvTypes = ocvTypes.asJava |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explicit type.
ocvTypes.find(x => x.mode == mode).getOrElse( | ||
throw new IllegalArgumentException("Unknown open cv mode " + mode)) | ||
} | ||
val undefinedType = OpenCvType(-1, "N/A", -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the type here, I think all public members need to have explicit type, even trivial members.
) | ||
case class OpenCvType(mode: Int, dataType: String, nChannels: Int) { | ||
def name: String = "CV_" + dataType + "C" + nChannels | ||
override def toString: String = "OpenCvType(mode = " + mode + ", name = " + name + ")" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark uses scala "string interpolation" a lot of places, I think it's quite nice and very readable: s"OpenCvType(mode = $mode, name = $name)"
@@ -1843,6 +1844,28 @@ def tearDown(self): | |||
|
|||
class ImageReaderTest(SparkSessionTestCase): | |||
|
|||
def test_ocv_types(self): | |||
ocvList = ImageSchema.ocvTypes | |||
self.assertEqual("Undefined", ocvList[0].name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test is failing:
======================================================================
FAIL: test_ocv_types (pyspark.ml.tests.ImageReaderTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/ml/tests.py", line 1849, in test_ocv_types
self.assertEqual("Undefined", ocvList[0].name)
AssertionError: 'Undefined' != u'CV_N/AC-1'
----------------------------------------------------------------------
python/pyspark/ml/tests.py
Outdated
continue | ||
x = [[ary_src[i][j][0:ocvType.nChannels] | ||
for j in range(len(ary_src[0]))] for i in range(len(ary_src))] | ||
npary0 = np.array(x).astype(ocvType.nptype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ary_src is an array, this becomes:
nparry0 = ary_src[..., :ocvType.nChannels].astype(ocvType.nptype)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, I'll change that.
if dtype not in self._numpyToOcvMap: | ||
raise ValueError( | ||
"Unsupported array data type '%s', currently only supported formats are %s" % | ||
(str(array.dtype), str(self._numpyToOcvMap.keys()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"%s" will call __str__
automatically so you don't need to wrap in str
.
python/pyspark/ml/image.py
Outdated
raise ValueError( | ||
"Can not find matching OpenCvFormat for type = '%s'; supported formats are = %s" % | ||
(name, str( | ||
self._ocvTypesByName.keys()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: can we put this on the above line?
python/pyspark/ml/image.py
Outdated
self._ocvTypesByName.keys()))) | ||
return self._ocvTypesByName[name] | ||
|
||
def ocvTypeByMode(self, mode): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark's approach has been to keep python and scala APIs as close as possible.
We should either try and copy the scala api here (make an OpenCvType
object with a get
method and __call__
method), or drop the OpenCvType
object from scala side and use the same method names as we're adding to the python side.
70bae2f
to
eee25ce
Compare
Test build #85878 has finished for PR 20168 at commit
|
d1b0ed2
to
48eddf1
Compare
Test build #85880 has finished for PR 20168 at commit
|
48eddf1
to
763c8a6
Compare
Test build #85884 has finished for PR 20168 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only remaining concerns with this PR are the breaking changes it introduces, @jkbradley @imatiach-msft how do you feel about having breaking change in ImageSchema between 2.3 & 2.4?
ImageSchema is experimental and some of these changes could make maintenance easier long term.
@tomasatdatabricks do you mind documenting the breaking changes, we'll need this documentation anyways for the release notes eventually.
Otherwise LGTM!
python/pyspark/ml/tests.py
Outdated
img = ImageSchema.toImage(npary0) | ||
self.assertEqual(ocvType, ImageSchema.ocvTypeByMode(img.mode)) | ||
npary1 = ImageSchema.toNDArray(img) | ||
np.testing.assert_array_equal(npary0, npary1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also check nparry1.dtype = ocvType.nptype
, numpy allows arrays of different types to be equal if their contents compare equal, eg [0, 1] == [0.0, 1.0]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, I'll add the test.
@MrBago Here is the description of the breaking changes. ImageSchema.ocvTypes and ImageSchema.javaOcvTypes changed types from Map[String,Int] to list of OpenCvType. ImageSchema.ocvTypes: Map[String, Int] -> IndexedSeq[ImageSchema.OcvType]. |
Test build #86059 has finished for PR 20168 at commit
|
2401add
to
d2a864e
Compare
Test build #86061 has finished for PR 20168 at commit
|
undefinedImageType -> -1, | ||
"CV_8U" -> 0, "CV_8UC1" -> 0, "CV_8UC3" -> 16, "CV_8UC4" -> 24 | ||
) | ||
val ocvTypes = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we set the explicit type?
*/ | ||
val javaOcvTypes: java.util.Map[String, Int] = ocvTypes.asJava |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's set the explicit type here ..
|
||
/** | ||
* (Java-specific) OpenCV type mapping supported | ||
* (Java Specific) list of OpenCv types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep as is (Java-specific)
.
@@ -71,9 +88,33 @@ def ocvTypes(self): | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we should fix the doc for :return:
. Seems it's going to be a list now.
for x in ocvTypeList] | ||
return self._ocvTypes[:] | ||
|
||
def ocvTypeByName(self, name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's write a doc and doctest too.
d2a864e
to
68a5a94
Compare
Test build #86148 has finished for PR 20168 at commit
|
68a5a94
to
9ec8cd3
Compare
Test build #86149 has finished for PR 20168 at commit
|
@MrBago @tomasatdatabricks I think the breaking changes are fine, the code was marked experimental and it is expected that the interfaces will change a lot initially based on early feedback. The PR looks good to me. |
* OpenCv type representation | ||
* | ||
* @param mode ordinal for the type | ||
* @param dataType open cv data type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small nitpick: I think we should always spell it as "OpenCV" to be consistent in the comments (unless you have any good objections)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
) | ||
def ocvTypeByName(name: String): OpenCvType = { | ||
ocvTypes.find(x => x.name == name).getOrElse( | ||
throw new IllegalArgumentException("Unknown open cv type " + name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same minor nitpick: "OpenCV" instead of "open cv", and in code below as well
@MrBago @tomasatdatabricks the changes look good to me, I went through everything one more time, I'll sign off as soon as the python tests are fixed (it looks like there were some style issues in last commit) and all other dev comments are resolved, thanks! |
9ec8cd3
to
896ccc2
Compare
* OpenCv type representation | ||
* | ||
* @param mode ordinal for the type | ||
* @param dataType open cv data type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
*/ | ||
val ocvTypes: IndexedSeq[OpenCvType] = { | ||
val types = | ||
for (nc <- Array(1, 2, 3, 4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numChannel
/** | ||
* A Mapping of Type to Numbers in OpenCV | ||
* | ||
* C1 C2 C3 C4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a brief header for row/column.
@@ -37,20 +37,67 @@ import org.apache.spark.sql.types._ | |||
@Since("2.3.0") | |||
object ImageSchema { | |||
|
|||
val undefinedImageType = "Undefined" | |||
/** | |||
* OpenCv type representation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a reference link for OpenCV data type? Like this one: https://docs.opencv.org/2.4/modules/core/doc/basic_structures.html
@@ -83,7 +83,8 @@ class ImageSchemaSuite extends SparkFunSuite with MLlibTestSparkContext { | |||
val bytes20 = getData(row).slice(0, 20) | |||
|
|||
val (expectedMode, expectedBytes) = firstBytes20(filename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you use ocvTypeByName
below to look up for it, it should be named as expectedType
or expectedTypeName
, other than expectedMode
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good catch. The name is definitely misleading as it is now.
python/pyspark/ml/image.py
Outdated
|
||
def ocvTypeByMode(self, mode): | ||
""" | ||
Return the supported OpenCvType with matching mode or raise error if there is no matching type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenCvType
-> OcvType
?
python/pyspark/ml/image.py
Outdated
Return the supported OpenCvType with matching mode or raise error if there is no matching type. | ||
|
||
:param: int mode: OpenCv type mode; must be equal to mode of one of the supported types. | ||
:return: OpenCvType with matching mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenCvType
-> OcvType
?
python/pyspark/ml/image.py
Outdated
ocvType = self.ocvTypeByMode(image.mode) | ||
if nChannels != ocvType.nChannels: | ||
raise ValueError( | ||
"Image has %d channels but OcvType '%s' expects %d channels." % |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Image has %d channels but its OcvType ...
python/pyspark/ml/image.py
Outdated
return self._ocvTypes[:] | ||
|
||
|
||
def ocvTypeByName(self, name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getOcvTypeByName
or findOcvTypeByName
?
python/pyspark/ml/tests.py
Outdated
|
||
def test_conversions(self): | ||
s = np.random.RandomState(seed=987) | ||
ary_src = s.rand(4, 10, 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ary_src -> array_src?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s.rand(4, 10, 10) -> s.rand(10, 10, 4)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that was the intention, good catch.
Btw, I think this isn't only to add non-integer image formats. So the PR title may be changed too. Like "Add ImageSchema support for all OpenCV image types"? |
Test build #86156 has finished for PR 20168 at commit
|
Overall looks good to me. Just some minor comments regarding with code comments and naming. |
Added test for conversion between array and image struct for all ocv types.
…rn correct name for Undefined type. Removed OpenCvType object and renamed the methods to match python side. + few cosmetic changes.
…pesJava, fixed/added python comments.
896ccc2
to
5a632f5
Compare
Test build #86207 has finished for PR 20168 at commit
|
ok to test |
Test build #91608 has finished for PR 20168 at commit
|
@tomasatdatabricks it looks like there are some conflicts that need to be resolved, otherwise looks good to me, can you please update the PR? |
@tomasatdatabricks @MrBago @WeichenXu123 sorry, any updates on this PR? It has been a while. |
@tomasatdatabricks, mind updating this? Lately I happened to take a look for this few times. I will try to review. |
Can one of the admins verify this patch? |
Closing this due to author's inactivity. Feel free to open another PR or take this over if anyone is interested in this. |
What changes were proposed in this pull request?
Added functionality to handle all OpenCV modes to ImageSchema:
How was this patch tested?
Added test for conversion between numpy arrays and images stored as all possible OpenCV modes.