Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21866][ML][PYTHON][FOLLOWUP] Few cleanups and fix image test failure in Python 3.6.0 / NumPy 1.13.3 #19835

Closed
wants to merge 2 commits into from

Conversation

@HyukjinKwon
Copy link
Member

commented Nov 28, 2017

What changes were proposed in this pull request?

Image test seems failed in Python 3.6.0 / NumPy 1.13.3. I manually tested as below:

======================================================================
ERROR: test_read_images (pyspark.ml.tests.ImageReaderTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/.../spark/python/pyspark/ml/tests.py", line 1831, in test_read_images
    self.assertEqual(ImageSchema.toImage(array, origin=first_row[0]), first_row)
  File "/.../spark/python/pyspark/ml/image.py", line 149, in toImage
    data = bytearray(array.astype(dtype=np.uint8).ravel())
TypeError: only integer scalar arrays can be converted to a scalar index

----------------------------------------------------------------------
Ran 1 test in 7.606s

To be clear, I think the error seems from NumPy - https://github.com/numpy/numpy/blob/75b2d5d427afdb1392f2a0b2092e0767e4bab53d/numpy/core/src/multiarray/number.c#L947

For a smaller scope:

>>> import numpy as np
>>> bytearray(np.array([1]).astype(dtype=np.uint8))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: only integer scalar arrays can be converted to a scalar index

In Python 2.7 / NumPy 1.13.1, it prints:

bytearray(b'\x01')

So, here, I simply worked around it by converting it to bytes as below:

>>> bytearray(np.array([1]).astype(dtype=np.uint8).tobytes())
bytearray(b'\x01')

Also, while looking into it again, I realised few arguments could be quite confusing, for example, Row that needs some specific attributes and numpy.ndarray. I added few type checking and added some tests accordingly. So, it shows an error message as below:

TypeError: array argument should be numpy.ndarray; however, it got [<class 'str'>].

How was this patch tested?

Manually tested with ./python/run-tests.

And also:

PYSPARK_PYTHON=python3 SPARK_TESTING=1 bin/pyspark pyspark.ml.tests ImageReaderTest
@HyukjinKwon

This comment has been minimized.

Copy link
Member Author

commented Nov 28, 2017

cc @jkbradley and @imatiach-msft, could you maybe take a look and see if it makes sense?

@srowen
srowen approved these changes Nov 28, 2017
@SparkQA

This comment has been minimized.

Copy link

commented Nov 28, 2017

Test build #84267 has finished for PR 19835 at commit cbff0fc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@HyukjinKwon HyukjinKwon force-pushed the HyukjinKwon:SPARK-21866-followup branch to 1581718 Nov 28, 2017
# Running `bytearray(numpy.array([1]))` fails in specific Python versions
# with a specific Numpy version, for example in Python 3.6.0 and NumPy 1.13.3.
# Here, it avoids it by converting it to bytes.
data = bytearray(array.astype(dtype=np.uint8).ravel().tobytes())

This comment has been minimized.

Copy link
@HyukjinKwon

HyukjinKwon Nov 28, 2017

Author Member

To be honest, I can't find the exact issue and changes about this issue yet. There are too many similar / related issues in NumPy and Python release notes, and it sounds even harder to find the relevant issue as the exception is from NumPy but the cause seems a different Python version (3.6.0), if I haven't missed something.

This comment has been minimized.

Copy link
@imatiach-msft

imatiach-msft Nov 29, 2017

Contributor

strange, but the comment explains the issue well and I think this is a good workaround

@SparkQA

This comment has been minimized.

Copy link

commented Nov 28, 2017

Test build #84269 has finished for PR 19835 at commit 1581718.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@HyukjinKwon HyukjinKwon changed the title [SPARK-21866][ML][PYTHON] Few cleanups and fix test case for Python 3.6.0 / NumPy 1.13.3 [SPARK-21866][ML][PYTHON][FOLLOWUP] Few cleanups and fix test case for image in Python 3.6.0 / NumPy 1.13.3 Nov 28, 2017
@SparkQA

This comment has been minimized.

Copy link

commented Nov 28, 2017

Test build #84270 has finished for PR 19835 at commit ce922a3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@HyukjinKwon HyukjinKwon changed the title [SPARK-21866][ML][PYTHON][FOLLOWUP] Few cleanups and fix test case for image in Python 3.6.0 / NumPy 1.13.3 [SPARK-21866][ML][PYTHON][FOLLOWUP] Few cleanups and test case for image in Python 3.6.0 / NumPy 1.13.3 Nov 29, 2017
@@ -1836,6 +1836,24 @@ def test_read_images(self):
self.assertEqual(ImageSchema.imageFields, expected)
self.assertEqual(ImageSchema.undefinedImageType, "Undefined")

with QuietTest(self.sc):

This comment has been minimized.

Copy link
@imatiach-msft

imatiach-msft Nov 29, 2017

Contributor

nice tests!

@imatiach-msft

This comment has been minimized.

Copy link
Contributor

commented Nov 29, 2017

the changes look good to me, the extra verification logic for arguments is a great addition

@HyukjinKwon HyukjinKwon changed the title [SPARK-21866][ML][PYTHON][FOLLOWUP] Few cleanups and test case for image in Python 3.6.0 / NumPy 1.13.3 [SPARK-21866][ML][PYTHON][FOLLOWUP] Few cleanups an fix image test failure in Python 3.6.0 / NumPy 1.13.3 Nov 29, 2017
@HyukjinKwon HyukjinKwon changed the title [SPARK-21866][ML][PYTHON][FOLLOWUP] Few cleanups an fix image test failure in Python 3.6.0 / NumPy 1.13.3 [SPARK-21866][ML][PYTHON][FOLLOWUP] Few cleanups and fix image test failure in Python 3.6.0 / NumPy 1.13.3 Nov 29, 2017
@HyukjinKwon

This comment has been minimized.

Copy link
Member Author

commented Nov 30, 2017

Merged to master.

Thanks for reviewing this @srowen and @imatiach-msft.

@asfgit asfgit closed this in 92cfbee Nov 30, 2017
@HyukjinKwon HyukjinKwon deleted the HyukjinKwon:SPARK-21866-followup branch Jan 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.