Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21866][ML][PYTHON][FOLLOWUP] Few cleanups and fix image test failure in Python 3.6.0 / NumPy 1.13.3 #19835

Closed
wants to merge 2 commits into from

Conversation

HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Nov 28, 2017

What changes were proposed in this pull request?

Image test seems failed in Python 3.6.0 / NumPy 1.13.3. I manually tested as below:

======================================================================
ERROR: test_read_images (pyspark.ml.tests.ImageReaderTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/.../spark/python/pyspark/ml/tests.py", line 1831, in test_read_images
    self.assertEqual(ImageSchema.toImage(array, origin=first_row[0]), first_row)
  File "/.../spark/python/pyspark/ml/image.py", line 149, in toImage
    data = bytearray(array.astype(dtype=np.uint8).ravel())
TypeError: only integer scalar arrays can be converted to a scalar index

----------------------------------------------------------------------
Ran 1 test in 7.606s

To be clear, I think the error seems from NumPy - https://github.com/numpy/numpy/blob/75b2d5d427afdb1392f2a0b2092e0767e4bab53d/numpy/core/src/multiarray/number.c#L947

For a smaller scope:

>>> import numpy as np
>>> bytearray(np.array([1]).astype(dtype=np.uint8))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: only integer scalar arrays can be converted to a scalar index

In Python 2.7 / NumPy 1.13.1, it prints:

bytearray(b'\x01')

So, here, I simply worked around it by converting it to bytes as below:

>>> bytearray(np.array([1]).astype(dtype=np.uint8).tobytes())
bytearray(b'\x01')

Also, while looking into it again, I realised few arguments could be quite confusing, for example, Row that needs some specific attributes and numpy.ndarray. I added few type checking and added some tests accordingly. So, it shows an error message as below:

TypeError: array argument should be numpy.ndarray; however, it got [<class 'str'>].

How was this patch tested?

Manually tested with ./python/run-tests.

And also:

PYSPARK_PYTHON=python3 SPARK_TESTING=1 bin/pyspark pyspark.ml.tests ImageReaderTest

@HyukjinKwon
Copy link
Member Author

cc @jkbradley and @imatiach-msft, could you maybe take a look and see if it makes sense?

@SparkQA
Copy link

SparkQA commented Nov 28, 2017

Test build #84267 has finished for PR 19835 at commit cbff0fc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

# Running `bytearray(numpy.array([1]))` fails in specific Python versions
# with a specific Numpy version, for example in Python 3.6.0 and NumPy 1.13.3.
# Here, it avoids it by converting it to bytes.
data = bytearray(array.astype(dtype=np.uint8).ravel().tobytes())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I can't find the exact issue and changes about this issue yet. There are too many similar / related issues in NumPy and Python release notes, and it sounds even harder to find the relevant issue as the exception is from NumPy but the cause seems a different Python version (3.6.0), if I haven't missed something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strange, but the comment explains the issue well and I think this is a good workaround

@SparkQA
Copy link

SparkQA commented Nov 28, 2017

Test build #84269 has finished for PR 19835 at commit 1581718.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon HyukjinKwon changed the title [SPARK-21866][ML][PYTHON] Few cleanups and fix test case for Python 3.6.0 / NumPy 1.13.3 [SPARK-21866][ML][PYTHON][FOLLOWUP] Few cleanups and fix test case for image in Python 3.6.0 / NumPy 1.13.3 Nov 28, 2017
@SparkQA
Copy link

SparkQA commented Nov 28, 2017

Test build #84270 has finished for PR 19835 at commit ce922a3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon HyukjinKwon changed the title [SPARK-21866][ML][PYTHON][FOLLOWUP] Few cleanups and fix test case for image in Python 3.6.0 / NumPy 1.13.3 [SPARK-21866][ML][PYTHON][FOLLOWUP] Few cleanups and test case for image in Python 3.6.0 / NumPy 1.13.3 Nov 29, 2017
@@ -1836,6 +1836,24 @@ def test_read_images(self):
self.assertEqual(ImageSchema.imageFields, expected)
self.assertEqual(ImageSchema.undefinedImageType, "Undefined")

with QuietTest(self.sc):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice tests!

@imatiach-msft
Copy link
Contributor

the changes look good to me, the extra verification logic for arguments is a great addition

@HyukjinKwon HyukjinKwon changed the title [SPARK-21866][ML][PYTHON][FOLLOWUP] Few cleanups and test case for image in Python 3.6.0 / NumPy 1.13.3 [SPARK-21866][ML][PYTHON][FOLLOWUP] Few cleanups an fix image test failure in Python 3.6.0 / NumPy 1.13.3 Nov 29, 2017
@HyukjinKwon HyukjinKwon changed the title [SPARK-21866][ML][PYTHON][FOLLOWUP] Few cleanups an fix image test failure in Python 3.6.0 / NumPy 1.13.3 [SPARK-21866][ML][PYTHON][FOLLOWUP] Few cleanups and fix image test failure in Python 3.6.0 / NumPy 1.13.3 Nov 29, 2017
@HyukjinKwon
Copy link
Member Author

Merged to master.

Thanks for reviewing this @srowen and @imatiach-msft.

@asfgit asfgit closed this in 92cfbee Nov 30, 2017
@HyukjinKwon HyukjinKwon deleted the SPARK-21866-followup branch January 2, 2018 03:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants