Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23256][ML][PYTHON] Add columnSchema method to PySpark image reader #20475

Closed
wants to merge 2 commits into from

Conversation

HyukjinKwon
Copy link
Member

What changes were proposed in this pull request?

This PR proposes to add columnSchema in Python side too.

>>> from pyspark.ml.image import ImageSchema
>>> ImageSchema.columnSchema.simpleString()
'struct<origin:string,height:int,width:int,nChannels:int,mode:int,data:binary>'

How was this patch tested?

Manually tested and unittest was added in python/pyspark/ml/tests.py.

@HyukjinKwon
Copy link
Member Author

@MrBago, @BryanCutler, @imatiach-msft, and @MLnick, could you take a look please?

:return: a :class:`StructType` for image column,
``struct<origin:string, height:int, width:int, nChannels:int, mode:int, data:binary>``.

.. versionadded:: 2.3.0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with 2.4.0. Let me know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this came out of the 2.3 ml QA and it's mostly an improvement to the python API, I think maybe 2.4 is best. But it is a new API do maybe it's ok to include in 2.3..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, let me go with 2.4.0.

@SparkQA
Copy link

SparkQA commented Feb 1, 2018

Test build #86932 has finished for PR 20475 at commit e180ade.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imatiach-msft
Copy link
Contributor

@HyukjinKwon looks like a great change to me, thank you for exposing the method in pyspark

Copy link
Contributor

@imatiach-msft imatiach-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice method!

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @HyukjinKwon!

:return: a :class:`StructType` for image column,
``struct<origin:string, height:int, width:int, nChannels:int, mode:int, data:binary>``.

.. versionadded:: 2.3.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this came out of the 2.3 ml QA and it's mostly an improvement to the python API, I think maybe 2.4 is best. But it is a new API do maybe it's ok to include in 2.3..

@SparkQA
Copy link

SparkQA commented Feb 2, 2018

Test build #86996 has finished for PR 20475 at commit ffa1b48.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

Thank you @imatiach-msft, @dongjoon-hyun, @felixcheung and @BryanCutler.

Merged to master only.

@asfgit asfgit closed this in 715047b Feb 4, 2018
@HyukjinKwon HyukjinKwon deleted the SPARK-23256 branch October 16, 2018 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants