Skip to content

[SPARK-14605][ML][PYTHON] Changed Python to use unicode UIDs for spark.ml Identifiable#12368

Closed
jkbradley wants to merge 3 commits intoapache:masterfrom
jkbradley:python-uid-unicode
Closed

[SPARK-14605][ML][PYTHON] Changed Python to use unicode UIDs for spark.ml Identifiable#12368
jkbradley wants to merge 3 commits intoapache:masterfrom
jkbradley:python-uid-unicode

Conversation

@jkbradley
Copy link
Member

What changes were proposed in this pull request?

Python spark.ml Identifiable classes use UIDs of type str, but they should use unicode (in Python 2.x) to match Java. This could be a problem if someone created a class in Java with odd unicode characters, saved it, and loaded it in Python.

This PR: Use unicode everywhere in Python.

How was this patch tested?

Updated persistence unit test to check uid type

@jkbradley
Copy link
Member Author

CC: @thunterdb @yanboliang

@SparkQA
Copy link

SparkQA commented Apr 13, 2016

Test build #55739 has finished for PR 12368 at commit b4ac68e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor

@jkbradley this looks great, thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't you have this line in the file above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep thanks

@thunterdb
Copy link
Contributor

@jkbradley sorry for the delay, I have one comment. Also, you have some merging issues.

@jkbradley jkbradley force-pushed the python-uid-unicode branch from b4ac68e to 5b5860a Compare April 14, 2016 19:43
@jkbradley
Copy link
Member Author

Thanks, updated!

@SparkQA
Copy link

SparkQA commented Apr 14, 2016

Test build #55841 has finished for PR 12368 at commit 5b5860a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 16, 2016

Test build #2794 has finished for PR 12368 at commit 5b5860a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member Author

I'll go ahead and merge this with master.
Thanks for reviewing it!

@asfgit asfgit closed this in 36da5e3 Apr 16, 2016
@jkbradley jkbradley deleted the python-uid-unicode branch April 18, 2016 19:41
lw-lin pushed a commit to lw-lin/spark that referenced this pull request Apr 20, 2016
…k.ml Identifiable

## What changes were proposed in this pull request?

Python spark.ml Identifiable classes use UIDs of type str, but they should use unicode (in Python 2.x) to match Java. This could be a problem if someone created a class in Java with odd unicode characters, saved it, and loaded it in Python.

This PR: Use unicode everywhere in Python.

## How was this patch tested?

Updated persistence unit test to check uid type

Author: Joseph K. Bradley <joseph@databricks.com>

Closes apache#12368 from jkbradley/python-uid-unicode.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants