-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-4586][MLLIB] Python API for ML pipeline and parameters #4151
Conversation
Test build #25935 has started for PR 4151 at commit
|
Test build #25935 has finished for PR 4151 at commit
|
Test PASSed. |
Test build #26113 has started for PR 4151 at commit
|
Test build #26114 has started for PR 4151 at commit
|
Test build #26113 has finished for PR 4151 at commit
|
Test PASSed. |
Test build #26114 has finished for PR 4151 at commit
|
Test PASSed. |
def __init__(self): | ||
#: A unique id for the object. The default implementation | ||
#: concatenates the class name, "-", and 8 random hex chars. | ||
self.uid = type(self).__name__ + "-" + uuid.uuid4().hex[:8] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The id(obj) will be the memory address of obj
, it should be used as part of uid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The memory address could be reused, which may not be unique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all live objects, the id (memory address) will be unique, but the random one (uuid) may not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if one gets dereferenced and a new object gets created? I try to make the random part of the id short while maintaining a tiny collision rate. With 8 hex chars, one gets selected with equal chances from more than 4 billion values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, thanks!
Test build #26202 has finished for PR 4151 at commit
|
Test PASSed. |
Test build #26238 has started for PR 4151 at commit
|
Test build #26238 has finished for PR 4151 at commit
|
Test PASSed. |
refactor
Test build #26244 has started for PR 4151 at commit
|
@davies I merged your changes and moved |
Test build #26246 has started for PR 4151 at commit
|
Test build #26244 has finished for PR 4151 at commit
|
Test PASSed. |
Test build #26246 has finished for PR 4151 at commit
|
Test PASSed. |
|
||
def __init__(self): | ||
super(HasMaxIter, self).__init__() | ||
#: param for max number of iterations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Does this appear in the generated doc? I did see that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but this is the official Sphinx way to document instance attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But pyspark.ml.rst, we disable the doc for members:
:undoc-members:
@mengxr After remove inherit_doc from pipeline.py , I think it's OK to merge. |
Great! Waiting for Jenkins ... |
Test build #26266 has started for PR 4151 at commit
|
Test build #26266 has finished for PR 4151 at commit
|
Test PASSed. |
Merged into master. |
This PR adds Python API for ML pipeline and parameters. The design doc can be found on the JIRA page. It includes transformers and an estimator to demo the simple text classification example code.
TODO:
CC: @davies @jkbradley