-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-7198] [MLLIB] VectorAssembler should output ML attributes #6452
Conversation
Test build #33642 has finished for PR 6452 at commit
|
// Schema transformation. | ||
val schema = dataset.schema | ||
lazy val first = dataset.first() | ||
val attrs = $(inputCols).flatMap { c => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not remove "index" from attributes. Should it remove or fix that field? (It's sort of odd that attributes have an index field.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indices gets reset in AttributeGroup
:
Let's re-visit the attribute API in 1.5:)
LGTM other than those 2 items. Side comment: Where should we document that a column of Vectors needs to have Vectors of the same length? This is a general rule, but some users may not realize it. |
I don't have a good place in mind. Maybe in the home page of |
Test build #33675 has finished for PR 6452 at commit
|
LGTM pending tests |
Oh I guess tests are done |
I'll go ahead and merge this into master and branch-1.4 |
`VectorAssembler` should carry over ML attributes. For unknown attributes, we assume numeric values. This PR handles the following cases: 1. DoubleType with ML attribute: carry over 2. DoubleType without ML attribute: numeric value 3. Scalar type: numeric value 4. VectorType with all ML attributes: carry over and update names 5. VectorType with number of ML attributes: assume all numeric 6. VectorType without ML attributes: check the first row and get the number of attributes jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #6452 from mengxr/SPARK-7198 and squashes the following commits: a9d2469 [Xiangrui Meng] add space facdb1f [Xiangrui Meng] VectorAssembler should output ML attributes (cherry picked from commit 7859ab6) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
`VectorAssembler` should carry over ML attributes. For unknown attributes, we assume numeric values. This PR handles the following cases: 1. DoubleType with ML attribute: carry over 2. DoubleType without ML attribute: numeric value 3. Scalar type: numeric value 4. VectorType with all ML attributes: carry over and update names 5. VectorType with number of ML attributes: assume all numeric 6. VectorType without ML attributes: check the first row and get the number of attributes jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes apache#6452 from mengxr/SPARK-7198 and squashes the following commits: a9d2469 [Xiangrui Meng] add space facdb1f [Xiangrui Meng] VectorAssembler should output ML attributes
`VectorAssembler` should carry over ML attributes. For unknown attributes, we assume numeric values. This PR handles the following cases: 1. DoubleType with ML attribute: carry over 2. DoubleType without ML attribute: numeric value 3. Scalar type: numeric value 4. VectorType with all ML attributes: carry over and update names 5. VectorType with number of ML attributes: assume all numeric 6. VectorType without ML attributes: check the first row and get the number of attributes jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes apache#6452 from mengxr/SPARK-7198 and squashes the following commits: a9d2469 [Xiangrui Meng] add space facdb1f [Xiangrui Meng] VectorAssembler should output ML attributes
} else { | ||
Some(attr.withName(c)) | ||
} | ||
case _: NumericType | BooleanType => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a quick question regarding this, why do we not consider a 3.2 case where we have Scalar type with ML attributes? is it because there's no such thing?
VectorAssembler
should carry over ML attributes. For unknown attributes, we assume numeric values. This PR handles the following cases:@jkbradley