Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12346] [ML] Missing attribute names in GLM for vector-type features #10323

Closed
wants to merge 2 commits into from

Conversation

Projects
None yet
5 participants
@ericl
Copy link
Contributor

commented Dec 16, 2015

Currently summary() fails on a GLM model fitted over a vector feature missing ML attrs, since the output feature attrs will also have no name. We can avoid this situation by forcing VectorAssembler to make up suitable names when inputs are missing names.

cc @mengxr

test("vector attribute generation") {
val formula = new RFormula().setFormula("id ~ vec")
val original = sqlContext.createDataFrame(
Seq((1, Vectors.dense(0.0, 1.0)), (2, Vectors.dense(1.0, 2.0)))

This comment has been minimized.

Copy link
@yanboliang

yanboliang Dec 16, 2015

Contributor

Should we support term in R formula is type vector? I think it's illegal in R.

This comment has been minimized.

Copy link
@ericl

ericl Dec 16, 2015

Author Contributor

I think it makes sense when using RFormula in a ML pipeline (not necessarily in R).

@SparkQA

This comment has been minimized.

Copy link

commented Dec 16, 2015

Test build #47806 has finished for PR 10323 at commit 1c66cdd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@thunterdb

This comment has been minimized.

Copy link
Contributor

commented Jan 6, 2016

@ericl this looks great, thanks!

asfgit pushed a commit that referenced this pull request Jan 18, 2016

[SPARK-12346][ML] Missing attribute names in GLM for vector-type feat…
…ures

Currently `summary()` fails on a GLM model fitted over a vector feature missing ML attrs, since the output feature attrs will also have no name. We can avoid this situation by forcing `VectorAssembler` to make up suitable names when inputs are missing names.

cc mengxr

Author: Eric Liang <ekl@databricks.com>

Closes #10323 from ericl/spark-12346.

(cherry picked from commit 5e492e9)
Signed-off-by: Xiangrui Meng <meng@databricks.com>

@asfgit asfgit closed this in 5e492e9 Jan 18, 2016

@mengxr

This comment has been minimized.

Copy link
Contributor

commented Jan 18, 2016

Merged into master and branch-1.6. Thanks! I created https://issues.apache.org/jira/browse/SPARK-12886 to track some follow-up tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.