Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-11902] [ML] Unhandled case in VectorAssembler#transform #9885

Closed
wants to merge 2 commits into from
Closed

[SPARK-11902] [ML] Unhandled case in VectorAssembler#transform #9885

wants to merge 2 commits into from

Conversation

BenFradet
Copy link
Contributor

There is an unhandled case in the transform method of VectorAssembler if one of the input columns doesn't have one of the supported type DoubleType, NumericType, BooleanType or VectorUDT.

So, if you try to transform a column of StringType you get a cryptic "scala.MatchError: StringType".

This PR aims to fix this, throwing a SparkException when dealing with an unknown column type.

@@ -84,6 +84,8 @@ class VectorAssembler(override val uid: String)
val numAttrs = group.numAttributes.getOrElse(first.getAs[Vector](index).size)
Array.fill(numAttrs)(NumericAttribute.defaultAttr)
}
case otherType =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems OK to me. Nit: you can use s"... $otherType" interpolation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

@SparkQA
Copy link

SparkQA commented Nov 21, 2015

Test build #46480 has finished for PR 9885 at commit 1dde108.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * abstract class Aggregator[-I, B, O] extends Serializable\n

@SparkQA
Copy link

SparkQA commented Nov 21, 2015

Test build #46481 has finished for PR 9885 at commit 76943f8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * abstract class Aggregator[-I, B, O] extends Serializable\n

asfgit pushed a commit that referenced this pull request Nov 23, 2015
There is an unhandled case in the transform method of VectorAssembler if one of the input columns doesn't have one of the supported type DoubleType, NumericType, BooleanType or VectorUDT.

So, if you try to transform a column of StringType you get a cryptic "scala.MatchError: StringType".

This PR aims to fix this, throwing a SparkException when dealing with an unknown column type.

Author: BenFradet <benjamin.fradet@gmail.com>

Closes #9885 from BenFradet/SPARK-11902.

(cherry picked from commit 4be360d)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
@mengxr
Copy link
Contributor

mengxr commented Nov 23, 2015

Merged into master and branch-1.6. Thanks!

@asfgit asfgit closed this in 4be360d Nov 23, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants