Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8051] [MLLIB] make StringIndexerModel silent if input column does not exist #6595

Closed
wants to merge 4 commits into from

Conversation

mengxr
Copy link
Contributor

@mengxr mengxr commented Jun 2, 2015

This is just a workaround to a bigger problem. Some pipeline stages may not be effective during prediction, and they should not complain about missing required columns, e.g. StringIndexerModel. @jkbradley

@SparkQA
Copy link

SparkQA commented Jun 2, 2015

Test build #34023 has finished for PR 6595 at commit e112394.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

As long as this is aimed at 1.4.1 (not 1.4.0), should we design a better fix? My suggestion would be to have PipelineStage include a Param specifying whether to use it during transform(). Stages could be used in transform() by default, but certain Transformers could override the default to skip during transform(). PipelineModel could read the Param and handle each stage accordingly.

If that's too big a change for 1.4.1, then this temp fix seems tolerable.

Note: We should document the behavior in the docs.

@SparkQA
Copy link

SparkQA commented Jun 2, 2015

Test build #34025 has finished for PR 6595 at commit 8ee7c7e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor Author

mengxr commented Jun 3, 2015

I checked the transformers we have. Perhaps this is the only one that would operate on target labels, and it is blocking users from making predictions without labels. So it would be nice to merge this fix into branch-1.4, before 1.4.1 is out.

@SparkQA
Copy link

SparkQA commented Jun 3, 2015

Test build #34069 has finished for PR 6595 at commit b6a36b9.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class ElementwiseProduct(val scalingVec: Vector) extends VectorTransformer

@jkbradley
Copy link
Member

LGTM pending tests

@jkbradley
Copy link
Member

test this please

@SparkQA
Copy link

SparkQA commented Jun 3, 2015

Test build #34106 has finished for PR 6595 at commit b6a36b9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class ElementwiseProduct(val scalingVec: Vector) extends VectorTransformer

@jkbradley
Copy link
Member

Merging with master and branch-1.4

asfgit pushed a commit that referenced this pull request Jun 3, 2015
…oes not exist

This is just a workaround to a bigger problem. Some pipeline stages may not be effective during prediction, and they should not complain about missing required columns, e.g. `StringIndexerModel`. jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #6595 from mengxr/SPARK-8051 and squashes the following commits:

b6a36b9 [Xiangrui Meng] add doc
f143fd4 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-8051
8ee7c7e [Xiangrui Meng] use SparkFunSuite
e112394 [Xiangrui Meng] make StringIndexerModel silent if input column does not exist

(cherry picked from commit 26c9d7a)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
@asfgit asfgit closed this in 26c9d7a Jun 3, 2015
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
…oes not exist

This is just a workaround to a bigger problem. Some pipeline stages may not be effective during prediction, and they should not complain about missing required columns, e.g. `StringIndexerModel`. jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes apache#6595 from mengxr/SPARK-8051 and squashes the following commits:

b6a36b9 [Xiangrui Meng] add doc
f143fd4 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-8051
8ee7c7e [Xiangrui Meng] use SparkFunSuite
e112394 [Xiangrui Meng] make StringIndexerModel silent if input column does not exist
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
…oes not exist

This is just a workaround to a bigger problem. Some pipeline stages may not be effective during prediction, and they should not complain about missing required columns, e.g. `StringIndexerModel`. jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes apache#6595 from mengxr/SPARK-8051 and squashes the following commits:

b6a36b9 [Xiangrui Meng] add doc
f143fd4 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-8051
8ee7c7e [Xiangrui Meng] use SparkFunSuite
e112394 [Xiangrui Meng] make StringIndexerModel silent if input column does not exist
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants