-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-8051] [MLLIB] make StringIndexerModel silent if input column does not exist #6595
Conversation
Test build #34023 has finished for PR 6595 at commit
|
As long as this is aimed at 1.4.1 (not 1.4.0), should we design a better fix? My suggestion would be to have PipelineStage include a Param specifying whether to use it during transform(). Stages could be used in transform() by default, but certain Transformers could override the default to skip during transform(). PipelineModel could read the Param and handle each stage accordingly. If that's too big a change for 1.4.1, then this temp fix seems tolerable. Note: We should document the behavior in the docs. |
Test build #34025 has finished for PR 6595 at commit
|
I checked the transformers we have. Perhaps this is the only one that would operate on target labels, and it is blocking users from making predictions without labels. So it would be nice to merge this fix into branch-1.4, before 1.4.1 is out. |
Test build #34069 has finished for PR 6595 at commit
|
LGTM pending tests |
test this please |
Test build #34106 has finished for PR 6595 at commit
|
Merging with master and branch-1.4 |
…oes not exist This is just a workaround to a bigger problem. Some pipeline stages may not be effective during prediction, and they should not complain about missing required columns, e.g. `StringIndexerModel`. jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #6595 from mengxr/SPARK-8051 and squashes the following commits: b6a36b9 [Xiangrui Meng] add doc f143fd4 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-8051 8ee7c7e [Xiangrui Meng] use SparkFunSuite e112394 [Xiangrui Meng] make StringIndexerModel silent if input column does not exist (cherry picked from commit 26c9d7a) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
…oes not exist This is just a workaround to a bigger problem. Some pipeline stages may not be effective during prediction, and they should not complain about missing required columns, e.g. `StringIndexerModel`. jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes apache#6595 from mengxr/SPARK-8051 and squashes the following commits: b6a36b9 [Xiangrui Meng] add doc f143fd4 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-8051 8ee7c7e [Xiangrui Meng] use SparkFunSuite e112394 [Xiangrui Meng] make StringIndexerModel silent if input column does not exist
…oes not exist This is just a workaround to a bigger problem. Some pipeline stages may not be effective during prediction, and they should not complain about missing required columns, e.g. `StringIndexerModel`. jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes apache#6595 from mengxr/SPARK-8051 and squashes the following commits: b6a36b9 [Xiangrui Meng] add doc f143fd4 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-8051 8ee7c7e [Xiangrui Meng] use SparkFunSuite e112394 [Xiangrui Meng] make StringIndexerModel silent if input column does not exist
This is just a workaround to a bigger problem. Some pipeline stages may not be effective during prediction, and they should not complain about missing required columns, e.g.
StringIndexerModel
. @jkbradley