-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-26966][ML] Update to JPMML 1.4.8 #23868
Conversation
@@ -44,12 +44,12 @@ private[mllib] object PMMLModelExportFactory { | |||
new GeneralizedLinearPMMLModelExport(lasso, "lasso regression") | |||
case svm: SVMModel => | |||
new BinaryClassificationPMMLModelExport( | |||
svm, "linear SVM", RegressionNormalizationMethodType.NONE, | |||
svm, "linear SVM", RegressionModel.NormalizationMethod.NONE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird, maybe I missed something obvious, but why is it a RegressionModel normalization method for something that is a classification model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's because the two classifiers it supports are logistic regression and SVM. The former makes sense but not so much SVM, but, it looks like only PMML 4.3 established an export format for it: http://dmg.org/pmml/v4-3/SupportVectorMachine.html . But now we're on PMML 4.3 with this change. That's probably worth doing, though my real purpose in this change is to get Java 9+ working. That can be handled separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RegressionModel
is a top-level PMML model element for representing all "dot product"-type models; the real mining function type (regression vs classification) is specified using the functionName
attribute. So, RegressionModel@functionName="classification"
is how communicates that this model element is encoding a classifier-type function.
Based on my experience with implementing Apache Spark ML-to-PMML converters (https://github.com/jpmml/jpmml-sparkml), then many non-decision tree based Apache Spark ML model classes (eg. NaiveBayesClassifier
, LinearSVC
) are based on the "dot product" business logic, and therefore reducible to the RegressionModel
element. In other words, there is no point in using more complex PMML model elements such as NaiveBayesModel
or SupportVectorMachineModel
, when the simplest RegressionModel
element will be able to capture everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks as always @vruusmann for review! glad we're getting on a newer version of JPMML
JPMML-Model development branches 1.3.X and 1.4.X are generating PMML schema version 4.3 markup by default; the 1.5.X development branch will be started when the PMML schema version 4.4 is released. Perhaps this information will be useful for mapping PMML schema versions with Apache Spark development branches (2.X vs 3.X?). Your "native PMML integration" probably doesn't cover decision tree models yet. Anyway, there have been a couple of breaking JPMML-Model API changes in the latest 1.4.X versions (such as making the |
Test build #102616 has finished for PR 23868 at commit
|
Test build #4564 has started for PR 23868 at commit |
Test build #4567 has finished for PR 23868 at commit
|
Test build #4568 has finished for PR 23868 at commit
|
Test build #4569 has finished for PR 23868 at commit
|
Test build #4570 has started for PR 23868 at commit |
Test build #4572 has finished for PR 23868 at commit
|
Merged to master |
JPMML apparently only supports Java 9 in 1.4.2+. We are seeing text failures from JPMML relating to JAXB when running on Java 11. It's shaded and not a big change, so should be safe. Existing tests. Closes apache#23868 from srowen/SPARK-26966. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>
What changes were proposed in this pull request?
JPMML apparently only supports Java 9 in 1.4.2+. We are seeing text failures from JPMML relating to JAXB when running on Java 11. It's shaded and not a big change, so should be safe.
How was this patch tested?
Existing tests.