Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-26966][ML] Update to JPMML 1.4.8 #23868

Closed
wants to merge 1 commit into from

Conversation

srowen
Copy link
Member

@srowen srowen commented Feb 22, 2019

What changes were proposed in this pull request?

JPMML apparently only supports Java 9 in 1.4.2+. We are seeing text failures from JPMML relating to JAXB when running on Java 11. It's shaded and not a big change, so should be safe.

How was this patch tested?

Existing tests.

@srowen srowen self-assigned this Feb 22, 2019
@@ -44,12 +44,12 @@ private[mllib] object PMMLModelExportFactory {
new GeneralizedLinearPMMLModelExport(lasso, "lasso regression")
case svm: SVMModel =>
new BinaryClassificationPMMLModelExport(
svm, "linear SVM", RegressionNormalizationMethodType.NONE,
svm, "linear SVM", RegressionModel.NormalizationMethod.NONE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird, maybe I missed something obvious, but why is it a RegressionModel normalization method for something that is a classification model

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's because the two classifiers it supports are logistic regression and SVM. The former makes sense but not so much SVM, but, it looks like only PMML 4.3 established an export format for it: http://dmg.org/pmml/v4-3/SupportVectorMachine.html . But now we're on PMML 4.3 with this change. That's probably worth doing, though my real purpose in this change is to get Java 9+ working. That can be handled separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RegressionModel is a top-level PMML model element for representing all "dot product"-type models; the real mining function type (regression vs classification) is specified using the functionName attribute. So, RegressionModel@functionName="classification" is how communicates that this model element is encoding a classifier-type function.

Based on my experience with implementing Apache Spark ML-to-PMML converters (https://github.com/jpmml/jpmml-sparkml), then many non-decision tree based Apache Spark ML model classes (eg. NaiveBayesClassifier, LinearSVC) are based on the "dot product" business logic, and therefore reducible to the RegressionModel element. In other words, there is no point in using more complex PMML model elements such as NaiveBayesModel or SupportVectorMachineModel, when the simplest RegressionModel element will be able to capture everything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks as always @vruusmann for review! glad we're getting on a newer version of JPMML

@vruusmann
Copy link
Contributor

JPMML-Model development branches 1.3.X and 1.4.X are generating PMML schema version 4.3 markup by default; the 1.5.X development branch will be started when the PMML schema version 4.4 is released. Perhaps this information will be useful for mapping PMML schema versions with Apache Spark development branches (2.X vs 3.X?).

Your "native PMML integration" probably doesn't cover decision tree models yet. Anyway, there have been a couple of breaking JPMML-Model API changes in the latest 1.4.X versions (such as making the org.dmg.pmml.tree.Node class polymorphic), which are all aimed at reducing the memory consumption of the PMML class model object. There's a JPMML-Model version 1.4.9 coming up in a couple weeks time, which keeps moving the needle in this area.

@SparkQA
Copy link

SparkQA commented Feb 22, 2019

Test build #102616 has finished for PR 23868 at commit 3ff554d.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 22, 2019

Test build #4564 has started for PR 23868 at commit 3ff554d.

@SparkQA
Copy link

SparkQA commented Feb 23, 2019

Test build #4567 has finished for PR 23868 at commit 3ff554d.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 24, 2019

Test build #4568 has finished for PR 23868 at commit 3ff554d.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 24, 2019

Test build #4569 has finished for PR 23868 at commit 3ff554d.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 24, 2019

Test build #4570 has started for PR 23868 at commit 3ff554d.

@SparkQA
Copy link

SparkQA commented Feb 25, 2019

Test build #4572 has finished for PR 23868 at commit 3ff554d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member Author

srowen commented Feb 25, 2019

Merged to master

@srowen srowen closed this in d252978 Feb 25, 2019
@srowen srowen deleted the SPARK-26966 branch February 27, 2019 14:35
sumwale pushed a commit to TIBCOSoftware/snappy-spark that referenced this pull request Jun 10, 2022
JPMML apparently only supports Java 9 in 1.4.2+. We are seeing text failures from JPMML relating to JAXB when running on Java 11. It's shaded and not a big change, so should be safe.

Existing tests.

Closes apache#23868 from srowen/SPARK-26966.

Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants