PMML4S-Spark is a PMML (Predictive Model Markup Language) scoring library for Spark as SparkML Transformer.
PMML4S-Spark is the Spark wrapper of PMML4S, you can see PMML4S for details.
- Spark >= 2.0.0
libraryDependencies += "org.pmml4s" %% "pmml4s-spark" % pmml4sSparkVersion
<dependency>
<groupId>org.pmml4s</groupId>
<artifactId>pmml4s-spark_${scala.version}</artifactId>
<version>${pmml4s-spark.version}</version>
</dependency>
-
Load model.
import scala.io.Source import org.pmml4s.model.Model import org.pmml4s.spark.ScoreModel // The main constructor accepts an object of org.pmml4s.model.Model val model = ScoreModel(Model(Source.fromURL(new java.net.URL("http://dmg.org/pmml/pmml_examples/KNIME_PMML_4.1_Examples/single_iris_dectree.xml"))))
or
import org.pmml4s.spark.ScoreModel // load model from those help methods, e.g. pathname, file object, a string, an array of bytes, or an input stream. val model = ScoreModel.fromFile("single_iris_dectree.xml")
-
Call
transform(dataset)
to run a batch score against an input dataset.// The data is from http://dmg.org/pmml/pmml_examples/Iris.csv val df = spark.read. format("csv"). options(Map("header" -> "true", "inferSchema" -> "true")). load("Iris.csv") val scoreDf = model.transform(df) scala> scoreDf.show(5) +------------+-----------+------------+-----------+-----------+---------------+-----------+-----------------------+---------------------------+--------------------------+-------+ |sepal_length|sepal_width|petal_length|petal_width| class|predicted_class|probability|probability_Iris-setosa|probability_Iris-versicolor|probability_Iris-virginica|node_id| +------------+-----------+------------+-----------+-----------+---------------+-----------+-----------------------+---------------------------+--------------------------+-------+ | 5.1| 3.5| 1.4| 0.2|Iris-setosa| Iris-setosa| 1.0| 1.0| 0.0| 0.0| 1| | 4.9| 3.0| 1.4| 0.2|Iris-setosa| Iris-setosa| 1.0| 1.0| 0.0| 0.0| 1| | 4.7| 3.2| 1.3| 0.2|Iris-setosa| Iris-setosa| 1.0| 1.0| 0.0| 0.0| 1| | 4.6| 3.1| 1.5| 0.2|Iris-setosa| Iris-setosa| 1.0| 1.0| 0.0| 0.0| 1| | 5.0| 3.6| 1.4| 0.2|Iris-setosa| Iris-setosa| 1.0| 1.0| 0.0| 0.0| 1| +------------+-----------+------------+-----------+-----------+---------------+-----------+-----------------------+---------------------------+--------------------------+-------+ only showing top 5 rows
-
Load model.
import org.pmml4s.spark.ScoreModel; // load model from those help methods, e.g. pathname, file object, a string, an array of bytes, or an input stream. ScoreModel model = ScoreModel.fromFile("single_iris_dectree.xml");
-
Call
transform(dataset)
to run a batch score against an input dataset.import org.apache.spark.sql.Dataset; // The data is from http://dmg.org/pmml/pmml_examples/Iris.csv Dataset<?> df = spark.read(). format("csv"). option("header", "true"). option("inferSchema", "true"). load("Iris.csv"); Dataset<?> scoreDf = model.transform(df); scoreDf.show(5);
See the PyPMML-Spark project. PyPMML-Spark is a Python PMML scoring library for PySpark as SparkML Transformer, it really is the Python API for PMML4s-Spark.
See the PMML4S project. PMML4S is a PMML scoring library for Scala. It provides both Scala and Java Evaluator API for PMML.
See the PyPMML project. PyPMML is a Python PMML scoring library, it really is the Python API for PMML4S.
See the AI-Serving project. AI-Serving is serving AI/ML models in the open standard formats PMML and ONNX with both HTTP (REST API) and gRPC endpoints.
If you have any questions about the PMML4S-Spark library, please open issues on this repository.
Feedback and contributions to the project, no matter what kind, are always very welcome.
PMML4S-Spark is licensed under APL 2.0.