Skip to content

autodeployai/pmml4s-spark

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
src
 
 
 
 
 
 
 
 
 
 

PMML4S-Spark

PMML4S-Spark is a PMML (Predictive Model Markup Language) scoring library for Spark as SparkML Transformer.

Features

PMML4S-Spark is the Spark wrapper of PMML4S, you can see PMML4S for details.

Prerequisites

  • Spark >= 2.0.0

Installation

PMML4S-Spark is available from the maven central.

Latest release: Maven Central

SBT users
libraryDependencies += "org.pmml4s" %%  "pmml4s-spark" % pmml4sSparkVersion
Maven users
<dependency>
  <groupId>org.pmml4s</groupId>
  <artifactId>pmml4s-spark_${scala.version}</artifactId>
  <version>${pmml4s-spark.version}</version>
</dependency>

Use PMML for Spark in Scala

  1. Load model.

    import scala.io.Source
    import org.pmml4s.model.Model
    import org.pmml4s.spark.ScoreModel
    
    // The main constructor accepts an object of org.pmml4s.model.Model
    val model = ScoreModel(Model(Source.fromURL(new java.net.URL("http://dmg.org/pmml/pmml_examples/KNIME_PMML_4.1_Examples/single_iris_dectree.xml"))))

    or

    import org.pmml4s.spark.ScoreModel
    
    // load model from those help methods, e.g. pathname, file object, a string, an array of bytes, or an input stream.
    val model = ScoreModel.fromFile("single_iris_dectree.xml")
  2. Call transform(dataset) to run a batch score against an input dataset.

    // The data is from http://dmg.org/pmml/pmml_examples/Iris.csv
    val df = spark.read.
      format("csv").
      options(Map("header" -> "true", "inferSchema" -> "true")).
      load("Iris.csv")
    
    val scoreDf = model.transform(df)
    scala> scoreDf.show(5)
    +------------+-----------+------------+-----------+-----------+---------------+-----------+-----------------------+---------------------------+--------------------------+-------+
    |sepal_length|sepal_width|petal_length|petal_width|      class|predicted_class|probability|probability_Iris-setosa|probability_Iris-versicolor|probability_Iris-virginica|node_id|
    +------------+-----------+------------+-----------+-----------+---------------+-----------+-----------------------+---------------------------+--------------------------+-------+
    |         5.1|        3.5|         1.4|        0.2|Iris-setosa|    Iris-setosa|        1.0|                    1.0|                        0.0|                       0.0|      1|
    |         4.9|        3.0|         1.4|        0.2|Iris-setosa|    Iris-setosa|        1.0|                    1.0|                        0.0|                       0.0|      1|
    |         4.7|        3.2|         1.3|        0.2|Iris-setosa|    Iris-setosa|        1.0|                    1.0|                        0.0|                       0.0|      1|
    |         4.6|        3.1|         1.5|        0.2|Iris-setosa|    Iris-setosa|        1.0|                    1.0|                        0.0|                       0.0|      1|
    |         5.0|        3.6|         1.4|        0.2|Iris-setosa|    Iris-setosa|        1.0|                    1.0|                        0.0|                       0.0|      1|
    +------------+-----------+------------+-----------+-----------+---------------+-----------+-----------------------+---------------------------+--------------------------+-------+
    only showing top 5 rows

Use PMML for Spark in Java

  1. Load model.

    import org.pmml4s.spark.ScoreModel;
    
    // load model from those help methods, e.g. pathname, file object, a string, an array of bytes, or an input stream.
    ScoreModel model = ScoreModel.fromFile("single_iris_dectree.xml");
  2. Call transform(dataset) to run a batch score against an input dataset.

    import org.apache.spark.sql.Dataset;
    
    // The data is from http://dmg.org/pmml/pmml_examples/Iris.csv
    Dataset<?> df = spark.read().
       format("csv").
       option("header", "true").
       option("inferSchema", "true").
       load("Iris.csv"); 
    
    Dataset<?> scoreDf = model.transform(df);
    scoreDf.show(5);

Use PMML in PySpark

See the PyPMML-Spark project. PyPMML-Spark is a Python PMML scoring library for PySpark as SparkML Transformer, it really is the Python API for PMML4s-Spark.

Use PMML in Scala or Java

See the PMML4S project. PMML4S is a PMML scoring library for Scala. It provides both Scala and Java Evaluator API for PMML.

Use PMML in Python

See the PyPMML project. PyPMML is a Python PMML scoring library, it really is the Python API for PMML4S.

Deploy PMML as REST API

See the AI-Serving project. AI-Serving is serving AI/ML models in the open standard formats PMML and ONNX with both HTTP (REST API) and gRPC endpoints.

Deploy and Manage AI/ML models at scale

See the DaaS system that deploys AI/ML models in production at scale on Kubernetes.

Support

If you have any questions about the PMML4S-Spark library, please open issues on this repository.

Feedback and contributions to the project, no matter what kind, are always very welcome.

License

PMML4S-Spark is licensed under APL 2.0.

About

PMML scoring library for Spark as SparkML Transformer

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published