Skip to content

Commit e4675c2

Browse files
keypointtmengxr
authored andcommitted
[SPARK-13018][DOCS] Replace example code in mllib-pmml-model-export.md using include_example
Replace example code in mllib-pmml-model-export.md using include_example https://issues.apache.org/jira/browse/SPARK-13018 The example code in the user guide is embedded in the markdown and hence it is not easy to test. It would be nice to automatically test them. This JIRA is to discuss options to automate example code testing and see what we can do in Spark 1.6. Goal is to move actual example code to spark/examples and test compilation in Jenkins builds. Then in the markdown, we can reference part of the code to show in the user guide. This requires adding a Jekyll tag that is similar to https://github.com/jekyll/jekyll/blob/master/lib/jekyll/tags/include.rb, e.g., called include_example. `{% include_example scala/org/apache/spark/examples/mllib/PMMLModelExportExample.scala %}` Jekyll will find `examples/src/main/scala/org/apache/spark/examples/mllib/PMMLModelExportExample.scala` and pick code blocks marked "example" and replace code block in `{% highlight %}` in the markdown. See more sub-tasks in parent ticket: https://issues.apache.org/jira/browse/SPARK-11337 Author: Xin Ren <iamshrek@126.com> Closes #11126 from keypointt/SPARK-13018.
1 parent cbeb006 commit e4675c2

File tree

2 files changed

+62
-32
lines changed

2 files changed

+62
-32
lines changed

docs/mllib-pmml-model-export.md

Lines changed: 3 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -45,41 +45,12 @@ The table below outlines the `spark.mllib` models that can be exported to PMML a
4545
<div data-lang="scala" markdown="1">
4646
To export a supported `model` (see table above) to PMML, simply call `model.toPMML`.
4747

48+
As well as exporting the PMML model to a String (`model.toPMML` as in the example above), you can export the PMML model to other formats.
49+
4850
Refer to the [`KMeans` Scala docs](api/scala/index.html#org.apache.spark.mllib.clustering.KMeans) and [`Vectors` Scala docs](api/scala/index.html#org.apache.spark.mllib.linalg.Vectors) for details on the API.
4951

5052
Here a complete example of building a KMeansModel and print it out in PMML format:
51-
{% highlight scala %}
52-
import org.apache.spark.mllib.clustering.KMeans
53-
import org.apache.spark.mllib.linalg.Vectors
54-
55-
// Load and parse the data
56-
val data = sc.textFile("data/mllib/kmeans_data.txt")
57-
val parsedData = data.map(s => Vectors.dense(s.split(' ').map(_.toDouble))).cache()
58-
59-
// Cluster the data into two classes using KMeans
60-
val numClusters = 2
61-
val numIterations = 20
62-
val clusters = KMeans.train(parsedData, numClusters, numIterations)
63-
64-
// Export to PMML
65-
println("PMML Model:\n" + clusters.toPMML)
66-
{% endhighlight %}
67-
68-
As well as exporting the PMML model to a String (`model.toPMML` as in the example above), you can export the PMML model to other formats:
69-
70-
{% highlight scala %}
71-
// Export the model to a String in PMML format
72-
clusters.toPMML
73-
74-
// Export the model to a local file in PMML format
75-
clusters.toPMML("/tmp/kmeans.xml")
76-
77-
// Export the model to a directory on a distributed file system in PMML format
78-
clusters.toPMML(sc,"/tmp/kmeans")
79-
80-
// Export the model to the OutputStream in PMML format
81-
clusters.toPMML(System.out)
82-
{% endhighlight %}
53+
{% include_example scala/org/apache/spark/examples/mllib/PMMLModelExportExample.scala %}
8354

8455
For unsupported models, either you will not find a `.toPMML` method or an `IllegalArgumentException` will be thrown.
8556

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
/*
2+
* Licensed to the Apache Software Foundation (ASF) under one or more
3+
* contributor license agreements. See the NOTICE file distributed with
4+
* this work for additional information regarding copyright ownership.
5+
* The ASF licenses this file to You under the Apache License, Version 2.0
6+
* (the "License"); you may not use this file except in compliance with
7+
* the License. You may obtain a copy of the License at
8+
*
9+
* http://www.apache.org/licenses/LICENSE-2.0
10+
*
11+
* Unless required by applicable law or agreed to in writing, software
12+
* distributed under the License is distributed on an "AS IS" BASIS,
13+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
* See the License for the specific language governing permissions and
15+
* limitations under the License.
16+
*/
17+
18+
// scalastyle:off println
19+
package org.apache.spark.examples.mllib
20+
21+
import org.apache.spark.{SparkConf, SparkContext}
22+
// $example on$
23+
import org.apache.spark.mllib.clustering.KMeans
24+
import org.apache.spark.mllib.linalg.Vectors
25+
// $example off$
26+
27+
object PMMLModelExportExample {
28+
29+
def main(args: Array[String]): Unit = {
30+
val conf = new SparkConf().setAppName("PMMLModelExportExample")
31+
val sc = new SparkContext(conf)
32+
33+
// $example on$
34+
// Load and parse the data
35+
val data = sc.textFile("data/mllib/kmeans_data.txt")
36+
val parsedData = data.map(s => Vectors.dense(s.split(' ').map(_.toDouble))).cache()
37+
38+
// Cluster the data into two classes using KMeans
39+
val numClusters = 2
40+
val numIterations = 20
41+
val clusters = KMeans.train(parsedData, numClusters, numIterations)
42+
43+
// Export to PMML to a String in PMML format
44+
println("PMML Model:\n" + clusters.toPMML)
45+
46+
// Export the model to a local file in PMML format
47+
clusters.toPMML("/tmp/kmeans.xml")
48+
49+
// Export the model to a directory on a distributed file system in PMML format
50+
clusters.toPMML(sc, "/tmp/kmeans")
51+
52+
// Export the model to the OutputStream in PMML format
53+
clusters.toPMML(System.out)
54+
// $example off$
55+
56+
sc.stop()
57+
}
58+
}
59+
// scalastyle:on println

0 commit comments

Comments
 (0)