Browse files

Bump version to v0.14

  • Loading branch information...
mhamilton723 committed Sep 23, 2018
1 parent 25f3601 commit 7eed833156f507357c8b43aefcf77ddfab0f2be9
Showing with 32 additions and 27 deletions.
  1. +23 −18
  2. +3 −3 docs/
  3. +4 −4 docs/
  4. +2 −2 docs/
@@ -5,15 +5,20 @@ Microsoft Machine Learning for Apache Spark
<img title="Build Status" align="right"
src="" />
MMLSpark provides a number of deep learning and data science tools for [Apache
Spark](, including seamless integration of
Spark Machine Learning pipelines with [Microsoft Cognitive Toolkit
(CNTK)]( and
[OpenCV](, enabling you to quickly create powerful,
highly-scalable predictive and analytical models for large image and text
MMLSpark requires Scala 2.11, Spark 2.1+, and either Python 2.7 or Python 3.5+.
MMLSpark is an ecosytem of tools aimed to expand the distributed computing framework
[Apache Spark]( in several new directions.
MMLSpark adds a number of deep learning and data science tools to the Spark ecosystem,
including seamless integration of Spark Machine Learning pipelines with [Microsoft Cognitive Toolkit
(CNTK)](, [LightGBM]( and
[OpenCV]( This enables powerful and highly-scalable predictive and analytical models
for a variety of datasources.
MMLSpark also brings new networking capabilities to the Spark Ecosystem. With the HTTP on Spark project, users
can embed **any** web service into their SparkML models. In this vein, MMLSpark provides easy to use
SparkML transformers for a wide variety of [Microsoft Cognitive Services]( For production grade deployment, the Spark Serving project enables high throughput,
sub-millisecond latency web services, backed by your Spark cluster.
MMLSpark requires Scala 2.11, Spark 2.3+, and either Python 2.7 or Python 3.5+.
See the API documentation [for
Scala]( and [for
@@ -151,9 +156,9 @@ MMLSpark can be conveniently installed on existing Spark clusters via the
`--packages` option, examples:
spark-shell --packages Azure:mmlspark:0.13
pyspark --packages Azure:mmlspark:0.13
spark-submit --packages Azure:mmlspark:0.13 MyApp.jar
spark-shell --packages Azure:mmlspark:0.14
pyspark --packages Azure:mmlspark:0.14
spark-submit --packages Azure:mmlspark:0.14 MyApp.jar
This can be used in other Spark contexts too, for example, you can use MMLSpark
@@ -168,14 +173,14 @@ cloud](, create a new [library from Maven
in your workspace.
For the coordinates use: `Azure:mmlspark:0.13`. Ensure this library is
For the coordinates use: `Azure:mmlspark:0.14`. Ensure this library is
attached to all clusters you create.
Finally, ensure that your Spark cluster has at least Spark 2.1 and Scala 2.11.
You can use MMLSpark in both your Scala and PySpark notebooks. To get started with our example notebooks import the following databricks archive:
### Docker
@@ -208,7 +213,7 @@ the above example, or from python:
import pyspark
spark = pyspark.sql.SparkSession.builder.appName("MyApp") \
.config("spark.jars.packages", "Azure:mmlspark:0.13") \
.config("spark.jars.packages", "Azure:mmlspark:0.14") \
import mmlspark
@@ -224,7 +229,7 @@ running script actions, see [this
The script action url is:
If you're using the Azure Portal to run the script action, go to `Script
actions``Submit new` in the `Overview` section of your cluster blade. In
@@ -240,7 +245,7 @@ your `build.sbt`:
resolvers += "MMLSpark Repo" at ""
libraryDependencies += "" %% "mmlspark" % "0.13"
libraryDependencies += "" %% "mmlspark" % "0.14"
### Building from source
@@ -314,4 +319,4 @@ PMML](
*Apache®, Apache Spark, and Spark® are either registered trademarks or
trademarks of the Apache Software Foundation in the United States and/or other
@@ -10,7 +10,7 @@ To install the current MMLSpark package for R use:
@@ -23,7 +23,7 @@ It will take some time to install all dependencies. Then, run:
config <- spark_config()
config$sparklyr.defaultPackages <- "Azure:mmlspark:0.13"
config$sparklyr.defaultPackages <- "Azure:mmlspark:0.14"
sc <- spark_connect(master = "local", config = config)
@@ -83,7 +83,7 @@ and then use spark_connect with method = "databricks":
sc <- spark_connect(method = "databricks")
@@ -29,7 +29,7 @@ You can now select one of the sample notebooks and run it, or create your own.
In the above, `microsoft/mmlspark` specifies the project and image name that you
want to run. There is another component implicit here which is the *tag* (=
version) that you want to use — specifying it explicitly looks like
`microsoft/mmlspark:0.13` for the `0.13` tag.
`microsoft/mmlspark:0.14` for the `0.14` tag.
Leaving `microsoft/mmlspark` by itself has an implicit `latest` tag, so it is
equivalent to `microsoft/mmlspark:latest`. The `latest` tag is identical to the
@@ -47,7 +47,7 @@ that you will probably want to use can look as follows:
-p \
-v ~/myfiles:/notebooks/myfiles \
In this example, backslashes are used to break things up for readability; you
@@ -59,7 +59,7 @@ path and line breaks looks a little different:
-p `
-v C:\myfiles:/notebooks/myfiles `
Let's break this command and go over the meaning of each part:
@@ -143,7 +143,7 @@ Let's break this command and go over the meaning of each part:
* **`microsoft/mmlspark:0.13`**
* **`microsoft/mmlspark:0.14`**
Finally, this specifies an explicit version tag for the image that we want to
@@ -26,7 +26,7 @@ to check availability in your data center.
MMLSpark provides an Azure Resource Manager (ARM) template to create a
default setup that includes an HDInsight cluster and a GPU machine for
training. The template can be found here:
It has the following parameters that configure the HDI Spark cluster and
the associated GPU VM:
@@ -69,7 +69,7 @@ GPU VM setup template at experimentation time.
### 1. Deploy an ARM template within the [Azure Portal](
[Click here to open the above main
in the Azure portal.
(If needed, you click the **Edit template** button to view and edit the

0 comments on commit 7eed833

Please sign in to comment.