Skip to content
Permalink
Browse files

Update documentation for new build

  • Loading branch information...
mhamilton723 committed Jul 3, 2019
1 parent e9ef538 commit 637df9d34f508cd1c83542a69e922bc342b1fe0d
Showing with 105 additions and 192 deletions.
  1. +13 −11 README.md
  2. +85 −5 docs/developer-readme.md
  3. +0 −169 docs/gpu-setup.md
  4. +1 −1 docs/http.md
  5. +2 −2 docs/lightgbm.md
  6. +1 −1 docs/mmlspark-serving.md
  7. +3 −3 docs/your-first-model.md
@@ -249,10 +249,14 @@ your `build.sbt`:

### Building from source

You can also easily create your own build by cloning this repo and use the main
build script: `./runme`. Run it once to install the needed dependencies, and
again to do a build. See [this guide](docs/developer-readme.md) for more
information.

MMLSpark has recently transitioned to a new build infrastructure.
For detailed developer docs please see the [Developer Readme](docs/developer-readme.md)

If you are an existing mmlspark developer, you will need to reconfigure your
development setup. We now support platform independent development and
better integrate with intellij and SBT.
If you encounter issues please reach out to our support email!

### R (Beta)

@@ -307,22 +311,20 @@ Issue](https://help.github.com/articles/creating-an-issue/).


## Other relevant projects

* [Microsoft Cognitive Toolkit](https://github.com/Microsoft/CNTK)

* [LightGBM](https://github.com/Microsoft/LightGBM)

* [DMTK: Microsoft Distributed Machine Learning Toolkit](https://github.com/Microsoft/DMTK)

* [Recommenders](https://github.com/Microsoft/Recommenders)

* [Azure Machine Learning
preview features](https://docs.microsoft.com/en-us/azure/machine-learning/preview)

* [JPMML-SparkML plugin for converting MMLSpark LightGBM models to
PMML](https://github.com/alipay/jpmml-sparkml-lightgbm)

* [Azure Machine Learning Studio](https://studio.azureml.net/)
* [Microsoft Cognitive Toolkit](https://github.com/Microsoft/CNTK)

* [Azure Machine Learning
preview features](https://docs.microsoft.com/en-us/azure/machine-learning/preview)


*Apache®, Apache Spark, and Spark® are either registered trademarks or
trademarks of the Apache Software Foundation in the United States and/or other
@@ -1,13 +1,93 @@
# MMLSpark Development
# MMLSpark Development Setup

1) [Install SBT](https://www.scala-sbt.org/1.x/docs/Setup.html)
- Make sure to download JDK 1.8 if you dont have it
- Make sure to download JDK 1.8 if you don't have it
2) Git Clone Repository
- `git clone https://github.com/Azure/mmlspark.git`
- `git checkout build-refactor`
3) Run sbt to compile and grab datasets
- `cd mmlspark`
- `sbt setup`
4) [Install IntelliJ](https://www.jetbrains.com/idea/download)
- Install Scala plugins during install
5) Configure IntelliJ
- OPEN the mmlspark directory
- click on build.sbt and import project (install scala/sbt plugins if needed)
- **OPEN** the mmlspark directory
- If the project does not automatically import,click on `build.sbt` and import project

# Publishing and Using Build Secrets

To use secrets in the build you must be part of the mmlspark keyvault
and azure subscription. If you are MSFT internal would like to be
added please reach out `mmlspark-support@microsoft.com`

# SBT Command Guide

## Scala build commands

### `compile`, `test:compile` and `it:compile`

Compiles the main, test, and integration test classes respectively

### `test`

Runs all mmlspark tests

### `scalastyle`

Runs scalastyle check

### `unidoc`

Generates documentation for scala sources

## Python Commands

### `createCondaEnv`

Creates a conda environment `mmlspark` from `environment.yaml` if it does not already exist.
This env is used for python testing. **Activate this env before using python build commands.**

### `cleanCondaEnv`

Removes `mmlspark` conda env

### `packagePython`

Compiles scala, runs python generation scripts, and creates a wheel

### `generatePythonDoc`

Generates documentation for generated python code

### `installPipPackage`

Installs generated python wheel into existing env

### `testPython`

Generates and runs python tests

## Environment + Publishing Commands

### `getDatasets`

Downloads all datasets used in tests to target folder

### `setup`

Combination of `compile`, `test:compile`, `it:compile`, `getDatasets`

### `package`

Packages the library into a jar

### `publishBlob`

Publishes Jar to mmlspark's azure blob based maven repo. (Requires Keys)

### `publishLocal`

Publishes library to local maven repo

### `publishDocs`

Publishes scala and python doc to mmlspark's build azure storage account. (Requires Keys)

This file was deleted.

@@ -24,7 +24,7 @@

```python
import mmlspark
from mmlspark import SimpleHTTPTransformer, JSONOutputParser
from mmlspark.io.http import SimpleHTTPTransformer, JSONOutputParser
from pyspark.sql.types import StructType, StringType
df = sc.parallelize([(x, ) for x in range(100)]).toDF("data")
@@ -30,7 +30,7 @@ many other machine learning tasks. LightGBM is part of Microsoft's
In PySpark, you can run the `LightGBMClassifier` via:

```python
from mmlspark import LightGBMClassifier
from mmlspark.lightgbm import LightGBMClassifier
model = LightGBMClassifier(learningRate=0.3,
numIterations=100,
numLeaves=31).fit(train)
@@ -40,7 +40,7 @@ Similarly, you can run the `LightGBMRegressor` by setting the
`application` and `alpha` parameters:

```python
from mmlspark import LightGBMRegressor
from mmlspark.lightgbm import LightGBMRegressor
model = LightGBMRegressor(application='quantile',
alpha=0.3,
learningRate=0.3,
@@ -57,7 +57,7 @@

```python
import mmlspark
from mmlspark import CNTKModel
from mmlspark.cntk import CNTKModel
import pyspark
from pyspark.sql.functions import udf, col
@@ -80,7 +80,7 @@ takes in training data and a base SparkML classifier, maps the data into the
format expected by the base classifier algorithm, and fits a model.

```python
from mmlspark.TrainClassifier import TrainClassifier
from mmlspark.train import TrainClassifier
from pyspark.ml.classification import LogisticRegression
model = TrainClassifier(model=LogisticRegression(), labelCol=" income").fit(train)
```
@@ -96,7 +96,7 @@ Finally, let's score the model against the test set, and use
precision, recall — from the scored data.

```python
from mmlspark.ComputeModelStatistics import ComputeModelStatistics
from mmlspark.train import ComputeModelStatistics
prediction = model.transform(test)
metrics = ComputeModelStatistics().transform(prediction)
metrics.select('accuracy').show()
@@ -107,7 +107,7 @@ package. For help on mmlspark classes and methods, you can use Python's help()
function, for example

```python
help(mmlspark.TrainClassifier)
help(mmlspark.train.TrainClassifier)
```

Next, view our other tutorials to learn how to

0 comments on commit 637df9d

Please sign in to comment.
You can’t perform that action at this time.