Skip to content

Commit

Permalink
[SYSTEMML-769] [WIP] Support for automatic detection of native BLAS a…
Browse files Browse the repository at this point in the history
…nd GPU backend

1. Support for automatic detection of BLAS (MKL and OpenBLAS) and GPU
backend.
2. Added native matmult and conv2d functions. In case if native library is not available, we fallback to Java implementation.
3. This will allow us to explore distributed GPU solution.
  • Loading branch information
Niketan Pansare committed Jan 6, 2017
1 parent 7a30925 commit e2d9a16
Show file tree
Hide file tree
Showing 29 changed files with 558 additions and 173 deletions.
137 changes: 137 additions & 0 deletions docs/accelerator.md
@@ -0,0 +1,137 @@
---
layout: global
title: Using systemml-accelerator
description: Using systemml-accelerator
---
<!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to you under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
{% endcomment %}
-->

* This will become a table of contents (this text will be scraped).
{:toc}

<br/>

## Introduction

The [systemml-accelerator](https://github.com/niketanpansare/systemml-accelerator) packages system-dependent libraries
to simplify deployment. It can be used to allow SystemML to use native BLAS as well as use hardware accelerators (such as Nvidia's GPU).

If you are [installing SystemML using pip](https://apache.github.io/incubator-systemml/beginners-guide-python#install-systemml),
no additional action is required. If you intend to use SystemML in any other way, you have to ensure that `systemml-accelerator.jar`
is available.

## Using native BLAS

By default, SystemML implements all its matrix operations in Java. This simplifies deployment especially in a distributed environment.
However, in some cases (such as deep learning), the user might want to use native BLAS rather than SystemML's internal Java library.
The current version supports only 64-bit JVM and Intel MKL (recommended) and OpenBLAS. For any other setup, we fallback to SystemML's internal Java library.

### Steps for installing Intel MKL

Download and install the [community version of Intel MKL](https://software.intel.com/sites/campaigns/nest/).
Intel requires you to first register your email address and then sends the download link to your email address with license key.

<div class="codetabs">
<div data-lang="Linux" markdown="1">
```bash
1. Extract the downloaded .tgz file and execute install.sh.
```
</div>
<div data-lang="Windows" markdown="1">
```bash
1. Execute the downloaded .exe file and follow the guided setup.
```
</div>
</div>

### Steps for installing OpenBLAS

<div class="codetabs">
<div data-lang="Linux" markdown="1">
```bash
1. Install OpenBLAS via yum/apt-get
# Fedora, Centos
sudo yum install openblas
# Ubuntu
sudo apt-get install openblas

2. If OpenBLAS not getting picked up, double-check if you are using 64-bit Java and if `libopenblas` is available.
ldconfig -p | grep libopenblas
# You may have to add explicit link using following command:
sudo ln -s /lib64/libopenblas.so.0 /usr/lib64/libopenblas.so
```
</div>
<div data-lang="Windows" markdown="1">
```bash
Download the pre-built binaries or install from the source.
```
</div>
</div>
Links:
1. [Pre-built OpenBLAS binaries](https://sourceforge.net/projects/openblas/),
2. [OpenBLAS source](https://github.com/xianyi/OpenBLAS).
By default, SystemML searches for Intel MKL and then OpenBLAS to select the underlying BLAS.
If both are not found, we fallback to SystemML's internal Java library.
If you want to explicitly select the underlying BLAS or disable native BLAS, please set
the environment variable `SYSTEMML_BLAS` to `mkl/openblas/none`.
## Using GPU
SystemML requires that CUDA 8.0 and CuDNN 5.1 is installed on the machine to exploit GPU.
If these libraries are not installed, we fall back to non-GPU plan.
Like native BLAS, to exploit GPU we require that `systemml-accelerator.jar` is available.
If you want to explicitly disable using GPU, please set the environment variable `SYSTEMML_GPU` to `none` (default: `cuda`).
To test if BLAS and/or GPU is enabled, please follow the below steps:
<div class="codetabs">
<div data-lang="PySpark" markdown="1">
```bash
from systemml import random
m1 = random.uniform(size=(1000,1000))
m2 = random.uniform(size=(1000,1000))
m3 = m1.dot(m2).toNumPy()
```
</div>
<div data-lang="Scala" markdown="1">
```bash
SYSTEMML_HOME=`python -c 'import imp; import os; print imp.find_module("systemml")[1]'`
SYSTEMML_JAR=`ls $SYSTEMML_HOME/systemml-java/systemml*incubating*.jar`
ACCELERATOR_JAR=`ls $SYSTEMML_HOME/systemml-java/systemml-accelerator.jar`
$SPARK_HOME/bin/spark-shell --jars $SYSTEMML_JAR,$ACCELERATOR_JAR
scala> import org.apache.sysml.api.mlcontext._
scala> import org.apache.sysml.api.mlcontext.ScriptFactory._
scala> val ml = new MLContext(sc)
scala> val script = dml("X = matrix(0.1, rows=1000, cols=1000); Y = matrix(0.2, rows=1000, cols=1000); Z = X %*% Y; print(sum(Z))")
scala> ml.execute(script)
```
</div>
</div>
The above script should output either .
```bash
accelerator.BLASHelper: Found BLAS: (mkl/openblas)
or
accelerator.LibraryLoader: Unable to load (MKL/OpenBLAS)
```
Note: if `systemml-accelerator.jar` is not included via `--jars` (for spark-shell or spark-submit), then we fall back to SystemML's internal Java library.
53 changes: 8 additions & 45 deletions pom.xml
Expand Up @@ -83,14 +83,6 @@
<enabled>true</enabled>
</releases>
</repository>
<repository>
<id>mavenized-jcuda-mvn-repo</id>
<url>https://raw.github.com/niketanpansare/mavenized-jcuda/mvn-repo/</url>
<snapshots>
<enabled>true</enabled>
<updatePolicy>always</updatePolicy>
</snapshots>
</repository>
</repositories>

<build>
Expand Down Expand Up @@ -1002,50 +994,21 @@

<dependencies>

<!-- For GPU backend
Use org.mystic:mavenized-jcuda until Alan puts org.jcuda:*
-->
<dependency>
<groupId>org.mystic</groupId>
<artifactId>mavenized-jcuda</artifactId>
<version>0.7.5b</version>
<type>jar</type>
<scope>provided</scope>
<exclusions>
<groupId>org.systemml</groupId>
<artifactId>accelerator</artifactId>
<version>0.0.1-SNAPSHOT</version>
<scope>system</scope>
<!-- Useful for pip install -->
<systemPath>${project.basedir}/src/test/config/local_jars/systemml-accelerator.jar</systemPath>
<exclusions>
<exclusion>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- Since there is no mvn repo for jcuda
<dependency>
<groupId>org.jcuda</groupId>
<artifactId>jcuda</artifactId>
<version>0.7.5b</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.jcuda</groupId>
<artifactId>jcublas</artifactId>
<version>0.7.5b</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.jcuda</groupId>
<artifactId>jcusparse</artifactId>
<version>0.7.5b</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.jcuda</groupId>
<artifactId>jcudnn</artifactId>
<version>0.7.5</version>
<scope>provided</scope>
</dependency>
-->
<!-- ************************* -->


<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
Expand Down
27 changes: 27 additions & 0 deletions scripts/perftest/microbenchmarks/matmult.dml
@@ -0,0 +1,27 @@
#-------------------------------------------------------------
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
#-------------------------------------------------------------
X = matrix(0.1, rows=$1, cols=$2)
Y = matrix(0.2, rows=$2, cols=$3)
Z = matrix(0, rows=$1, cols=$3)
for(i in 1:$4) {
Z = Z + X %*% Y
}
print(as.scalar(Z[1,1]))
67 changes: 67 additions & 0 deletions scripts/perftest/microbenchmarks/runMatMultExperiments.sh
@@ -0,0 +1,67 @@
#!/bin/bash
#-------------------------------------------------------------
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
#-------------------------------------------------------------

#-------------------------------------------------------------
export JAVA_HOME=... # 64 bit JVM
export SPARK_HOME=...

CONF="--master local[*] --executor-memory 5g"
invoke_systemml() {
iter=$4
setup=$5
echo "Testing "$setup" with "$iter" iterations and using setup ["$1", "$2"] %*% ["$2", "$3"]"
tstart=$(date +%s.%N)
echo $JAVA_OPTS $OUTPUT_SYSTEMML_STATS
$SPARK_HOME/bin/spark-submit $CONF --class org.apache.sysml.api.DMLScript $6 SystemML.jar -f matmult.dml -stats -args $1 $2 $3 $4
ttime=$(echo "$(date +%s.%N) - $tstart" | bc)
echo $setup","$iter","$1","$2","$3","$ttime >> time.txt
}


rm time.txt
iter=1000
export SYSTEMML_GPU=none
echo "-------------------------"
for i in 1 10 100 1000 2000 5000 10000
do
for j in 1 10 100 1000 2000 5000 10000
do
for k in 1 10 100 1000 2000 5000 10000
do
# Intel MKL
export SYSTEMML_BLAS=mkl
invoke_systemml $i $j $k $iter IntelMKL "--jars ./systemml-accelerator.jar"

# OpenBLAS
export SYSTEMML_BLAS=openblas
invoke_systemml $i $j $k $iter OpenBLAS "--jars ./systemml-accelerator.jar"

# Java
invoke_systemml $i $j $k $iter Java ""

# GPU
export SYSTEMML_GPU=cuda
invoke_systemml $i $j $k $iter GPU "--jars ./systemml-accelerator.jar"
export SYSTEMML_GPU=none
done
done
done
35 changes: 14 additions & 21 deletions src/main/java/org/apache/sysml/api/DMLScript.java
Expand Up @@ -66,6 +66,7 @@
import org.apache.sysml.parser.ParseException;
import org.apache.sysml.runtime.DMLRuntimeException;
import org.apache.sysml.runtime.DMLScriptException;
import org.apache.sysml.runtime.controlprogram.CPPUtil;
import org.apache.sysml.runtime.controlprogram.Program;
import org.apache.sysml.runtime.controlprogram.caching.CacheStatistics;
import org.apache.sysml.runtime.controlprogram.caching.CacheableData;
Expand Down Expand Up @@ -129,6 +130,9 @@ public enum RUNTIME_PLATFORM {
public static boolean DISABLE_SPARSE = false;
public static boolean DISABLE_CACHING = false;
// ------------------------------------------------------------------------
// Native BLAS is enabled by default and we fall back to Java BLAS whenever the library is not available
// or whenever operation is not supported (eg: sparse matrix multiplication).
public static final boolean ENABLE_NATIVE_BLAS = true;

// flag that indicates whether or not to suppress any prints to stdout
public static boolean _suppressPrint2Stdout = false;
Expand All @@ -148,8 +152,6 @@ public enum RUNTIME_PLATFORM {
//+ " -s: <filename> will be interpreted as a DML script string \n"
+ " -python: (optional) parses Python-like DML\n"
+ " -debug: (optional) run in debug mode\n"
+ " -gpu: <flags> (optional) use acceleration whenever possible. Current version only supports CUDA.\n"
+ " Supported <flags> for this mode is force=(true|false)\n"
// Later add optional flags to indicate optimizations turned on or off. Currently they are turned off.
//+ " -debug: <flags> (optional) run in debug mode\n"
//+ " Optional <flags> that is supported for this mode is optimize=(on|off)\n"
Expand Down Expand Up @@ -269,7 +271,7 @@ else if( args.length==1 && args[0].equalsIgnoreCase("-clean") ){

//parse arguments and set execution properties
RUNTIME_PLATFORM oldrtplatform = rtplatform; //keep old rtplatform
ExplainType oldexplain = EXPLAIN; //keep old explain
ExplainType oldexplain = EXPLAIN; //keep old explain

// Reset global flags to avoid errors in test suite
ENABLE_DEBUG_MODE = false;
Expand Down Expand Up @@ -305,23 +307,6 @@ else if (args[i].equalsIgnoreCase("-config"))
else if( args[i].equalsIgnoreCase("-debug") ) {
ENABLE_DEBUG_MODE = true;
}
else if( args[i].equalsIgnoreCase("-gpu") ) {
USE_ACCELERATOR = true;
if( args.length > (i+1) && !args[i+1].startsWith("-") ) {
String flag = args[++i];
if(flag.startsWith("force=")) {
String [] flagOptions = flag.split("=");
if(flagOptions.length == 2)
FORCE_ACCELERATOR = Boolean.parseBoolean(flagOptions[1]);
else
throw new DMLRuntimeException("Unsupported \"force\" option for -gpu:" + flag);
}
else {
throw new DMLRuntimeException("Unsupported flag for -gpu:" + flag);
}
}
GPUContext.createGPUContext(); // Set GPU memory budget
}
else if( args[i].equalsIgnoreCase("-python") ) {
parsePyDML = true;
}
Expand All @@ -337,6 +322,10 @@ else if (args[i].startsWith("-args") || args[i].startsWith("-nvargs")) {
}
}

USE_ACCELERATOR = CPPUtil.isGPUAvailable();
if(USE_ACCELERATOR)
GPUContext.createGPUContext(); // Set GPU memory budget

//set log level
if (!ENABLE_DEBUG_MODE)
setLoggingProperties( conf );
Expand Down Expand Up @@ -383,6 +372,10 @@ else if (args[i].startsWith("-args") || args[i].startsWith("-nvargs")) {
return true;
}

public static boolean isNativeEnabled(int numThreads) {
return ENABLE_NATIVE_BLAS && CPPUtil.isLibraryLoaded() && (numThreads <= 0 || numThreads >= CPPUtil.maxNumThreads);
}

///////////////////////////////
// private internal utils (argument parsing)
////////
Expand Down Expand Up @@ -944,4 +937,4 @@ private static void cleanSystemMLWorkspace()
throw new DMLException("Failed to run SystemML workspace cleanup.", ex);
}
}
}
}

0 comments on commit e2d9a16

Please sign in to comment.