[SYSTEMML-769] [WIP] Support for automatic detection of native BLAS a…

…nd GPU backend 1. Support for automatic detection of BLAS (MKL and OpenBLAS) and GPU backend. 2. Added native matmult and conv2d functions. In case if native library is not available, we fallback to Java implementation. 3. This will allow us to explore distributed GPU solution.
apache · Jan 6, 2017 · e2d9a16 · e2d9a16
1 parent 7a30925
commit e2d9a16
Show file tree

Hide file tree

Showing 29 changed files with 558 additions and 173 deletions.
diff --git a/docs/accelerator.md b/docs/accelerator.md
@@ -0,0 +1,137 @@
+---
+layout: global
+title: Using systemml-accelerator
+description: Using systemml-accelerator
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+<br/>
+
+## Introduction
+
+The [systemml-accelerator](https://github.com/niketanpansare/systemml-accelerator) packages system-dependent libraries
+to simplify deployment. It can be used to allow SystemML to use native BLAS as well as use hardware accelerators (such as Nvidia's GPU).
+
+If you are [installing SystemML using pip](https://apache.github.io/incubator-systemml/beginners-guide-python#install-systemml), 
+no additional action is required. If you intend to use SystemML in any other way, you have to ensure that `systemml-accelerator.jar`
+is available.
+
+## Using native BLAS
+
+By default, SystemML implements all its matrix operations in Java. This simplifies deployment especially in a distributed environment.
+However, in some cases (such as deep learning), the user might want to use native BLAS rather than SystemML's internal Java library.
+The current version supports only 64-bit JVM and Intel MKL (recommended) and OpenBLAS. For any other setup, we fallback to SystemML's internal Java library.
+
+### Steps for installing Intel MKL
+
+Download and install the [community version of Intel MKL](https://software.intel.com/sites/campaigns/nest/). 
+Intel requires you to first register your email address and then sends the download link to your email address with license key.
+
+<div class="codetabs">
+<div data-lang="Linux" markdown="1">
+```bash
+1. Extract the downloaded .tgz file and execute install.sh.
+```
+</div>
+<div data-lang="Windows" markdown="1">
+```bash
+1. Execute the downloaded .exe file and follow the guided setup.
+```
+</div>
+</div>
+
+### Steps for installing OpenBLAS
+
+<div class="codetabs">
+<div data-lang="Linux" markdown="1">
+```bash
+1. Install OpenBLAS via yum/apt-get
+# Fedora, Centos
+sudo yum install openblas
+# Ubuntu
+sudo apt-get install openblas
+
+2.  If OpenBLAS not getting picked up, double-check if you are using 64-bit Java and if `libopenblas` is available.
+ldconfig -p | grep libopenblas
+# You may have to add explicit link using following command:
+sudo ln -s /lib64/libopenblas.so.0 /usr/lib64/libopenblas.so
+```
+</div>
+<div data-lang="Windows" markdown="1">
+```bash
+Download the pre-built binaries or install from the source.
+```
+</div>
+</div> 
+
+Links:
+1. [Pre-built OpenBLAS binaries](https://sourceforge.net/projects/openblas/), 
+2. [OpenBLAS source](https://github.com/xianyi/OpenBLAS).
+
+By default, SystemML searches for Intel MKL and then OpenBLAS to select the underlying BLAS.
+If both are not found, we fallback to SystemML's internal Java library.
+If you want to explicitly select the underlying BLAS or disable native BLAS, please set
+the environment variable `SYSTEMML_BLAS` to `mkl/openblas/none`.
+
+## Using GPU
+
+SystemML requires that CUDA 8.0 and CuDNN 5.1 is installed on the machine to exploit GPU. 
+If these libraries are not installed, we fall back to non-GPU plan.
+Like native BLAS, to exploit GPU we require that `systemml-accelerator.jar` is available.
+
+If you want to explicitly disable using GPU, please set the environment variable `SYSTEMML_GPU` to `none` (default: `cuda`).
+
+To test if BLAS and/or GPU is enabled, please follow the below steps:
+
+<div class="codetabs">
+<div data-lang="PySpark" markdown="1">
+```bash
+from systemml import random
+m1 = random.uniform(size=(1000,1000))
+m2 = random.uniform(size=(1000,1000))
+m3 = m1.dot(m2).toNumPy()
+```
+</div>
+<div data-lang="Scala" markdown="1">
+```bash
+SYSTEMML_HOME=`python -c 'import imp; import os; print imp.find_module("systemml")[1]'`
+SYSTEMML_JAR=`ls $SYSTEMML_HOME/systemml-java/systemml*incubating*.jar`
+ACCELERATOR_JAR=`ls $SYSTEMML_HOME/systemml-java/systemml-accelerator.jar`
+$SPARK_HOME/bin/spark-shell --jars $SYSTEMML_JAR,$ACCELERATOR_JAR
+scala> import org.apache.sysml.api.mlcontext._
+scala> import org.apache.sysml.api.mlcontext.ScriptFactory._
+scala> val ml = new MLContext(sc)
+scala> val script = dml("X = matrix(0.1, rows=1000, cols=1000); Y = matrix(0.2, rows=1000, cols=1000); Z = X %*% Y; print(sum(Z))")
+scala> ml.execute(script)
+```
+</div>
+</div>
+
+The above script should output either .
+```bash
+accelerator.BLASHelper: Found BLAS: (mkl/openblas)
+or
+accelerator.LibraryLoader: Unable to load (MKL/OpenBLAS)
+```
+
+Note: if `systemml-accelerator.jar` is not included via `--jars` (for spark-shell or spark-submit), then we fall back to SystemML's internal Java library.
diff --git a/pom.xml b/pom.xml
@@ -83,14 +83,6 @@
             <enabled>true</enabled>
           </releases>
         </repository>
-        <repository>
-            <id>mavenized-jcuda-mvn-repo</id>
-            <url>https://raw.github.com/niketanpansare/mavenized-jcuda/mvn-repo/</url>
-            <snapshots>
-                <enabled>true</enabled>
-                <updatePolicy>always</updatePolicy>
-            </snapshots>
-        </repository>
     </repositories>
 
 	<build>
@@ -1002,50 +994,21 @@
 
 	<dependencies>
 
-		<!-- For GPU backend
-			Use org.mystic:mavenized-jcuda until Alan puts org.jcuda:*
-		 -->
 		<dependency>
-			<groupId>org.mystic</groupId>
-            <artifactId>mavenized-jcuda</artifactId>
-            <version>0.7.5b</version>
-            <type>jar</type>
-            <scope>provided</scope>
-            <exclusions>
+			<groupId>org.systemml</groupId>
+			<artifactId>accelerator</artifactId>
+			<version>0.0.1-SNAPSHOT</version>
+			<scope>system</scope>
+			<!-- Useful for pip install -->
+			<systemPath>${project.basedir}/src/test/config/local_jars/systemml-accelerator.jar</systemPath>
+			<exclusions>
 		        <exclusion>
 		            <groupId>*</groupId>
 		            <artifactId>*</artifactId>
 		        </exclusion>
 		    </exclusions>
 		</dependency>
-		<!-- Since there is no mvn repo for jcuda
-		<dependency>
-			<groupId>org.jcuda</groupId>
-			<artifactId>jcuda</artifactId>
-			<version>0.7.5b</version>
-			<scope>provided</scope>
-		</dependency>
-		<dependency>
-			<groupId>org.jcuda</groupId>
-			<artifactId>jcublas</artifactId>
-			<version>0.7.5b</version>
-			<scope>provided</scope>
-		</dependency>
-		<dependency>
-			<groupId>org.jcuda</groupId>
-			<artifactId>jcusparse</artifactId>
-			<version>0.7.5b</version>
-			<scope>provided</scope>
-		</dependency>
-		<dependency>
-			<groupId>org.jcuda</groupId>
-			<artifactId>jcudnn</artifactId>
-			<version>0.7.5</version>
-			<scope>provided</scope>
-		</dependency>
-		 -->
-		<!-- ************************* -->
-
+
 		<dependency>
 			<groupId>org.apache.spark</groupId>
 			<artifactId>spark-core_${scala.binary.version}</artifactId>

diff --git a/scripts/perftest/microbenchmarks/matmult.dml b/scripts/perftest/microbenchmarks/matmult.dml
@@ -0,0 +1,27 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+X = matrix(0.1, rows=$1, cols=$2)
+Y = matrix(0.2, rows=$2, cols=$3)
+Z = matrix(0, rows=$1, cols=$3)
+for(i in 1:$4) {
+        Z = Z + X %*% Y
+}
+print(as.scalar(Z[1,1]))
diff --git a/scripts/perftest/microbenchmarks/runMatMultExperiments.sh b/scripts/perftest/microbenchmarks/runMatMultExperiments.sh
@@ -0,0 +1,67 @@
+#!/bin/bash
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+#-------------------------------------------------------------
+export JAVA_HOME=... # 64 bit JVM
+export SPARK_HOME=...
+
+CONF="--master local[*] --executor-memory 5g"
+invoke_systemml() {
+        iter=$4
+        setup=$5
+        echo "Testing "$setup" with "$iter" iterations and using setup ["$1", "$2"] %*% ["$2", "$3"]"
+        tstart=$(date +%s.%N)
+        echo $JAVA_OPTS $OUTPUT_SYSTEMML_STATS
+        $SPARK_HOME/bin/spark-submit $CONF --class org.apache.sysml.api.DMLScript $6 SystemML.jar -f matmult.dml -stats -args $1 $2 $3 $4
+        ttime=$(echo "$(date +%s.%N) - $tstart" | bc)
+        echo $setup","$iter","$1","$2","$3","$ttime >> time.txt
+}
+
+
+rm time.txt
+iter=1000
+export SYSTEMML_GPU=none
+echo "-------------------------"
+for i in 1 10 100 1000 2000 5000 10000
+do
+for j in 1 10 100 1000 2000 5000 10000
+do
+for k in 1 10 100 1000 2000 5000 10000
+do
+        # Intel MKL
+        export SYSTEMML_BLAS=mkl
+        invoke_systemml $i $j $k $iter IntelMKL "--jars ./systemml-accelerator.jar"
+
+        # OpenBLAS
+        export SYSTEMML_BLAS=openblas
+        invoke_systemml $i $j $k $iter OpenBLAS "--jars ./systemml-accelerator.jar"
+
+        # Java
+        invoke_systemml $i $j $k $iter Java ""
+
+        # GPU
+        export SYSTEMML_GPU=cuda
+        invoke_systemml $i $j $k $iter GPU "--jars ./systemml-accelerator.jar"
+        export SYSTEMML_GPU=none
+done
+done
+done
diff --git a/src/main/java/org/apache/sysml/api/DMLScript.java b/src/main/java/org/apache/sysml/api/DMLScript.java
@@ -66,6 +66,7 @@
 import org.apache.sysml.parser.ParseException;
 import org.apache.sysml.runtime.DMLRuntimeException;
 import org.apache.sysml.runtime.DMLScriptException;
+import org.apache.sysml.runtime.controlprogram.CPPUtil;
 import org.apache.sysml.runtime.controlprogram.Program;
 import org.apache.sysml.runtime.controlprogram.caching.CacheStatistics;
 import org.apache.sysml.runtime.controlprogram.caching.CacheableData;
@@ -129,6 +130,9 @@ public enum RUNTIME_PLATFORM {
 	public static boolean DISABLE_SPARSE = false;
 	public static boolean DISABLE_CACHING = false;
 	// ------------------------------------------------------------------------
+	// Native BLAS is enabled by default and we fall back to Java BLAS whenever the library is not available 
+	// or whenever operation is not supported (eg: sparse matrix multiplication). 
+	public static final boolean ENABLE_NATIVE_BLAS = true;
 
 	// flag that indicates whether or not to suppress any prints to stdout
 	public static boolean _suppressPrint2Stdout = false;
@@ -148,8 +152,6 @@ public enum RUNTIME_PLATFORM {
 			//+ "   -s: <filename> will be interpreted as a DML script string \n"
 			+ "   -python: (optional) parses Python-like DML\n"
 			+ "   -debug: (optional) run in debug mode\n"
-			+ "   -gpu: <flags> (optional) use acceleration whenever possible. Current version only supports CUDA.\n"
-			+ "			Supported <flags> for this mode is force=(true|false)\n"
 			// Later add optional flags to indicate optimizations turned on or off. Currently they are turned off.
 			//+ "   -debug: <flags> (optional) run in debug mode\n"
 			//+ "			Optional <flags> that is supported for this mode is optimize=(on|off)\n"
@@ -269,7 +271,7 @@ else if( args.length==1 && args[0].equalsIgnoreCase("-clean") ){
 
 		//parse arguments and set execution properties
 		RUNTIME_PLATFORM oldrtplatform = rtplatform; //keep old rtplatform
-		ExplainType oldexplain = EXPLAIN; //keep old explain
+		ExplainType oldexplain = EXPLAIN; //keep old explain	
 
 		// Reset global flags to avoid errors in test suite
 		ENABLE_DEBUG_MODE = false;
@@ -305,23 +307,6 @@ else if (args[i].equalsIgnoreCase("-config"))
 				else if( args[i].equalsIgnoreCase("-debug") ) {					
 					ENABLE_DEBUG_MODE = true;
 				}
-				else if( args[i].equalsIgnoreCase("-gpu") ) {	
-					USE_ACCELERATOR = true;
-					if( args.length > (i+1) && !args[i+1].startsWith("-") ) {
-						String flag = args[++i];
-						if(flag.startsWith("force=")) {
-							String [] flagOptions = flag.split("=");
-							if(flagOptions.length == 2)
-								FORCE_ACCELERATOR = Boolean.parseBoolean(flagOptions[1]);
-							else
-								throw new DMLRuntimeException("Unsupported \"force\" option for -gpu:" + flag);
-						}
-						else {
-							throw new DMLRuntimeException("Unsupported flag for -gpu:" + flag);
-						}
-					}
-					GPUContext.createGPUContext(); // Set GPU memory budget
-				}
 				else if( args[i].equalsIgnoreCase("-python") ) {
 					parsePyDML = true;
 				}
@@ -337,6 +322,10 @@ else if (args[i].startsWith("-args") || args[i].startsWith("-nvargs")) {
 				}
 			}
 
+			USE_ACCELERATOR = CPPUtil.isGPUAvailable();
+			if(USE_ACCELERATOR)
+				GPUContext.createGPUContext(); // Set GPU memory budget
+
 			//set log level
 			if (!ENABLE_DEBUG_MODE)
 				setLoggingProperties( conf );
@@ -383,6 +372,10 @@ else if (args[i].startsWith("-args") || args[i].startsWith("-nvargs")) {
 		return true;
 	}
 
+	public static boolean isNativeEnabled(int numThreads) {
+		return ENABLE_NATIVE_BLAS && CPPUtil.isLibraryLoaded() && (numThreads <= 0 || numThreads >= CPPUtil.maxNumThreads); 
+	}
+
 	///////////////////////////////
 	// private internal utils (argument parsing)
 	////////
@@ -944,4 +937,4 @@ private static void cleanSystemMLWorkspace()
 			throw new DMLException("Failed to run SystemML workspace cleanup.", ex);
 		}
 	}
-}  
+}