This repository has been archived by the owner. It is now read-only.
The C++ engine that powers the scientific computing library ND4J - n-dimensional arrays for Java
Clone or download
shugeo and raver119 Shugeo roll (#905)
* roll op initial version and tests.

* roll op and tests the first working version.

* Cillular indexing for shift with linear roll.

* Multiaxes version. Initial revision.

* roll op: helpered revision.

* inplace flag was applied.

* Working inplace flag and simplification of linear roll

* Simplification and optimization.

* A next stage for non-linear rolls

* Added test for 3D tensor.

* Last changes.

* Inplace oriented version. Initital revision.

* Implemented for multiple axis.

* Avoid unwanted duplication. Proper inplace using.

* Inplace changes and test.

* Reviewed linear roll with inplace.

* The first full working version.

* Final without abuse messages.

* Improved version.

* Added test for large tensor.
Latest commit 3898eda May 21, 2018
Type Name Latest commit message Commit time
Failed to load latest commit information.
blas Long shapes (#884) May 18, 2018
cmake micro tweak Apr 12, 2018
include Shugeo roll (#905) May 20, 2018
minifier Minifier (#866) Apr 22, 2018
msi Msi (#172) May 8, 2016
packages Add native rpm / deb packaging facilities (#366) Apr 2, 2017
profile s/long/long long Mar 4, 2016
tests_cpu Shugeo roll (#905) May 20, 2018
.gitignore R119 still polishing (#873) May 1, 2018 Long shapes (#884) May 18, 2018
CMakeLists.txt Downgrade CMAKE requirement to 3.6 (#823) Mar 17, 2018
Jenkinsfile jenkinsfile again Jan 27, 2018
LICENSE s/long/long long Mar 4, 2016
Neanderthal-EPL.txt s/long/long long Mar 4, 2016 R119 documentation update (#874) May 1, 2018 Changes to support Raspberry Pi (pull #331) Nov 26, 2016 R119 documentation update (#874) May 1, 2018
assembly.xml Add pom.xml file to build and deploy assemblies (#702) Jan 29, 2018 Minifier (#866) Apr 22, 2018 Add native rpm / deb packaging facilities (#366) Apr 2, 2017 Create Apr 26, 2016
flatproto.txt Workspaces in Graph (#799) Feb 28, 2018 Add note for iOS build (#400) Jun 16, 2017 adding (#245) Jun 28, 2016
macOSx10 (CPU only).md Updated mac os documentation (#511) Oct 11, 2017 Link dynamically to GCC runtimes on Mac to support C++ exceptions (#853) Apr 3, 2018
pom.xml Update to version 1.0.0-SNAPSHOT May 16, 2018 protobuf updated Aug 28, 2017 Update to install GCC Nov 4, 2016 R119 documentation update (#874) May 1, 2018


Native operations for nd4j. Build using cmake


  • GCC 4.9+
  • CUDA 8.0 or 9.0 (if desired)
  • CMake 3.8 (as of Nov 2017, in near future will require 3.9)

Additional build arguments

There's few additional arguments for script you could use:

 -a XXXXXXXX// shortcut for -march/-mtune, i.e. -a native
 -b release OR -b debug // enables/desables debug builds. release is considered by default
 -j XX // this argument defines how many threads will be used to binaries on your box. i.e. -j 8 
 -cc XX// CUDA-only argument, builds only binaries for target GPU architecture. use this for fast builds

You can find the compute capability for your card on the NVIDIA website here.

For example, a GTX 1080 has compute capability 6.1, for which you would use -cc 61 (note no decimal point).

OS Specific Requirements


Download the NDK, extract it somewhere, and execute the following commands, replacing android-xxx with either android-arm or android-x86:

git clone
git clone
export ANDROID_NDK=/path/to/android-ndk/
cd libnd4j
bash -platform android-xxx
cd ../nd4j
mvn clean install -Djavacpp.platform=android-xxx -DskipTests -pl '!:nd4j-cuda-9.0,!:nd4j-cuda-9.0-platform,!:nd4j-tests'


Run ./ (Please ensure you have brew installed)

See macOSx10 CPU


Depends on the distro - ask in the earlyadopters channel for specifics on distro

Ubuntu Linux 15.10

sudo dpkg -i cuda-repo-ubuntu1504-7-5-local_7.5-18_amd64.deb
sudo apt-get update
sudo apt-get install cuda
sudo apt-get install cmake
sudo apt-get install gcc-4.9
sudo apt-get install g++-4.9
sudo apt-get install git
git clone
cd libnd4j/
export LIBND4J_HOME=~/libnd4j/
sudo rm /usr/bin/gcc
sudo rm /usr/bin/g++
sudo ln -s /usr/bin/gcc-4.9 /usr/bin/gcc
sudo ln -s /usr/bin/g++-4.9 /usr/bin/g++
./ -c cuda -сс YOUR_DEVICE_ARCH

Ubuntu Linux 16.04

sudo apt install cmake
sudo apt install nvidia-cuda-dev nvidia-cuda-toolkit nvidia-361
./ -c cuda -сс YOUR_DEVICE_ARCH

The standard development headers are needed.

CentOS 6

yum install centos-release-scl-rh epel-release
yum install devtoolset-3-toolchain maven30 cmake3 git
scl enable devtoolset-3 maven30 bash
./ -c cuda -сс YOUR_DEVICE_ARCH



Setup for All OS

  1. Set a LIBND4J_HOME as an environment variable to the libnd4j folder you've obtained from GIT

    • Note: this is required for building nd4j as well.
  2. Setup cpu followed by gpu, run the following on the command line:

    • For standard builds:

      ./ -c cuda -сс YOUR_DEVICE_ARCH
    • For Debug builds:

      ./ blas -b debug
      ./ blas -c cuda -сс YOUR_DEVICE_ARCH -b debug
    • For release builds (default):

      ./ -c cuda -сс YOUR_DEVICE_ARCH

OpenMP support

OpenMP 4.0+ should be used to compile libnd4j. However, this shouldn't be any trouble, since OpenMP 4 was released in 2015 and should be available on all major platforms.

Linking with MKL

We can link with MKL either at build time, or at runtime with binaries initially linked with another BLAS implementation such as OpenBLAS. In either case, simply add the path containing (or mkl_rt.dll on Windows), say /path/to/intel64/lib/, to the LD_LIBRARY_PATH environment variable on Linux (or PATH on Windows), and build or run your Java application as usual. If you get an error message like undefined symbol: omp_get_num_procs, it probably means that, libiomp5.dylib, or libiomp5md.dll is not present on your system. In that case though, it is still possible to use the GNU version of OpenMP by setting these environment variables on Linux, for example:

export LD_PRELOAD=/usr/lib64/

##Troubleshooting MKL

Sometimes the above steps might not be all you need to do. Another additional step might be the need to add:

export LD_LIBRARY_PATH=/opt/intel/lib/intel64/:/opt/intel/mkl/lib/intel64

This ensures that mkl will be found first and liked to.


If on Ubuntu (14.04 or above) or CentOS (6 or above), this repository is also set to create packages for your distribution. Let's assume you have built:

  • for the cpu, your command-line was ./ ...:
cd blasbuild/cpu
make package
  • for the gpu, your command-line was ./ -c cuda ...:
cd blasbuild/cuda
make package

Uploading package to Bintray

The package upload script is in packaging. The upload command for an rpm built for cpu is:

./packages/ myAPIUser myAPIKey deeplearning4j blasbuild/cpu/libnd4j-0.8.0.fc7.3.1611.x86_64.rpm

The upload command for a deb package built for cuda is:

./packages/ myAPIUser myAPIKey deeplearning4j blasbuild/cuda/libnd4j-0.8.0.fc7.3.1611.x86_64.deb

##Running tests

Tests are written with gtest, run using cmake. Tests are currently under tests_cpu/

There are 2 directories for running tests:

1. libnd4j_tests: These are older legacy ops tests.
2. layers_tests: This covers the newer graph operations and ops associated with samediff.

For running the tests, we currently use cmake to run the tests. We typically use clion for our tests.