Alchemist: an Apache Spark<->MPI interface
Switch branches/tags
Nothing to show
Clone or download
Pull request Compare This branch is 10 commits ahead of jey:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
core
data
project
setup
test/src/main/scala explicitly using dist-eigs mode in testSVD code May 24, 2018
.gitignore
CORI_README.md rolled back communication changes, they were causing hanging Feb 21, 2018
ISSUES
LICENSE.md
Makefile using my own implementation of GEMM to avoid relayout Mar 12, 2018
PROTOCOL
README.md
RunKmeansLocal.sh
RunSvdLocal.sh
TODOS.md
architecture.png
build.sbt cleaned up to prep for merging Nov 7, 2017
createtesth5.m added code for GEMM for two row distributed matrices Mar 15, 2018
run-2T-climate-svd-test-cori-alchemist.sh rolled back to elemental GEMM instead of custom GEMM, because custom … Mar 22, 2018
run-admmkrr-test-cori.sh
run-admmkrr-test-mac.sh cg-krr scala code now compiles, but will not run (need the c++ backen… Jan 10, 2018
run-cgkrr-test-cori.sh
run-cgkrr-test-mac.sh added random fourier feature computation function to Alchemist, added… Jan 13, 2018
run-climate-svd-test-cori-alchemist-and-spark.sh
run-climate-svd-test-cori-pure-alchemist.sh modified scripts for svd test with 400GB dataset to use k=200 and run… May 24, 2018
run-climate-svd-test-cori-pure-spark.sh
run-climate-svd-test-mac.sh added code for GEMM for two row distributed matrices Mar 15, 2018
run-convert-2T-to-parquet.sh
run-dump-climate-parquet.sh
run-lad-test-mac.sh
run-lsqr-test-mac.sh ADMM KRR test code in place, runs but currently Skylark's solver's no… Jan 3, 2018
run-matmul-test-cori.sh
run-matmul-test-mac.sh
run-svd-pure-spark-cori.sh fixed issues with Spark svd test code Feb 23, 2018
run-svd-test-cori.sh
run-svd-test-mac.sh
test-h5spark.sh

README.md

Alchemist is a framework for easily and efficiently calling MPI-based codes from Apache Spark.

Platonic Alchemist Architecture

Supporting libraries that Alchemist uses:

  • Elemental -- used for distributing the matrices b/w Alchemist processes, distributed linear algebra
  • Eigen3 -- used for local matrix manipulations (more convenient interface than Elemental)
  • Arpack-ng -- for the computation of truncated SVDs
  • Arpackpp -- very convenient C++ interface to Arpack-ng

The remainder of this file gives instructions for installing and running Alchemist locally on a Powerbook. See also instructions for installing and running Alchemist on Cori, a NERSC supercomputer.

To run Alchemist in a fresh terminal:

cd $HOME/Documents/alchemist # or wherever you installed it
export ALPREFIX=$HOME/Documents/alchemist/bins # or whatever you used during install
export PATH=$PATH:$HOME/local/spark-2.1.1/bin # or wherever spark-bin is located
export TMPDIR=/tmp # avoid a Mac specific issue with tmpdir length
make # will both build and run the test suite

Installation instructions (for running locally on MacOS 10.12) (NEED TO BE UPDATED: adapt the setup/cori-bootstrap.sh script)

Install some prereqs

Assuming that the XCode command line tools, Homebrew, and Spark have been installed:

brew install gcc
brew install make --with-default-names
brew install cmake
brew install boost-mpi
brew install sbt

Clone the Alchemist repo and set the ALPREFIX environment variable where supporting libraries will be installed

cd Documents # (if you want to install alchemist in $HOME/Documents/alchemist)
git clone https://github.com/alexgittens/alchemist.git
cd alchemist 
export ALPREFIX=$HOME/Documents/alchemist/bins

Install Elemental into ALPREFIX

git clone https://github.com/elemental/Elemental.git
cd Elemental
git checkout 0.87
mkdir build
cd build
CC=gcc-7 CXX=g++-7 FC=gfortran-7 cmake -DCMAKE_BUILD_TYPE=Release -DEL_IGNORE_OSX_GCC_ALIGNMENT_PROBLEM=ON -DCMAKE_INSTALL_PREFIX=$ALPREFIX ..
nice make -j8
make install
cd ../..
rm -rf Elemental

Install Eigen3 into ALPREFIX

curl -L -O http://bitbucket.org/eigen/eigen/get/3.3.4.zip
unzip 3.3.4.zip
rm 3.3.4.zip
cd eigen-eigen-5a0156e40feb # or whatever the tag is
mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX=$ALPREFIX ..
make install
cd ../..
rm -rf eigen-eigen-5a0156e40feb

Install SPDLog into ALPREFIX

git clone https://github.com/gabime/spdlog.git
cd spdlog
mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX=$ALPREFIX ..
make install
cd ../..
rm -rf spdlog

Install Arpack-ng into ALPREFIX

git clone https://github.com/opencollab/arpack-ng.git
cd arpack-ng
mkdir build
cd build
CC=gcc-7 FC=gfortran-7 cmake -DMPI=ON -DBUILD_SHARED_LIBS=ON -DCMAKE_INSTALL_PREFIX=$ALPREFIX ..
nice make -j8
make install
cd ../..
rm -rf arpack-ng

Install Arpackpp into ALPREFIX

git clone https://github.com/m-reuter/arpackpp.git
cd arpackpp
mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX=$ALPREFIX ..
make install
cd ../..
rm -rf arpackpp

To test

Needs to be made less manual and more in line with standard practices, e.g., see the spark-perf project