Skip to content

amplab/keystone

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
bin
 
 
 
 
lib
 
 
 
 
sbt
 
 
 
 
src
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

KeystoneML

The biggest, baddest pipelines around.

Example pipeline

Build the KeystoneML project

./sbt/sbt assembly
make # This builds the native libraries used in KeystoneML

Example: MNIST pipeline

# Get the data from S3
wget http://mnist-data.s3.amazonaws.com/train-mnist-dense-with-labels.data
wget http://mnist-data.s3.amazonaws.com/test-mnist-dense-with-labels.data

KEYSTONE_MEM=4g ./bin/run-pipeline.sh \
  keystoneml.pipelines.images.mnist.MnistRandomFFT \
  --trainLocation ./train-mnist-dense-with-labels.data \
  --testLocation ./test-mnist-dense-with-labels.data \
  --numFFTs 4 \
  --blockSize 2048

Running with spark-submit

To run KeystoneML pipelines on large datasets you will need a Spark cluster. KeystoneML pipelines run on the cluster using spark-submit.

You need to export SPARK_HOME to run KeystoneML using spark-submit. Having done that you can similarly use run-pipeline.sh to launch your pipeline.

export SPARK_HOME=~/spark-1.3.1-bin-cdh4 # should match the version keystone is built with
KEYSTONE_MEM=4g ./bin/run-pipeline.sh \
  keystoneml.pipelines.images.mnist.MnistRandomFFT \
  --trainLocation ./train-mnist-dense-with-labels.data \
  --testLocation ./test-mnist-dense-with-labels.data \
  --numFFTs 4 \
  --blockSize 2048

About

Simplifying robust end-to-end machine learning on Apache Spark.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages