VariantSpark is a framework for applying Spark-based Machine Learning methods to whole-genome variant information
VariantSpark Readme


New VariantSpark version (Cursed Forest) available:


  1. Download, example.conf and variantspark-1.0.jar
  2. Ensure is executable chmod +x

Building From Source

If you have trouble running VariantSpark, you can build it yourself using Maven.

  1. Check out the repo
  2. cd VariantSpark/variantspark
  3. vi pom.xml and ensure software versions match those on your cluster.
  4. mvn package to build.
  5. If you built it locally, copy target/VCF-clusterer-0.0.1-SNAPSHOT.jar to your cluster.

Submit a Job

Once installed, use the launcher script, to submit a job to your cluster. You need to specify a configuration file with -c. An example file is available as example.conf. Submit a job using ./ -c example.conf.