No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
bin
gnocchi-cli
gnocchi-core
scripts
.gitignore
ADAMKryoRegistrator.scala
GnocchiFunSuite.scala
Gnocchi_README.md
LICENSE
LICENSE_header.txt
README.md
fixPheno4Gnocchi.py
fixVCF4Gnocchi.py
gnocchi-parent.iml
pom.xml

README.md

gnocchi

Coverage Status

Genotype-phenotype analysis using the ADAM genomics analysis platform. This is work-in-progress. Currently, we implement a simple case/control analysis using a Chi squared test.

Build

To build, install Maven. Then run:

mvn package

Maven will automatically pull down and install all of the necessary dependencies. Occasionally, building in Maven will fail due to memory issues. You can work around this by setting the MAVEN_OPTS environment variable to -Xmx2g -XX:MaxPermSize=1g.

Run

To run, you'll need to install Spark. If you are just evaluating locally, you can use a prebuilt Spark distribution. If you'd like to use a cluster, refer to Spark's cluster overview.

Once Spark is installed, set the environment variable SPARK_HOME to point to the Spark installation root directory. Then, you can run gnocchi via ./bin/gnocchi-submit.

We include test data. You can run with the test data by running:

./bin/gnocchi-submit regressPhenotypes testData/sample.vcf testData/samplePhenotypes.csv testData/associations -saveAsText

Phenotype Input

We accept phenotype inputs in a CSV format:

Sample,Phenotype,Has Phenotype
mySample,a phenotype,true

The has phenotype column is binary true/false. See the test data for more descriptions.

License

This project is released under an Apache 2.0 license.