-
Notifications
You must be signed in to change notification settings - Fork 308
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
8 changed files
with
3,070 additions
and
3,035 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,159 @@ | ||
Building ADAM from Source | ||
========================= | ||
|
||
You will need to have `Apache Maven <http://maven.apache.org/>`__ | ||
version 3.1.1 or later installed in order to build ADAM. | ||
|
||
**Note:** The default configuration is for Hadoop 2.7.3. If building | ||
against a different version of Hadoop, please pass | ||
``-Dhadoop.version=<HADOOP_VERSION>`` to the Maven command. ADAM | ||
will cross-build for both Spark 1.x and 2.x, but builds by default | ||
against Spark 1.6.3. To build for Spark 2, run the | ||
``./scripts/move_to_spark2.sh`` script. | ||
|
||
.. code:: bash | ||
$ git clone https://github.com/bigdatagenomics/adam.git | ||
$ cd adam | ||
$ export MAVEN_OPTS="-Xmx512m -XX:MaxPermSize=128m" | ||
$ mvn clean package -DskipTests | ||
Outputs | ||
|
||
:: | ||
|
||
... | ||
[INFO] ------------------------------------------------------------------------ | ||
[INFO] BUILD SUCCESS | ||
[INFO] ------------------------------------------------------------------------ | ||
[INFO] Total time: 9.647s | ||
[INFO] Finished at: Thu May 23 15:50:42 PDT 2013 | ||
[INFO] Final Memory: 19M/81M | ||
[INFO] ------------------------------------------------------------------------ | ||
|
||
You might want to take a peek at the ``scripts/jenkins-test`` script and | ||
give it a run. It will fetch a mouse chromosome, encode it to ADAM reads | ||
and pileups, run flagstat, etc. We use this script to test that ADAM is | ||
working correctly. | ||
|
||
Running ADAM | ||
------------ | ||
|
||
ADAM is packaged as an | ||
`überjar <https://maven.apache.org/plugins/maven-shade-plugin/>`__ and | ||
includes all necessary dependencies, except for Apache Hadoop and Apache | ||
Spark. | ||
|
||
You might want to add the following to your ``.bashrc`` to make running | ||
ADAM easier: | ||
|
||
.. code:: bash | ||
alias adam-submit="${ADAM_HOME}/bin/adam-submit" | ||
alias adam-shell="${ADAM_HOME}/bin/adam-shell" | ||
``$ADAM_HOME`` should be the path to where you have checked ADAM out on | ||
your local filesystem. The first alias should be used for running ADAM | ||
jobs that operate locally. The latter two aliases call scripts that wrap | ||
the ``spark-submit`` and ``spark-shell`` commands to set up ADAM. You’ll | ||
need to have the Spark binaries on your system; prebuilt binaries can be | ||
downloaded from the `Spark | ||
website <http://spark.apache.org/downloads.html>`__. Our `continuous | ||
integration setup <https://amplab.cs.berkeley.edu/jenkins/job/ADAM/>`__ | ||
builds ADAM against Spark versions 1.6.1 and 2.0.0, Scala versions 2.10 | ||
and 2.11, and Hadoop versions 2.3.0 and 2.6.0. | ||
|
||
Once this alias is in place, you can run ADAM by simply typing | ||
``adam-submit`` at the commandline, e.g. | ||
|
||
.. code:: bash | ||
$ adam-submit | ||
Building for Python | ||
------------------- | ||
|
||
To build and test `ADAM’s Python bindings <#python>`__, enable the | ||
``python`` profile: | ||
|
||
.. code:: bash | ||
mvn -Ppython package | ||
This will enable the ``adam-python`` module as part of the ADAM build. | ||
This module uses Maven to invoke a Makefile that builds a Python egg and | ||
runs tests. To build this module, we require either an active | ||
`Conda <https://conda.io/>`__ or | ||
`virtualenv <https://virtualenv.pypa.io/en/stable/>`__ environment. | ||
|
||
`To setup and activate a Conda | ||
environment <https://conda.io/docs/using/envs.html>`__, run: | ||
|
||
.. code:: bash | ||
conda create -n adam python=2.7 anaconda | ||
source activate adam | ||
`To setup and activate a virtualenv | ||
environment <https://virtualenv.pypa.io/en/stable/userguide/#usage>`__, | ||
run: | ||
|
||
.. code:: bash | ||
virtualenv adam | ||
. adam/bin/activate | ||
Additionally, to run tests, the PySpark dependencies must be on the | ||
Python module load path and the ADAM JARs must be built and provided to | ||
PySpark. This can be done with the following bash commands: | ||
|
||
.. code:: bash | ||
# add pyspark to the python path | ||
PY4J_ZIP="$(ls -1 "${SPARK_HOME}/python/lib" | grep py4j)" | ||
export PYTHONPATH=${SPARK_HOME}/python:${SPARK_HOME}/python/lib/${PY4J_ZIP}:${PYTHONPATH} | ||
# put adam jar on the pyspark path | ||
ASSEMBLY_DIR="${ADAM_HOME}/adam-assembly/target" | ||
ASSEMBLY_JAR="$(ls -1 "$ASSEMBLY_DIR" | grep "^adam[0-9A-Za-z\.\_-]*\.jar$" | grep -v -e javadoc -e sources || true)" | ||
export PYSPARK_SUBMIT_ARGS="--jars ${ASSEMBLY_DIR}/${ASSEMBLY_JAR} --driver-class-path ${ASSEMBLY_DIR}/${ASSEMBLY_JAR} pyspark-shell" | ||
This assumes that the `ADAM JARs have already been | ||
built <#build-from-source>`__. Additionally, we require | ||
`pytest <https://docs.pytest.org/en/latest/>`__ to be installed. The | ||
adam-python makefile can install this dependency. Once you have an | ||
active virtualenv or Conda environment, run: | ||
|
||
.. code:: bash | ||
cd adam-python | ||
make prepare | ||
Building for R | ||
-------------- | ||
|
||
ADAM supports SparkR, for Spark 2.1.0 and onwards. To build and test | ||
`ADAM’s R bindings <#r>`__, enable the ``r`` profile: | ||
|
||
.. code:: bash | ||
mvn -Pr package | ||
This will enable the ``adam-r`` module as part of the ADAM build. This | ||
module uses Maven to invoke the ``R`` executable to build the | ||
``bdg.adam`` package and run tests. Beyond having ``R`` installed, we | ||
require you to have the ``SparkR`` package installed, and the ADAM JARs | ||
must be built and provided to ``SparkR``. This can be done with the | ||
following bash commands: | ||
|
||
.. code:: bash | ||
# put adam jar on the SparkR path | ||
ASSEMBLY_DIR="${ADAM_HOME}/adam-assembly/target" | ||
ASSEMBLY_JAR="$(ls -1 "$ASSEMBLY_DIR" | grep "^adam[0-9A-Za-z\_\.-]*\.jar$" | grep -v javadoc | grep -v sources || true)" | ||
export SPARKR_SUBMIT_ARGS="--jars ${ASSEMBLY_DIR}/${ASSEMBLY_JAR} --driver-class-path ${ASSEMBLY_DIR}/${ASSEMBLY_JAR} sparkr-shell" | ||
Note that the ``ASSEMBLY_DIR`` and ``ASSEMBLY_JAR`` lines are the same | ||
as for the `Python build <#python-build>`__. As with the Python build, | ||
this assumes that the `ADAM JARs have already been | ||
built <#build-from-source>`__. |
Oops, something went wrong.