Skip to content

Commit

Permalink
Add ability to build everything inside a docker container
Browse files Browse the repository at this point in the history
  • Loading branch information
nicolaasuni committed Jun 21, 2018
1 parent e795ca8 commit d05c86f
Show file tree
Hide file tree
Showing 5 changed files with 132 additions and 7 deletions.
18 changes: 18 additions & 0 deletions Makefile
Expand Up @@ -7,6 +7,15 @@
# List special make targets that are not associated with files
.PHONY: help c go javascript python r format clean

# CVS path (path to the parent dir containing the project)
CVSPATH=github.com/genomicsplc

# Project vendor
VENDOR=genomicsplc

# Project name
PROJECT=variantkey

# --- MAKE TARGETS ---

# Display general help about this command
Expand All @@ -21,6 +30,7 @@ help:
@echo " make python : Build and test the Python version"
@echo " make r : Build and test the R version"
@echo " make clean : Remove any build artifact"
@echo " make dbuild : Build everything inside a Docker container"
@echo ""

all: clean c go javascript python r
Expand Down Expand Up @@ -53,3 +63,11 @@ clean:
cd javascript && make clean
cd python && make clean
cd r && make clean

# Build everything inside a Docker container
dbuild:
@mkdir -p target
@rm -rf target/*
@echo 0 > target/make.exit
CVSPATH=$(CVSPATH) VENDOR=$(VENDOR) PROJECT=$(PROJECT) MAKETARGET='$(MAKETARGET)' ./dockerbuild.sh
@exit `cat target/make.exit`
51 changes: 45 additions & 6 deletions README.md
Expand Up @@ -19,14 +19,53 @@ The individual components of short variants (up to 11 bases between REF and ALT
This software library can be used to generate and reverse VariantKeys.


## Quick Start

This project includes a Makefile that allows you to test and build the project in a Linux-compatible system with simple commands.

To see all available options, from the project root type:

```
make help
```

To build all the VriantKey versions inside a Docker container (requires Docker):

```
make dbuild
```

An arbitrary make target can be executed inside a [Docker](https://www.docker.com/) container by specifying the `MAKETARGET` parameter:

```
MAKETARGET='build' make dbuild
```
The list of make targets can be obtained by typing ```make```


The base Docker building environment is defined in the following Dockerfile:

```
resources/Docker/Dockerfile.dev
```

To build and test only a specific language version, `cd` into the language directory and use the `make` command.
For example:

```
cd c
make test
```


## Human Genetic Variant Definition

In this context, the human genetic variant for a given genome assembly is defined as the set of four components compatible with the VCF format:

* **CHROM** - chromosome: An identifier from the reference genome. It only has 26 valid values: autosomes from 1 to 22, the sex chromosomes X=23 and Y=24, mitochondria MT=25 and a symbol NA=0 to indicate missing data.
* **POS** - position: The reference position in the chromosome, with the 1st nucleotide having position 0. The largest expected value is 247,199,718 to represent the last base pair in the chromosome 1.
* **REF** - reference allele: String containing a sequence of reference nucleotide letters. The value in the POS field refers to the position of the first nucleotide in the String.
* **ALT** - alternate allele: Single alternate non-reference allele. String containing a sequence of nucleotide letters. Multialleic variants must be decomposed in individual bialleic variants.
* **`CHROM`** - chromosome: An identifier from the reference genome. It only has 26 valid values: autosomes from 1 to 22, the sex chromosomes X=23 and Y=24, mitochondria MT=25 and a symbol NA=0 to indicate missing data.
* **`POS`** - position: The reference position in the chromosome, with the 1st nucleotide having position 0. The largest expected value is 247,199,718 to represent the last base pair in the chromosome 1.
* **`REF`** - reference allele: String containing a sequence of reference nucleotide letters. The value in the POS field refers to the position of the first nucleotide in the String.
* **`ALT`** - alternate allele: Single alternate non-reference allele. String containing a sequence of nucleotide letters. Multialleic variants must be decomposed in individual bialleic variants.


## Variant Decomposition and Normalization
Expand Down Expand Up @@ -178,14 +217,14 @@ The VariantKey is composed of 3 sections arranged in 64 bit:
```
This section allow two different type of encodings:

* Non-reversible encoding
* **Non-reversible encoding**

If the total number of nucleotides between REF and ALT is more then 11, or if any of the alleles contains nucleotide letters other than base A, C, G and T, then the LSB (least significant bit) is set to 1 and the remaining 30 bit are filled with an hash value of the REF and ALT strings.
The hash value is calulated using a custom fast non-cryptographic algorithm based on MurmurHash3.
A lookup table is required to reverse the REF and ALT values.
In the normalized dbSNP VCF file GRCh37.p13.b150 there are only 0.365% (1229769 / 337162128) variants that requires this encoding. Amongst those, the maximum number of variants that share the same chromosome and position is 15. With 30 bit the probability of hash collision is approximately 10<sup>-7</sup> for 15 elements, 10<sup>-6</sup> for 46 and 10<sup>-5</sup> for 146.

* Reversible encoding
* **Reversible encoding**

If the total number of nucleotides between REF and ALT is 11 or less, and they only contain base letters A, C, G and T, then the LSB is set to 0 and the remaining 30 bit are used as follows:
* bit 1-4 indicate the number of bases in REF - the capacity of this section is 2^4=16; the maximum expected value is 10 dec = 1010 bin;
Expand Down
2 changes: 1 addition & 1 deletion VERSION
@@ -1 +1 @@
2.6.4
2.6.5
58 changes: 58 additions & 0 deletions dockerbuild.sh
@@ -0,0 +1,58 @@
#!/bin/sh
#
# dockerbuild.sh
#
# Build the software inside a Docker container
#
# @author Nicola Asuni <info@tecnick.com>
# ------------------------------------------------------------------------------

# NOTES:
# This script requires Docker

# EXAMPLE USAGE:
# CVSPATH=project VENDOR=vendorname PROJECT=projectname MAKETARGET=all ./dockerbuild.sh

# Get vendor and project name
: ${CVSPATH:=project}
: ${VENDOR:=vendor}
: ${PROJECT:=project}

# make target to execute
: ${MAKETARGET:=all}

# Name of the base development Docker image
DOCKERDEV=${VENDOR}/dev_${PROJECT}

# Build the base environment and keep it cached locally
docker build --pull --tag ${DOCKERDEV} --file ./resources/Docker/Dockerfile.dev ./resources/Docker/

# Define the project root path
PRJPATH=/root/src/${CVSPATH}/${PROJECT}

# Generate a temporary Dockerfile to build and test the project
# NOTE: The exit status of the RUN command is stored to be returned later,
# so in case of error we can continue without interrupting this script.
cat > Dockerfile.test <<- EOM
FROM ${DOCKERDEV}
RUN mkdir -p ${PRJPATH}
ADD ./ ${PRJPATH}
WORKDIR ${PRJPATH}
RUN make ${MAKETARGET} || (echo \$? > target/make.exit)
EOM

# Define the temporary Docker image name
DOCKER_IMAGE_NAME=${VENDOR}/build_${PROJECT}

# Build the Docker image
docker build --no-cache --tag ${DOCKER_IMAGE_NAME} --file Dockerfile.test .

# Start a container using the newly created Docker image
CONTAINER_ID=$(docker run -d ${DOCKER_IMAGE_NAME})

# Copy all build/test artifacts back to the host
docker cp ${CONTAINER_ID}:"${PRJPATH}/target" ./

# Remove the temporary container and image
docker rm -f ${CONTAINER_ID} || true
docker rmi -f ${DOCKER_IMAGE_NAME} || true
10 changes: 10 additions & 0 deletions resources/Docker/Dockerfile.dev
@@ -0,0 +1,10 @@
# Dockerfile
#
# Linux development environment
#
# Extend the tecnickcom/alldev image defined in
# https://github.com/tecnickcom/alldev
# ------------------------------------------------------------------------------

FROM tecnickcom/alldev
MAINTAINER info@tecnick.com

0 comments on commit d05c86f

Please sign in to comment.