# Germline BrevDev Blueprint

NVIDIA Parabricks® is the only GPU-accelerated computational genomics toolkit that delivers fast and accurate analysis for sequencing centers, clinical teams, genomics researchers, and next-generation sequencing instrument developers. Parabricks provides GPU-accelerated versions of tools used every day by computational biologists and bioinformaticians—enabling significantly faster runtimes, workflow scalability, and lower compute costs.

The toolkit includes full compatibility with workflow languages and managers (WDL, NextFlow, Cromwell) to easily intertwine GPU- and CPU-powered tasks, as well as support for easy cloud deployment (AWS, GCP, Terra, and DNAnexus).

[Workflow diagram]

## Dataset

This data is / comes from ...

In [None]:
! wget -O parabricks_sample.tar.gz "https://s3.amazonaws.com/parabricks.sample/parabricks_sample.tar.gz"

! tar xzvf parabricks_sample.tar.gz

In [None]:
! tree parabricks_sample

[01;34mdata[0m
└── [01;34moutput[0m

1 directory, 0 files


Add an output directory

## Alignment

In [None]:
%%sh

DATA_DIR="parabricks_sample/Data"
REF_DIR="parabricks_sample/Ref"
REF="${REF_DIR}/Homo_sapiens_assembly38.fasta"
KNOWN_SITES="${REF_DIR}/Homo_sapiens_assembly38.known_indels.vcf.gz"
FASTQ_1="${DATA_DIR}/sample_1.fq.gz"
FASTQ_2="${DATA_DIR}/sample_2.fq.gz"

docker run --gpus all --rm \
    -v `pwd`:`pwd` \
    ${DOCKER_IMAGE} pbrun germline \
    --ref ${REF} \
    --in-fq ${FASTQ_1} ${FASTQ_2} \
    --knownSites ${KNOWN_SITES} \
    --out-bam ${DATA_DIR}/sample.bam \
    --out-recal-file ${DATA_DIR}/sample.recal.txt


## Variant Calling

In [None]:
%%sh

DATA_DIR="parabricks_sample/Data"
REF_DIR="parabricks_sample/Ref"
REF="${REF_DIR}/Homo_sapiens_assembly38.fasta"
KNOWN_SITES="${REF_DIR}/Homo_sapiens_assembly38.known_indels.vcf.gz"
FASTQ_1="${DATA_DIR}/sample_1.fq.gz"
FASTQ_2="${DATA_DIR}/sample_2.fq.gz"
OUT_BAM="$(basename -s .fq.gz $FASTQ_1).bam"

docker run --gpus all --rm \
    -v `pwd`:`pwd` \
    ${DOCKER_IMAGE} pbrun haplotypecaller \
    --ref ${REF} \
    --in-bam ${DATA_DIR}/sample.bam \
    --in-recal-file ${DATA_DIR}/sample.recal.txt \
    --out-variants ${DATA_DIR}/sample.vcf

## Check Accuracy

In [None]:
%%sh

EVAL_VCF=""
TRUTH_VCF=""
TRUTH_BED=""
OUT_FILE="$(basename -s .vcf $EVAL_VCF).output"

/opt/hap.py/bin/hap.py \
    /data/${TRUTH_VCF} \
    /data/${EVAL_VCF} \
    -f /data/${TRUTH_BED} \
    -r /data/ref/ucsc.hg19.fasta \
    -o /data/${OUT_FILE} \
    --engine=vcfeval \
    --pass-only

## Next Steps