# Germline Analysis Blueprint 

This notebook shows how to run Germline analysis on WES data. 

## Dataset

The data set used in this lab is an exome NA12878 from the [NIH](https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data_indexes/NA12878/sequence.index.NA12878_Illumina_HiSeq_Exome_Garvan_fastq_09252015) sequenced on Illumina. The two fastq files and the reference files can be downloaded using the `download_data.sh` script. 

In [None]:
! ./download_data.sh

It will take around 15 minutes to download and organize the data. 

In [None]:
! ls data

## Alignment

In this step we will run BWA alignment using [Parabricks fq2bam](https://docs.nvidia.com/clara/parabricks/latest/documentation/tooldocs/man_fq2bam.html) tool. 

In [None]:
%%sh

DOCKER_IMAGE="nvcr.io/nvidia/clara/clara-parabricks:4.4.0-1"

DATA_DIR="$PWD/data"

REF="ref/Homo_sapiens_assembly38.fasta"
FASTQ_1="NIST7035_TAAGGCGA_L001_R1_001.fastq.gz"
FASTQ_2="NIST7035_TAAGGCGA_L001_R2_001.fastq.gz"
OUT_BAM="NIST7035_TAAGGCGA_L001_R1_001.bam"

docker run --gpus all --rm \
    -v ${DATA_DIR}:${DATA_DIR} \
    -w ${DATA_DIR} \
    ${DOCKER_IMAGE} pbrun fq2bam \
    --ref ${REF} \
    --in-fq ${FASTQ_1} ${FASTQ_2} \
    --out-bam ${OUT_BAM}

Looking in the data folder now, we now see the generated bam files. 

In [None]:
! ls data | grep .bam


## Variant Calling

In this step we will run [Parabricks DeepVariant](https://docs.nvidia.com/clara/parabricks/latest/documentation/tooldocs/man_deepvariant.html). Since we are using exomes, we must include the `--use-wes-model` flag. 

In [None]:
%%sh

DOCKER_IMAGE="nvcr.io/nvidia/clara/clara-parabricks:4.4.0-1"

DATA_DIR="$PWD/data"

REF="ref/Homo_sapiens_assembly38.fasta"
IN_BAM="NIST7035_TAAGGCGA_L001_R1_001.bam"
OUT_VCF="NIST7035_TAAGGCGA_L001_R1_001.vcf"

docker run --gpus all --rm \
    -v ${DATA_DIR}:${DATA_DIR} \
    -w ${DATA_DIR} \
    ${DOCKER_IMAGE} pbrun deepvariant \
    --ref ${REF} \
    --in-bam ${IN_BAM} \
    --out-variants ${OUT_VCF} \
    --use-wes-model

Looking in the data folder now, we now see the generated vcf files. 

In [None]:
! ls data | grep .vcf

## Next Steps

Try running Germline analysis on your own data. 