# Setup a working enviroment for running Clairvoyante

The `$HOME/parepare_data.sh` will install the python packages that are needed for a working envrionment to run Clairvoyante. It will download the data for variant calling and training dataset from the DNAnexus project. 

Note that this script does a `git clone` the latest version of Clairvoyante at run time.  If there is some update of the source code in GitHub, we might have to update the code in this notebook to make sure it will still function correctly.


The content of `$HOME/parepare_data.sh`.
```
#!/bin/bash
sudo /anaconda2/bin/pip install intervaltree blosc --no-cache
sudo apt-get install -q samtools
wget -q https://bootstrap.pypa.io/get-pip.py
sudo -H pypy get-pip.py
sudo -H pypy -m pip install -q blosc --no-cache
sudo -H pypy -m pip install -q intervaltree --no-cache

export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin
export PYTHONPATH="/usr/share/dnanexus/lib/python2.7/site-packages:/usr/share/dnanexus/lib/python2.7/site-packages:/usr/share/dnanexus/lib/python2.7/site-packages:/usr/share/dnanexus/lib/python2.7/site-packages:"
source $HOME/environment
dx select $DX_PROJECT_CONTEXT_ID
git clone --depth=1 https://github.com/aquaskyline/Clairvoyante.git
cd Clairvoyante
dx download $DX_PROJECT_CONTEXT_ID:/cv_data/trainedModels.tbz
tar -jxf trainedModels.tbz

dx download $DX_PROJECT_CONTEXT_ID:/cv_data/training.tar.gz
tar -zxf training.tar.gz

dx download $DX_PROJECT_CONTEXT_ID:/cv_data/testingData.tar.gz
tar -zxf testingData.tar.gz
```


Original data URLs
```
http://www.bio8.cs.hku.hk/trainedModels.tbz
https://www.dropbox.com/s/twxe6kyv6k3owz4/training.tar.gz
https://www.dropbox.com/s/0s26oru0hut9edc/testingData.tar.gz
```

## Excute the `prepare_data.sh`. 

This may take a while.

In [None]:
%%bash
bash $HOME/prepare_data.sh

# Call variants

## Call variants from at known variant sites using a BAM file and a trained model

In [None]:
%%bash
export PATH=/anaconda2/bin:$PATH
cd Clairvoyante/training
python ../clairvoyante/callVarBam.py \
       --chkpnt_fn ../trainedModels/fullv3-illumina-novoalign-hg001+hg002-hg38/learningRate1e-3.epoch500 \
       --bam_fn ../testingData/chr21/chr21.bam \
       --ref_fn ../testingData/chr21/chr21.fa \
       --call_fn tensor_can_chr21.vcf \
       --ctgName chr21
head -100 tensor_can_chr21.vcf

## Call variants from the tensors of candidate variant and a trained model

In [None]:
%%bash
export PATH=/anaconda2/bin:$PATH
cd Clairvoyante/training
python ../clairvoyante/callVar.py --chkpnt_fn ../trainedModels/fullv3-illumina-novoalign-hg001+hg002-hg38/learningRate1e-3.epoch500 --tensor_fn tensor_can_chr21 --call_fn tensor_can_chr21.vcf
head -100 tensor_can_chr21.vcf