<a href="https://colab.research.google.com/github/DCEG-workshops/statgen_workshop_tutorial/blob/main/src/09_functionalGenomics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Set up

Important: We want to mount the google drive for the data neeed for this workshop. Please open this [link](https://colab.research.google.com/corgiredirector?site=https%3A%2F%2Fdrive.google.com%2Fdrive%2Ffolders%2F1rui3w4tok2Z7EhtMbz6PobeC_fDxTw7G%3Fusp%3Dsharing) with your Google drive and find the "statgen_workshop" folder under "Share with me". Then add a shortcut to the folder under "My Drive", as shown in the screenshot.

In [None]:
from google.colab import drive
drive.mount('/content/drive/')

Let's look at the input data directory

In [None]:
!ls /content/drive/MyDrive/statgen_workshop/data/workshop9

Set up the path variables

In [None]:
import os
analysis_dir="/content/09_analysis/"
input_dir="/content/drive/MyDrive/statgen_workshop/data/workshop9"
os.environ['analysis_dir']=analysis_dir
os.environ['input_dir']=input_dir

Let's clone the tutorial repo

In [None]:
%%bash
git clone https://github.com/DCEG-workshops/statgen_workshop_tutorial.git

Let's install cutadapt

In [None]:
!pip install cutadapt

See if we can run cutadapt

In [None]:
!cutadapt

Let's install minimap2

In [None]:
%%bash
git clone https://github.com/lh3/minimap2
cd minimap2 && make

See if we can run minimap2

In [None]:
! ./minimap2/minimap2

Let's install samtools

In [None]:
%%bash
wget https://github.com/samtools/samtools/releases/download/1.17/samtools-1.17.tar.bz2 && \
	tar jxf samtools-1.17.tar.bz2 && \
	rm samtools-1.17.tar.bz2 && \
	cd samtools-1.17 && \
	./configure --prefix $(pwd) && \
	make

See if we can run samtools

In [None]:
! ./samtools-1.17/samtools

Add minimap2 and samtools executables to the PATH

In [None]:
os.environ['PATH'] += ":/content/samtools-1.17/:/content/minimap2"

# Extract regions from the assemblies

Take a look at script 1

In [None]:
%%bash
cat statgen_workshop_tutorial/src/09_functionalGenomics/script1_toRetrieveGenomeRegion_HPRC_DCEG_GSTM.sh

Run script 1

In [None]:
%%bash
bash statgen_workshop_tutorial/src/09_functionalGenomics/script1_toRetrieveGenomeRegion_HPRC_DCEG_GSTM.sh

Take a look at the file generated

In [None]:
%%bash
ls ${analysis_dir}/retrievedRegions/hg38_chr1_109655000_109742000/fasta_files/

# Aligning with hg38, generating BAM files for visualization in IGV

Take a look at script 2

In [None]:
!cat statgen_workshop_tutorial/src/09_functionalGenomics/script2_toAlignFastaToReference_HPRC_DCEG_GSTM.sh

Run script 2

In [None]:
%%bash
bash statgen_workshop_tutorial/src/09_functionalGenomics/script2_toAlignFastaToReference_HPRC_DCEG_GSTM.sh

Is the bam generated?

In [None]:
! ls /content/09_analysis//retrievedRegions/hg38_chr1_109655000_109742000/bam_files/HPRC.cb.bam

let's use igv to visualize it, first we will install igv-notebook

In [None]:
pip install igv-notebook

Run igv notebook

In [None]:
import igv_notebook
igv_notebook.init()
igv_browser= igv_notebook.Browser(
    {
        "genome": "hg38",
        "locus": "chr1:109655000-109742000",
        "tracks": [{
            "name": "BAM",
            "path": "/content/09_analysis//retrievedRegions/hg38_chr1_109655000_109742000/bam_files/HPRC.cb.bam",
            "indexPath": "/content/09_analysis//retrievedRegions/hg38_chr1_109655000_109742000/bam_files/HPRC.cb.bam.bai",
            "format": "bam",
            "type": "alignment"
        }]
    }
)