<a href="https://colab.research.google.com/github/camgenomicmedicine/GMO7-Jupyter/blob/main/NGS.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

When you start a colab session you are starting a new virtual machine. If we want to use notebooks for reproducbile research we need to be able to set up all the programs and import all the data we need for our analysis each time we begin. This might seem like a lot of work - but it is worth doing. Why? Because it makes our research reproducible. There are a couple of things we can do

In [None]:
#@title Set up conda on notebook
%%capture
! wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.2-Linux-x86_64.sh 
! chmod +x Miniconda3-py37_4.8.2-Linux-x86_64.sh 
! bash ./Miniconda3-py37_4.8.2-Linux-x86_64.sh -b -f -p /usr/local 
import sys 
sys.path.append('/usr/local/lib/python3.7/site-packages/')


In [None]:
#@title Install the programs you need
%%capture
# FastQC is a program designed to spot potential problems in high throughput sequencing datasets. 
!conda install -c bioconda fastqc -y
# Multiqc can aggregate and summarize all the QC data and alignment log data in one file 
!pip install multiqc
# Trimmomatic: A flexible read trimming tool for Illumina NGS data 
! conda install -c bioconda trimmomatic -y
# Kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads 
! conda install -c bioconda kallisto -y

In [None]:
# get the link to all files from SRA Explorer 
!curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRRXXX/006/SRR8668XXX/SRR8668XXX.fastq.gz -o SRR8668XXX\_GSM3639XXX\_skin\_HS2\_Homo\_sapiens\_RNA-Seq.fastq.gz

In [None]:
# Pre-alignment QA 
!fastqc \*.fastq.gz
# Trimming 
!trimmomatic PE -phred33 R1.fastq R2.fastq R1\_paired.fq.gz R1\_unpaired.fq.gz R2\_paired.fq.gz R2\_unpaired.fq.gz ILLUMINACLIP:contams\_forward\_rev.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
# Pre-alignment Multiqc summary file 
!multiqc .
## alignment 
!kallisto quant -i index -o output pairA\_1.fastq pairA\_2.fastq

In [None]:
# create a zip file from multiqc folder and download it to your local drive 
!zip -r ./<zipFileName>.zip <directoryPath>