Genomics Pipeline

Overview

Genomic Pipeline is a software used for analyzing DNA sequence data.

Our pipeline provides multiple algorithms for every step of sequencing analyses and allows user to compare these alternative algorithms by intermediary outputs. It starts from a raw FASTQ file (which is the text file produced by sequencing machines) and ends with a VCF and TXT files (detailed files of discovered SNPs/Indels). It includes different types of mapping, pre-processing (such as GATK Best Practices), variant calling and variant annotation algorithms. A very brief workflow of our basic pipeline can be seen in below figure.

Getting Started

Prerequisites

UNIX/Linux
Python 3 or higher
Java
Picard
GenomeAnalysisToolkit (GATK 3.5-0)
VarScan.v2.3.9
Samtools
Annovar

Getting your clone

$git clone https://github.com/MBaysanLab/GenomicPipeline

If you do not have git you can download zipped Genomic Pipeline here.

Files

If input files are tumor sample, FASTQ file must be gzipped and following convention; SampleName_SampleNumber_Lane_Read_FlowCellIndex.fastq.gz (i.e. SN01_AACGTGA_L001_R1_001.fastq.gz).

If input files are normal sample, FASTQ file must be gzipped and following convention; SampleName_Germline_SampleNumber_Lane_Read_FlowCellIndex.fastq.gz (i.e. SN02_Germline_ACGGTGA_L001_R1_001.fastq.gz).

Usage

After you install relative tools and add them to your system path, you must to edit path.py file with corresponding directories.

Pipeline is divided to 2 independent sub-pipeline;

One for mapping and pre-processing. For this pipeline you just need to edit main function of run_pipeline_mapping.py file.
- Then run the python file in an ide or write below command
```
python run_pipeline_mapping.py
```
Second is variant calling and variant annotation. For this pipeline you just need to edit main function of run_pipeline_variant_calling.py file
- Then run the python file in an ide or write below command
```
python run_pipeline_variant_calling.py
```

You can now

Map each samples and align them to according reference genome with BWA and Bowtie2 algorithms
Make pre process steps for variant calling with SAMTools and GATK Best Practices
Variant call with tumor and normal sample in order to find mutations and variants on sample with Varscan and Mutect2.
Annotate variants and find mutations in corresponding genes, past studies and get statistic for them with Annovar.

Important Notes

When start to use pipeline, map your germline/normal samples before than tumor files because you will need output of the germline and tumor files in variant calling together.

Contact

If you have questions or need help using the pipeline you can contact us via e-mail

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
images		images
README.md		README.md
cnv.py		cnv.py
gatk_pre_processing.py		gatk_pre_processing.py
helpers.py		helpers.py
log_command.py		log_command.py
look_logs.py		look_logs.py
look_logs.pyc		look_logs.pyc
mapping.py		mapping.py
paths.py		paths.py
pipe_jobs.py		pipe_jobs.py
pon_creation.py		pon_creation.py
pre_processing.py		pre_processing.py
qc_trim.py		qc_trim.py
run_pipeline_mapping.py		run_pipeline_mapping.py
run_pipeline_variant_calling.py		run_pipeline_variant_calling.py
split_by_chr.py		split_by_chr.py
variant_annotation.py		variant_annotation.py
variant_calling.py		variant_calling.py

MBaysanLab/GenomicsPipeline

Folders and files

Latest commit

History

Repository files navigation

Genomics Pipeline

Overview

Getting Started

Prerequisites

Getting your clone

Files

Usage

Important Notes

Contact

About

Resources

Stars

Watchers

Forks

Languages