Snakemake for SNPs: A pipeline for calling SNPs and quantify them in an unbiased manner

Snakemake for SNPs is a flexible and user-friendly SNPs analysis workflow.

Snakemake for SNPs can be applied to both model and non-model organisms. It supports mapping RNA-Seq raw reads to the reference genome (can be downloaded from public database or can be homemade by users) and it can do both Allele Specific Expression for SNPs and obtain Differential Expressed Genes (DEGs), which in turn can be cross between them. It requires basic python programming skill for use. If you're beginner at programming, just jump on the config file and adapt it to your experiments!

If you use our pipeline you need to cite us:

WARNING: adapt the citation to our link:

NOTE: This pipeline is created in Linux and other platforms may not work out accurately.

Workflow

The usage of this workflow is described in the Snakemake Workflow Catalog.

Quick start

Clone the repository:

#git clone https://github.com/AylaScientist/Snakemake_for_SNPs.git

Create the environment:

conda create -n pipeline python=3.7

Activate the environment:

conda activate pipeline

Installation

Install the packages including the bio tools:

pip install git+https://github.com/snakemake/snakemake

conda install -c bioconda snakemake-wrapper-utils

conda install -c bioconda trimmomatic=0.39

conda install -c bioconda fastqc=0.11.9

conda install -c bioconda star=2.7.10a

conda install -c bioconda htseq=0.11.3

conda install -c bioconda picard=2.26

conda install -c bioconda gatk4=4.2.5.0

conda install -c bioconda samtools=1.16

conda install -c bioconda bcftools=1.16

conda install -c bioconda vcftools=0.1.16

conda install -c bioconda htslib=1.16

conda install -c anaconda perl=5.26.2

conda install -c anaconda pandas

conda install -c anaconda scipy

conda install -c anaconda statsmodels

conda install -c anaconda seaborn

conda install -c conda-forge matplotlib

conda install -c conda-forge py-bgzip

First test

Set the resources of the system in the file config.

gedit ~/Snakemake_for_SNPs/config/config_main.yaml

Now that the resources are adapted to your computer, run a dry run for the pipeline with the example data to build a dag of jobs

cd ~/Snakemake_for_SNPs/workflow/

snakemake -n

If this point doesn't work, please contact me: ayla.bcn@gmail.com

Run the pipeline with the desired resources.

This is an example for 4 threads at 4GB.

snakemake --use-conda --cores 4

Set up configuration for your personal project

Customize the workflow based on your need in the next file:

./config/config_main.yaml.

In this file you should also change the species and the different databases for gene/transcript/protein/GO_function/KEGG correct annotation and mining of the data

Modify the metafiles describing your data and the experiment:

config/Experimental_design.csv

config/Experimental_groups.csv

config/Sample_names.csv

config/Samples_MAE.csv

config/samples.csv

Please note that the column names on the file "Experimental_groups.csv" should be called "Group_1" and "Group_2" for applying the Chi-square test.

For configuring your pseudogenomes

You need to chose two samples from different groups, preferably one sample from the control group and one sample from a treatment group. The SNPs from these samples will be used to construct the pseudogenomes. The codes of these two samples in the example are GF6 and KS4. In order to create the pseudogenomes of your experiment, these codes should be substituted in the next files, including the file name of the *colnames.csv files:

config/Pseudogenome_codes.csv

config/tbGF6_colnames.csv

config/tbKS4_colnames.csv

Very important: ADD the genome or transcriptome of your species! Here we have the genome of the Nile Tilapia in the folder genome in the root of the git.

Evaluation

The pipeline for SNPs has been evaluated on 4 datasets including 2 non-model organism (Nile and Mozambique tilapias). WARNING: Put here the link to the article

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.snakemake/log		.snakemake/log
config		config
workflow		workflow
.gitignore		.gitignore
Figure 1 Pipeline white background.png		Figure 1 Pipeline white background.png
LICENSE.md		LICENSE.md
README.md		README.md
Snakefile_backup		Snakefile_backup
Snakemake_for_SNPs.code-workspace		Snakemake_for_SNPs.code-workspace

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.snakemake/log

.snakemake/log

config

config

workflow

workflow

.gitignore

.gitignore

Figure 1 Pipeline white background.png

Figure 1 Pipeline white background.png

LICENSE.md

LICENSE.md

README.md

README.md

Snakefile_backup

Snakefile_backup

Snakemake_for_SNPs.code-workspace

Snakemake_for_SNPs.code-workspace

Repository files navigation

Snakemake for SNPs: A pipeline for calling SNPs and quantify them in an unbiased manner

Workflow

Quick start

Installation

First test

Run the pipeline with the desired resources.

Set up configuration for your personal project

For configuring your pseudogenomes

Evaluation

About

Releases

Packages

Languages

License

AylaScientist/Snakemake_for_SNPs

Folders and files

Latest commit

History

Repository files navigation

Snakemake for SNPs: A pipeline for calling SNPs and quantify them in an unbiased manner

Workflow

Quick start

Installation

First test

Run the pipeline with the desired resources.

Set up configuration for your personal project

For configuring your pseudogenomes

Evaluation

About

Resources

License

Stars

Watchers

Forks

Languages