Repository for the annotation pipeline used @ PGSB. Tested on plant genomes.
Most dependencies should be covered by the provided conda environment
- install anaconda from anaconda.org
- install bioconda (see bioconda)
- clone repository
git clone https://github.com/PGSB-HMGU/plant.annot.git
- cd into plant.annot directory
cd plant.annot
- create plant.annot environment using the provided yaml file
conda env create --file=plant.annot.yaml
- activate environment
conda activate plant.annot
- download and install Transdecoder 3.0.1 from Transdecoder
- download transposon database PTREP from Hypothetical TREP protein sequences
- download reference proteins from uniprot
- download reference proteins form closely related species
- protein sequences must include STOP amino acids as ('*')
- build hisat2 index
- build gmap index
- split large chromosomes into single files to speed up gth
- edit config.yaml
- define ISOseq data as described in config.yaml
- define reference proteins as described in config.yaml
- define RNAseq data as described in config.yaml
- review executable section in config.yaml
- perform a dry run
snakemake final_files -np
- run locally
snakemake final_files --cores cores
- run on a cluster
snakemake final_files --cluster "qsub"