Project: Transcriptome Annotation of Equus caballus
User Manual
03-713: Bioinformatics Data Integration Practicum
Team-2: Taylor Ayers, Tao Luo, Lilin Huang, Sarah Oladejo
git clone https://github.com/luotao9728/annotation
This directory contains files:
- start_pipeline.sh
- build_index.sh
- pipeline.sh
- annotation.yml
- README.md
- Make sure Anaconda3 is installed on your computer
- Make sure in your current working environment has the following packages:
- sickle-trim (Trim illumina short reads)
- LoRDEC (Fix long reads by short reads)
- hisat2 (Short RNA-seq Alignment)
- minimap2 (Long RNA-seq Alignment)
- seqtk (Convert FASTA and FASTQ format)
- SamTools (Sort and Convert sam to bam)
- StringTie (Annotation)
- Alternatively, you could directly create a new working conda environment using the following command (make sure you have annotation.yml file in your working directory):
conda env create -n annotation --file annotation.yml
conda activate annotation
Requirements for input files
- Reference genome: fasta/fna
- illumina RNA-seq (forward/reverse): fastq
- PacBio RNA-seq: fastq
- Reference annotation: gff
- Keyword: name of this annotation The input files should be in the annotation directory
- Make sure you have the environment (with all packages) ready.
- Download the input files into the cloned directory.
- Execute the command and follow the prompt:
bash start_pipeline.sh
- Follow the instructions to enter the file names.
- Be patient. The annotation process may take a long time. Have a great day! :)