Skip to content

Latest commit

 

History

History
67 lines (29 loc) · 1.92 KB

workflow.md

File metadata and controls

67 lines (29 loc) · 1.92 KB

1. removing human, bacteria and host contamination using BBDuk:

% bbduk.sh in=read1.fastqsanger.fastq in2=read2.fastqsanger.fastq ref=hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz,fusedEPmasked2.fa.gz,moll_masked.fa out=clean1.fq out2=clean2.fq outm=matched.fq k=31 -Xmx150g

Input -- 100424418 reads

Total Removed -- 2287450 reads

Result -- 98136968 reads

2. Decontamination from rRNA using BBmap:

% bbduk.sh -Xmx100g in=clean1.fq in2=clean2.fq ref=tepm_finally/SILVA_132_SSURef_tax_silva.fasta,tepm_finally/SILVA_132_LSURef_tax_silva.fasta out=finally_cleaned1.fq out2=finally_cleaned2.fq outm=matched.fq k=31 -Xmx150g

Input -- 98136968 reads

Removed -- 2903390 reads (2.96%)

Result -- 95233578 reads

3. Adapters and unpaired reads removing:

% java -jar trimmomatic-0.36.jar PE clean1.fq clean2.fq clean1-P.fq clean1-U.fq clean2-P.fq clean2-U.fq TRAILING:20 LEADING:20 SLIDINGWINDOW:4:20 ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 MINLEN:30

Input -- 95233578 reads

Removed -- 8947812 reads

Result -- 86285766 reads

4. Assembly using rnaSPAdes:

% spades.py --rna -1 clean1.fq -2 clean2.fq -t 30 -m 300 -o ./assembly

5. Quality assesment using rnaQUAST

% python rnaQUAST.py -c transcripts.fasta --busco_lineage metazoa_odb9 -o ../qualuty_assesment --gene_mark --blat

Short summary report

== BASIC TRANSCRIPTS METRICS ==

Transcripts -- 122671

Transcripts > 500 bp -- 25381

Transcripts > 1000 bp -- 13952

== BUSCO METRICS ==

Complete -- 80.982

Partial -- 8.793

== GeneMarkS-T METRICS ==

Predicted genes -- 19470