Skip to content

AnnaSyme/organelle-assembly

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 

organelle-assembly

A script to assemble a plastid or mitochondrial genome from long and short reads.

To run:

conda activate bio
bash assembler.sh -b baits.fasta -g 160000 -s 40000000 R1.fasta R2.fasta nano.fq.gz

Inputs:

  • -b baits file, e.g. gene sequences of related species
  • -g expected genome size
  • -s target bases (for Filtlong - e.g. coverage (250) x genome size)
  • R1 and R2 illumina reads, already trimmed and filtered (see fastp.sh script)
  • Nanopore reads, raw

How it works (abridged):

  • Uses a baits file to extract nanopore organelle reads (e.g. mitochondrial or chloroplast) from all the sequencing reads (e.g. nuclear, mitochondrial, chloroplast)
  • Assembles these reads (Flye); polishes assembly (Racon)
  • Uses this assembly as the new baits file to re-extract nanopore organelle reads
  • Assembles (Flye); polishes assembly (Racon)
  • Uses this assembly as baits to extract illumnina organelle reads
  • Uses these reads to polish assembly (Pilon)
  • Further assemblies for comparison: Raven, Miniasm, Unicycler
  • Reports read and assembly stats

Where are the results?

  • The results folder has assemblies, assembly graphs, extracted reads, read-mapping bam files, and read/assembly stats.
  • The script run and screen output is saved as logfile.txt

Tools:

Installed with conda.

minimap2
samtools
filtlong
flye
racon
raven
fastp
rasusa
bwa
pilon
unicycler
miniasm
minipolish
mummer
seqkit

See packagae-list.txt for full details.

Test data

Here are some links to test data.