Skip to content

aineniamh/squirrel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

squirrel

Some QUIck Rearranging to Resolve Evolutionary Links

Generate a quick mpox alignment

squirrel <your-sequences.fasta>

Run reconstruction

squirrel <your-sequences.fasta> --run-phylo --outgroups outgroup_id1,outgroup_id2

Note: the sequence file you provide must have the specified outgroups in it, with the IDs matching those you provide. This pipeline can accept one or more outgroup IDs.

How it works - alignment

Squirrel maps each query genome in the input file against the NC_063383 reference genome using minimap2. It then trims to 190788 at the end of the genome to mask out one of the ITR regions and pads the end of the genome with N. It performs masking (replacement with N) on low-complexity or repetitive regions, defined here. The masking can be toggled on and off. Using gofasta, the map file is then converted into a multiple sequence alignment.

Squirrel by default creates a single alignment fasta file. Using the genbank coordinates for NC_063383 it also has the ability to extract the aligned coding sequences either as separate records or as a concatenated alignment. This can facilitate codon-aware phylogenetic or sequence analysis.

How it works - phylogeny & reconstruction

Squirrel also has an optional --run-phylo mode that will take the newly generated alignment and build a maximum likelihood phylogeny using iqtree. It runs iqtree for ancestral state reconstruction too, and parses the output files providing a branch-mapped summary of SNPs that have occurred across the phylogeny, and an output phylogeny figure with SNPs plotted along branches, coloured by whether SNPs are consistent with APOBEC3-editing or not. An outgroup (or multiple outgroups) must be specified to ensure correct rooting for the ancestral state reconstruction.

Recommended outgroups for phylogeny mode

Clade I

  • KJ642617,KJ642615,KJ642616

Clade IIb

  • KJ642617,KJ642615

Installation

  1. Clone this repository and cd squirrel
  2. conda env create -f environment.yml
  3. conda activate squirrel
  4. pip install .

Check the install worked

Type (in the squirrel environment):

squirrel -v

and you should see the versions of squirrel.

Full usage

usage: squirrel <input> [options]

squirrel: Some QUIck Rearranging to Resolve Evolutionary Links

optional arguments:
  -h, --help            show this help message and exit

Input-Output options:
  input                 Input fasta file of sequences to analyse.
  -o OUTDIR, --outdir OUTDIR
                        Output directory. Default: current working directory
  --outfile OUTFILE     Optional output file name. Default: <input>.aln.fasta
  --tempdir TEMPDIR     Specify where you want the temp stuff to go. Default: $TMPDIR
  --no-temp             Output all intermediate files, for dev purposes.

Pipeline options:
  --no-mask             Skip masking of repetitive regions. Default: masks repeat regions
  --no-itr-mask         Skip masking of end ITR. Default: masks ITR
  --extract-cds         Extract coding sequences based on coordinates in the reference
  --concatenate         Concatenate coding sequences for each genome, separated by `NNN`. Default: write out as separate records
  -p, --run-phylo       Run phylogenetic reconstruction pipeline
  --outgroups OUTGROUPS
                        Specify which MPXV outgroup(s) in the alignment to use in the phylogeny. These will get pruned out from the final tree.

Misc options:
  -v, --version         show program's version number and exit
  --verbose             Print lots of stuff to screen
  -t THREADS, --threads THREADS
                        Number of threads

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages