Some QUIck Rearranging to Resolve Evolutionary Links
squirrel <your-sequences.fasta>
squirrel <your-sequences.fasta> --run-phylo --outgroups outgroup_id1,outgroup_id2
Note: the sequence file you provide must have the specified outgroups in it, with the IDs matching those you provide. This pipeline can accept one or more outgroup IDs.
Squirrel maps each query genome in the input file against the NC_063383 reference genome using minimap2. It then trims to 190788 at the end of the genome to mask out one of the ITR regions and pads the end of the genome with N
. It performs masking (replacement with N
) on low-complexity or repetitive regions, defined here. The masking can be toggled on and off.
Using gofasta, the map file is then converted into a multiple sequence alignment.
Squirrel by default creates a single alignment fasta file. Using the genbank coordinates for NC_063383 it also has the ability to extract the aligned coding sequences either as separate records or as a concatenated alignment. This can facilitate codon-aware phylogenetic or sequence analysis.
Squirrel also has an optional --run-phylo
mode that will take the newly generated alignment and build a maximum likelihood phylogeny using iqtree. It runs iqtree for ancestral state reconstruction too, and parses the output files providing a branch-mapped summary of SNPs that have occurred across the phylogeny, and an output phylogeny figure with SNPs plotted along branches, coloured by whether SNPs are consistent with APOBEC3-editing or not. An outgroup (or multiple outgroups) must be specified to ensure correct rooting for the ancestral state reconstruction.
Clade I
- KJ642617,KJ642615,KJ642616
Clade IIb
- KJ642617,KJ642615
- Clone this repository and
cd squirrel
conda env create -f environment.yml
conda activate squirrel
pip install .
Type (in the squirrel environment):
squirrel -v
and you should see the versions of squirrel.
usage: squirrel <input> [options]
squirrel: Some QUIck Rearranging to Resolve Evolutionary Links
optional arguments:
-h, --help show this help message and exit
Input-Output options:
input Input fasta file of sequences to analyse.
-o OUTDIR, --outdir OUTDIR
Output directory. Default: current working directory
--outfile OUTFILE Optional output file name. Default: <input>.aln.fasta
--tempdir TEMPDIR Specify where you want the temp stuff to go. Default: $TMPDIR
--no-temp Output all intermediate files, for dev purposes.
Pipeline options:
--no-mask Skip masking of repetitive regions. Default: masks repeat regions
--no-itr-mask Skip masking of end ITR. Default: masks ITR
--extract-cds Extract coding sequences based on coordinates in the reference
--concatenate Concatenate coding sequences for each genome, separated by `NNN`. Default: write out as separate records
-p, --run-phylo Run phylogenetic reconstruction pipeline
--outgroups OUTGROUPS
Specify which MPXV outgroup(s) in the alignment to use in the phylogeny. These will get pruned out from the final tree.
Misc options:
-v, --version show program's version number and exit
--verbose Print lots of stuff to screen
-t THREADS, --threads THREADS
Number of threads