Software code to reproduce the analyses in
Valles, S. M., Rivers, A. R. 2019. Nine new RNA viruses associated with the fire ant Solenopsis invicta from its native range. Submitted manuscript.
Four main types of analysis run in this paper.
- Processing the raw metagenomic data into contigs.
- Primer walking, 5' and 3' RACE to close the genomes. (not in this repository)
- Mapping reads to the completed genomes and plotting the distributions.
- The construction of Maximum likelihood phylogenetic trees.
The metagenomics workflow followed these steps.
- Remove sequencing contaminants with BBduk
- Trim adapters with BBduk
- Mask repetitive sequences in the S. invicta genome with RepeatMasker
- Index the S. invicta genome using BBmap
- Remove S. invicta reads from the samples using BBsplit
- Combine all samples and assemble a combined metagenome with Spades
- Identify contigs with Diamond vs NR.
- Identify viral contigs with Megan using the Diamond output (not included in repo)
Note: many steps in the work flow were run using the SLURM scheduler. The runall.sh files omits the
sbatchcommand for job submission to make the workflow portable, however without a large memory machine some steps may not run.
- Map all 8 trimmed libraries to the manually closed genomes with BBmap
- Combine the summary data from BBmap with a python Scripts
- Create Figure 1. using an R script
- Align the polyproteins with Mafft
- Select Phylogenetically informative regions with TrimAL
- Create a maximum likelihood tree using RAxML
- Format and annotate the data using ETE Toolkit
To facilitate reproducible research We have created a conda environment containing the software necessary to reproduce the analyses.
conda env create --name valles_rivers_2018 --file create_fire_ant_conda_env.yml source activate valles_rivers_2018