Sources of Error
Source of False Positives. If a known virus A
is present at a high read-count, then things like sequencing error, biological artifacts and mis-mapping will result in a small fraction of reads being assigned to a related, but not the ideal sequence (B
and C
). The distance (in nt- or aa-substitutions) from the virus in the sequencing library may be in the "known range" to virus A
, and in the unknown range to virus B
and C
.
Often this falls well below the level of "noise", but in libraries with high viral read-counts (10,000s), this may lead to an appreciable signal in neighboring viruses.
The best way to mitigate this issue is to consider a higher level of the hierarchy for locating novel viruses. For instance instead of asking "Find me a novel PCV2-related sequence". You first ask "Find a novel Circovirus sequence." and then sub-set those results to "Which of those libraries is the best-available match PCV2."
Source of False Negatives. Alignment scatter occurs when a library-sequence is "between" the sequences from two operational taxonomic units (OTU). When providing summary statistics at the level of OTU/Family, this in effect "dilutes" divergent reads across categories. A virus may be sufficiently abundant to warrant further investigation yet be reported as rare/incomplete. An interesting but probably hard to detect case would be chimeric sequences.
- Accessing Serratus Data
- Data Types
- Sequence References
- Running Serratus
- Finding Novel Viruses (tutorials)
- Papers using Serratus
- Containers
- Summarizer usage
- Cloud Budgeting
- Data Policy
- Serratus Annotation
- Serratus Assembly
- FLOM reference
- Proteome guided assembly for high divergence low coverage genomes
- PRICE de novo assembly
- Taxonomy case study of misannotated Genbank entry
- Taxonomy prediction
- Tree placement and taxonomy data
- Design of nucleotide summarizer scoring and-depth estimates
- Cov-phylogenetic-tree-quality-by-monophylicity
- Running a Treesearch
- Finding-transfer---recombination-events-in-the-Spike-protein
- Serratus SQL Database Management
- Trees-and-alignments-for-Ribovirus-orders-and-families
- Viral-contigs-containing-RdRP