Skip to content

Pipeline for the identification of iSNVs from NGS data

Notifications You must be signed in to change notification settings

alexarmerov/SARS-CoV-2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

SARS-CoV-2 is a new pathogen with devastating consequences globally. The convergence of new sequencing technologies and bioinformatics tools have made it possible to continuously monitor the diversity of this virus. However, this monitoring has focused on consensus sequence variants. Less attention has been given to variants that occur at a low frequency, called intra-host single-nucleotide variants (iSNVs). We developed a bioinformatics pipeline for the identification of iSNVs in next-generation sequencing (NGS) data.

The pipeline identifies synonymous and nonsynonymous iSNVs and their respective frequencies in a viral gene. The characterization of iSNVs in coding sequences is based on the identification of codons supported by filtered reads according to various quality criteria. This approach has the advantage of taking into account the sequence context in which the variant emerges. In the pipeline, some of the scripts from the VirVarSeq toolkit are used for the identification iSNVs based on codons.

The pipeline was developed to identify iSNVs in paired-end short Illumina reads. The pipeline has been used mainly in sequences obtained with the ARTIC system of amplicon-based sequencing. We have also tested it on sequences obtained with the shotgun approach.

Pipeline

The pipeline was developed with Snakemake, which allows analysis scalability and reproducibility. The figure below describes the main steps of the pipeline. There are two main stages in the pipeline. In the first one, a consensus sequence is obtained from the alignment of the reads to the reference sequence of SARS-CoV-2. Then the reads are realigned to the consensus sequence and the iSNVs are identified.

Documentation

  1. Installation
  2. Pipeline configuration
  3. Running the pipeline
  4. Results
  5. Troubleshooting

Getting Help

If you identify a bug or other kind of problem with the pipeline, please open an issue

Citations

This pipeline was developed for our analyzes of SARS-CoV-2 iSNVs. If you use this pipeline, you could cite the following publication:

Armero, A., Berthet, N., & Avarre, J. C. (2021). Intra-Host Diversity of SARS-Cov-2 Should Not Be Neglected: Case of the State of Victoria, Australia. Viruses, 13(1), 133. https://doi.org/10.3390/v13010133.

About

Pipeline for the identification of iSNVs from NGS data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published