GitHub - B-UMMI/ReMatCh: Reads mapping against target sequences, checking mapping and consensus sequences production

Reads mapping against target sequences, checking mapping and consensus sequences production

Rational

ReMatCh was designed to map HTS reads onto a set of reference sequences in order to determine whether those sequences are present or absent in each sample, and to identify any variation compared to the reference. ReMatCh determines if a sequence is present or absent based on the proportion of reference sequence length covered by at least a predefined number of reads; and the sequence similarity. It relies mainly on the strength of high read numbers to correctly identify two types of variants: SNPs and INDELs. However, when a position does not meet the criteria for being unambiguously called, ReMatCh will designate it as a potential heterozygous position. In order to correctly identify variants over the entire length of a target region, references containing additional sequences flanking the region of interest can be provided and will grant a scaffold for proper read mapping. Moreover, to avoid errors when calling a position due to improper read mapping resulting from divergence between the allele in the genome of interest and the reference sequence, ReMatCh has the option to be executed in double run mode, in which the resulting consensus sequences are used as reference sequences in a second run, thereby facilitating read mapping. ReMatCh can use locally stored sequence data, but it can also directly interact with the ENA database and download the read files from sample/run accession numbers provided by the user or all data associated with a given taxon name. A ReMatCh module was designed to get the MLST sequence type from HTS reads. Using a provided MLST curated schema with flanking regions or an allele for each MLST loci obtained from PubMLST database (https://pubmlst.org) as reference sequences, the consensus sequences produced by ReMatCh are compared to the ones found in PubMLST database for allele scoring and ST determination. ReMatCh software dependencies are: Bowtie2 (Langmead and Salzberg 2012) for read mapping, Samtools (Li et al. 2009) for sam/bam manipulation and variant calling and Bcftools (Li 2011) for consensus sequence production. These software dependencies are provided together with ReMatCh to facilitate the installation and guarantee that the users have the correct versions. Besides the parallelization implemented within Bowtie2 and Samtools, ReMatCh assigns one sequence variant analysis and coverage determination to each available thread.

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
ReMatCh		ReMatCh
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

License

B-UMMI/ReMatCh

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Rational

Dependencies

Installation

Input

Reference

Samples

Usage

Usage Examples

Running ReMatCh Beginner

Using local samples for provided reference file

Running ReMatCh Moderate

Using specific ENA sequencing data for provided reference file

Using ENA sequencing data of a given taxon for provided reference file

Running ReMatCh Advanced

MultiLocus Sequence Typing for local samples

MultiLocus Sequence Typing for ENA list of IDs or taxon

Outputs

Citation

Contact

About

Resources

License

Stars

Watchers

Forks

Languages