seq_typing

Determines which reference sequence is more likely to be present in a given sample

Rational
Input requirements
Dependencies
- Install dependencies
Install seq_typing
Usage
Outputs
- seq_typing.py
- ecoli_stx_subtyping.py
Citation
Contact

Rational

seq_typing is a software to determine a given sample type using either a read mapping approach or a sequence Blast search against a set of reference sequences.
For the read mapping approach, the sample's reads are mapped to the given reference sequences using Bowtie2, parsed with Samtools and analysed via ReMatCh. Based on the length of the sequence covered and it's depth of coverage, seq_typing returns the type associated with the reference sequence which is more likely to be present. The selected sequence will be the one covered to a greater extent and with higher depth of coverage, that passes defined thresholds.
For the Blast approach (when using sequences fasta files) the sequence selected, for each DB sequence, the best Blast hit is retrieved. The best hit is defined by the largest alignment length, highest similarity, and lowest E-value and number of gaps (applied hierarchically following the order here described).
In both cases, manual curation and sequence type definition is required for reference sequences database production.

#sequence_type	reference_file	type	sequence	sequenced_covered	coverage_depth	sequence_identity	query	q_start	q_end	s_start	s_end	evalue
selected	O_type.fasta	O26	wzy_192_AF529080_O26	100.0	281.95405669599216	100.0	NA	NA	NA	NA	NA	NA
selected	H_type.fasta	H11	fliC_269_AY337465_H11	99.4546693933197	51.76490747087046	99.86291980808772	NA	NA	NA	NA	NA	NA
other_probable_type	O_type.fasta	O26	wzx_208_AF529080_O26	100.0	223.3072050673001	100.0	NA	NA	NA	NA	NA	NA
other_probable_type	H_type.fasta	H11	fliC_276_AY337472_H11	98.84117246080436	37.52551724137931	99.86206896551724	NA	NA	NA	NA	NA	NA

#sequence_type	reference_file	type	sequence	sequenced_covered	coverage_depth	sequence_identity	query	q_start	q_end	s_start	s_end	evalue
selected	1_GenotypesDENV_14-05-18.fasta	3-III	gb:EU529683	...	Subtype:3-III	Host:Human	seqTyping_3-III	100.0	1	99.223	NODE_1_length_10319_cov_2021.782660	138
other_probable_type	1_GenotypesDENV_14-05-18.fasta	1-V	gb:GQ868570	...	Subtype:1-V	Host:Human	seqTyping_1-V	100.0	1	99.479	NODE_2_length_10199_cov_229.028848	13
other_probable_type	1_GenotypesDENV_14-05-18.fasta	4-II	gb:GQ868585	...	Subtype:4-II	Host:Human	seqTyping_4-II	100.0	1	99.38	NODE_4_length_10182_cov_29.854132	13

#sequence_type	reference_file	type	sequence	sequenced_covered	coverage_depth	sequence_identity	query	q_start	q_end	s_start	s_end	evalue
selected	1_virulence_db.stx1_subtyping.fasta	stx1a	stx1A:15:AF461168:A:seqTyping_stx1a	100.0	65.37447257383967	100.0	NA	NA	NA	NA	NA	NA
selected	2_virulence_db.stx2_subtyping.fasta	stx2c	stx2B:15:AB071845:C:seqTyping_stx2c	100.0	19.377777777777776	100.0	NA	NA	NA	NA	NA	NA
other_probable_type	1_virulence_db.stx1_subtyping.fasta	stx1c	stx1B:11:AB071620:C:seqTyping_stx1c	100.0	21.64814814814815	99.25925925925925	NA	NA	NA	NA	NA	NA
other_probable_type	1_virulence_db.stx1_subtyping.fasta	stx1a	stx1B:14:AM230663:A:seqTyping_stx1a	100.0	45.06666666666667	100.0	NA	NA	NA	NA	NA	NA
other_probable_type	2_virulence_db.stx2_subtyping.fasta	stx2c	stx2B:10:EF441604:C:seqTyping_stx2c	100.0	17.2	99.25925925925925	NA	NA	NA	NA	NA	NA
other_probable_type	2_virulence_db.stx2_subtyping.fasta	stx2d	stx2B:11:FM998840:D:seqTyping_stx2d	100.0	9.996296296296297	99.62962962962963	NA	NA	NA	NA	NA	NA

Files

README.md

Latest commit

History

README.md

File metadata and controls

seq_typing

Rational

Input requirements

Dependencies

Install dependencies

Install seq_typing

Usage

General use

General info

index module

reads module

blast module

assembly module

Organisms typing

Usage examples

Reads

Assemblies

E. coli stx subtyping

General usage

ecoli_stx_subtyping Reads

ecoli_stx_subtyping Assembly

Blast

Update stx references

Container

Outputs

seq_typing.py

ecoli_stx_subtyping.py

Citation

Contact