This tool offers multiple options for filtering variants (in VCF files, relative to M. tuberculosis H37Rv coordinates).
It currently has 5 main modes:
- Filter by region. Mask out variants in certain regions. Region lists available as:
farhat_rlc
: Refined Low Confidence regions from Marin et alpe_ppe
: PE/PPE genes from Fishbein et al 2015tbprofiler
: TBProfiler list of antibiotic resistant genesmtbseq
: MTBseq list of antibiotic resistant genesuvp
: UVP list of repetitive loci in M. tuberculosis genome
- Filter by proximity to indels. Masks out variants within a certain distance (by default 5 bases) of an insertion or deletion site.
- Filter by percentage of alternate allele bases. Mask out variants with less than a minimum percentage (by default 90%) alternative alleles.
- Filter by depth of reads at a variant site. Masks out variants with less than a minimum depth of coverage (default 30) at the site
- Filter all non-SNV variants. Masks out variants that are not single nucleotide variants.
Filtering by (SAM/BAM) mapping quality was omitted because these filters are performed by the upstream workflow we (SANBI) currently use.
When used together the effects of the filters are added (i.e. a variant is masked out if it is masked by any of the filters).
The software is available via bioconda and can be installed with:
conda install tb_variant_filter
usage: tb_variant_filter [-h] [--region_filter REGION_FILTER]
[--close_to_indel_filter]
[--indel_window_size INDEL_WINDOW_SIZE]
[--min_percentage_alt_filter]
[--min_percentage_alt MIN_PERCENTAGE_ALT]
[--min_depth_filter] [--min_depth MIN_DEPTH]
[--snv_only_filter]
input_file [output_file]
Filter variants from a VCF file (relative to M. tuberculosis H37Rv)
positional arguments:
input_file VCF input file (relative to H37Rv)
output_file Output file (VCF format)
optional arguments:
-h, --help show this help message and exit
--region_filter REGION_FILTER, -R REGION_FILTER
--close_to_indel_filter, -I
Mask out single nucleotide variants that are too close
to indels
--indel_window_size INDEL_WINDOW_SIZE
Window around indel to mask out (mask this number of
bases upstream/downstream from the indel. Requires -I
option to selected)
--min_percentage_alt_filter, -P
Mask out variants with less than a given percentage
variant allele at this site
--min_percentage_alt MIN_PERCENTAGE_ALT
Variants with less than this percentage variants at a
site will be masked out
--min_depth_filter, -D
Mask out variants with less than a given depth of
reads
--min_depth MIN_DEPTH
Variants at sites with less than this depth of reads
will be masked out
--snv_only_filter Mask out variants that are not SNVs
To export a region (from the list of possible region masks) in BED format, use the tb_region_list_to_bed
command:
usage: tb_region_list_to_bed [-h] [--chromosome_name CHROMOSOME_NAME]
{farhat_rlc, mtbseq,pe_ppe,tbprofiler,uvp} [output_file]
Output region filter in BED format
positional arguments:
{mtbseq,pe_ppe,tbprofiler,uvp}
Name of region list
output_file File to write output to
optional arguments:
-h, --help show this help message and exit
--chromosome_name CHROMOSOME_NAME
Chromosome name to use in BED