Skip to content

Latest commit

 

History

History
96 lines (77 loc) · 4.27 KB

README.md

File metadata and controls

96 lines (77 loc) · 4.27 KB

tb_variant_filter

tb variant filter build status

This tool offers multiple options for filtering variants (in VCF files, relative to M. tuberculosis H37Rv coordinates).

It currently has 5 main modes:

  1. Filter by region. Mask out variants in certain regions. Region lists available as:
    1. farhat_rlc: Refined Low Confidence regions from Marin et al
    2. pe_ppe: PE/PPE genes from Fishbein et al 2015
    3. tbprofiler: TBProfiler list of antibiotic resistant genes
    4. mtbseq: MTBseq list of antibiotic resistant genes
    5. uvp: UVP list of repetitive loci in M. tuberculosis genome
  2. Filter by proximity to indels. Masks out variants within a certain distance (by default 5 bases) of an insertion or deletion site.
  3. Filter by percentage of alternate allele bases. Mask out variants with less than a minimum percentage (by default 90%) alternative alleles.
  4. Filter by depth of reads at a variant site. Masks out variants with less than a minimum depth of coverage (default 30) at the site
  5. Filter all non-SNV variants. Masks out variants that are not single nucleotide variants.

Filtering by (SAM/BAM) mapping quality was omitted because these filters are performed by the upstream workflow we (SANBI) currently use.

When used together the effects of the filters are added (i.e. a variant is masked out if it is masked by any of the filters).

Installation

The software is available via bioconda and can be installed with:

conda install tb_variant_filter

Usage

usage: tb_variant_filter [-h] [--region_filter REGION_FILTER]
                         [--close_to_indel_filter]
                         [--indel_window_size INDEL_WINDOW_SIZE]
                         [--min_percentage_alt_filter]
                         [--min_percentage_alt MIN_PERCENTAGE_ALT]
                         [--min_depth_filter] [--min_depth MIN_DEPTH]
                         [--snv_only_filter]
                         input_file [output_file]

Filter variants from a VCF file (relative to M. tuberculosis H37Rv)

positional arguments:
  input_file            VCF input file (relative to H37Rv)
  output_file           Output file (VCF format)

optional arguments:
  -h, --help            show this help message and exit
  --region_filter REGION_FILTER, -R REGION_FILTER
  --close_to_indel_filter, -I
                        Mask out single nucleotide variants that are too close
                        to indels
  --indel_window_size INDEL_WINDOW_SIZE
                        Window around indel to mask out (mask this number of
                        bases upstream/downstream from the indel. Requires -I
                        option to selected)
  --min_percentage_alt_filter, -P
                        Mask out variants with less than a given percentage
                        variant allele at this site
  --min_percentage_alt MIN_PERCENTAGE_ALT
                        Variants with less than this percentage variants at a
                        site will be masked out
  --min_depth_filter, -D
                        Mask out variants with less than a given depth of
                        reads
  --min_depth MIN_DEPTH
                        Variants at sites with less than this depth of reads
                        will be masked out
  --snv_only_filter     Mask out variants that are not SNVs

To export a region (from the list of possible region masks) in BED format, use the tb_region_list_to_bed command:

usage: tb_region_list_to_bed [-h] [--chromosome_name CHROMOSOME_NAME]
                             {farhat_rlc, mtbseq,pe_ppe,tbprofiler,uvp} [output_file]

Output region filter in BED format

positional arguments:
  {mtbseq,pe_ppe,tbprofiler,uvp}
                        Name of region list
  output_file           File to write output to

optional arguments:
  -h, --help            show this help message and exit
  --chromosome_name CHROMOSOME_NAME
                        Chromosome name to use in BED