Skip to content

frichter/ore

Repository files navigation

ORE: Outlier-RV enrichment

ORE identifies outlier genes with more rare variants than expected by chance (and vice-versa). Paper in Bioinformatics.

Cursory use of ORE (outlier-RV enrichment) is provided here, visit the latest ORE documentation for more details. Confirm the following are installed:

Then, on the command line, install with

pip install ore

Example run

ore --vcf test.vcf.gz \
    --bed test.bed.gz \
    --output ore_results \
    --distribution normal \
    --threshold 2 3 4 \
    --max_outliers_per_id 500 \
    --af_rare 0.05 0.01 1e-3 \
    --tss_dist 5000

Variants and gene expression are specified with --vcf (line 1) and --bed (line 2), respectively. The output prefix is provided with --output (line 3). In this example, the outlier specifications --distribution (line 4), --threshold (line 5), and --max_outliers_per_id (line 6) indicate that outliers are defined using a normal distribution with a z-score more extreme than two, and samples with more than 500 outliers are excluded. Variant information is specified with --af_rare (line 7) and --tss_dist (line 8) to encode that variants are defined as rare with a intra-cohort allele frequency at varying thresholds (≤ 0.05, 0.01, and 0.001), and to only use variants within 5 kb of the TSS.

Usage, visit the latest ORE documentation for more

ore [-h] [--version] -v VCF -b BED [-o OUTPUT]
         [--outlier_output OUTLIER_OUTPUT] [--enrich_file ENRICH_FILE]
         [--extrema] [--distribution {normal,rank,custom}]
         [--threshold [THRESHOLD [THRESHOLD ...]]]
         [--max_outliers_per_id MAX_OUTLIERS_PER_ID]
         [--af_rare [AF_RARE [AF_RARE ...]]] [--af_vcf]
         [--intracohort_rare_ac INTRACOHORT_RARE_AC]
         [--af_min [AF_MIN [AF_MIN ...]]] [--gq GQ] [--dp DP]
         [--aar AAR AAR] [--tss_dist [TSS_DIST [TSS_DIST ...]]] [--upstream]
         [--downstream] [--annovar]
         [--variant_class {intronic,intergenic,exonic,UTR5,UTR3,splicing,upstream,ncRNA,ncRNA_exonic}]
         [--exon_class {nonsynonymous,intergenic,nonframeshift,frameshift,stopgain,stoploss}]
         [--refgene] [--ensgene] [--annovar_dir ANNOVAR_DIR]
         [--humandb_dir HUMANDB_DIR] [--processes PROCESSES] [--clean_run]
Required arguments:

-v VCF, --vcf VCF Location of VCF file. Must be tabixed! -b BED, --bed BED Gene expression file location. Must be tabixed!

Optional file locations:
-o OUTPUT, --output OUTPUT

Output prefix (default is VCF prefix)

--outlier_output OUTLIER_OUTPUT

Outlier filename (default is VCF prefix)

--enrich_file ENRICH_FILE

Output file for enrichment odds ratios and p-values (default is VCF prefix)

Optional outlier arguments:

--extrema Only the most extreme value is an outlier --distribution DISTRIBUTION Outlier distribution. Options: {normal,rank,custom} --threshold THRESHOLD Expression threshold for defining outliers. Must be greater than 0 for normal or (0,0.5) non-inclusive with rank. Ignored with custom --max_outliers_per_id MAX_OUTLIERS_PER_ID Maximum number of outliers per ID

Optional variant-related arguments:
--af_rare AF_RARE

AF cut-off below which a variant is considered rare (space separated list e.g., 0.1 0.05)

--af_vcf Use the VCF AF field to define an allele as rare. --intracohort_rare_ac INTRACOHORT_RARE_AC Allele COUNT to be used instead of intra-cohort allele frequency. (still uses af_rare for population level AF cut-off) --af_min AF_MIN Lower bound on AF cut-offs for --af_rare, must be same length as --af_rare (e.g., with --af_rare 0.01 0.5 and --af_min 0 0.05 ORE will compare variants within [0,0.01] and [0.05,0.5] to other variants). --gq GQ Minimum genotype quality each variant in each individual --dp DP Minimum depth per variant in each individual --aar AAR Alternate allelic ratio for heterozygous variants (provide two space-separated numbers between 0 and 1, e.g., 0.2 0.8) --tss_dist TSS_DIST Variants within this distance of the TSS are considered --upstream Only variants UPstream of TSS --downstream Only variants DOWNstream of TSS

Optional arguments for using ANNOVAR:
--annovar Use ANNOVAR to specify allele frequencies and

functional class

--variant_class

Only variants in these classes will be considered. Options: {intronic,intergenic,exonic,UTR5,UTR3,splicing,upstream,ncRNA}

--exon_class

Only variants with these exonic impacts will be considered. Options: {nonsynonymous,intergenic,nonframeshift,frameshift,stopgain,stoploss}

--refgene Filter on RefGene function. --ensgene Filter on ENSEMBL function. --annovar_dir ANNOVAR_DIR Directory of the table_annovar.pl script --humandb_dir HUMANDB_DIR Directory of ANNOVAR data (refGene, ensGene, and gnomad_genome)

optional arguments:

-h, --help show this help message and exit --version show program's version number and exit --processes PROCESSES Number of CPU processes --clean_run Delete temporary files from the previous run

Felix Richter <felix.richter@icahn.mssm.edu>