Skip to content

Commit

Permalink
version 1.0.1
Browse files Browse the repository at this point in the history
  • Loading branch information
empiricalbayes authored and cran-robot committed Jan 26, 2024
0 parents commit 0e03033
Show file tree
Hide file tree
Showing 10 changed files with 4,873 additions and 0 deletions.
15 changes: 15 additions & 0 deletions DESCRIPTION
@@ -0,0 +1,15 @@
Package: SNVLFDR
Title: Empirical Bayes Single Nucleotide Variant Calling
Version: 1.0.1
Authors@R: c(person("Ali", "Karimnezhad", role = c("aut", "cre", "ctb"),
email = "ali.karimnezhad@gmail.com"))
Author: Ali Karimnezhad [aut, cre, ctb]
Maintainer: Ali Karimnezhad <ali.karimnezhad@gmail.com>
Description: Identifies single nucleotide variants in next-generation sequencing data by estimating their local false discovery rates. For more details, see Karimnezhad, A. and Perkins, T. J. (2024) <doi:10.1038/s41598-024-51958-z>.
Encoding: UTF-8
License: GPL (>= 3)
RoxygenNote: 7.2.3
NeedsCompilation: no
Packaged: 2024-01-24 16:32:08 UTC; alikarimnezhad
Repository: CRAN
Date/Publication: 2024-01-25 13:30:02 UTC
9 changes: 9 additions & 0 deletions MD5
@@ -0,0 +1,9 @@
dada0eda7b34d8b9bd5be5685d9930b5 *DESCRIPTION
158d41e1626c314614db731fd3b8d9c8 *NAMESPACE
07fac9673c049d7b71d7d14550ee489b *R/Source.R
aab1cbbcc25a49abb8025edcc369813e *README.md
bf689758e67890bc5237d84e05bb7567 *inst/extdata/bam_input.csv
0bac5d292ed444362d78b7c9b4a48f7c *inst/extdata/calls.vcf
85766890945f0af61eaffa34ea7688cf *inst/extdata/regions.bed
773dd98fef3d9691a2e883a524e96001 *man/get_LFDRs.Rd
c6b47fd680db7d09127c6b851c72bb49 *man/get_LFDRs_given_caller.Rd
6 changes: 6 additions & 0 deletions NAMESPACE
@@ -0,0 +1,6 @@
# Generated by roxygen2: do not edit by hand

export(get_LFDRs)
export(get_LFDRs_given_caller)
import(stats)
import(utils)
607 changes: 607 additions & 0 deletions R/Source.R

Large diffs are not rendered by default.

60 changes: 60 additions & 0 deletions README.md
@@ -0,0 +1,60 @@
# SNVLFDR
Empirical Bayes Single Nucleotide Variant Calling for Next Generation Sequencing Data

### LFDR-based Variant Calling
Identifies single nucleotide variants in next-generation sequencing data by estimating their local false discovery rates. For more details, see <https://doi.org/10.1038/s41598-024-51958-z>.


## Installation

The package can be installed from the GitHub repository
```r
devtools::install_github("empiricalbayes/SNVLFDR")
```

## Getting Started
You can load SNVLFDR as follows:

```r
library(SNVLFDR)
```

## How to run SNVLFDR to call variants on an example data.
```r
bam_input <- system.file("extdata", "bam_input.csv", package="SNVLFDR")
bedfile <- system.file("extdata", "regions.bed", package="SNVLFDR")
BQ.T=20
MQ.T=20
pi0.initial=0.95
AF.T=0.01
DP.T=10
LFDR.T=0.01
error=NULL
method='empirical'
epsilon=0.01
output=get_LFDRs(bam_input,bedfile,BQ.T,MQ.T,pi0.initial,AF.T,DP.T,LFDR.T,error,method,epsilon)

#Estimated LFDRs
output$estimated.LFDRs

#Estimated proportion on non-mutant sites
output$estimated.pi0

#Filtered Bam matrix that includes estimated LFDRs
output$filtered.bam


## How to run SNVLFDR to prioritize variants called by another variant caller
bam_path <- system.file("extdata", "bam_input.csv", package="SNVLFDR")
calls_path <- system.file("extdata", "calls.vcf", package="SNVLFDR")
output=get_LFDRs_given_caller(bam_input=bam_path,calls=calls_path,LFDR.T=0.01,error=NULL)

#Updated VCF file that includes estimated LFDRs
output$updated.vcf


```


## References
Karimnezhad, A and Perkins, T.J. (2024) Empirical Bayes single nucleotide variant-calling for next-generation sequencing data. Scientific Reports 14, 1550, <https://doi.org/10.1038/s41598-024-51958-z>
1,000 changes: 1,000 additions & 0 deletions inst/extdata/bam_input.csv

Large diffs are not rendered by default.

92 changes: 92 additions & 0 deletions inst/extdata/calls.vcf
@@ -0,0 +1,92 @@
##fileformat=VCFv4.2
##FILTER=<ID=base_qual,Description="alt median base quality">
##FILTER=<ID=clustered_events,Description="Clustered events observed in the tumor">
##FILTER=<ID=contamination,Description="contamination">
##FILTER=<ID=duplicate,Description="evidence for alt allele is overrepresented by apparent duplicates">
##FILTER=<ID=fragment,Description="abs(ref - alt) median fragment length">
##FILTER=<ID=germline,Description="Evidence indicates this site is germline, not somatic">
##FILTER=<ID=haplotype,Description="Variant near filtered variant on same haplotype.">
##FILTER=<ID=low_allele_frac,Description="Allele fraction is below specified threshold">
##FILTER=<ID=map_qual,Description="ref - alt median mapping quality">
##FILTER=<ID=multiallelic,Description="Site filtered because too many alt alleles pass tumor LOD">
##FILTER=<ID=n_ratio,Description="Ratio of N to alt exceeds specified ratio">
##FILTER=<ID=normal_artifact,Description="artifact_in_normal">
##FILTER=<ID=numt_chimera,Description="NuMT variant with too many ALT reads originally from autosome">
##FILTER=<ID=numt_novel,Description="Alt depth is below expected coverage of NuMT in autosome">
##FILTER=<ID=orientation,Description="orientation bias detected by the orientation bias mixture model">
##FILTER=<ID=panel_of_normals,Description="Blacklisted site in panel of normals">
##FILTER=<ID=position,Description="median distance of alt variants from end of reads">
##FILTER=<ID=slippage,Description="Site filtered due to contraction of short tandem repeat region">
##FILTER=<ID=strand_bias,Description="Evidence for alt allele comes from one read direction only">
##FILTER=<ID=strict_strand,Description="Evidence for alt allele is not represented in both directions">
##FILTER=<ID=weak_evidence,Description="Mutation does not meet likelihood threshold">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=AF,Number=A,Type=Float,Description="Allele fractions of alternate alleles in the tumor">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=F1R2,Number=R,Type=Integer,Description="Count of reads in F1R2 pair orientation supporting each allele">
##FORMAT=<ID=F2R1,Number=R,Type=Integer,Description="Count of reads in F2R1 pair orientation supporting each allele">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">
##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=PS,Number=1,Type=Integer,Description="Phasing set (typically the position of the first variant in the set)">
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
##GATKCommandLine=<ID=FilterMutectCalls,CommandLine="FilterMutectCalls --output /global/online/ohri/ohri1/projects/gapp_ali/analysis/mutect2/TDNA1_filtered.vcf --variant /global/online/ohri/ohri1/projects/gapp_ali/analysis/mutect2/TDNA1_unfiltered.vcf --reference /global/online/ohri/ohri1/projects/gapp/rawdata/genomes/hg19_chrs.fa --threshold-strategy OPTIMAL_F_SCORE --f-score-beta 1.0 --false-discovery-rate 0.05 --initial-threshold 0.1 --mitochondria-mode false --max-events-in-region 2 --max-alt-allele-count 1 --unique-alt-read-count 0 --min-median-mapping-quality 30 --min-median-base-quality 20 --max-median-fragment-length-difference 10000 --min-median-read-position 1 --max-n-ratio Infinity --min-reads-per-strand 0 --autosomal-coverage 0.0 --max-numt-fraction 0.85 --min-allele-fraction 0.0 --contamination-estimate 0.0 --log-snv-prior -13.815510557964275 --log-indel-prior -16.11809565095832 --log-artifact-prior -2.302585092994046 --normal-p-value-threshold 0.001 --min-slippage-length 8 --pcr-slippage-rate 0.1 --distance-on-haplotype 100 --long-indel-length 5 --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false",Version="4.1.2.0",Date="August 26, 2022 1:03:37 EDT PM">
##GATKCommandLine=<ID=Mutect2,CommandLine="Mutect2 --callable-depth 10 --minimum-allele-fraction 0.01 --output /global/online/ohri/ohri1/projects/gapp_ali/analysis/mutect2/TDNA1_unfiltered.vcf --intervals /global/online/ohri/ohri1/projects/gapp/rawdata/trusight170/TST170_DNA_target.bed --input /global/online/ohri/ohri1/projects/gapp_ali/analysis/bwa_rmdup/TDNA1_sorted_rmdups.bam --reference /global/online/ohri/ohri1/projects/gapp/rawdata/genomes/hg19_chrs.fa --f1r2-median-mq 50 --f1r2-min-bq 20 --f1r2-max-depth 200 --genotype-pon-sites false --genotype-germline-sites false --af-of-alleles-not-in-resource -1.0 --mitochondria-mode false --tumor-lod-to-emit 3.0 --initial-tumor-lod 2.0 --pcr-snv-qual 40 --pcr-indel-qual 40 --max-population-af 0.01 --downsampling-stride 1 --max-suspicious-reads-per-alignment-start 0 --normal-lod 2.2 --ignore-itr-artifacts false --gvcf-lod-band -2.5 --gvcf-lod-band -2.0 --gvcf-lod-band -1.5 --gvcf-lod-band -1.0 --gvcf-lod-band -0.5 --gvcf-lod-band 0.0 --gvcf-lod-band 0.5 --gvcf-lod-band 1.0 --genotype-filtered-alleles false --disable-adaptive-pruning false --dont-trim-active-regions false --max-disc-ar-extension 25 --max-gga-ar-extension 300 --padding-around-indels 150 --padding-around-snps 20 --kmer-size 10 --kmer-size 25 --dont-increase-kmer-sizes-for-cycles false --allow-non-unique-kmers-in-ref false --num-pruning-samples 1 --min-dangling-branch-length 4 --recover-all-dangling-branches false --max-num-haplotypes-in-population 128 --min-pruning 2 --adaptive-pruning-initial-error-rate 0.001 --pruning-lod-threshold 2.302585092994046 --max-unpruned-variants 100 --debug-assembly false --debug-graph-transformations false --capture-assembly-failure-bam false --error-correct-reads false --kmer-length-for-read-error-correction 25 --min-observations-for-kmer-to-be-solid 20 --likelihood-calculation-engine PairHMM --base-quality-score-threshold 18 --pair-hmm-gap-continuation-penalty 10 --pair-hmm-implementation FASTEST_AVAILABLE --pcr-indel-model CONSERVATIVE --phred-scaled-global-read-mismapping-rate 45 --native-pair-hmm-threads 4 --native-pair-hmm-use-double-precision false --bam-writer-type CALLED_HAPLOTYPES --dont-use-soft-clipped-bases false --min-base-quality-score 10 --smith-waterman JAVA --emit-ref-confidence NONE --max-mnp-distance 1 --min-assembly-region-size 50 --max-assembly-region-size 300 --assembly-region-padding 100 --max-reads-per-alignment-start 50 --active-probability-threshold 0.002 --max-prob-propagation-distance 50 --force-active false --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false --max-read-length 2147483647 --min-read-length 30 --minimum-mapping-quality 20 --disable-tool-default-annotations false --enable-all-annotations false",Version="4.1.2.0",Date="August 26, 2022 12:18:06 EDT PM">
##INFO=<ID=CONTQ,Number=1,Type=Float,Description="Phred-scaled qualities that alt allele are not due to contamination">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##INFO=<ID=ECNT,Number=1,Type=Integer,Description="Number of events in this haplotype">
##INFO=<ID=GERMQ,Number=1,Type=Integer,Description="Phred-scaled quality that alt alleles are not germline variants">
##INFO=<ID=MBQ,Number=R,Type=Integer,Description="median base quality">
##INFO=<ID=MFRL,Number=R,Type=Integer,Description="median fragment length">
##INFO=<ID=MMQ,Number=R,Type=Integer,Description="median mapping quality">
##INFO=<ID=MPOS,Number=A,Type=Integer,Description="median distance from end of read">
##INFO=<ID=NALOD,Number=A,Type=Float,Description="Negative log 10 odds of artifact in normal with same allele fraction as tumor">
##INFO=<ID=NCount,Number=1,Type=Integer,Description="Count of N bases in the pileup">
##INFO=<ID=NLOD,Number=A,Type=Float,Description="Normal log 10 likelihood ratio of diploid het or hom alt genotypes">
##INFO=<ID=OCM,Number=1,Type=Integer,Description="Number of alt reads whose original alignment doesn't match the current contig.">
##INFO=<ID=PON,Number=0,Type=Flag,Description="site found in panel of normals">
##INFO=<ID=POPAF,Number=A,Type=Float,Description="negative log 10 population allele frequencies of alt alleles">
##INFO=<ID=ROQ,Number=1,Type=Float,Description="Phred-scaled qualities that alt allele are not due to read orientation artifact">
##INFO=<ID=RPA,Number=.,Type=Integer,Description="Number of times tandem repeat unit is repeated, for each allele (including reference)">
##INFO=<ID=RU,Number=1,Type=String,Description="Tandem repeat unit (bases)">
##INFO=<ID=SEQQ,Number=1,Type=Integer,Description="Phred-scaled quality that alt alleles are not sequencing errors">
##INFO=<ID=STR,Number=0,Type=Flag,Description="Variant is a short tandem repeat">
##INFO=<ID=STRANDQ,Number=1,Type=Integer,Description="Phred-scaled quality of strand bias artifact">
##INFO=<ID=STRQ,Number=1,Type=Integer,Description="Phred-scaled quality that alt alleles in STRs are not polymerase slippage errors">
##INFO=<ID=TLOD,Number=A,Type=Float,Description="Log 10 likelihood ratio score of variant existing versus not existing">
##INFO=<ID=UNIQ_ALT_READ_COUNT,Number=1,Type=Integer,Description="Number of ALT reads with unique start and mate end positions at a variant site">
##MutectVersion=2.2
##contig=<ID=chr1,length=249250621>
##contig=<ID=chr2,length=243199373>
##contig=<ID=chr3,length=198022430>
##contig=<ID=chr4,length=191154276>
##contig=<ID=chr5,length=180915260>
##contig=<ID=chr6,length=171115067>
##contig=<ID=chr7,length=159138663>
##contig=<ID=chr8,length=146364022>
##contig=<ID=chr9,length=141213431>
##contig=<ID=chr10,length=135534747>
##contig=<ID=chr11,length=135006516>
##contig=<ID=chr12,length=133851895>
##contig=<ID=chr13,length=115169878>
##contig=<ID=chr14,length=107349540>
##contig=<ID=chr15,length=102531392>
##contig=<ID=chr16,length=90354753>
##contig=<ID=chr17,length=81195210>
##contig=<ID=chr18,length=78077248>
##contig=<ID=chr19,length=59128983>
##contig=<ID=chr20,length=63025520>
##contig=<ID=chr21,length=48129895>
##contig=<ID=chr22,length=51304566>
##contig=<ID=chrX,length=155270560>
##contig=<ID=chrY,length=59373566>
##contig=<ID=chrM,length=16571>
##filtering_status=These calls have been filtered by FilterMutectCalls to label false positives with a list of failed filters and true positives with PASS.
##source=FilterMutectCalls
##source=Mutect2
##tumor_sample=SAMPLE
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
chr1 11174415 . G A . PASS CONTQ=93;DP=1544;ECNT=1;GERMQ=72;MBQ=20,20;MFRL=127,133;MMQ=60,60;MPOS=25;POPAF=7.30;SEQQ=93;STRANDQ=93;TLOD=1536.87 GT:AD:AF:DP:F1R2:F2R1:SB 0/1:838,661:0.440:1499:366,297:465,356:409,429,299,362
chr1 11184544 . C T . PASS CONTQ=93;DP=1862;ECNT=1;GERMQ=93;MBQ=0,20;MFRL=0,137;MMQ=60,60;MPOS=24;POPAF=7.30;SEQQ=93;STRANDQ=93;TLOD=5303.00 GT:AD:AF:DP:F1R2:F2R1:SB 0/1:0,1794:0.999:1794:0,876:0,888:0,0,954,840

0 comments on commit 0e03033

Please sign in to comment.