ClinPrior

ClinPrior is an interactome-driven prioritization method that predicts the patient's disease causal variant based on the description of the phenotype in HPO terms. This prioritization is divided into two steps:

Gene prioritization: computation of a phenotypic metric by comparing the patient's phenotype with HPO-Gene associations from existing human disease databases (prior knowledge) and iterative propagation of this phenotypic score within of a multilayer network with physical and functional interactions.
Variant prioritization: filtering and calculation of a variant deleteriousness score from WES or WGS sequences in a vcf (variant calling format) file. The last step returns a ranked list with the best variants that can explain the patient's phenotype at the top.

If you find this code useful in your clinical genomics analysis, please consider citing: ClinPrior publication in Genome Medicine

ClinPrior pipeline

First, the algorithm calculates the phenotypic association metric for each gene in the phenotypic layer based on the patient’s phenotype and known HPO-gene associations. The multilayer network is built from different data resources. The phenotypic layer reports HPO-gene associations, the physical layer reports physical protein‒protein interactions (PPIs) and the functional layer provides coexpression, signalling or metabolic pathway, and protein domain associations. The method propagates the phenotypic metric in adjacent nodes of the network so that higher scores indicate a better phenotypic fit with the patient. Variants resulting from patient genomic sequencing are filtered by frequency, variant impact and mode of inheritance. With this method, new candidate genes not previously associated with disease can also be identified thanks to the propagation of the phenotypic metric through neighbourhood connections.

Diagnostic yield in a patient real-world cohort

The diagnostic yield in WES and WGS in a real-world cohort with 135 families affected by hereditary spastic paraplegia (HSP) and/or cerebellar ataxia (CA).

Installation

You can install the development version from GitHub with:

install.packages("devtools")
devtools::install_github("aschluter/ClinPrior")
library(ClinPrior)

Download the human dataset for assembly GRCh38

dir.create(paste(system.file("extdata", package = "ClinPrior"),"/assembly38",sep=""))
library(piggyback)
pb_download(repo = "aschluter/ClinPrior",
            tag = "v2.0-assemblyGRCh38",
            dest = system.file("extdata/assembly38", package = "ClinPrior"))
            
            
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("BSgenome.Hsapiens.UCSC.hg38")

Download the human dataset for assembly GRCh37

dir.create(paste(system.file("extdata", package = "ClinPrior"),"/assembly37",sep=""))
library(piggyback)
pb_download(repo = "aschluter/ClinPrior",
            tag = "v2.0-assemblyGRCh37",
            dest = system.file("extdata/assembly37", package = "ClinPrior"))
            
            
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("BSgenome.Hsapiens.UCSC.hg19")

Gene prioritization

Gene prioritization based on user-provided HPOs.

HP code	Description
HP:0004481	Progressive macrocephaly
HP:0002376	Developmental regression
HP:0001257	Spasticity
HP:0001250	Seizure
HP:0000238	Hydrocephalus
HP:0002922	Increased CSF protein
HP:0000365	Hearing impairment

HPOpatient = c("HP:0004481","HP:0002376","HP:0001257","HP:0001250","HP:0000238","HP:0002922","HP:0000365")

Y<-proteinScore(HPOpatient)

Score propagation

Propagation of the phenotypic score in the physical and functional interactomes.

ClinPriorGeneScore<-MatrixPropagation(Y,alpha=0.2)

head(ClinPriorGeneScore)

Symbol	geneID	PriorFunct	Symbol	geneID	PriorPhys
L2HGDH	79944	0.75624798	L2HGDH	79944	0.762632978
SUMF1	285362	0.673091841	SUMF1	285362	0.6754931
NDUFAF4	29078	0.587444509	ADGRG1	9289	0.672186086
NDUFA1	4694	0.578958636	NDUFAF4	29078	0.60198354
ADGRG1	9289	0.548488778	NDUFA1	4694	0.586792265
ASPA	443	0.513182187	ASPA	443	0.529834244

The final result provides an ordered list of genes that match the patient's HPOs. The PriorFunct and PriorPhys columns report the phenotypic scores after propagation through the functional and physical gene-gene interactions of the network.

Prioritization of genes from a file with the patient's HPOs

patientHPOsFile <- paste(system.file("extdata/example", package = "ClinPrior"),"HPOpatient.txt",sep="/")
HPOpatient <- unique(read.csv(patientHPOsFile, header = FALSE, sep = "\t")[, 1])
Y<-proteinScore(HPOpatient)
ClinPriorGeneScore<-MatrixPropagation(Y,alpha=0.2)

head(ClinPriorGeneScore)

Symbol	geneID	PriorFunct	Symbol	geneID	PriorPhys
14465	KCNQ2	3785	0.75577235	KCNQ2	3785
10110	METTL23	124512	0.06526948	METTL23	124512
6323	KCNQ3	3786	0.06023326	KCNQ3	3786
8082	QDPR	5860	0.05522760	QDPR	5860
7696	SLC1A4	6509	0.04717662	SLC1A4	6509
4269	EDC3	80153	0.04607937	EDC3	80153

Variant prioritization

Once we have the gene prioritization result (ClinPriorGeneScore in the previous section) that best fits our patient's phenotype, we prioritize the patient variants from a WES or WGS annotated vcf file with the Ensembl Variant Effect Predictor (VEP) (https://www.ensembl.org/info/docs/tools/vep/index.html). In the example below, we have created a synthetic vcf file with the pathogenic variant KCNQ2:Met546Thr.

library(vcfR)

#example for GRCh37 assembly
vcfFile = paste(system.file("extdata/example", package = "ClinPrior"),"HG001_GRCh37_1_22_v4.2.1_benchmark.vep01.KCNQ2Met546Thr.vcf.gz",sep="/")
variants <- read.vcfR(vcfFile)
variantsFiltered <- readVCF(sampleName = "HG001", variants=variants, assembly= "assembly37")
result = priorBestVariantVcfR(variants = variantsFiltered, sampleName = "HG001", GlobalPhenotypicScore = ClinPriorGeneScore, assembly= "assembly37")

head(result)

ClinPriorPosition	CHROM	POS	REF	ALT	genesList	clinvar	knownDisease	Consequence	cDNA	Protein
1	20	62044929	A	G	KCNQ2	pathogenic	AD	missense_variant	ENST00000359125.2:c.1637T>C	ENSP00000352035.2:p.Met546Thr

The KCNQ2 Met546Thr variant is in the first row.

VEP annotation

To annotate a vcf file, you will need the Ensembl Variant Effect Predictor (VEP) (https://www.ensembl.org/info/docs/tools/vep/index.html). This is the command line to annotate your vcf file using the VEP docker or command line tool. You will need to install VEP, CADD and MaxEntScan first.

vep --af --af_gnomade --af_gnomadg --appris --biotype --buffer_size 500 --cache --ccds --check_existing --database 0 --dir [PATH]/cache --dir_plugins [PATH]/VEP_plugins --distance 5000 --fasta_dir [PATH]/fasta --filter_common --force --fork 4 --hgvs --input_file [PATH]/vcf_test.vcf --mane --output_file [PATH]/output.vcf --pick_allele --polyphen b --pubmed --quiet --regulatory --safe --sift b --stats_text --symbol --transcript_version --tsl --var_synonyms --vcf

Alternatively, you can annotate your VCF using the VEP web interface. You should use the default checkboxes and additionally check the following: CCDS, Protein, HGVS, Variant synonyms, gnomAD (exomes) allele frequencies, gnomAD (genomes) allele frequencies, CADD, MaxEntScan, SpliceAI, Exclude common variants, and in Restrict results select "Show one selected consequence per variant allele".

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
R		R
data		data
img		img
inst/extdata		inst/extdata
man		man
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
ClinPrior.Rproj		ClinPrior.Rproj
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md

License

aschluter/ClinPrior

Folders and files

Latest commit

History

Repository files navigation

ClinPrior

ClinPrior pipeline

Diagnostic yield in a patient real-world cohort

Installation

Gene prioritization

Score propagation

Prioritization of genes from a file with the patient's HPOs

Variant prioritization

VEP annotation

About

Resources

License

Stars

Watchers

Forks

Languages