add clinvar_disease column #635

brentp · 2015-12-08T22:31:05Z

From clinvar, using CLNDBN as disease name and GENEINFO as the genes, we can populate the the clinvar_disease column with a | delimited list of diseases that match the current variant's top_impact.gene (and chrom)

currently, the clinvar associations are per-variant so we can't tell if we have found a variant that in the same gene as one where there's a clinvar variant related to our phenotype.

Since we're doing | delimited, queries would have to be like:

... AND clinvar_disease LIKE '%DYSPLASIA%'

thoughts @jxchong, @arq5x, @dgaston ?

This is related to #571 and it might be better to do a more comprehensive thing such as what is proposed there but this would be pretty simple and provide a fair bit of utility. We can't use omim by default due to the licensing/registration reqs.

jxchong · 2015-12-08T22:34:17Z

currently, the clinvar associations are per-gene so we can't tell if we have found a variant that in the same gene as one where there's a clinvar variant related to our phenotype.

Do you mean that clinvar association are per variant?

brentp · 2015-12-08T22:40:56Z

yes. edited

jxchong · 2015-12-08T22:47:27Z

I can think of a few things:
this leads to possibly unexpected behavior where in_clinvar==0 but clinvar_disease==some_disease because the variant itself isn't in clinvar but other variants in the gene are listed as causing some_disease -- this would be different than the current situation

Alternatively would it be better to have one "disease_phenotype" column (mashed together with | delimiters) that is populated by clinvar but eventually optionally concatenate other gene-to-phenotype resources (OMIM, etc) onto the value?

Or a third option could be creating an additional clinvar_gene_phenotype column to hold this gene-wise phenotype information?

brentp · 2015-12-08T23:11:43Z

I like "clinvar_gene_phenotype"

dgaston · 2015-12-09T19:07:15Z

Yes, I agree with @jxchong, I think a separate column as she proposes would work best. I am currently using GEMINI in a proposed clinical workflow and make use of the in_clinvar column for filtering for specific reported actionable variants for instance. All of our variants are in disease genes since they are targeted amplicon panels so the finer grained reported clinvar variant is needed, which also helps us weed out FFPE artifacts, sequencing errors, etc.

For my germline work I often have to do post-hoc filtering with gene lists to identify variants in genes I happen to know are related to the phenotype, but having it in a phenotype-specific column would let me do it directly in the database queries much easier.

add clinvar_gene_phenotype column. closes #635.

brentp · 2015-12-09T19:30:44Z

thanks for the feedback, this is now in master and will be in 0.18.1 soon.

jxchong · 2015-12-19T00:22:51Z

FYI for a few variants it seems that clinvar_gene_phenotype is showing up with a '.' instead of 'None' in the query results.

chr16 3900824 3900825 C A CREBBP

brentp · 2015-12-19T01:25:37Z

shoot. that's a decomposed variant:

16  3900825 rs200673670 C   A   .   .   RS=200673670;RSPOS=3900825;dbSNPBuildID=137;SSR=0;SAO=0;VP=0x050060000a05000002100100;GENEINFO=CREBBP:1387;WGT=1;VC=SNV;PM;NSM;REF;ASP;LSD;CLNALLE=.;CLNHGVS=.;CLNSRC=.;CLNORIGIN=.;CLNSRCID=.;CLNSIG=.;CLNDSDB=.;CLNDSDBID=.;CLNDBN=.;CLNREVSTAT=.;CLNACC=.;OLD_MULTIALLELIC=16:3900825:C/A/T
16  3900825 rs200673670 C   T   .   .   RS=200673670;RSPOS=3900825;dbSNPBuildID=137;SSR=0;SAO=0;VP=0x050060000a05000002100100;GENEINFO=CREBBP:1387;WGT=1;VC=SNV;PM;NSM;REF;ASP;LSD;CLNALLE=1;CLNHGVS=NC_000016.9:g.3900825C>T;CLNSRC=.;CLNORIGIN=1;CLNSRCID=.;CLNSIG=1;CLNDSDB=MedGen;CLNDSDBID=CN169374;CLNDBN=not_specified;CLNREVSTAT=not;CLNACC=RCV000120599.1;OLD_MULTIALLELIC=16:3900825:C/A/T

looks like the decomposition isn't going as expected.

brentp added the version-0.18.1 label Dec 8, 2015

brentp closed this as completed in 6bf5f41 Dec 9, 2015

brentp added a commit that referenced this issue Dec 9, 2015

Merge pull request #636 from brentp/clinvar_gene_phenotype

fe7bf29

add clinvar_gene_phenotype column. closes #635.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add clinvar_disease column #635

add clinvar_disease column #635

brentp commented Dec 8, 2015

jxchong commented Dec 8, 2015

brentp commented Dec 8, 2015

jxchong commented Dec 8, 2015

brentp commented Dec 8, 2015

dgaston commented Dec 9, 2015

brentp commented Dec 9, 2015

jxchong commented Dec 19, 2015

brentp commented Dec 19, 2015

add clinvar_disease column #635

add clinvar_disease column #635

Comments

brentp commented Dec 8, 2015

jxchong commented Dec 8, 2015

brentp commented Dec 8, 2015

jxchong commented Dec 8, 2015

brentp commented Dec 8, 2015

dgaston commented Dec 9, 2015

brentp commented Dec 9, 2015

jxchong commented Dec 19, 2015

brentp commented Dec 19, 2015