Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add clinvar_disease column #635

Closed
brentp opened this issue Dec 8, 2015 · 8 comments
Closed

add clinvar_disease column #635

brentp opened this issue Dec 8, 2015 · 8 comments

Comments

@brentp
Copy link
Collaborator

brentp commented Dec 8, 2015

From clinvar, using CLNDBN as disease name and GENEINFO as the genes, we can populate the the clinvar_disease column with a | delimited list of diseases that match the current variant's top_impact.gene (and chrom)

currently, the clinvar associations are per-variant so we can't tell if we have found a variant that in the same gene as one where there's a clinvar variant related to our phenotype.

Since we're doing | delimited, queries would have to be like:

... AND clinvar_disease LIKE '%DYSPLASIA%'

thoughts @jxchong, @arq5x, @dgaston ?

This is related to #571 and it might be better to do a more comprehensive thing such as what is proposed there but this would be pretty simple and provide a fair bit of utility. We can't use omim by default due to the licensing/registration reqs.

@jxchong
Copy link
Contributor

jxchong commented Dec 8, 2015

currently, the clinvar associations are per-gene so we can't tell if we have found a variant that in the same gene as one where there's a clinvar variant related to our phenotype.

Do you mean that clinvar association are per variant?

@brentp
Copy link
Collaborator Author

brentp commented Dec 8, 2015

yes. edited

@jxchong
Copy link
Contributor

jxchong commented Dec 8, 2015

I can think of a few things:
this leads to possibly unexpected behavior where in_clinvar==0 but clinvar_disease==some_disease because the variant itself isn't in clinvar but other variants in the gene are listed as causing some_disease -- this would be different than the current situation

Alternatively would it be better to have one "disease_phenotype" column (mashed together with | delimiters) that is populated by clinvar but eventually optionally concatenate other gene-to-phenotype resources (OMIM, etc) onto the value?

Or a third option could be creating an additional clinvar_gene_phenotype column to hold this gene-wise phenotype information?

@brentp
Copy link
Collaborator Author

brentp commented Dec 8, 2015

I like "clinvar_gene_phenotype"

@dgaston
Copy link
Contributor

dgaston commented Dec 9, 2015

Yes, I agree with @jxchong, I think a separate column as she proposes would work best. I am currently using GEMINI in a proposed clinical workflow and make use of the in_clinvar column for filtering for specific reported actionable variants for instance. All of our variants are in disease genes since they are targeted amplicon panels so the finer grained reported clinvar variant is needed, which also helps us weed out FFPE artifacts, sequencing errors, etc.

For my germline work I often have to do post-hoc filtering with gene lists to identify variants in genes I happen to know are related to the phenotype, but having it in a phenotype-specific column would let me do it directly in the database queries much easier.

@brentp brentp closed this as completed in 6bf5f41 Dec 9, 2015
brentp added a commit that referenced this issue Dec 9, 2015
add clinvar_gene_phenotype column. closes #635.
@brentp
Copy link
Collaborator Author

brentp commented Dec 9, 2015

thanks for the feedback, this is now in master and will be in 0.18.1 soon.

@jxchong
Copy link
Contributor

jxchong commented Dec 19, 2015

FYI for a few variants it seems that clinvar_gene_phenotype is showing up with a '.' instead of 'None' in the query results.

chr16 3900824 3900825 C A CREBBP

@brentp
Copy link
Collaborator Author

brentp commented Dec 19, 2015

shoot. that's a decomposed variant:

16  3900825 rs200673670 C   A   .   .   RS=200673670;RSPOS=3900825;dbSNPBuildID=137;SSR=0;SAO=0;VP=0x050060000a05000002100100;GENEINFO=CREBBP:1387;WGT=1;VC=SNV;PM;NSM;REF;ASP;LSD;CLNALLE=.;CLNHGVS=.;CLNSRC=.;CLNORIGIN=.;CLNSRCID=.;CLNSIG=.;CLNDSDB=.;CLNDSDBID=.;CLNDBN=.;CLNREVSTAT=.;CLNACC=.;OLD_MULTIALLELIC=16:3900825:C/A/T
16  3900825 rs200673670 C   T   .   .   RS=200673670;RSPOS=3900825;dbSNPBuildID=137;SSR=0;SAO=0;VP=0x050060000a05000002100100;GENEINFO=CREBBP:1387;WGT=1;VC=SNV;PM;NSM;REF;ASP;LSD;CLNALLE=1;CLNHGVS=NC_000016.9:g.3900825C>T;CLNSRC=.;CLNORIGIN=1;CLNSRCID=.;CLNSIG=1;CLNDSDB=MedGen;CLNDSDBID=CN169374;CLNDBN=not_specified;CLNREVSTAT=not;CLNACC=RCV000120599.1;OLD_MULTIALLELIC=16:3900825:C/A/T

looks like the decomposition isn't going as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants