-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add clinvar_disease column #635
Comments
Do you mean that clinvar association are per variant? |
yes. edited |
I can think of a few things: Alternatively would it be better to have one "disease_phenotype" column (mashed together with | delimiters) that is populated by clinvar but eventually optionally concatenate other gene-to-phenotype resources (OMIM, etc) onto the value? Or a third option could be creating an additional clinvar_gene_phenotype column to hold this gene-wise phenotype information? |
I like "clinvar_gene_phenotype" |
Yes, I agree with @jxchong, I think a separate column as she proposes would work best. I am currently using GEMINI in a proposed clinical workflow and make use of the in_clinvar column for filtering for specific reported actionable variants for instance. All of our variants are in disease genes since they are targeted amplicon panels so the finer grained reported clinvar variant is needed, which also helps us weed out FFPE artifacts, sequencing errors, etc. For my germline work I often have to do post-hoc filtering with gene lists to identify variants in genes I happen to know are related to the phenotype, but having it in a phenotype-specific column would let me do it directly in the database queries much easier. |
add clinvar_gene_phenotype column. closes #635.
thanks for the feedback, this is now in master and will be in 0.18.1 soon. |
FYI for a few variants it seems that clinvar_gene_phenotype is showing up with a '.' instead of 'None' in the query results.
|
shoot. that's a decomposed variant:
looks like the decomposition isn't going as expected. |
From clinvar, using CLNDBN as disease name and GENEINFO as the genes, we can populate the the clinvar_disease column with a | delimited list of diseases that match the current variant's top_impact.gene (and chrom)
currently, the clinvar associations are per-variant so we can't tell if we have found a variant that in the same gene as one where there's a clinvar variant related to our phenotype.
Since we're doing | delimited, queries would have to be like:
... AND clinvar_disease LIKE '%DYSPLASIA%'
thoughts @jxchong, @arq5x, @dgaston ?
This is related to #571 and it might be better to do a more comprehensive thing such as what is proposed there but this would be pretty simple and provide a fair bit of utility. We can't use omim by default due to the licensing/registration reqs.
The text was updated successfully, but these errors were encountered: