-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loading CPTAC data generates many "gene" records #40
Comments
cc'ing @pambot on this. PHOSPHOPROTEIN and SORBS1_pT72 are definitely hacks we should solve. Ideally, we should be able to pull out SORBS1_pT72 when querying SORBS1. Now we have the generic_entity table. Maybe we can do it now.
|
@jjgao thanks for the comments. I also think it would be nice to have SORBS1_pT72 (and others) appear as options when filling in SORBS1. This could probably be done by tweaking the way the gene_alias table is used now in the query page. The only problem could be that you will get a long list of options for each and every gene you fill in the query box....so maybe we need a new feature for this scenario? |
@jjgao will phosphoproteins move to the generic assay feature as well? |
@pieterlukasse yes, that's the plan. |
When loading
brca
study with CPTAC mass spectrometry data, the portal will generate a large amount of new "gene" records to store the data reported for each separate isoform(?) in the CPTAC files (72,159 new records ingene
table!)Here are some concerns:
SORBS1_pT72
orSORBS1_pT82_S89
encoding modifications to the canonical protein sequence known for geneSORBS1
rather than symbols of well known isoforms? If so, we risk an explosion of the number of records in thegene
table as each study finds new modifications.Another question I had when looking at the data (see data sample below) is:
SORBS1|SORBS1
made? Is this an aggregation of all the otherSORBS1|*
items? How is this aggregation done?Data sample from file:
The text was updated successfully, but these errors were encountered: