Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading CPTAC data generates many "gene" records #40

Closed
pieterlukasse opened this issue Apr 18, 2017 · 7 comments
Closed

Loading CPTAC data generates many "gene" records #40

pieterlukasse opened this issue Apr 18, 2017 · 7 comments
Labels

Comments

@pieterlukasse
Copy link
Member

pieterlukasse commented Apr 18, 2017

When loading brca study with CPTAC mass spectrometry data, the portal will generate a large amount of new "gene" records to store the data reported for each separate isoform(?) in the CPTAC files (72,159 new records in gene table!)

Here are some concerns:

  • the query page becomes slow when typing "PHOSPHOPROTEIN" in the Genes box (each new protein "gene" also gets this alias). The resulting drop down is very slow.
  • depending on what each symbol means in the CPTAC data file, this solution might not be scalable. For example: are SORBS1_pT72 or SORBS1_pT82_S89 encoding modifications to the canonical protein sequence known for gene SORBS1 rather than symbols of well known isoforms? If so, we risk an explosion of the number of records in the gene table as each study finds new modifications.

Another question I had when looking at the data (see data sample below) is:

  • how is the entry SORBS1|SORBS1 made? Is this an aggregation of all the other SORBS1|* items? How is this aggregation done?

Data sample from file:

SORBS1|SORBS1   0.545571655184  1.31369690336   1.20131762167   1.1320980343    0.54739111875   1.19041192239   2.73163154855   0.948705044244  1.33867851356   2.12510951076   1.01727605533   1.3008073214
SORBS1|SORBS1_pT72      0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.319789927879  0.325725496261  0.453594164798  0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0
SORBS1|SORBS1_pS76      0.0     0.0     0.0     0.0     0.0     0.0     0.802830082054  1.10324826511   1.43253238093   0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0
SORBS1|SORBS1_pS77      0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0
SORBS1|SORBS1_pS78      0.0312601610742 0.247415810145  1.62051540467   1.11476716945   0.849155398463  2.08833175784   0.0     0.0     0.0     0.0     0.0     0.0     1.32989448951   0.754650122826  0.78
@pieterlukasse
Copy link
Member Author

@jjgao @zheins @sheridancbio

@pieterlukasse
Copy link
Member Author

Just for the record:

Here an example of a "protein" vs one of its isoforms(?) :
image

Here is how the isoform(?) is displayed in oncoprint, as "protein upregulation":
image

@jjgao
Copy link
Member

jjgao commented Apr 18, 2017

cc'ing @pambot on this.

PHOSPHOPROTEIN and SORBS1_pT72 are definitely hacks we should solve. Ideally, we should be able to pull out SORBS1_pT72 when querying SORBS1. Now we have the generic_entity table. Maybe we can do it now.

SORBS1|SORBS1 is the general protein level. It's detected independent of the phospho-levels. Theoretically, it should include all the phosphoproteins. @pambot please correct me if I am wrong.

@pieterlukasse
Copy link
Member Author

@jjgao thanks for the comments. I also think it would be nice to have SORBS1_pT72 (and others) appear as options when filling in SORBS1. This could probably be done by tweaking the way the gene_alias table is used now in the query page. The only problem could be that you will get a long list of options for each and every gene you fill in the query box....so maybe we need a new feature for this scenario?

@pieterlukasse
Copy link
Member Author

@jjgao will phosphoproteins move to the generic assay feature as well?

@jjgao
Copy link
Member

jjgao commented Jun 21, 2019

@pieterlukasse yes, that's the plan.

@jjgao
Copy link
Member

jjgao commented Jun 21, 2019

https://github.com/cBioPortal/cbioportal/issues/6309

@jjgao jjgao closed this as completed Jun 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants