Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

frontend selection of mutation genetic profiles may need improvements #1646

Closed
sheridancbio opened this issue Sep 7, 2016 · 11 comments
Closed
Assignees

Comments

@sheridancbio
Copy link
Contributor

sheridancbio commented Sep 7, 2016

Originally, this issue was a bug report about unpopulated mutation lists in the results page.

The bugs have been fixed, but there is still possible ambiguity about which genetic profile should be used for frontend visualization (such as the mutationmapper) when there are more than one mutation profile to choose from.


Below Here Is The Original Bug Report

Accessing views through /beta deployment (rc?) show unpopulated mutation lists:

(suspect malfunctioning mutation data servlets - oncoprint looks ok)

@ersinciftci @n1zea144

PatientVew (accessed through sample list in study view):
http://www.cbioportal.org/beta/case.do?cancer_study_id=brca_tcga&case_id=TCGA-3C-AAAU

Local debugging shows some exceptions:
SEVERE: Servlet.service() for servlet [MutationsJSON] in context with path [/cbioportaltest] threw exception
java.lang.NumberFormatException: For input string: ""
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:592)
at java.lang.Integer.parseInt(Integer.java:615)
at org.mskcc.cbio.portal.servlet.MutationsJSON.getDrugs(MutationsJSON.java:425)
at org.mskcc.cbio.portal.servlet.MutationsJSON.processGetMutationsRequest(MutationsJSON.java:276)
at org.mskcc.cbio.portal.servlet.MutationsJSON.processRequest(MutationsJSON.java:126)
at org.mskcc.cbio.portal.servlet.MutationsJSON.doPost(MutationsJSON.java:786)

Through Query Page: (query study brca_tcga on genes TP53 BRCA1)
There are many mutations in this query, but the "Mutations" Tab is empty for both genes.

http://www.cbioportal.org/beta/index.do?cancer_study_list=brca_tcga&cancer_study_id=brca_tcga&genetic_profile_ids_PROFILE_MUTATION_EXTENDED=brca_tcga_mutations&genetic_profile_ids_PROFILE_COPY_NUMBER_ALTERATION=brca_tcga_gistic&Z_SCORE_THRESHOLD=2.0&RPPA_SCORE_THRESHOLD=2.0&data_priority=0&case_set_id=brca_tcga_cnaseq&case_ids=&patient_case_select=sample&gene_set_choice=user-defined-list&gene_list=TP53+BRCA1&clinical_param_selection=null&tab_index=tab_visualize&Action=Submit&show_samples=false&\

@n1zea144
Copy link
Contributor

n1zea144 commented Sep 7, 2016

I'm seeing this in the logs:
2016-09-07 15:17:38 [ajp-bio-28009-exec-2905] ERROR org.mskcc.cbio.portal.util.MutationDataUtils - Could not parse OMA URL: Invalid host: [Not Available]. Is the OMA server still accessible? It may have been brought down after Chris's last day (last week).

@sheridancbio sheridancbio self-assigned this Sep 16, 2016
@sheridancbio
Copy link
Contributor Author

I have continued looking into this. I have just queried genes TP53 and AKT1 across all provisional studies plus 3-4 additional high-sample studies from breast cancer. Out of 34 studies queried, 15 failed to show mutations which were visible on the oncoprint after opening the mutations tab ... instead reporting "There are no TP53 mutations in the selected samples." for example. Affected tudy ids:
esca_tcga
paad_tcga
blca_tcga
stad_tcga
luad_tcga
sarc_tcga
brca_tcga_pub
lihc_tcga
brca_tcga
ucec_tcga
acc_tcga
acc_tcga
chol_tcga
thym_tcga
pcpg_tcga

@sheridancbio
Copy link
Contributor Author

sheridancbio commented Sep 16, 2016

Following up on the comment from @n1zea144, I went to the public-portal-beta.log and there were bunches of similar errors from the one that he found. Example:
2016-09-16 15:41:18 [ajp-bio-28009-exec-1462] ERROR org.mskcc.cbio.portal.util.MutationDataUtils - Could not parse OMA URL: Invalid host: [Not Available]
I'm not familiar with the OMA server, looking into it. Clearly this is a factor for some studies but not for others. (or maybe some mutations but not others)

@sheridancbio
Copy link
Contributor Author

core/src/main/java/org/mskcc/cbio/portal/util/OmaLinkUtil.java
and
core/src/main/java/org/mskcc/cbio/portal/util/MutationDataUtils.java
are the relevant code. This does seem related to the clickable links to mutationassessor.org which come from our expanded MAF tables. years ago the domain "getma.org" expired and code was written to dynamically rewrite these links as "mutationassessor.org" which was still a valid (and equivalent) URL. But in the MAF files, the old hostname still persisted. Another patch apparently needs to be added to this part of the old API, or this problem needs to be masked. We hope to soon shift away from these embedded links to a previous release of mutationassessor anyway and use dynamic calls to the mutationassessor web api.

@sheridancbio
Copy link
Contributor Author

Ok, I think I have it ... the code in MutationDataUtils.java checks for the value "NA" and hardcodes an "NA" into the link response which is then not displayed. But I am guessing that we switched away from using "NA" in our database at some point ... I remember a discussion that there was a gene named "NA" (ENSG00000047597 ?) as one reason to switch away. So the "[Not Available]" string is actually a replacement for "NA" .. but it is slipping through this code because it was only looking for "NA" verbatim. This probably has nothing to do with the domain names used for mutationassessor, and the fix should be relatively easy.

@sheridancbio
Copy link
Contributor Author

Confirmed .. in the latest version of cgds_public on dashi, the link fields in mutation_event for mutation assessor contain the string "[Not Available]" for 101404 rows out of 2496234 total. In my local copy of cgds_public from a couple months ago, there are no such entries. Any query which constructs a link to omaRedirect.do? on one of these mutations will throw a MalformedURLException with the current codebase, so the error occurs when one of these 101404 mutations are present in the query results.

@sheridancbio
Copy link
Contributor Author

sheridancbio commented Sep 16, 2016

One good example test case is mutation TP53 L194R .. this mutation event is marked "[Not Available]" in the mutation event record links, but is present in the following profiles:
brca_tcga_mutations
gbm_tcga_mutations
hnsc_tcga_mutations
lgg_tcga_mutations
lihc_tcga_mutations
luad_tcga_mutations
ov_tcga_mutations
sarc_tcga_mutations
thym_tcga_mutations

@sheridancbio
Copy link
Contributor Author

Additional searching shows that there are three "non-link" values to handle from the current databases we are using: "NA", "[Not Available]", and ""

mysql> select link_pdb, count(1) from mutation_event where link_pdb not like '%pdb%' group by link_pdb;
+-----------------+----------+
| link_pdb | count(1) |
+-----------------+----------+
| | 346 |
| NA | 1606540 |
| [Not Available] | 101404 |
+-----------------+----------+

@sheridancbio
Copy link
Contributor Author

This issue has been partially fixed now, via:
#1705
#1733

I am changing this issue from a Bug to an Enhancement now.
The remaining work is to select the correct mutation profile for studies where multiple mutation profiles might be loaded by the query page, and returned by the datamanager call to 'getMutationProfileIds'.

It is possible this never happens --- if the query results page only ever loads a single mutation profile, maybe we don't need to fix anything. But in general 'getMutationProfileIds' might return more than one mutation profile, and in DataProxyFactory.js we are setting servletParams.geneticProfiles = mutation_profile_ids[0];
which just takes the first element of the returned list.
All users of DataProxyFactory, and in particular mutationmapper, may need to get the correct profile Id. There is some logic in core which selects default profiles according to criteria. It should be understood and ported to the frontend.

@sheridancbio
Copy link
Contributor Author

adding participants: @jjgao @adamabeshouse @onursumer

@sheridancbio sheridancbio changed the title mutation lists not populated (studyview / patientview / querypage) frontend selection of mutation genetic profiles may need improvements Sep 29, 2016
@sheridancbio
Copy link
Contributor Author

I am closing this issue now .. because each study currently has only a single genetic profile of type EXTENDED_MUTATION. If in the future we have multiple mutation profiles per study, we may need to revisit this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants