hide fusion gene table in the summary view if no fusion data present … #6731

khzhu · 2019-10-23T17:21:45Z

hide the fusion gene table in the summary view if no fusion data present and fix the number of samples for fusion.

Fix #6709

Describe changes proposed in this pull request:

add a new fusion case list
update migration.sql script to back fill fusion case lists for studies with fusion data.
add a new flag fusionPresent to sample model
add new properties numberOfFusionProfiledSamples/numberOfFusionUnprofiledSamples in MolecularProfileSampleCount model
update sample-counts web service endpoint to include the number of Profiled/Un-profiled fusion Samples.

Checks

Runs on heroku
Has tests or has a separate issue that describes the types of test that should be created. If no test is included it should explicitly be mentioned in the PR why there is no test.
The commit log is comprehensible. It follows 7 rules of great commit messages. For most PRs a single commit should suffice, in some cases multiple topical commits can be useful. During review it is ok to see tiny commits (e.g. Fix reviewer comments), but right before the code gets merged to master or rc branch, any such commits should be squashed since they are useless to the other developers. Definitely avoid merge commits, use rebase instead.
Is this PR adding logic based on one or more clinical attributes? If yes, please make sure validation for this attribute is also present in the data validation / data loading layers (in backend repo) and documented in File-Formats Clinical data section!

Any screenshots or GIFs?

If this is a new visual feature please add a before/after screenshot or gif
here with e.g. GifGrabber.

Notify reviewers

Read our Pull request merging
policy. It can help to figure out who worked on the
file before you. Please use git blame <filename> to determine that
and notify them either through slack or by assigning them as a reviewer on the PR

zhx828

Where the logic to determine the study has fusion sample list?

khzhu · 2019-10-23T18:16:38Z

@zhx828 , in SampleServiceImpl processSample method.

khzhu · 2019-10-24T14:51:24Z

@alisman , here is the backend PR.

khzhu · 2019-10-24T14:57:19Z

Hi @sheridancbio , would you be able to review this PR. It will fix the problem reported by Tali. Thank you!

alisman · 2019-10-24T15:05:26Z

@kalletlak do frontned/backend need to be deployed together? I.E. will either break if they deployed separately?

zhx828 · 2019-10-24T19:31:48Z

@khzhu let me rephrase, where the logic(script) to populate the sample list if the study has the fusion data in the database?

kalletlak · 2019-10-28T14:11:59Z

@kalletlak do frontned/backend need to be deployed together? I.E. will either break if they deployed separately?

@alisman Yes

khzhu · 2019-10-29T17:37:11Z

Hi @zhx828 , I did not change any existing logic to populate the sample list other than added a new case list cases_fusion to list all samples that have fusion data.
The following line I added to the SampleServiceImpl class that will mark any samples as fusion present if the sample is listed in the cases_fusions list.

sample.setFusionPresent(fusionSampleIdsMap.get(sample.getCancerStudyIdentifier()).contains(sample.getStableId()));

Hope it answers your question.

zhx828 · 2019-10-29T18:54:55Z

@khzhu if this is the case, @sheridancbio could you point Kelsey to the pipeline code where the logic determines whether the fusion list should be populated? @khzhu could you help to make a patch to that code too?

khzhu · 2019-10-29T19:25:37Z

@zhx828 , will do.

sheridancbio · 2019-10-29T20:03:07Z

@zhx828 @khzhu I'm reading up on this now. From a first read-through it looks like you introduce a new standard sample list to hold all samples which underwent fusion profiling. Then the presence or absence of list elements determines whether this study has any fusion data to display or not.

Two very quick thoughts:

I think there are 4 current standard case lists : _all, _sequenced, _cna, _cnaseq ... The meaning of the case lists seem a little mixed already. "_sequenced" perhaps is all samples which have any genetic profiling performed at all. _cna perhaps means samples for which a detected cna alteration event is present. Adding _fusion would be "those samples which were characterized on a platform which can detect fusion events?" I'm just trying to understand the meaning of our standard case list files to see if this fits in with the other case lists.
If there is now going to be a file in the import directory such as staging_dir/case_lists/cases_fusion.txt I wonder if it will be a required file or not. If it is missing and we still have fusion events in data_SV.txt or data_mutations_extended.txt I wonder what we should do. Perhaps the validator script needs to be updated for cases like this. When deploying this code we will need to make adjustments to the msk importer code base to generate this _fusion case list ... and I suppose we will need to infer which samples were analyzed for fusions based on whether the mutation sequencing and analysis was a protocol which can detect/report fusions or not (as well as other factors). It may require some approximations.

Just to be clear ... it is possible in your minds that there can be samples on the _fusion case list for which no fusions were detected. (in other words .. profiled but nothing found) ? cc: @n1zea144

khzhu · 2019-10-29T20:24:33Z

hi @sheridancbio , thank for your inputs.
to answer your questions:

Adding _fusion would be "those samples which were characterized on a platform which can detect fusion events?"

yes, that is right.

If there is now going to be a file in the import directory such as staging_dir/case_lists/cases_fusion.txt I wonder if it will be a required file or not.

it will not be a required file.

If it is missing and we still have fusion events in data_SV.txt or data_mutations_extended.txt I wonder what we should do.

we will not add cases_fusion.txt file for both. cases_fusion.txt will be eventually deprecated once we started to use structural variants.

n1zea144 · 2019-10-30T13:43:59Z

@khzhu @sheridancbio

My two cents -
I would follow the same pattern that is used to display the mutation x cna plot, mutated genes & cna genes table, etc. (which I imagine is being done in this PR). Curious, if it is done by the presence or absence of a case list, how is the rendering of these charts determined on virtual cohort study summary pages?

jjgao · 2019-10-31T13:05:15Z

one minor comment: should we use cases_fusion or cases_sv? (assume we will use the same case list after switch to the sv implementation)

jjgao · 2019-10-31T13:30:24Z

db-scripts/src/main/resources/migration.sql

+REPLACE INTO sample_list(STABLE_ID, CATEGORY, CANCER_STUDY_ID, NAME, DESCRIPTION)
+    SELECT DISTINCT CONCAT(CANCER_STUDY_IDENTIFIER, '_fusion'), 'all_cases_with_mutation_data', `cancer_study`.CANCER_STUDY_ID, 'Samples with fusion data','Samples with fusion data'
+        FROM `mutation_event`
+            JOIN `mutation` ON `mutation`.MUTATION_EVENT_ID = `mutation_event`.MUTATION_EVENT_ID
+            JOIN `genetic_profile` ON `genetic_profile`.GENETIC_PROFILE_ID = `mutation`.GENETIC_PROFILE_ID
+            JOIN `cancer_study` ON `cancer_study`.CANCER_STUDY_ID = `genetic_profile`.CANCER_STUDY_ID
+        WHERE MUTATION_TYPE = 'fusion';
+REPLACE INTO sample_list_list(list_id, sample_id)
+    SELECT DISTINCT list_id, SAMPLE_ID
+        FROM `mutation_event`
+            JOIN `mutation` ON `mutation`.MUTATION_EVENT_ID = `mutation_event`.MUTATION_EVENT_ID
+            JOIN `genetic_profile` ON `genetic_profile`.GENETIC_PROFILE_ID = `mutation`.GENETIC_PROFILE_ID
+            JOIN `cancer_study` ON `cancer_study`.CANCER_STUDY_ID = `genetic_profile`.CANCER_STUDY_ID
+            JOIN `sample_list` ON `sample_list`.CANCER_STUDY_ID = `cancer_study`.CANCER_STUDY_ID AND `sample_list`.STABLE_ID LIKE '%fusion%'
+        WHERE MUTATION_TYPE = 'fusion';


My understanding of this code is that a default _fusion sample list will be created which is the list of samples that have fusion data. There is a caveat: some profiled samples may be fusion negative. For example, many MSK-IMPACT samples do not have fusions but they were all profiled.

This might be fine as the default one for migration, but we should generate proper _fusion case list for our studies.

Alternatively, I am wondering if we should just copy _sequenced to _fusion when there are fusion data?

@n1zea144 @sheridancbio thoughts?

@n1zea144 @SRodenburg , any inputs on this? thanks!

Hi @jjgao Rob and I were talking about this too. Using MSK-IMPACT/MSK-ARCHER as an example, I'm its not entirely accurate to assume that if a sample was sequenced for mutation calling, that it was also analyzed for fusions (although it is true that with IMPACT, there are some probes looking for canonical fusions). However, it seems excessive to have a separate list for cases that were checked for SVs, and then another case list for samples in which SVs were found, so carrying over the sequenced sample list to fusions may be a good compromise.

thanks, @n1zea144 !
but how do we know if there are fusions or not at the importing stage to decide if we should carry the sequenced sample list over to fusions?

khzhu · 2019-11-07T22:35:57Z

Alternatively, I am wondering if we should just copy _sequenced to _fusion when there are fusion data?

I tried JJ's alternative approach and tested on msk_impact_2017 dataset. And, it worked well.
should we go with this plan, @jjgao @n1zea144 @sheridancbio?

for existing datasets that have fusions, we backfill fusion sample case list using samples recorded in the mutation_event table with mutation_type = 'fusion'
for new dataset that has fusions, we will just copy _sequenced to _fusion before uploading.

jjgao · 2019-11-07T22:43:32Z

@khzhu I am not sure if I understand correctly, but I think we should:

for existing datasets that have fusions, we backfill fusion sample case list with the _sequenced case list.
for a new dataset that has fusions, if there is _fusions case list, use that; otherwise, copy _sequenced to _fusion before uploading.

This will ensure that our current database will work after migration, and our current data files will work. Also once we put _fusion case lists, they will also work.

BTW, should we call it _sv case list?

khzhu · 2019-11-07T22:52:03Z

for existing datasets that have fusions, we backfill fusion sample case list with the _sequenced case list.

will update migration sql statement

for a new dataset that has fusions, if there is _fusions case list, use that; otherwise, copy _sequenced to _fusion before uploading.

sounds good.

BTW, should we call it _sv case list?

go with _sv

khzhu · 2019-11-08T18:21:39Z

Hi @jjgao @n1zea144 @sheridancbio ,
Recap changes made based on inputs from you all.

create a new _sv case list and add a new all_cases_with_structural_variant_data case list category
for existing datasets that have fusions, backfill fusion sample case list from _sequenced case list.
for new dataset with fusions, if _sv case list supplied, use it; otherwise, create _sv case list from _sequenced case list before uploading.

may I please have your final review on the changes. Thanks!

khzhu · 2019-12-04T21:27:18Z

hi @jjgao @n1zea144 @sheridancbio @zhx828
I added one more sql statement to the migration script that would backfill genetic profiles for the existing fusion data. The fusion Gene table on the summary view will be displayed or hidden depending on if fusion was profiled or not. Details can be found in this PR
Would you please take a look and let me know your thoughts on this? thanks

…and fix fusion gene count

…s are stored in the mutation_event table and a new filter for fusion data and a new case list for sv data

jjgao · 2019-12-21T17:24:43Z

@khzhu @alisman @zhx828 what's left for this?

khzhu · 2019-12-21T20:06:47Z

Hi @jjgao , all work has been done and I am waiting for @n1zea144 or @sheridancbio to complete their reviews. Thanks!

inodb · 2019-12-23T17:46:13Z

I created a new PR rebased to release-3.2.0: #6945. Please do further review there

khzhu requested review from jjgao, zhx828 and sheridancbio October 23, 2019 17:21

jjgao removed their request for review October 23, 2019 17:27

zhx828 reviewed Oct 23, 2019

View reviewed changes

jjgao temporarily deployed to cbioportal-add-fusion-s-a7utpz October 23, 2019 18:51 Inactive

khzhu requested a review from alisman October 24, 2019 14:50

khzhu mentioned this pull request Oct 24, 2019

hide fusion gene table in the summary view if no fusion data present cBioPortal/cbioportal-frontend#2812

Closed

6 tasks

jjgao reviewed Oct 31, 2019

View reviewed changes

khzhu force-pushed the add-fusion-sample-count-6709 branch from 0f16cb3 to bcff7cb Compare November 7, 2019 22:45

jjgao temporarily deployed to cbioportal-add-fusion-s-a7utpz November 7, 2019 22:46 Inactive

khzhu force-pushed the add-fusion-sample-count-6709 branch from bcff7cb to 0ccf56a Compare November 8, 2019 18:21

jjgao temporarily deployed to cbioportal-add-fusion-s-a7utpz November 8, 2019 18:21 Inactive

khzhu force-pushed the add-fusion-sample-count-6709 branch from 044f1c1 to beb2d74 Compare December 4, 2019 18:17

jjgao temporarily deployed to cbioportal-add-fusion-s-a7utpz December 4, 2019 18:17 Inactive

khzhu force-pushed the add-fusion-sample-count-6709 branch from beb2d74 to 0c6b702 Compare December 5, 2019 02:10

jjgao temporarily deployed to cbioportal-add-fusion-s-a7utpz December 5, 2019 02:10 Inactive

khzhu force-pushed the add-fusion-sample-count-6709 branch from 0c6b702 to cdd9169 Compare December 10, 2019 01:55

jjgao temporarily deployed to cbioportal-add-fusion-s-a7utpz December 10, 2019 01:55 Inactive

zhx828 added the do not merge label Dec 17, 2019

zhx828 requested review from n1zea144, ao508 and sheridancbio December 17, 2019 21:44

alisman approved these changes Dec 17, 2019

View reviewed changes

khzhu added 5 commits December 17, 2019 17:55

hide fusion gene table in the summary view if no fusion data present …

96dbcf0

…and fix fusion gene count

check if fusion present only when fusion hashmap is not empty

ee493f2

resolving code review comments from Hongxin and Angelica

7354c20

resolving code review comments from Rob

ce76cc3

fixing end_to_end_tests_localdb IndexError

a1565e4

khzhu force-pushed the add-fusion-sample-count-6709 branch from cdd9169 to 289a918 Compare December 18, 2019 02:07

jjgao temporarily deployed to cbioportal-add-fusion-s-a7utpz December 18, 2019 02:07 Inactive

added migration sql statement to backfill genetic profiles for fusion…

22d6a70

…s are stored in the mutation_event table and a new filter for fusion data and a new case list for sv data

khzhu force-pushed the add-fusion-sample-count-6709 branch from 289a918 to 22d6a70 Compare December 18, 2019 02:16

jjgao temporarily deployed to cbioportal-add-fusion-s-a7utpz December 18, 2019 02:17 Inactive

fix whitespace samplelist

b019607

jjgao temporarily deployed to cbioportal-add-fusion-s-a7utpz December 23, 2019 16:17 Inactive

inodb mentioned this pull request Dec 23, 2019

Fixes fusion gene table (visibility + counts) #6945

Merged

inodb closed this Dec 23, 2019

jagnathan deleted the add-fusion-sample-count-6709 branch June 2, 2021 16:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hide fusion gene table in the summary view if no fusion data present … #6731

hide fusion gene table in the summary view if no fusion data present … #6731

khzhu commented Oct 23, 2019 •

edited by SRodenburg

zhx828 left a comment

khzhu commented Oct 23, 2019

khzhu commented Oct 24, 2019

khzhu commented Oct 24, 2019

alisman commented Oct 24, 2019

zhx828 commented Oct 24, 2019

kalletlak commented Oct 28, 2019

khzhu commented Oct 29, 2019

zhx828 commented Oct 29, 2019

khzhu commented Oct 29, 2019

sheridancbio commented Oct 29, 2019

khzhu commented Oct 29, 2019

n1zea144 commented Oct 30, 2019

jjgao commented Oct 31, 2019

jjgao Oct 31, 2019

khzhu Nov 6, 2019

n1zea144 Nov 7, 2019 •

edited

khzhu Nov 7, 2019

khzhu commented Nov 7, 2019

jjgao commented Nov 7, 2019

khzhu commented Nov 7, 2019

khzhu commented Nov 8, 2019

khzhu commented Dec 4, 2019

jjgao commented Dec 21, 2019

khzhu commented Dec 21, 2019

inodb commented Dec 23, 2019

hide fusion gene table in the summary view if no fusion data present … #6731

hide fusion gene table in the summary view if no fusion data present … #6731

Conversation

khzhu commented Oct 23, 2019 • edited by SRodenburg

Checks

Any screenshots or GIFs?

Notify reviewers

zhx828 left a comment

Choose a reason for hiding this comment

khzhu commented Oct 23, 2019

khzhu commented Oct 24, 2019

khzhu commented Oct 24, 2019

alisman commented Oct 24, 2019

zhx828 commented Oct 24, 2019

kalletlak commented Oct 28, 2019

khzhu commented Oct 29, 2019

zhx828 commented Oct 29, 2019

khzhu commented Oct 29, 2019

sheridancbio commented Oct 29, 2019

khzhu commented Oct 29, 2019

n1zea144 commented Oct 30, 2019

jjgao commented Oct 31, 2019

jjgao Oct 31, 2019

Choose a reason for hiding this comment

khzhu Nov 6, 2019

Choose a reason for hiding this comment

n1zea144 Nov 7, 2019 • edited

Choose a reason for hiding this comment

khzhu Nov 7, 2019

Choose a reason for hiding this comment

khzhu commented Nov 7, 2019

jjgao commented Nov 7, 2019

khzhu commented Nov 7, 2019

khzhu commented Nov 8, 2019

khzhu commented Dec 4, 2019

jjgao commented Dec 21, 2019

khzhu commented Dec 21, 2019

inodb commented Dec 23, 2019

khzhu commented Oct 23, 2019 •

edited by SRodenburg

n1zea144 Nov 7, 2019 •

edited