Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hide fusion gene table in the summary view if no fusion data present … #6731

Closed
wants to merge 7 commits into from

Conversation

khzhu
Copy link
Contributor

@khzhu khzhu commented Oct 23, 2019

hide the fusion gene table in the summary view if no fusion data present and fix the number of samples for fusion.

Fix #6709

Describe changes proposed in this pull request:

  • add a new fusion case list
  • update migration.sql script to back fill fusion case lists for studies with fusion data.
  • add a new flag fusionPresent to sample model
  • add new properties numberOfFusionProfiledSamples/numberOfFusionUnprofiledSamples in MolecularProfileSampleCount model
  • update sample-counts web service endpoint to include the number of Profiled/Un-profiled fusion Samples.

Checks

  • Runs on heroku
  • Has tests or has a separate issue that describes the types of test that should be created. If no test is included it should explicitly be mentioned in the PR why there is no test.
  • The commit log is comprehensible. It follows 7 rules of great commit messages. For most PRs a single commit should suffice, in some cases multiple topical commits can be useful. During review it is ok to see tiny commits (e.g. Fix reviewer comments), but right before the code gets merged to master or rc branch, any such commits should be squashed since they are useless to the other developers. Definitely avoid merge commits, use rebase instead.
  • Is this PR adding logic based on one or more clinical attributes? If yes, please make sure validation for this attribute is also present in the data validation / data loading layers (in backend repo) and documented in File-Formats Clinical data section!

Any screenshots or GIFs?

If this is a new visual feature please add a before/after screenshot or gif
here with e.g. GifGrabber.

Notify reviewers

Read our Pull request merging
policy
. It can help to figure out who worked on the
file before you. Please use git blame <filename> to determine that
and notify them either through slack or by assigning them as a reviewer on the PR

@jjgao jjgao removed their request for review October 23, 2019 17:27
Copy link
Member

@zhx828 zhx828 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where the logic to determine the study has fusion sample list?

@khzhu
Copy link
Contributor Author

khzhu commented Oct 23, 2019

@zhx828 , in SampleServiceImpl processSample method.

@jjgao jjgao temporarily deployed to cbioportal-add-fusion-s-a7utpz October 23, 2019 18:51 Inactive
@khzhu khzhu requested a review from alisman October 24, 2019 14:50
@khzhu
Copy link
Contributor Author

khzhu commented Oct 24, 2019

@alisman , here is the backend PR.

@khzhu
Copy link
Contributor Author

khzhu commented Oct 24, 2019

Hi @sheridancbio , would you be able to review this PR. It will fix the problem reported by Tali. Thank you!

@alisman
Copy link
Contributor

alisman commented Oct 24, 2019

@kalletlak do frontned/backend need to be deployed together? I.E. will either break if they deployed separately?

@zhx828
Copy link
Member

zhx828 commented Oct 24, 2019

@khzhu let me rephrase, where the logic(script) to populate the sample list if the study has the fusion data in the database?

@kalletlak
Copy link
Member

@kalletlak do frontned/backend need to be deployed together? I.E. will either break if they deployed separately?

@alisman Yes

@khzhu
Copy link
Contributor Author

khzhu commented Oct 29, 2019

Hi @zhx828 , I did not change any existing logic to populate the sample list other than added a new case list cases_fusion to list all samples that have fusion data.
The following line I added to the SampleServiceImpl class that will mark any samples as fusion present if the sample is listed in the cases_fusions list.

sample.setFusionPresent(fusionSampleIdsMap.get(sample.getCancerStudyIdentifier()).contains(sample.getStableId()));

Hope it answers your question.

@zhx828
Copy link
Member

zhx828 commented Oct 29, 2019

@khzhu if this is the case, @sheridancbio could you point Kelsey to the pipeline code where the logic determines whether the fusion list should be populated? @khzhu could you help to make a patch to that code too?

@khzhu
Copy link
Contributor Author

khzhu commented Oct 29, 2019

@zhx828 , will do.

@sheridancbio
Copy link
Contributor

@zhx828 @khzhu I'm reading up on this now. From a first read-through it looks like you introduce a new standard sample list to hold all samples which underwent fusion profiling. Then the presence or absence of list elements determines whether this study has any fusion data to display or not.

Two very quick thoughts:

  • I think there are 4 current standard case lists : _all, _sequenced, _cna, _cnaseq ... The meaning of the case lists seem a little mixed already. "_sequenced" perhaps is all samples which have any genetic profiling performed at all. _cna perhaps means samples for which a detected cna alteration event is present. Adding _fusion would be "those samples which were characterized on a platform which can detect fusion events?" I'm just trying to understand the meaning of our standard case list files to see if this fits in with the other case lists.
  • If there is now going to be a file in the import directory such as staging_dir/case_lists/cases_fusion.txt I wonder if it will be a required file or not. If it is missing and we still have fusion events in data_SV.txt or data_mutations_extended.txt I wonder what we should do. Perhaps the validator script needs to be updated for cases like this. When deploying this code we will need to make adjustments to the msk importer code base to generate this _fusion case list ... and I suppose we will need to infer which samples were analyzed for fusions based on whether the mutation sequencing and analysis was a protocol which can detect/report fusions or not (as well as other factors). It may require some approximations.

Just to be clear ... it is possible in your minds that there can be samples on the _fusion case list for which no fusions were detected. (in other words .. profiled but nothing found) ? cc: @n1zea144

@khzhu
Copy link
Contributor Author

khzhu commented Oct 29, 2019

hi @sheridancbio , thank for your inputs.
to answer your questions:

Adding _fusion would be "those samples which were characterized on a platform which can detect fusion events?"

yes, that is right.

If there is now going to be a file in the import directory such as staging_dir/case_lists/cases_fusion.txt I wonder if it will be a required file or not.

it will not be a required file.

If it is missing and we still have fusion events in data_SV.txt or data_mutations_extended.txt I wonder what we should do.

we will not add cases_fusion.txt file for both. cases_fusion.txt will be eventually deprecated once we started to use structural variants.

@n1zea144
Copy link
Contributor

@khzhu @sheridancbio

My two cents -
I would follow the same pattern that is used to display the mutation x cna plot, mutated genes & cna genes table, etc. (which I imagine is being done in this PR). Curious, if it is done by the presence or absence of a case list, how is the rendering of these charts determined on virtual cohort study summary pages?

@jjgao
Copy link
Member

jjgao commented Oct 31, 2019

one minor comment: should we use cases_fusion or cases_sv? (assume we will use the same case list after switch to the sv implementation)

Comment on lines 725 to 739
REPLACE INTO sample_list(STABLE_ID, CATEGORY, CANCER_STUDY_ID, NAME, DESCRIPTION)
SELECT DISTINCT CONCAT(CANCER_STUDY_IDENTIFIER, '_fusion'), 'all_cases_with_mutation_data', `cancer_study`.CANCER_STUDY_ID, 'Samples with fusion data','Samples with fusion data'
FROM `mutation_event`
JOIN `mutation` ON `mutation`.MUTATION_EVENT_ID = `mutation_event`.MUTATION_EVENT_ID
JOIN `genetic_profile` ON `genetic_profile`.GENETIC_PROFILE_ID = `mutation`.GENETIC_PROFILE_ID
JOIN `cancer_study` ON `cancer_study`.CANCER_STUDY_ID = `genetic_profile`.CANCER_STUDY_ID
WHERE MUTATION_TYPE = 'fusion';
REPLACE INTO sample_list_list(list_id, sample_id)
SELECT DISTINCT list_id, SAMPLE_ID
FROM `mutation_event`
JOIN `mutation` ON `mutation`.MUTATION_EVENT_ID = `mutation_event`.MUTATION_EVENT_ID
JOIN `genetic_profile` ON `genetic_profile`.GENETIC_PROFILE_ID = `mutation`.GENETIC_PROFILE_ID
JOIN `cancer_study` ON `cancer_study`.CANCER_STUDY_ID = `genetic_profile`.CANCER_STUDY_ID
JOIN `sample_list` ON `sample_list`.CANCER_STUDY_ID = `cancer_study`.CANCER_STUDY_ID AND `sample_list`.STABLE_ID LIKE '%fusion%'
WHERE MUTATION_TYPE = 'fusion';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of this code is that a default _fusion sample list will be created which is the list of samples that have fusion data. There is a caveat: some profiled samples may be fusion negative. For example, many MSK-IMPACT samples do not have fusions but they were all profiled.

This might be fine as the default one for migration, but we should generate proper _fusion case list for our studies.

Alternatively, I am wondering if we should just copy _sequenced to _fusion when there are fusion data?

@n1zea144 @sheridancbio thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@n1zea144 @SRodenburg , any inputs on this? thanks!

Copy link
Contributor

@n1zea144 n1zea144 Nov 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jjgao Rob and I were talking about this too. Using MSK-IMPACT/MSK-ARCHER as an example, I'm its not entirely accurate to assume that if a sample was sequenced for mutation calling, that it was also analyzed for fusions (although it is true that with IMPACT, there are some probes looking for canonical fusions). However, it seems excessive to have a separate list for cases that were checked for SVs, and then another case list for samples in which SVs were found, so carrying over the sequenced sample list to fusions may be a good compromise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, @n1zea144 !
but how do we know if there are fusions or not at the importing stage to decide if we should carry the sequenced sample list over to fusions?

@khzhu
Copy link
Contributor Author

khzhu commented Nov 7, 2019

Alternatively, I am wondering if we should just copy _sequenced to _fusion when there are fusion data?

I tried JJ's alternative approach and tested on msk_impact_2017 dataset. And, it worked well.
should we go with this plan, @jjgao @n1zea144 @sheridancbio?

  1. for existing datasets that have fusions, we backfill fusion sample case list using samples recorded in the mutation_event table with mutation_type = 'fusion'
  2. for new dataset that has fusions, we will just copy _sequenced to _fusion before uploading.

@jjgao
Copy link
Member

jjgao commented Nov 7, 2019

@khzhu I am not sure if I understand correctly, but I think we should:

  1. for existing datasets that have fusions, we backfill fusion sample case list with the _sequenced case list.
  2. for a new dataset that has fusions, if there is _fusions case list, use that; otherwise, copy _sequenced to _fusion before uploading.

This will ensure that our current database will work after migration, and our current data files will work. Also once we put _fusion case lists, they will also work.

BTW, should we call it _sv case list?

@khzhu khzhu force-pushed the add-fusion-sample-count-6709 branch from 0f16cb3 to bcff7cb Compare November 7, 2019 22:45
@jjgao jjgao temporarily deployed to cbioportal-add-fusion-s-a7utpz November 7, 2019 22:46 Inactive
@khzhu
Copy link
Contributor Author

khzhu commented Nov 7, 2019

for existing datasets that have fusions, we backfill fusion sample case list with the _sequenced case list.

will update migration sql statement

for a new dataset that has fusions, if there is _fusions case list, use that; otherwise, copy _sequenced to _fusion before uploading.

sounds good.

BTW, should we call it _sv case list?

go with _sv

@khzhu khzhu force-pushed the add-fusion-sample-count-6709 branch from bcff7cb to 0ccf56a Compare November 8, 2019 18:21
@jjgao jjgao temporarily deployed to cbioportal-add-fusion-s-a7utpz November 8, 2019 18:21 Inactive
@khzhu
Copy link
Contributor Author

khzhu commented Nov 8, 2019

Hi @jjgao @n1zea144 @sheridancbio ,
Recap changes made based on inputs from you all.

  • create a new _sv case list and add a new all_cases_with_structural_variant_data case list category
  • for existing datasets that have fusions, backfill fusion sample case list from _sequenced case list.
  • for new dataset with fusions, if _sv case list supplied, use it; otherwise, create _sv case list from _sequenced case list before uploading.

may I please have your final review on the changes. Thanks!

@khzhu khzhu force-pushed the add-fusion-sample-count-6709 branch from 044f1c1 to beb2d74 Compare December 4, 2019 18:17
@jjgao jjgao temporarily deployed to cbioportal-add-fusion-s-a7utpz December 4, 2019 18:17 Inactive
@khzhu
Copy link
Contributor Author

khzhu commented Dec 4, 2019

hi @jjgao @n1zea144 @sheridancbio @zhx828
I added one more sql statement to the migration script that would backfill genetic profiles for the existing fusion data. The fusion Gene table on the summary view will be displayed or hidden depending on if fusion was profiled or not. Details can be found in this PR
Would you please take a look and let me know your thoughts on this? thanks

@khzhu khzhu force-pushed the add-fusion-sample-count-6709 branch from beb2d74 to 0c6b702 Compare December 5, 2019 02:10
@jjgao jjgao temporarily deployed to cbioportal-add-fusion-s-a7utpz December 5, 2019 02:10 Inactive
@khzhu khzhu force-pushed the add-fusion-sample-count-6709 branch from 0c6b702 to cdd9169 Compare December 10, 2019 01:55
@jjgao jjgao temporarily deployed to cbioportal-add-fusion-s-a7utpz December 10, 2019 01:55 Inactive
@khzhu khzhu force-pushed the add-fusion-sample-count-6709 branch from cdd9169 to 289a918 Compare December 18, 2019 02:07
@jjgao jjgao temporarily deployed to cbioportal-add-fusion-s-a7utpz December 18, 2019 02:07 Inactive
…s are stored in the mutation_event table and a new filter for fusion data and a new case list for sv data
@khzhu khzhu force-pushed the add-fusion-sample-count-6709 branch from 289a918 to 22d6a70 Compare December 18, 2019 02:16
@jjgao jjgao temporarily deployed to cbioportal-add-fusion-s-a7utpz December 18, 2019 02:17 Inactive
@jjgao
Copy link
Member

jjgao commented Dec 21, 2019

@khzhu @alisman @zhx828 what's left for this?

@khzhu
Copy link
Contributor Author

khzhu commented Dec 21, 2019

Hi @jjgao , all work has been done and I am waiting for @n1zea144 or @sheridancbio to complete their reviews. Thanks!

@jjgao jjgao temporarily deployed to cbioportal-add-fusion-s-a7utpz December 23, 2019 16:17 Inactive
@inodb
Copy link
Member

inodb commented Dec 23, 2019

I created a new PR rebased to release-3.2.0: #6945. Please do further review there

@inodb inodb closed this Dec 23, 2019
@jagnathan jagnathan deleted the add-fusion-sample-count-6709 branch June 2, 2021 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fusion tables
9 participants