New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add a new all_cases_with_sv_data to the validation process #6860
add a new all_cases_with_sv_data to the validation process #6860
Conversation
I've approved, but I wonder if this fix alone will fix #6741. Have we imported a file with both fully specified structural variants (Site1 and Site2 exon number and transcript id provided), and also non-specified structural variants (Site1 and Site2 exon number and transcript id blank or NA)? Is it properly visualized? |
do you have any test data for the second use case @sheridancbio? thanks! |
@khzhu The file in data hub for for sv in msk-2017-impact should contain both types of data. Does it not? cc: @ritikakundra |
@ritikakundra @rmadupuri Can you make sure that you are using the same caselist name in the sv caselist for msk-2017-impact as @khzhu is using here? Also, can you confirm data_SV.txt file on datahub contains our full set of SV data (both sv and fusion record types - see @sheridancbio comment above). Thanks. |
Thanks, @n1zea144! |
@@ -52,6 +52,7 @@ | |||
ALL_CASES_WITH_MUTATION_AND_CNA_DATA("all_cases_with_mutation_and_cna_data"), | |||
ALL_CASES_WITH_MUTATION_AND_CNA_AND_MRNA_DATA("all_cases_with_mutation_and_cna_and_mrna_data"), | |||
ALL_CASES_WITH_GSVA_DATA("all_cases_with_gsva_data"), | |||
ALL_CASES_WITH_SV_DATA("all_cases_with_structural_variant_data"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @khzhu, we are using all_cases_with_sv_data
as the category name instead of all_cases_with_structural_variant_data
.
@@ -4253,6 +4253,7 @@ def processCaseListDirectory(caseListDir, cancerStudyId, logger, | |||
'all_cases_with_mutation_and_cna_data', | |||
'all_cases_with_mutation_and_cna_and_mrna_data', | |||
'all_cases_with_gsva_data', | |||
'all_cases_with_structural_variant_data', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, should be all_cases_with_sv_data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, will make corrections. thanks @rmadupuri!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
thanks, @ritikakundra! I will modify the import script so that those events without exome numbers not being rejected. |
Hi Kelsey. This is just a partial answer to your question. There are three categories of SV records during import. There are 4 key fields:
There are basically 3 categories of records:
In the (now hidden) documentation here: the fully specified records are considered valid "fusion" events The frontend fails to render the studies (and maybe crashes - I cannot recall the exact details) when a genetic profile contains any partially specified records ... so that is why the filter is in place to remove these events during import. We cannot remove that filter until we write some kind of code to either handle these properly in the frontend, or to "downgrade" these records to remain as unspecified records. We have basically postponed this work / decision. The other issue is that although the frontend can display genetic profiles made up entirely of unspecified records .. or fully specified records ... it cannot render a mixture and the visual display crashes when opening the fusions tab. If the data_SV.txt file from mskimpact_2017 study contains no unspecified records then for testing you can simply blank out (or make NA) the fields for some of the partially specified records which would demote them to become unspecified records and allow them to be imported. Then importing the profile will give a mixture of unspecified records and fully specified records ... and the frontend currently cannot render such a mixture. We are hoping the frontend can me massaged to render both of these categories at once. I'll need to respond with more details later. |
thanks, @sheridancbio , for detailed information! |
86cbcee
to
cd1b6af
Compare
Hi @sheridancbio , I've made changes to use all_cases_with_sv_data instead. The PR is ready to merge. We will have a new PR for
|
I am canceling the request for review from @ritikakundra because she was part of the conversation about all_cases_with_structural_variant_data v. all_cases_with_sv_data with @rmadupuri yesterday and also agreed. |
I just noticed that I failed to squash the two changesets into one while merging. Apologies. |
Fix #6741
Describe changes proposed in this pull request:
Checks
Any screenshots or GIFs?
If this is a new visual feature please add a before/after screenshot or gif
here with e.g. GifGrabber.
Notify reviewers
Read our Pull request merging
policy. It can help to figure out who worked on the
file before you. Please use
git blame <filename>
to determine thatand notify them either through slack or by assigning them as a reviewer on the PR