-
Notifications
You must be signed in to change notification settings - Fork 67
V23 manuscript run #1672
V23 manuscript run #1672
Conversation
…the array list changes seen here are definitely correct
@@ -468,6 +468,9 @@ BS_49BQS7Z6 MKL1--ACO2 0.853867400303679 -0.761701539955673 NA NA no change no c | |||
BS_49BQS7Z6 MN1--PATZ1 2.98989536306761 2.87742570123607 NA NA differentially expressed differentially expressed NA NA 22:27761050 22:31344642 other 1 9 medium MN1 NA PATZ1 NA CosmicCensus, Oncogene CosmicCensus, Oncogene, TranscriptionFactor NA NA NA ARRIBA 1 NA NA FALSE PT_HNZNZ635 | |||
BS_49BQS7Z6 MYH9--MLC1 1.60675751637715 1.69724397726497 NA NA no change no change NA NA 22:36335506 22:50083173 other 5 0 medium MYH9 NA MLC1 NA TumorSuppressorGene, CosmicCensus, Oncogene NA NA NA NA ARRIBA 1 NA NA FALSE PT_HNZNZ635 | |||
BS_49BQS7Z6 NPAS2--NPAS2 -0.402203274374439 -0.402203274374439 NA NA no change no change NA NA 2:100965766 2:100948235 frameshift 2 1 low NPAS2 NA NPAS2 NA TumorSuppressorGene, TranscriptionFactor TumorSuppressorGene, TranscriptionFactor NA NA NA ARRIBA 1 NA NA TRUE PT_HNZNZ635 | |||
BS_49BQS7Z6 RN7SL2--IGF2 NA 13.250457852608 NA NA NA differentially expressed NA NA 14:49862573 11:2139374 other 3 40 NA RN7SL2 NA IGF2 NA NA Oncogene NA NA NA STARFUSION 1 NA NA FALSE PT_HNZNZ635 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's my best guess as to what's going on with RN7SL2--IGF2
:
- In the base histologies file (what is used to generate this fusion file prior to release), the following broad histologies are represented when you review the 5 BS IDs with a
RN7SL2--IGF2
fusion:Benign tumor
,Ependymal tumor
,Diffuse astrocytic and oligodendroglial tumor
,Neuronal and mixed neuronal-glial tumor
,Low-grade astrocytic tumor
- In the
pbta-histologies.tsv
file in the release, the broad histologies are:Ependymal tumor
,Diffuse astrocytic and oligodendroglial tumor
,Neuronal and mixed neuronal-glial tumor
,Low-grade astrocytic tumor
(one biospecimen [BS_DGTPWJ59
] went fromBenign tumor
toNeuronal and mixed neuronal-glial tumor
) - In
analyses/fusion_filtering/05-QC_putative_onco_fusion_distribution.Rmd
, fusions are filtered out if they are in more than 4 histologies:OpenPBTA-analysis/analyses/fusion_filtering/05-QC_putative_onco_fusion_distribution.Rmd
Lines 184 to 201 in 61dc81b
### get fusions that are found in more than 4 histologies ```{r get fusions that are found in more than 4 histologies } # count number of fusions in putative oncogene annotated fused gene are in more than N (countHistology) histologies FusionInNhist<-fusion_calls %>% dplyr::select(FusionName,broad_histology) %>% unique() %>% group_by(FusionName) %>% tally(name="count") FusionInNhist<-FusionInNhist[FusionInNhist$count>countHistology,] FusionInNhist # plot broad_histologies that have Fusions that are potential false positives and found in multiple histologies multiHistFusion<-fusion_calls %>% dplyr::filter(FusionName %in% FusionInNhist$FusionName) %>% # we want to remove annots column since there is difference in arriba annotation (uniquely has duplication/translocation/deletion values) and StarFusion annotation which will be counted twice since they are not unique dplyr::select(-annots) %>% unique() ggplot(multiHistFusion,aes(x=multiHistFusion$FusionName,fill=multiHistFusion$broad_histology))+geom_bar()+theme(axis.text.x = element_text(angle=90))+coord_flip()+ylab("count")+xlab("FusionName(Total)") ```
In summary: This fusion was found in more than 4 histologies in pbta-histologies-base.tsv
and exactly 4 histologies in pbta-histologies.tsv
due to the change in broad_histology
value for one biospecimen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting for my own reference that this is the line that filters out >4 histologies:
OpenPBTA-analysis/analyses/fusion_filtering/05-QC_putative_onco_fusion_distribution.Rmd
Line 208 in 61dc81b
putative_driver_fusions<-fusion_calls %>% dplyr::filter(!FusionName %in% FusionInNhist$FusionName) %>% |
I went through the spreadsheet and took a look at the highlighted rows. I believe the changes in All that to say – I don't believe anything here is concerning at this point. |
Opening this up for "formal review" since things look mostly accounted for. Spelunk away! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through this fairly carefully yesterday (with particular emphasis on the highlighted modules), so I am going to go ahead and approve.
There are no code changes here that are part of CI, so this can be merged before checks. |
I am opening this PR as a Draft as it is, well, a lot...and perhaps it needs to be split up.
This PR represents a local Big Run -
RUN_LOCAL=1 bash scripts/run-manuscript-analyses.sh
, and there are rather a lot of diffs. I have tried to catalog how I handled everything in this spreadsheet: https://docs.google.com/spreadsheets/d/1xRGlPYzfMcGnLBIMGTJuHJ5LyCyOSuIOKcp_GyGKMnU/edit?usp=sharingWithin this spreadsheet, I indicate for each file that has a diff whether I indeed committed it or just checked it out (so it won't appear in this PR as a diff!) because it was already reasonably up-to-date. I made the decision to generally commit binary files even if there were no apparent diffs due to file metadata which will likely have changed to better reflect that the file was generated with V23.
I've highlighted a couple rows in the spreadsheet that I think are worth definitely checking.
There is also one easy thing to review here! I updated to
V23
infigures/generate-figures.sh
. The rest is less easy but I hope the spreadsheet will help, and/or a new strategy for merging all this in.From this endeavor, I identified that at least these figure panels will need to be updated for resubmission, and associated figures recompiled -