Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

V23 manuscript run #1672

Merged
merged 19 commits into from
Feb 20, 2023
Merged

Conversation

sjspielman
Copy link
Member

@sjspielman sjspielman commented Feb 17, 2023

I am opening this PR as a Draft as it is, well, a lot...and perhaps it needs to be split up.

This PR represents a local Big Run - RUN_LOCAL=1 bash scripts/run-manuscript-analyses.sh, and there are rather a lot of diffs. I have tried to catalog how I handled everything in this spreadsheet: https://docs.google.com/spreadsheets/d/1xRGlPYzfMcGnLBIMGTJuHJ5LyCyOSuIOKcp_GyGKMnU/edit?usp=sharing
Within this spreadsheet, I indicate for each file that has a diff whether I indeed committed it or just checked it out (so it won't appear in this PR as a diff!) because it was already reasonably up-to-date. I made the decision to generally commit binary files even if there were no apparent diffs due to file metadata which will likely have changed to better reflect that the file was generated with V23.
I've highlighted a couple rows in the spreadsheet that I think are worth definitely checking.

There is also one easy thing to review here! I updated to V23 in figures/generate-figures.sh. The rest is less easy but I hope the spreadsheet will help, and/or a new strategy for merging all this in.

From this endeavor, I identified that at least these figure panels will need to be updated for resubmission, and associated figures recompiled -

@sjspielman sjspielman marked this pull request as draft February 17, 2023 18:36
@@ -468,6 +468,9 @@ BS_49BQS7Z6 MKL1--ACO2 0.853867400303679 -0.761701539955673 NA NA no change no c
BS_49BQS7Z6 MN1--PATZ1 2.98989536306761 2.87742570123607 NA NA differentially expressed differentially expressed NA NA 22:27761050 22:31344642 other 1 9 medium MN1 NA PATZ1 NA CosmicCensus, Oncogene CosmicCensus, Oncogene, TranscriptionFactor NA NA NA ARRIBA 1 NA NA FALSE PT_HNZNZ635
BS_49BQS7Z6 MYH9--MLC1 1.60675751637715 1.69724397726497 NA NA no change no change NA NA 22:36335506 22:50083173 other 5 0 medium MYH9 NA MLC1 NA TumorSuppressorGene, CosmicCensus, Oncogene NA NA NA NA ARRIBA 1 NA NA FALSE PT_HNZNZ635
BS_49BQS7Z6 NPAS2--NPAS2 -0.402203274374439 -0.402203274374439 NA NA no change no change NA NA 2:100965766 2:100948235 frameshift 2 1 low NPAS2 NA NPAS2 NA TumorSuppressorGene, TranscriptionFactor TumorSuppressorGene, TranscriptionFactor NA NA NA ARRIBA 1 NA NA TRUE PT_HNZNZ635
BS_49BQS7Z6 RN7SL2--IGF2 NA 13.250457852608 NA NA NA differentially expressed NA NA 14:49862573 11:2139374 other 3 40 NA RN7SL2 NA IGF2 NA NA Oncogene NA NA NA STARFUSION 1 NA NA FALSE PT_HNZNZ635
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my best guess as to what's going on with RN7SL2--IGF2:

  • In the base histologies file (what is used to generate this fusion file prior to release), the following broad histologies are represented when you review the 5 BS IDs with a RN7SL2--IGF2 fusion: Benign tumor, Ependymal tumor, Diffuse astrocytic and oligodendroglial tumor, Neuronal and mixed neuronal-glial tumor, Low-grade astrocytic tumor
  • In the pbta-histologies.tsv file in the release, the broad histologies are: Ependymal tumor, Diffuse astrocytic and oligodendroglial tumor, Neuronal and mixed neuronal-glial tumor, Low-grade astrocytic tumor (one biospecimen [BS_DGTPWJ59] went from Benign tumor to Neuronal and mixed neuronal-glial tumor)
  • In analyses/fusion_filtering/05-QC_putative_onco_fusion_distribution.Rmd, fusions are filtered out if they are in more than 4 histologies:
    ### get fusions that are found in more than 4 histologies
    ```{r get fusions that are found in more than 4 histologies }
    # count number of fusions in putative oncogene annotated fused gene are in more than N (countHistology) histologies
    FusionInNhist<-fusion_calls %>% dplyr::select(FusionName,broad_histology) %>% unique() %>% group_by(FusionName) %>% tally(name="count")
    FusionInNhist<-FusionInNhist[FusionInNhist$count>countHistology,]
    FusionInNhist
    # plot broad_histologies that have Fusions that are potential false positives and found in multiple histologies
    multiHistFusion<-fusion_calls %>% dplyr::filter(FusionName %in% FusionInNhist$FusionName) %>%
    # we want to remove annots column since there is difference in arriba annotation (uniquely has duplication/translocation/deletion values) and StarFusion annotation which will be counted twice since they are not unique
    dplyr::select(-annots) %>%
    unique()
    ggplot(multiHistFusion,aes(x=multiHistFusion$FusionName,fill=multiHistFusion$broad_histology))+geom_bar()+theme(axis.text.x = element_text(angle=90))+coord_flip()+ylab("count")+xlab("FusionName(Total)")
    ```

In summary: This fusion was found in more than 4 histologies in pbta-histologies-base.tsv and exactly 4 histologies in pbta-histologies.tsv due to the change in broad_histology value for one biospecimen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting for my own reference that this is the line that filters out >4 histologies:

putative_driver_fusions<-fusion_calls %>% dplyr::filter(!FusionName %in% FusionInNhist$FusionName) %>%

@jaclyn-taroni
Copy link
Member

jaclyn-taroni commented Feb 19, 2023

I went through the spreadsheet and took a look at the highlighted rows. I believe the changes in chromothripsis and interaction-plots are due to the changes in the independent specimen files. That is also likely true in the recurrently fused fusions files, as the stranded independent RNA-seq specimens file changed between v22 and v23. I have identified what I think is the cause of the addition of the RN7SL2--IGF2 lines in analyses/fusion_filtering/results/pbta-fusion-putative-oncogenic.tsv here: #1672 (comment)

All that to say – I don't believe anything here is concerning at this point.

@sjspielman sjspielman marked this pull request as ready for review February 20, 2023 15:21
@sjspielman
Copy link
Member Author

Opening this up for "formal review" since things look mostly accounted for. Spelunk away!

Copy link
Member

@jaclyn-taroni jaclyn-taroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through this fairly carefully yesterday (with particular emphasis on the highlighted modules), so I am going to go ahead and approve.

@sjspielman
Copy link
Member Author

There are no code changes here that are part of CI, so this can be merged before checks.

@sjspielman sjspielman merged commit 1fd1548 into AlexsLemonade:master Feb 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants