-
Notifications
You must be signed in to change notification settings - Fork 67
removed local rearrangement filter for onco #567
removed local rearrangement filter for onco #567
Conversation
Hi @kgaonkar6, wanted to give you a heads up that I am out of the office until later this week. There will be a bit of a delay before I am able to review. |
ok thanks for letting me know! |
@kgaonkar6 what changes in this pull request speak to
Would we expect it to show up in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Implementation matches what is described. I had one minor comment, but no need for me to re-review before merging.
A more general comment I have is that we have made a number of incremental changes to the fusion filtering to recover known fusions and I don't think we have a good picture, at least in what is in this repository, of what these changes mean for potentially letting in false positives. A comprehensive analysis of false positive rate seems difficult given the data at hand. However, it might be useful to reflect on the number of fusions included in the putative oncogenic list and how that changes given every decision.
putative_driver_annotated_fusions <- fusion_calls %>% | ||
dplyr::select(-Caller,-annots) %>% | ||
unique() %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need this step? unique()
gets called at the end of the chain.
I actually only updated the pbta-histology-file ( edited Embryonal Tumor to Embryonal tumor) in my data file and ran the fusion_filtering. I'll double check but yes we didn't find any relevant fusions/fused genes after the edits to broad_histology. |
Thats a very good point! Around ~4000 fusions have either genes annotated as oncogene which now include the local rearrangements as well. Also of these ~4000 around ~1000 fusions are self fused GeneA==GeneB found in multiple histologies and we don't know if these are false calls or true calls. Mainly I think the issue is that other than the known fusions it's hard to call any fusion false call if it has either gene annotated as oncogene. I'm trying to make some distribution plots to review potential false calls, I'll post them here as I have them. |
@jaclyn-taroni @jharenza I've created some plots to identify fusions that are found in multiple histologies https://github.com/d3b-center/D3b-codes/blob/fusion_v15_QC/OpenPBTA_v15_release_QC/QC_putative_onco_fusion_dustribution.pdf
|
@kgaonkar6 I am going to merge this, but we can keep talking about the analysis you posted earlier today! |
thanks! |
Purpose/implementation Section
Fusion filtering steps that affect the putative_oncogene fusions need to be updated through this PR because of the following issues:
What scientific question is your analysis addressing?
Filtering fusions for putative oncogenic fusion list
What was your approach?
I moved this dplyr::filter(!grepl("LOCAL_REARRANGEMENT|LOCAL_INVERSION",annots)) filter to only be applied after gathering all putative oncogene fusions from annotations https://github.com/kgaonkar6/OpenPBTA-analysis/blob/ad1239e1104410115e2b6d0d87982a44c3df89e0/analyses/fusion_filtering/04-project-specific-filtering.Rmd#L145
What GitHub issue does your pull request address?
#543
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Which areas should receive a particularly close look?
Is there anything that you want to discuss further?
Please review the filtering and implementation @jaclyn-taroni @jharenza
Putative Driver:
Filtering for general cancer specific genes ( after QC+expression_filteirng )
Fusions with genes in either onco from 02 script in columns Gene1A_anno,Gene1B_anno,Gene2A_anno,Gene2B_anno
Scavenge back filtered fusions to add to putative oncogenic fusions ( after QC+expression_filteirng removing LOCAL_REARRANGEMENT|LOCAL_INVERSION as potential read-throughs) :
In-frame/frameshift fusions is called in atleast 2 samples per histology OR
In-frame/frameshift fusions is called in atleast 2 callers
AND
Remove filtered-fusions found in more than 1 histology OR
Remove filtered-fusion with genes that have multi-fused gene (more than 5 times in sample)
Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?
Yes
Results
What types of results are included (e.g., table, figure)?
results/
FilteredFusion.tsv
pbta-fusion-putative-oncogenic.tsv
pbta-fusion-recurrent-fusion-byhistology.tsv
pbta-fusion-recurrent-fusion-bysample.tsv
pbta-fusion-recurrently-fused-genes-byhistology.tsv
pbta-fusion-recurrently-fused-genes-bysample.tsv
What is your summary of the results?
4606 fusions are now in pbta-fusion-putative-oncogenic.tsv
SCFD2 has been added to the recurrently fused genes because of the edited filtering.
This is the file uploaded as part as v15
Reproducibility Checklist
Documentation Checklist
README
and it is up to date.analyses/README.md
and the entry is up to date.