Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

removed local rearrangement filter for onco #567

Merged
merged 6 commits into from
Mar 6, 2020

Conversation

kgaonkar6
Copy link
Collaborator

@kgaonkar6 kgaonkar6 commented Feb 28, 2020

Purpose/implementation Section

Fusion filtering steps that affect the putative_oncogene fusions need to be updated through this PR because of the following issues:

  1. Putative oncogenic fusions shouldn't be filtered by "Local rearrangements|local inversion filter , it removes some know fusions eg. CAPZA2--MET
  2. Also updated the clinical file to change Embryonal Tumor to Embryonal tumor

What scientific question is your analysis addressing?

Filtering fusions for putative oncogenic fusion list

What was your approach?

I moved this dplyr::filter(!grepl("LOCAL_REARRANGEMENT|LOCAL_INVERSION",annots)) filter to only be applied after gathering all putative oncogene fusions from annotations https://github.com/kgaonkar6/OpenPBTA-analysis/blob/ad1239e1104410115e2b6d0d87982a44c3df89e0/analyses/fusion_filtering/04-project-specific-filtering.Rmd#L145

What GitHub issue does your pull request address?

#543

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Please review the filtering and implementation @jaclyn-taroni @jharenza

Putative Driver:
Filtering for general cancer specific genes ( after QC+expression_filteirng )
Fusions with genes in either onco from 02 script in columns Gene1A_anno,Gene1B_anno,Gene2A_anno,Gene2B_anno

Scavenge back filtered fusions to add to putative oncogenic fusions ( after QC+expression_filteirng removing LOCAL_REARRANGEMENT|LOCAL_INVERSION as potential read-throughs) :
In-frame/frameshift fusions is called in atleast 2 samples per histology OR
In-frame/frameshift fusions is called in atleast 2 callers
AND
Remove filtered-fusions found in more than 1 histology OR
Remove filtered-fusion with genes that have multi-fused gene (more than 5 times in sample)

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes

Results

What types of results are included (e.g., table, figure)?

results/
FilteredFusion.tsv
pbta-fusion-putative-oncogenic.tsv
pbta-fusion-recurrent-fusion-byhistology.tsv
pbta-fusion-recurrent-fusion-bysample.tsv
pbta-fusion-recurrently-fused-genes-byhistology.tsv
pbta-fusion-recurrently-fused-genes-bysample.tsv

What is your summary of the results?

4606 fusions are now in pbta-fusion-putative-oncogenic.tsv
SCFD2 has been added to the recurrently fused genes because of the edited filtering.
This is the file uploaded as part as v15

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

@jaclyn-taroni
Copy link
Member

Hi @kgaonkar6, wanted to give you a heads up that I am out of the office until later this week. There will be a bit of a delay before I am able to review.

@kgaonkar6
Copy link
Collaborator Author

ok thanks for letting me know!

@jaclyn-taroni
Copy link
Member

@kgaonkar6 what changes in this pull request speak to

Also updated the clinical file to change Embryonal Tumor to Embryonal tumor

Would we expect it to show up in analyses/fusion_filtering/results/pbta-fusion-recurrently-fused-genes-byhistology.tsv but there were no relevant fusions?

Copy link
Member

@jaclyn-taroni jaclyn-taroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Implementation matches what is described. I had one minor comment, but no need for me to re-review before merging.

A more general comment I have is that we have made a number of incremental changes to the fusion filtering to recover known fusions and I don't think we have a good picture, at least in what is in this repository, of what these changes mean for potentially letting in false positives. A comprehensive analysis of false positive rate seems difficult given the data at hand. However, it might be useful to reflect on the number of fusions included in the putative oncogenic list and how that changes given every decision.

putative_driver_annotated_fusions <- fusion_calls %>%
dplyr::select(-Caller,-annots) %>%
unique() %>%
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this step? unique() gets called at the end of the chain.

@kgaonkar6
Copy link
Collaborator Author

@kgaonkar6 what changes in this pull request speak to

Also updated the clinical file to change Embryonal Tumor to Embryonal tumor

Would we expect it to show up in analyses/fusion_filtering/results/pbta-fusion-recurrently-fused-genes-byhistology.tsv but there were no relevant fusions?

I actually only updated the pbta-histology-file ( edited Embryonal Tumor to Embryonal tumor) in my data file and ran the fusion_filtering. I'll double check but yes we didn't find any relevant fusions/fused genes after the edits to broad_histology.

@kgaonkar6
Copy link
Collaborator Author

kgaonkar6 commented Mar 5, 2020

👍 Implementation matches what is described. I had one minor comment, but no need for me to re-review before merging.

A more general comment I have is that we have made a number of incremental changes to the fusion filtering to recover known fusions and I don't think we have a good picture, at least in what is in this repository, of what these changes mean for potentially letting in false positives. A comprehensive analysis of false positive rate seems difficult given the data at hand. However, it might be useful to reflect on the number of fusions included in the putative oncogenic list and how that changes given every decision.

Thats a very good point! Around ~4000 fusions have either genes annotated as oncogene which now include the local rearrangements as well. Also of these ~4000 around ~1000 fusions are self fused GeneA==GeneB found in multiple histologies and we don't know if these are false calls or true calls. Mainly I think the issue is that other than the known fusions it's hard to call any fusion false call if it has either gene annotated as oncogene. I'm trying to make some distribution plots to review potential false calls, I'll post them here as I have them.

@kgaonkar6
Copy link
Collaborator Author

kgaonkar6 commented Mar 6, 2020

@jaclyn-taroni @jharenza I've created some plots to identify fusions that are found in multiple histologies https://github.com/d3b-center/D3b-codes/blob/fusion_v15_QC/OpenPBTA_v15_release_QC/QC_putative_onco_fusion_dustribution.pdf
I have plotted

  1. the "other" fusion found in more than 4 histologies (we had removed this filtering to include the IGH-MYC known fusion)
  2. the "local rearrangement" fusions found in more than 4 histologies (we had removed this filtering to include the CPAZA2-MET known fusion)
  3. overall oncogene annotated gene fusions found in more than 4 histologies to get an overall view of what these fusions that are potential false calls. 1184 fusions are of this category.
    Rplot07

@jaclyn-taroni
Copy link
Member

@kgaonkar6 I am going to merge this, but we can keep talking about the analysis you posted earlier today!

@jaclyn-taroni jaclyn-taroni merged commit cdcd377 into AlexsLemonade:master Mar 6, 2020
@kgaonkar6
Copy link
Collaborator Author

thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants