Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Planned Analysis: Co-Occurence / Mutual Exclusivity #13

Closed
PichaiRaman opened this issue Jul 12, 2019 · 16 comments
Closed

Planned Analysis: Co-Occurence / Mutual Exclusivity #13

PichaiRaman opened this issue Jul 12, 2019 · 16 comments
Assignees
Labels
cnv Related to or requires CNV data fusion Related to or requires fusion data in progress Someone is working on this issue, but feel free to propose an alternative approach! snv Related to or requires SNV data

Comments

@PichaiRaman
Copy link
Contributor

Determining genetic lesions (Mutation, CNV, Fusion) and/or pathways which co-occur or are mutually exclusive across the PBTA. This could help associate lesions with pathways or define potential synthetic lethality relationship.

@cgreene cgreene added the good first issue Good for newcomers label Jul 14, 2019
@cansavvy
Copy link
Collaborator

cansavvy commented Jul 25, 2019

Would we be looking to have something like this figure for this analysis, but with the disease type labels? Screen Shot 2019-07-25 at 3 01 05 PM This is from Rotika et al, 2019 doi: http://dx.doi.org/10.1101/566455.

@cansavvy
Copy link
Collaborator

And after #19 is done, the molecular subtypes could also be added.

@cansavvy cansavvy self-assigned this Jul 25, 2019
@cansavvy
Copy link
Collaborator

One of my initial questions is how we'd like to combine results that measure similar things.

For example, we have two fusions results. Do we want only take the common fusions from both files?

arriba.fusions.tsv
star-fusion.fusions.tsv

And if I understand correctly, these all have results that overlap:

strelka2.maf
manta-sv.maf
mutect2.maf

Note that I'm still digging into all these files and determining what's here, so after I do some initial analyses I may come back with a suggestion on how to handle this, but if someone knows this data better and has an idea of what we want to do first, then I'm of course open to those suggestions.

@jharenza
Copy link
Collaborator

One of my initial questions is how we'd like to combine results that measure similar things.

For example, we have two fusions results. Do we want only take the common fusions from both files?

arriba.fusions.tsv
star-fusion.fusions.tsv

And if I understand correctly, these all have results that overlap:

strelka2.maf
manta-sv.maf
mutect2.maf

Note that I'm still digging into all these files and determining what's here, so after I do some initial analyses I may come back with a suggestion on how to handle this, but if someone knows this data better and has an idea of what we want to do first, then I'm of course open to those suggestions.

We actually have created a high-confidence set of calls by integrating the two fusion callers - will add that progress to #10 soon! The same should be done with the SNVs and SVs (Lumpy SV data yet to come).

@cansavvy
Copy link
Collaborator

Okay. So I should hold off on doing this until those integrated results come back?

@jharenza
Copy link
Collaborator

Okay. So I should hold off on doing this until those integrated results come back?

I think you can integrate the SNV results and do the mutual exclusivity/co-occurrance analysis on SNVs more immediately!

@cansavvy
Copy link
Collaborator

Okay. How should I integrate those? Sounds like you guys have already done some work on this?

@jharenza
Copy link
Collaborator

Would we be looking to have something like this figure for this analysis, but with the disease type labels? Screen Shot 2019-07-25 at 3 01 05 PM This is from Rotika et al, 2019 doi: http://dx.doi.org/10.1101/566455.

This figure is related to #6. For this issue, we would look for something similar to
example-plot.pdf (created with maftools for another solid tumor dataset I had) for specific histologies or another mode of visualizing/tabling statistical testing for these mutation relationships. First, discover those relationships, then summarize. May depend on #19, but can give a go with broader histologies or start with one, for example, Medulloblastoma, High-grade glioma, low-grade glioma.

@jharenza
Copy link
Collaborator

Okay. How should I integrate those? Sounds like you guys have already done some work on this?

We do not yet have this automated, so you can come up with the total merge of mutations per sample based on the two mutation algorithms, Mutect2 and Strelka2 and then start with those. It may also be a good idea to investigate mutations present in only one algorithm for potential artifacts (some will be real), but this may be constitute another issue and may cause these issues dependency on that - thoughts, @cgreene ?

@jharenza jharenza removed the good first issue Good for newcomers label Jul 28, 2019
@cansavvy
Copy link
Collaborator

cansavvy commented Jul 29, 2019

Sounds like I should do some initial analyses to see how much overlap there is between Mutext2 and Strelka2 and then I'll report back and we can try to make some further decisions.
I will hold off for now on combining the fusions results since it sounds like you are working on this.

@jharenza
Copy link
Collaborator

jharenza commented Jul 29, 2019

Sounds like I should do some initial analyses to see how much overlap there is between Mutext2 and Strelka2 and then I'll report back and we can try to make some further decisions.
I will hold off for now on combining the fusions results since it sounds like you are working on this.

Yes, sounds great. Strelka may find more lower frequency variants, which can still be real, so we just have to assess whether these are real and potentially oncogenic and if so, keep!

@jharenza jharenza reopened this Jul 29, 2019
@cgreene
Copy link
Collaborator

cgreene commented Jul 29, 2019

Maybe it'd be good to add a new issue for:

Evaluate concordance between Mutext2 and Strelka2 and decide on next steps.

This may be helpful to keep discussion within the issue focused on a single topic.

@jharenza
Copy link
Collaborator

created #30

@jashapiro
Copy link
Member

I'm going to start working on this, aiming to build a figure similar to the one @jharenza presented:
example-plot

For now I will used the strelka2 data, but it should be flexible enough to feed in whatever final set of data we would be interested in.

I was planning to allow production alternative plots for different VAF cutoffs, as well as grouping by gene, filtering by effect, mutation type, etc.

@jashapiro jashapiro assigned jashapiro and unassigned cansavvy Oct 8, 2019
@jashapiro jashapiro added the in progress Someone is working on this issue, but feel free to propose an alternative approach! label Oct 9, 2019
@jaclyn-taroni jaclyn-taroni added cnv Related to or requires CNV data fusion Related to or requires fusion data snv Related to or requires SNV data labels Oct 26, 2019
@jashapiro
Copy link
Member

I'm currently working on adding processing of CNV outputs for these analyses. The current plan is to make plots with SNV only, CNV only, and SNV + CNV. For initial analysis, I am taking each gene + loss or gain status (broadly interpreted) as the mutation unit for each sample. However, there are a few issues worth discussing before finalizing the analysis.

@jaclyn-taroni
Copy link
Member

Closing all planned analysis tickets in favor of opening new proposed analysis/updated analysis tickets as needed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cnv Related to or requires CNV data fusion Related to or requires fusion data in progress Someone is working on this issue, but feel free to propose an alternative approach! snv Related to or requires SNV data
Projects
None yet
Development

No branches or pull requests

6 participants