Proposed Analysis: Generate hotspot and hot region lists #932

jharenza · 2021-02-03T20:20:30Z

What are the scientific goals of the analysis?

Generate hotspot and hot region file for #819

What methods do you plan to use to accomplish the scientific goals?

After some discussion with David Wheeler (formerly, BCM, now St. Jude), we have decided to derive a cancer hotspots (oncogenes) and hot regions file with which to perform recovery of potentially missed oncogenic DNA alterations. The rationale behind this is that frequently, many oncogenes are mutated recurrently at the same nucleotide(s), whereas tumor suppressor genes (TSG) frequently can have a non-recurrent mutation in a functional domain or a truncated alteration, leading to inactivation of the gene. In addition, kinase genes often may have recurrent or non-recurrent alterations within the kinase domain, leading to activation of the kinase.

We will first search for deleterious recurrent point mutations

Filter IMPACT ='HIGH|MODERATE'
Filter Variant_Classification = 'Missense_Mutation|Splice_Region|In_Frame_Del|Frame_Shift_Del|Splice_Site|Splice_Region|Nonsense_Mutation|Nonstop_Mutation|In_Frame_Ins|Frameshift_Ins'

within the OpenPBTA cohort (N>=2) within the annotated oncogene and TSG lists from fusion_filtering here to determine whether we are missing any pediatric brain-specific hotspots from the cancer hotspots v2 file found here. We will inspect those and potentially add them to the list of hotspots.

For hot regions, we will create a BED file of all kinase domain coordinates in which to scavenge back deleterious alterations.

For TSGs, we will include the gene as the "region" and scavenge back deleterious alterations that occur within these genes. We may either add these to the region list or just add this as a step in #819.

What input data are required for this analysis?

pbta-snv-lancet.vep.maf.gz
pbta-snv-mutect2.vep.maf.gz
pbta-snv-strelka2.vep.maf.gz
pbta-snv-vardict.vep.maf.gz

How long do you expect is needed to complete the analysis? Will it be a multi-step analysis?

1 week, yes

Who will complete the analysis (please add a GitHub handle here if relevant)?

@kgaonkar6

What relevant scientific literature relates to this analysis?

Chang, et. al, 2016 and 2017

The text was updated successfully, but these errors were encountered:

kgaonkar6 · 2021-02-09T16:50:53Z

In addition to Oncogene and TSGs from the genereferencelist.tsv we will also add brain gene list https://chopri.box.com/s/tqfq8whojsgg6htoz4begrgmpqyy3adt.

kgaonkar6 · 2021-02-19T21:48:45Z

Adding filter for independent samples while checking for recurrence from today's meeting:
Use independent-specimens.wgswxs.primary-plus.tsv to include primary tumors when available, if not select other tumor_descriptor types

kgaonkar6 · 2021-02-22T17:25:03Z

Just wanted to clarify for the kinase domain (it seems the same also applies to TSG gene body region mutations ) do we want to only retain sites that are either not in Cosmic Census and MSKCC hotspot as outputs so that these novel sites can be added to the hotspot list?

jharenza · 2021-02-22T18:31:38Z

Just wanted to clarify for the kinase domain (it seems the same also applies to TSG gene body region mutations ) do we want to only retain sites that are either not in Cosmic Census and MSKCC hotspot as outputs so that these novel sites can be added to the hotspot list?

For these "regions", I do not envision them being added to the hotspot list because I think there will be a lot of uniqueness here, but rather we capture those for any sample. For instance, if we see a non-canonical kinase domain mutation in sample X, but it is only in 2/3 callers, we should scavenge that back. Does that make sense?

jharenza · 2021-02-22T18:32:15Z

Also adding here that @adamcresnick suggests we add to the brain-goi list a list of developmental genes. I do not have this collated, so marking this as a future to-do.

kgaonkar6 · 2021-02-22T18:36:16Z

but do we want to keep any mutation in domain region to scavenge back or only scavenge back novel (sites not overlapping MSKCC and gene not in Cosmic Census gene)? Because it seems to me, we would already use the MSKCC and Cosmic Census gene list as hotspot to scavenge back any mutations.

jharenza · 2021-02-22T18:40:59Z

but do we want to keep any mutation in domain region to scavenge back or only scavenge back novel (sites not overlapping MSKCC and gene not in Cosmic Census gene)? Because it seems to me, we would already use the MSKCC and Cosmic Census gene list as hotspot to scavenge back any mutations.

Correct, you can make it a novel set, but my point was that I don't think we need to make a list, but rather a region in which we find deleterious mutations that, if not captured in previous steps (ie MSKCC/Cosmic), we would scavenge.

jharenza · 2021-03-01T18:57:19Z

I am going to close this issue in favor of simplifying to the original issue here #819 - determining novel recurrent hotspots may be beyond the scope of this manuscript and require some bench validation since most of the VAFs are very low.

jharenza added proposed analysis in progress Someone is working on this issue, but feel free to propose an alternative approach! labels Feb 3, 2021

kgaonkar6 mentioned this issue Feb 9, 2021

#932 Part1 : Recurrence snv/indels in OpenPBTA using all calls from strelka2,mutect2,vardict and lancet #938

Closed

5 tasks

kgaonkar6 mentioned this issue Feb 11, 2021

#932 Part N Kinase domain overlap as hotspot #940

Closed

5 tasks

kgaonkar6 mentioned this issue Feb 18, 2021

Update setup db #946

Merged

5 tasks

This was referenced Feb 22, 2021

Part 1 #819 Combine snv per caller and filter to scavenge hotspots #947

Closed

Part 2 #932 Checking recurrence in combined snv calls #948

Closed

jharenza closed this as completed Mar 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed Analysis: Generate hotspot and hot region lists #932

Proposed Analysis: Generate hotspot and hot region lists #932

jharenza commented Feb 3, 2021

kgaonkar6 commented Feb 9, 2021

kgaonkar6 commented Feb 19, 2021

kgaonkar6 commented Feb 22, 2021

jharenza commented Feb 22, 2021

jharenza commented Feb 22, 2021

kgaonkar6 commented Feb 22, 2021

jharenza commented Feb 22, 2021

jharenza commented Mar 1, 2021

Proposed Analysis: Generate hotspot and hot region lists #932

Proposed Analysis: Generate hotspot and hot region lists #932

Comments

jharenza commented Feb 3, 2021

What are the scientific goals of the analysis?

What methods do you plan to use to accomplish the scientific goals?

What input data are required for this analysis?

How long do you expect is needed to complete the analysis? Will it be a multi-step analysis?

Who will complete the analysis (please add a GitHub handle here if relevant)?

What relevant scientific literature relates to this analysis?

kgaonkar6 commented Feb 9, 2021

kgaonkar6 commented Feb 19, 2021

kgaonkar6 commented Feb 22, 2021

jharenza commented Feb 22, 2021

jharenza commented Feb 22, 2021

kgaonkar6 commented Feb 22, 2021

jharenza commented Feb 22, 2021

jharenza commented Mar 1, 2021