Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Proposed Analysis: Generate hotspot and hot region lists #932

Closed
jharenza opened this issue Feb 3, 2021 · 8 comments
Closed

Proposed Analysis: Generate hotspot and hot region lists #932

jharenza opened this issue Feb 3, 2021 · 8 comments
Labels
in progress Someone is working on this issue, but feel free to propose an alternative approach! proposed analysis

Comments

@jharenza
Copy link
Collaborator

jharenza commented Feb 3, 2021

What are the scientific goals of the analysis?

Generate hotspot and hot region file for #819

What methods do you plan to use to accomplish the scientific goals?

After some discussion with David Wheeler (formerly, BCM, now St. Jude), we have decided to derive a cancer hotspots (oncogenes) and hot regions file with which to perform recovery of potentially missed oncogenic DNA alterations. The rationale behind this is that frequently, many oncogenes are mutated recurrently at the same nucleotide(s), whereas tumor suppressor genes (TSG) frequently can have a non-recurrent mutation in a functional domain or a truncated alteration, leading to inactivation of the gene. In addition, kinase genes often may have recurrent or non-recurrent alterations within the kinase domain, leading to activation of the kinase.

We will first search for deleterious recurrent point mutations

Filter IMPACT ='HIGH|MODERATE'
Filter Variant_Classification = 'Missense_Mutation|Splice_Region|In_Frame_Del|Frame_Shift_Del|Splice_Site|Splice_Region|Nonsense_Mutation|Nonstop_Mutation|In_Frame_Ins|Frameshift_Ins'

within the OpenPBTA cohort (N>=2) within the annotated oncogene and TSG lists from fusion_filtering here to determine whether we are missing any pediatric brain-specific hotspots from the cancer hotspots v2 file found here. We will inspect those and potentially add them to the list of hotspots.

For hot regions, we will create a BED file of all kinase domain coordinates in which to scavenge back deleterious alterations.

For TSGs, we will include the gene as the "region" and scavenge back deleterious alterations that occur within these genes. We may either add these to the region list or just add this as a step in #819.

What input data are required for this analysis?

pbta-snv-lancet.vep.maf.gz
pbta-snv-mutect2.vep.maf.gz
pbta-snv-strelka2.vep.maf.gz
pbta-snv-vardict.vep.maf.gz

How long do you expect is needed to complete the analysis? Will it be a multi-step analysis?

1 week, yes

Who will complete the analysis (please add a GitHub handle here if relevant)?

@kgaonkar6

What relevant scientific literature relates to this analysis?

Chang, et. al, 2016 and 2017

@jharenza jharenza added proposed analysis in progress Someone is working on this issue, but feel free to propose an alternative approach! labels Feb 3, 2021
@kgaonkar6
Copy link
Collaborator

In addition to Oncogene and TSGs from the genereferencelist.tsv we will also add brain gene list https://chopri.box.com/s/tqfq8whojsgg6htoz4begrgmpqyy3adt.

@kgaonkar6
Copy link
Collaborator

Adding filter for independent samples while checking for recurrence from today's meeting:
Use independent-specimens.wgswxs.primary-plus.tsv to include primary tumors when available, if not select other tumor_descriptor types

@kgaonkar6
Copy link
Collaborator

Just wanted to clarify for the kinase domain (it seems the same also applies to TSG gene body region mutations ) do we want to only retain sites that are either not in Cosmic Census and MSKCC hotspot as outputs so that these novel sites can be added to the hotspot list?

@jharenza
Copy link
Collaborator Author

Just wanted to clarify for the kinase domain (it seems the same also applies to TSG gene body region mutations ) do we want to only retain sites that are either not in Cosmic Census and MSKCC hotspot as outputs so that these novel sites can be added to the hotspot list?

For these "regions", I do not envision them being added to the hotspot list because I think there will be a lot of uniqueness here, but rather we capture those for any sample. For instance, if we see a non-canonical kinase domain mutation in sample X, but it is only in 2/3 callers, we should scavenge that back. Does that make sense?

@jharenza
Copy link
Collaborator Author

Also adding here that @adamcresnick suggests we add to the brain-goi list a list of developmental genes. I do not have this collated, so marking this as a future to-do.

@kgaonkar6
Copy link
Collaborator

but do we want to keep any mutation in domain region to scavenge back or only scavenge back novel (sites not overlapping MSKCC and gene not in Cosmic Census gene)? Because it seems to me, we would already use the MSKCC and Cosmic Census gene list as hotspot to scavenge back any mutations.

@jharenza
Copy link
Collaborator Author

but do we want to keep any mutation in domain region to scavenge back or only scavenge back novel (sites not overlapping MSKCC and gene not in Cosmic Census gene)? Because it seems to me, we would already use the MSKCC and Cosmic Census gene list as hotspot to scavenge back any mutations.

Correct, you can make it a novel set, but my point was that I don't think we need to make a list, but rather a region in which we find deleterious mutations that, if not captured in previous steps (ie MSKCC/Cosmic), we would scavenge.

@jharenza
Copy link
Collaborator Author

jharenza commented Mar 1, 2021

I am going to close this issue in favor of simplifying to the original issue here #819 - determining novel recurrent hotspots may be beyond the scope of this manuscript and require some bench validation since most of the VAFs are very low.

@jharenza jharenza closed this as completed Mar 1, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
in progress Someone is working on this issue, but feel free to propose an alternative approach! proposed analysis
Projects
None yet
Development

No branches or pull requests

2 participants