Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Proposed Analysis: Molecularly subtype high-grade glioma tumors #249

Closed
jharenza opened this issue Nov 8, 2019 · 8 comments
Closed

Proposed Analysis: Molecularly subtype high-grade glioma tumors #249

jharenza opened this issue Nov 8, 2019 · 8 comments
Labels
cnv Related to or requires CNV data in progress Someone is working on this issue, but feel free to propose an alternative approach! molecular subtyping Related to molecular subtyping of tumors proposed analysis snv Related to or requires SNV data transcriptomic Related to or requires transcriptomic data

Comments

@jharenza
Copy link
Collaborator

jharenza commented Nov 8, 2019

Scientific goals

What are the scientific goals of the analysis?
Subtype High-grade gliomas (high-grade astrocytic tumors, HGAT), according to defining histone lesion.

Note: historically, amino acid 1 was not counted in protein nomenclature, thus K27 and G34 mutations noted above really translate to K28 and G35. See:
ActaNeuropathol.2018Leske.pdf
Our data maps to K28 and G35, consistent with COSMIC and pedcbioportal.

  • Fun fact: this was also the case with BRAF V600E (it was historically called V599E, but the community adapted the current nomenclature).

Proposed methods

What methods do you plan to use to accomplish the scientific goals?

  1. Review of mutation, copy number, and expression data.
  2. Render results in tabular form in a notebook.

Any sample that harbors H3F3A K28M, H3F3A G35R/V_, or HIST1H3B K28M, if not classified as a high-grade glioma, should be, as this is a defining lesion. Eg: three former PNET tumors were re-classified as such.

H3 K28 mutant

  • These tumors contain H3F3A K28M or HIST1H3B K28M mutations
  • Co-occurring lesions include: ACVR1, TP53, ATRX mutations; PDGFRA amplification; PTEN loss
  • Mutually-exclusive lesions: FGFR1 mutations/fusions (thalamic); IDH1 mutations; BRAF V600E (low-grade gliomas)
  • Average age of 9 years
  • Majority should be midline localized

H3 G35 mutant

  • These tumors contain H3F3A G35R/V mutations
  • Co-occurring lesions include: ATRX/DAXX, TP53, SETD2 mutations, NTRK fusions
  • Mutually exclusive lesions: IDH1 mutations
  • Average age of 20 years

IDH mutant

  • These tumors contain IDH1 R132H mutations
  • Co-occurring lesions include: TP53 mutations; P73-AS1 promoter methylation and downregulation
  • High expression of FOXG1 and OLIG2
  • Mutually exclusive lesions: chr7 gain and chr10 loss

H3.3 and IDH wildtype

  • High-grade gliomas absent of H3F3A and IDH mutations
  • Defining lesions: MYCN, PDGFRA amplification, TP53 and TERT mutations

1p/19q co-deleted oligodendrogliomas

  • Co-deletion of chr 1p and 19q (LOH, loss of heterozygosity of both) results in translocation t(1p;19q)
  • Nearly all have a co-occurring IDH1 mutation (R132 or R172)
  • Other co-occurring lesions include: TERT promoter, CIC, and FUBP1 mutations
  • Mutually exclusive of TP53 and ATRX mutation
  • Typically occurs in adult tumors

Required input data

What input data will you use for this analysis?
Histologies file (contains brain regions classified for high-grade gliomas as midline, hemispheric, mixed, or other), SNVs, copy number, RNA expression data.

Proposed timeline

What is the timeline for the analysis?
1 week

Relevant literature

If there is relevant scientific literature, put links to those items here.
Link to The 2016 World Health Organization Classification of Tumors
of the Central Nervous System: a summary
Link to Clinical features, diagnosis, and pathology of IDH-mutant, 1p/19q-codeleted oligodendrogliomas

@jharenza jharenza added proposed analysis ticket in progress still working on this ticket - may change labels Nov 8, 2019
@jharenza
Copy link
Collaborator Author

jharenza commented Nov 8, 2019

@adamcresnick, @awaanders, @jainpayal022 - do these look like reasonable ways to subtype the high-grade gliomas by lesion?

@jharenza jharenza changed the title Proposed Analysis: Subtype high-grade glioma tumors Proposed Analysis: Molecularly subtype high-grade glioma tumors Nov 8, 2019
@jharenza jharenza removed the ticket in progress still working on this ticket - may change label Nov 8, 2019
@jharenza
Copy link
Collaborator Author

jharenza commented Nov 8, 2019

Got confirmation from @awaanders that these look good. Thanks!

@jaclyn-taroni jaclyn-taroni added cnv Related to or requires CNV data snv Related to or requires SNV data transcriptomic Related to or requires transcriptomic data molecular subtyping Related to molecular subtyping of tumors labels Nov 10, 2019
@jaclyn-taroni
Copy link
Member

Any sample that harbors H3F3A K28M, H3F3A G35R/V_, or HIST1H3B K28M, if not classified as a high-grade glioma, should be, as this is a defining lesion. Eg: three former PNET tumors were re-classified as such.

For this ticket, the first step should be looking at these lesions in all samples.

@cbethell
Copy link
Contributor

I am planning to work on this ticket.
I have made a general workout plan as follows to produce the following information in tabular format:

Kids_First_Participant_ID Kids_First_Biospecimen_ID age at diagnosis (days) reported gender glioma brain region FOXG1 expression z-score ... ... Co-occurring lesions Focal CN status Chromosomal gain/loss
PT_XXXXXXXX BS_XXXXXXXX 800 Female midline 4.235 ... ... ACVR1, TP53 loss chr7 gain
... ... ... ... ... ... ... ... ... ... ...

Notes on columns

  • age at diagnosis (days), reported gender, glioma brain region can all be obtained from the pbta-histologies.tsv file.
  • For any of the genes mentioned in terms of overexpression, a column that contains the z-scores for samples should be included.
    • To obtain z-scores, I would expect an analyst to filter an expression matrix to only ATRT samples, log2(x + 1) it and then z-score the rows.
  • The co-occurring lesions can be determined using the data/pbta-snv-consensus-mutation.maf.tsv.gz file.
  • For focal copy number alterations, the files in analyses/focal-cn-file-preparation/results can be used. See also: SMARCB1 deletions in ATRT with current SEG to gene mapping #217
  • To my knowledge, the structural variant data is not yet in an easily consumable format analogous to the focal CN files above.

Note: As mentioned in the previous comment, the first step will be to look at the lesions in all samples. I will be joining the files mentioned above using the sample_id variable from the histologies file.

@jharenza
Copy link
Collaborator Author

@cbethell - thanks for working on this! I don't think we need translocations, however there are deletions and duplications in that manta file that may be useful. This file should be annotated for SV type and has a field for gene annotation, so it should be a bit more straightforward to use than the CNV files. Regarding reported_gender, I would suggest using germline_sex_estimate. While these are annotated for the Normal WGS files, we should be able to use this estimate on a patient-level.

@jaclyn-taroni
Copy link
Member

Summarizing ongoing efforts and what tasks remain - initially mentioned in #420 (comment). Essentially, each data type requires a number of decisions that are best discussed and documented one data type at a time. Therefore, we've split up the work of cleaning and presenting relevant data in the following way:

  1. Subsetting files to tumor samples and using Kids_First_Biospecimen_ID to filter rather than sample_id (allows us to drop derived cell line samples) - PR 5 of n - Molecular Subtyping - HGG (Revise Subset Files) #426
  2. Cleaning copy number data - specifically, annotated CNVkit data and GISTIC broad_values_by_arm.txt data - PR 6 of n - Molecular Subtyping - HGG (Cleaning CNV data)  #427
  3. Cleaning mutation data - specifically, consensus mutation data for the 12 genes mentioned above - PR 7 of n - Molecular Subtyping - HGG (Cleaning mutation data) #428
  4. Cleaning fusion data - NTRK and FGFR1 fusions from pbta-fusion-putative-oncogenic.tsv file - PR 8 of n - Molecular Subtyping - HGG (Cleaning fusion data) #429
  5. Cleaning gene expression data - specifically FOXG1, OLIG2, and, in the absence of methylation data, TP73-AS1 for both stranded and poly-A data - needs to be addressed
  6. Joining all DNA-seq data (copy number, mutation) with relevant clinical data by Kids_First_Biospecimen_ID and removing samples with ambiguous sample_id that cannot be mapped to RNA-seq data - needs to be addressed, relies on PR 6 of n - Molecular Subtyping - HGG (Cleaning CNV data)  #427, PR 7 of n - Molecular Subtyping - HGG (Cleaning mutation data) #428
  7. Joining RNA-seq data (fusion, gene expression) to the DNA-seq/clinical table where possible to produce a final table - needs to be addressed, depends on everything above

What is mentioned above, but has not been directly addressed with the Manta data is the translocation t(1p;19q) - for the moment we use the chromosome arm info from GISTIC.

@jaclyn-taroni
Copy link
Member

molecular-subtyping-HGG will need to be rerun with annotated focal consensus calls (#186) and new GISTIC calls (#453).

@jaclyn-taroni
Copy link
Member

I am going to close this - what is left of what has been discussed is tracked in #499 and #509 as well was the open PR #466.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cnv Related to or requires CNV data in progress Someone is working on this issue, but feel free to propose an alternative approach! molecular subtyping Related to molecular subtyping of tumors proposed analysis snv Related to or requires SNV data transcriptomic Related to or requires transcriptomic data
Projects
None yet
Development

No branches or pull requests

3 participants