Skip to content

broadinstitute/FusionInspectorPaper

Repository files navigation

Supplemental Code and Data for our FusionInspector Paper

Included here are the supplemental code and data used for analyses and generating figures in our paper:

"Targeted in silico characterization of fusion transcripts in tumor and normal tissues via FusionInspector" (2023) by Brian J. Haas, Alexander Dobin, Mahmoud Ghandi, Anne Van Arsdale, Timothy Tickle, James T. Robinson, Riaz Gillani, Simon Kasif, and Aviv Regev

The structure of this work can be divided into the following order:

  • (A) Initial STAR-Fusion scan of TCGA and GTEx for fusion transcripts: TCGA (10,133 samples) and GTEx (8375 samples) are first analyzed using STAR-Fusion and recurrent fusion transcripts are identified across tumor and normal samples.

  • (B) FusionInspector exploration of recurrent fusions: FusionInspector is used to further examine a subset of samples including 628 TCGA and 530 GTEx identified as having occurrences of recurrent fusions of interest. FusionInspector captures sequence and expression features for each of these fusion instances among smaples, and we use those attributes to generate clusters of fusions having similar features. Interestingly, we find a cluster (C4) of fusions that is heavily enriched for known cancer fusions (found in the COSMIC fusion database). We also find smaller clusters that have features consistent with low levels of cis- or trans-splicing from highly expresed fusion partners or likely biological or experimental artifacts.

  • (C) Targeted screening of select C4 fusions: We selected a subset of 231 fusion gene pairs with at least one occurrence in C4, at least three occurrences overall, and with at least 30% of those occurrences found in clusters containing other COSMIC fusions. This was further supplemented with 5 additional COSMIC fusions not found in C4, giving a total of 236 fusions. Using FusionInspector in its screening modality, we targeted 2764 TCGA and 1009 GTEx samples expected to have occurrences of these 236 fusions, and each sample was screened with this exact same set of 236 fusion genes given as a panel. This yielded thousands of newly characterized occurrences of these fusions via FusionInspector. To predict which fusion clusters individual fusion isoforms correspond to from (B), we built and trained a random forest classifier using the fusion features and cluster lables from (B) and applied this classifier to fusions identified here in (C) - allowing us to classify individual instances as COSMIC-like, artifact-like, or other cateogory. From these predictions, we were able to glean more insights into the general characteristics of these 236 fusion pairs, allowing us to better prioritize newly identified fusions as COSMIC-like or to discount others as more enriched in artifact-like features. We identify examples of understudied fusions that may deserve more attention in future studies of fusion transcripts in tumor and normal tissues.

  • (D) Application of FusionInspector to all 1366 TARGET pediatric cancer RNA-seq samples (1233 participants). STAR-Fusion was first used to identify candidate fusions and FusionInspector was run subsequenetly to in silico validate and further characterize fusion transcripts. Here we find again that COSMIC fusions (albeit including fusions specific to pediatric cancer samples) are enriched among a single cluster with attributes matching to the earlier defined C4 fusion cluster. We further investigate recurrent fusions represented within that cluster and identify additional novel fusion candidates potentially relevant to pediatric cancers.

Supplemental Data

Supplemental Code

About

No description, website, or topics provided.

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published