Skip to content
No description, website, or topics provided.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
images upreg Sep 15, 2019
networks versions with no spaces in names Oct 4, 2019
resources presentation Sep 23, 2019
LICENSE Initial commit Sep 14, 2019 added parentheses Oct 7, 2019
logo2.png logo2 Sep 15, 2019
upreg.png upreg Sep 15, 2019



Inferring regulators and pathways involved in NF1 and NF2 tumors originating from Schwann cells using gene expression data


  • Neurofibromatosis Type 1 (NF1) and Neurofibromatosis Type 2 (NF2) present as specific tumor types arising from Schwann cells. Using RNA-Seq data to perform a differential expression analysis we identified signaling pathways and upstream regulators enriched for specific tumor types.

  • Using this information we then further developed tumor-specific gene/pathway networks to prioritize potentially significant biology.

  • We then mapped the significantly activated upstream regulators to drug-target data to identify compounds that may have tumor-specific activity in these diseases.

  • This work provides potential insight into the biology of specific NF1 and NF2 tumor types, prioritizes novel drug targets for further development and establishes an analytic that can be applied to further unravel the biology of other tumor types arising from different cells of origin in NF1 and NF2 patients.


Tumor development in NF1 and NF2 patients can take a number of different clinical courses based on the cell of origin (COO) for the tumor for example:

  • Plexiform Neurofibroma (PNF), Cutaneous Neurofibroma (CNF), Malignant Peripheral Nerve Sheath Tumor (MPNST) and Schwanoma arising from Schwann cells in the peripheral nervous system (PNS)
  • Glioma and Meningioma arising from the glial precursors in the central nervous system (CNS)

Due to the different COO for these tumors, the underlying biology is likely to be different. This makes therapeutic compound discovery to effectively target these tumors exceptionally difficult.

A better understanding of the biology of the different tumor types will lead to better approaches for identifying rational and effective treatment options.



We use RNA-Seq data provided by the organizers excluding the 48 new samples that were added on 09/13/19. In addition we also used two independent sets of RNA-Seq data obtained from normal Schwann cells (SRP212780 and SRP094118).

Data processing and visualization

Data from all 3 resources were merged, omitting genes that were not present in all samples. The final data set contained 17,644 genes. RNA-Seq counts K were normalized, i.e. multiplied by a sample-specific factor s to account for differences in read depths, using the median-of-ratios method. Counts where subsequently log2-transformed (after adding a constant of 0.1 in order to handle zero counts). We then performed standard Principal Componant Analysis (PCA) retaining only the 10 top components as input into t-Distributed Stochastic Neighbor Embedding (t-SNE) (default parameter settings).

Differential gene expression analysis

Differential expression analysis between selected clusters of samples (tumor vs. normal) was performed using DESeq2.

Ingenuity Knowledge Base

The Ingenuity Knowledge Base (IKB) (QIAGEN) is a large, structured collection of curated findings from the biomedical literature. IKB content is represented as a network with nodes (genes, drugs and other molecules, biological functions, diseases, and pathways) and edges (representing prior experimental observations).

Upstream Regulator Analysis

Upstream Regulator Analysis (URA) (Krämer et al. Bioinformatics. 2014 Feb 15;30(4):523-30. PMID: 24336805) based on the IKB is used to infer activation or inhibition of regulators potentially causing observed gene expression changes.


Machine learning-augmented Pathway Analysis

Machine Learning-augmented Pathway Analysis (MLPA) is used to infer weighted and signed pathway-gene associations. Its method is based on content-driven vector space embedding of genes, with gene feature vectors being used for subsequent training for pathway prediction.

Software availability and reproducibility

PCA, t-SNE, as well as data preprocessing was performed in a jupyter notebook running Python 2.7 with libraries pandas, sklearn, numpy and matplotlib. PCA and t-SNE was also run independently using OmicSoft Array Studio (QIAGEN) with similar results. DESeq2 is publicly available (R) but we used it as part of a pipeline in OmicSoft Array Studio. URA and access to the IKB is commercially available in Ingenuity Pathway Analysis (IPA) (QIAGEN). MLPA is in active development and has not yet been made publicly available. The jupyter notebook and all necessary input data is being provided as a gzipped tar bundle on github.


Our approach involved the following steps:

  1. Cluster RNA-Seq expression data using PCA and t-SNE
  2. Identify the biology that differentiates the tumor types
  3. Create tumor type specific networks to visualize the differences
  4. Identify drugs that target molecules in networks

Step 1: Cluster RNA-Seq expression data using PCA and t-SNE

Initially, the combination of RNA-Seq expression data and two sources of normal Schwann cells was clustered using PCA and t-SNE. first tsne! This first attempt highlighted a few issues:

  • Some specimens separate into clear tumor type clusters, but others do not.
  • There is ambiguity around which samples represent true ‘normal’ tissue of origin.
  • There is potential for significant separation to be based on batch effect or technical artificats.

We noted that the tumor types descended from different ancestor cells (, which could lead to problems with comparing their data, so we choose to focus on tumors originating from Schwann cells. tumor type taxonomy!

After reducing to descendants of Schwann cells, all tumor types and normal cells separate into distinct clusters. second tsne! Note that some types appear to subdivide further, so for our purposes, we split MPNST into two sub-groups. Also, both sets of normal Schwann cells clustered together, which gives confidence that the clustering is based on real biology rather than just experimental methods.

Step 2: Identify the biology that differentiates the tumor types

Five differential expression datasets were created, each of which compared normal Schwann cells to one of the tumor types, such as the following compared against the MPNST1 cluster: volcano!

Analysis of these datasets in IPA shows significant pathway and regulator differentiation between NF1 and NF2 tumors, but NF1 types appear relatively similar. 5 tumor comparison!

To further differentiate between NF1 types, we created three more differential expression datasets matching cutaneous nf against mpnst1, mpnst2, and neurofibroma each. 3 tumor comparison! Now differences between the NF1 types appear.

Step 3: Create tumor type specific networks to visualize the differences

MLPA was then used to generate tumor-context networks inferred from content. Upstream regulators are connected to pathways for different tumors, and the resulting networks depict the differences between the tumor types. networks! Signaling through these pathways potentially explains gene expression changes.

Step 4: Identify drugs that target molecules in networks

By combining the regulator results above with the Harmonized Drug Screening Data, drug targets can be matched with tumors and their effect can be predicted. drugs! The table above shows significant upstream regulators identified in different tumor types that are also targets in available drug screening data.

  • All upstream regulators are predicted to be activating except those shown in parentheses
  • Genes in bold are known to function in regulation of chromatin
  • Coloring is to highlight tumor type specificity of the upstream regulator

SMARCA4 is a particularly interesting target for NF1.


  • NF1 and NF2 are clearly differentiated.
  • This approach can discriminate between tumor types by identifying drivers and signaling cascades.
  • Content-driven Machine Learning can generate tumor-specific networks involving inferred gene-pathway associations.
  • This approach identified potential therapeutic targets, including the following SMARCA4 hypothesis:

SMARCA4 could be a potential drug target for NF1

SMARCA4, commonly thought of as a tumor suppressor, is emerging as a potential driver of tumorigenesis. The gene has the potential to drive CTNNB1, which is another NF1 upstream regulator in the results above. Compounds that target SMARCA4 are already available, so in this context, SMARCA4 is a potential drug target. smarca4!

Additional Questions:

1. What additional data would you like to have?

  • Data from uniform sources
  • Counts at transcript level
  • More samples of normal tissue data
  • Single cell data

2. What are the next rational steps?

  • Perform same analysis on glioma tumor types by comparing to their origin cells
  • Compare proposed networks to additional known research
  • Consider suggested drugs, such as those targeting SMARCA4

3. What additional tools or pipelines will be needed for those steps?

  • Normal tissue data for glioma tumor types
  • Techniques such as mean-variants modeling (e.g. voom) could be used to compare against DESeq2
  • Time :-)

4. What skills would additional collaborators ideally have?

  • Experts in NF


Data preparation

jupyter notebook for NF RNA-Seq t-SNE analysis

Differential expression datasets

Differential expression datasets between normal Schwann cells and 5 tumor types

Differential expression datasets between Cutaneous NF and other NF1 tumor types

Results of normal vs. 5 tumor types

Significant upstream regulators for 5 tumors vs. normal

Heatmap of upstream regulators for 5 tumors vs. normal

Significant pathways for 5 tumors vs. normal

Heatmap of pathways for 5 tumors vs. normal

Results of Cutaneous NF vs. other NF1 tumor types

Significant upstream regulators for NF1 comparison

Heatmap of upstream regulators for NF1 comparison

Significant pathways for NF1 comparison

Heatmap of pathways for NF1 comparison

Regulator-pathway networks

Networks depict key regulators and pathways significant to each tumor type






Matching drugs against regulators



Hackathon Outcome session, Tuesday, September 24, 2019

You can’t perform that action at this time.