<a href="https://colab.research.google.com/github/cytoscape/cytoscape-automation/blob/master/for-scripters/R/colab/jupyter-bridge-rcy3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Jupyer Bridge and RCy3**


You can open this notebook in the Google Colab from Github directly (File -> Open notebook -> Github).

Also you can download this notebook and upload it to the Google Colab (File -> Open notebook -> Upload).

<font color='red'> You do not need to run installation and getting started sections if you come from the basic Jupyter Bride and RCy3 tutorial, since you have already installed required packages and build the connection. </font>

## **Installation**

In [None]:
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("RCy3")
library(RCy3)
library(RColorBrewer)

Loading required package: usethis

Downloading GitHub repo cytoscape/RCy3@HEAD



R.methodsS3  (NA -> 1.8.1   ) [CRAN]
R.oo         (NA -> 1.24.0  ) [CRAN]
signal       (NA -> 0.7-6   ) [CRAN]
plyr         (NA -> 1.8.6   ) [CRAN]
XML          (NA -> 3.99-0.6) [CRAN]
R.utils      (NA -> 2.10.1  ) [CRAN]
png          (NA -> 0.1-7   ) [CRAN]
matrixStats  (NA -> 0.58.0  ) [CRAN]
BiocGenerics (NA -> 0.36.0  ) [CRAN]
uchardet     (NA -> 1.1.0   ) [CRAN]
dplR         (NA -> 1.7.2   ) [CRAN]
graph        (NA -> 1.68.0  ) [CRAN]
igraph       (NA -> 1.2.6   ) [CRAN]
RJSONIO      (NA -> 1.3-1.4 ) [CRAN]


Installing 14 packages: R.methodsS3, R.oo, signal, plyr, XML, R.utils, png, matrixStats, BiocGenerics, uchardet, dplR, graph, igraph, RJSONIO

Installing packages into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)



[32m✔[39m  [90mchecking for file ‘/tmp/RtmpQV73Ms/remotes39b9a99e7/cytoscape-RCy3-b1c0087/DESCRIPTION’[39m[36m[39m
[90m─[39m[90m  [39m[90mpreparing ‘RCy3’:[39m[36m[39m
[32m✔[39m  [90mchecking DESCRIPTION meta-information[39m[36m[39m
[90m─[39m[90m  [39m[90mchecking for LF line-endings in source and make files and shell scripts[39m[36m[39m
[90m─[39m[90m  [39m[90mchecking for empty or unneeded directories[39m[36m[39m
[90m─[39m[90m  [39m[90mbuilding ‘RCy3_2.11.6.tar.gz’[39m[36m[39m
   


Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)



## **Getting started**


In [None]:
browserClientJs <- getBrowserClientJs()
IRdisplay::display_javascript(browserClientJs)

In [None]:
cytoscapeVersionInfo()

In [None]:
cytoscapePing()

You are connected to Cytoscape!



# **Differentially Expressed Genes Network Analysis**

## **Prerequisites**

If you haven’t already, install the [*STRINGApp*](http://apps.cytoscape.org/apps/stringapp) and [*filetransferApp*](https://apps.cytoscape.org/apps/filetransfer).

## **Background**

Ovarian serous cystadenocarcinoma is a type of epithelial ovarian cancer which accounts for ~90% of all ovarian cancers. The data used in this protocol are from [The Cancer Genome Atlas](https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga).
The Cancer Genome Atlas, in which multiple subtypes of serous cystadenocarcinoma were identified and characterized by mRNA expression.

We will focus on the differential gene expression between two subtypes, Mesenchymal and Immunoreactive.

For convenience, the data has already been analyzed and pre-filtered, using log fold change value and adjusted p-value.




## **Network Retrieval**

Many public databases and multiple Cytoscape apps allow you to retrieve a network or pathway relevant to your data. For this workflow, we will use the STRING app. Some other options include:

*   [WikiPathways](https://www.wikipathways.org/index.php/WikiPathways)
*   [NDEx](http://www.ndexbio.org/#/)
*   [GeneMANIA](https://genemania.org/)






## **Retrieve Networks from STRING**

To identify a relevant network, we will query the STRING database in two different ways:

Query STRING protein with the list of differentially expressed genes.

Query STRING disease for a keyword; ovarian cancer.

The two examples are split into two separate workflows below.



## **Example 1: STRING Protein Query Up-regulated Genes**

Load the file containing the data for up-regulated genes, TCGA-Ovarian-MesenvsImmuno_UP_full.csv:

In [None]:
de.genes.up <- read.table("https://raw.githubusercontent.com/cytoscape/cytoscape-tutorials/gh-pages/protocols/data/TCGA-Ovarian-MesenvsImmuno-data-up.csv", header = TRUE, sep = "\t", quote="\"", stringsAsFactors = FALSE)

In [None]:
string.cmd = paste('string protein query query="', paste(de.genes.up$Gene, collapse = '\n'), '" cutoff=0.4  species="Homo sapiens"', sep = "")
commandsRun(string.cmd)

The resulting network will load automatically and contains up-regulated genes recognized by STRING, and interactions between them with an evidence score of 0.4 or greater.

The networks consists of one large connected component, several smaller networks, and some unconnected nodes. We will select only the connected nodes to work with for the rest of this tutorial, by creating a subnetwork based on all edges:

In [None]:
createSubnetwork(edges='all', subnetwork.name='String de genes up')

## **Data Integration**

Next we will import log fold changes and p-values from our TCGA dataset to create a visualization. Since the STRING network is a protein-protein network, it is annotated with protein identifiers (Uniprot and Ensembl protein), as well as HGNC gene symbols. Our data from TCGA has NCBI Gene identifiers (formerly Entrez), so before importing the data we are going to use the ID Mapper functionality in Cytoscape to map the network to NCBI Gene.

In [None]:
mapped.cols <- mapTableColumn('display name', 'Human', 'HGNC', 'Entrez Gene')

We can now import the differential gene expression data and integrate it with the network (node) table in Cytoscape. For importing the data we will use the following mapping:

*   Key Column for Network should be Entrez Gene, which is the column we just added.

*   Gene should be the key of the data(de.genes.full).




In [None]:
de.genes.full <- read.table("https://raw.githubusercontent.com/cytoscape/cytoscape-tutorials/gh-pages/protocols/data/TCGA-Ovarian-MesenvsImmuno_data.csv", header = TRUE, sep = ",", quote="\"", stringsAsFactors = FALSE)

loadTableData(de.genes.full,data.key.column="Gene",table.key.column="Entrez Gene")

You will notice two new columns (logFC and FDR.adjusted.Pvalue) in the Node Table.




In [None]:
tail(getTableColumnNames('node'))

## **Visualization**

Next, we will create a visualization of the imported data on the network.

In [None]:
setVisualStyle(style.name="default")
setNodeShapeDefault(new.shape="ELLIPSE", style.name = "default")
lockNodeDimensions(new.state="TRUE", style.name = "default")
setNodeSizeDefault(new.size="50", style.name = "default")
setNodeColorDefault(new.color="#D3D3D3", style.name = "default")
setNodeBorderWidthDefault(new.width="2", style.name = "default")
setNodeBorderColorDefault(new.color="#616060", style.name = "default")
setNodeLabelMapping(table.column="display name",style.name = "default")
setNodeFontSizeDefault(new.size="14", style.name = "default")

Before we create a mapping for node color representing the range of fold changes, we need the min and max of the logFC column:

In [None]:
logFC.table.up <- getTableColumns('node', 'logFC')

In [None]:
logFC.up.min <- min(logFC.table.up, na.rm = T)
logFC.up.max <- max(logFC.table.up, na.rm = T)
logFC.up.center <- logFC.up.min + (logFC.up.max - logFC.up.min)/2

In [None]:
copyVisualStyle(from.style = "default", to.style = "de genes up")
setVisualStyle(style.name="de genes up")

data.values = c(logFC.up.min, logFC.up.center, logFC.up.max)
node.colors <- c(brewer.pal(length(data.values), "YlOrRd"))
setNodeColorMapping('logFC', data.values, node.colors, style.name="de genes up")

Applying a force-directed layout, the network will now look something like this:



In [None]:
layoutNetwork(paste('force-directed',
              'defaultSpringCoefficient=0.00003',
              'defaultSpringLength=50',
              'defaultNodeMass=4',
              sep=' '))

## **Enrichment Analysis Options**

Next, we are going to perform enrichment anlaysis uing the STRING app.



## **STRING Enrichment**
The STRING app has built-in enrichment analysis functionality, which includes enrichment for GO Process, GO Component, GO Function, InterPro, KEGG Pathways, and PFAM.

First, we will run the enrichment on the whole network, against the genome:

In [None]:
string.cmd = 'string retrieve enrichment allNetSpecies="Homo sapiens", background=genome  selectedNodesOnly="false"'
commandsRun(string.cmd)
string.cmd = 'string show enrichment'
commandsRun(string.cmd)

When the enrichment analysis is complete, a new tab titled STRING Enrichment will open in the Table Panel.

The STRING app includes several options for filtering and displaying the enrichment results. The features are all available at the top of the STRING Enrichment tab.

We are going to filter the table to only show GO Process:

In [None]:
string.cmd = 'string filter enrichment categories="GO Process", overlapCutoff = "0.5", removeOverlapping = "true"'
commandsRun(string.cmd)

Next, we will add a split donut chart to the nodes representing the top terms:



In [None]:
string.cmd = 'string show charts'
commandsRun(string.cmd)

## **STRING Protein Query: Down-regulated genes**
We are going to repeat the network search, data integration, visualization and enrichment analysis for the set of down-regulated genes by using the first column of [TCGA-Ovarian-MesenvsImmuno-data-down.csv](https://cytoscape.github.io/cytoscape-tutorials/protocols/data/TCGA-Ovarian-MesenvsImmuno-data-down.csv):

In [None]:
de.genes.down <- read.table("https://cytoscape.github.io/cytoscape-tutorials/protocols/data/TCGA-Ovarian-MesenvsImmuno-data-down.csv", header = TRUE, sep = "\t", quote="\"", stringsAsFactors = FALSE)
string.cmd = paste('string protein query query="', paste(de.genes.down$Gene, collapse = '\n'), '" cutoff=0.4  species="Homo sapiens"', sep = "")
commandsRun(string.cmd)

## **Subnetwork**
Let’s select only the connected nodes to work with for the rest of this tutorial, by creating a subnetwork based on all edges:

In [None]:
createSubnetwork(edges='all', subnetwork.name='String de genes down')

## **Data integration**
Again, the identifiers in the network needs to be mapped to Entrez Gene (NCBI gene):

In [None]:
mapped.cols <- mapTableColumn('display name', 'Human', 'HGNC', 'Entrez Gene')

We can now import the data:



In [None]:
loadTableData(de.genes.full,data.key.column="Gene",table.key.column="Entrez Gene")

## **Visualization**
Next, we can create a visualization. Note that the default style has been altered in the previous example, so we can simply switch to default to get started:

In [None]:
setVisualStyle(style.name="default")

The node fill color has to be redefined for down-regulated genes:



In [None]:
logFC.table.down <- getTableColumns('node', 'logFC')

In [None]:
logFC.dn.min <- min(logFC.table.down, na.rm = T)
logFC.dn.max <- max(logFC.table.down, na.rm = T)
logFC.dn.center <- logFC.dn.min + (logFC.dn.max - logFC.dn.min)/2

In [None]:
copyVisualStyle(from.style = "default", to.style = "de genes down")
setVisualStyle(style.name="de genes down")

data.values = c(logFC.dn.min, logFC.dn.center, logFC.dn.max)
node.colors <- c(brewer.pal(length(data.values), "Blues"))
setNodeColorMapping('logFC', data.values, node.colors, style.name="de genes down")

Apply a force-directed layout.



In [None]:
layoutNetwork(paste('force-directed',
              'defaultSpringCoefficient=0.00003',
              'defaultSpringLength=50',
              'defaultNodeMass=4',
              sep=' '))

## **STRING Enrichment**
Now we can perform STRING Enrichment analysis on the resulting network:

In [None]:
string.cmd = 'string retrieve enrichment allNetSpecies="Homo sapiens", background=genome  selectedNodesOnly="false"'
commandsRun(string.cmd)
string.cmd = 'string show enrichment'
commandsRun(string.cmd)

Filter the analysis results for non-redundant GO Process terms only.



In [None]:
string.cmd = 'string filter enrichment categories="GO Process", overlapCutoff = "0.5", removeOverlapping = "true"'
commandsRun(string.cmd)

In [None]:
string.cmd = 'string show charts'
commandsRun(string.cmd)

## **STRING Disease Query**
So far, we queried the STRING database with a set of genes we knew were differentially expressed. Next, we will query the STRING disease database to retrieve a network genes associated with ovarian cancer, which will be completely independent of our dataset.

In [None]:
string.cmd = 'string disease query disease="ovarian cancer" cutoff="0.95"'
commandsRun(string.cmd)

This will bring in the top 100 (default) ovarian cancer associated genes connected with a confidence score greater than 0.95. Again, lets extract out the connected nodes:

In [None]:
createSubnetwork(edges='all', subnetwork.name='String ovarian sub')

## **Data integration**
Next we will import differential gene expression data from our TCGA dataset to create a visualization. Just like the previous example, we will need to do some identifier mapping to match the data to the network.

In [None]:
mapped.cols <- mapTableColumn("display name",'Human','HGNC','Entrez Gene')

Here we set Human as species, HGNC as Map from, and Entrez Gene as To.

We can now import the data frame with the full data (already loaded the data in Example 1 above) into the node table in Cytoscape:

In [None]:
loadTableData(de.genes.full, data.key.column = "Gene", table = "node", table.key.column = "Entrez Gene")

## **Visualization**
Again, we can create a visualization:

In [None]:
setVisualStyle(style.name="default")

Next, we need the min and max of the logFC column:



In [None]:
logFC.table.ovarian <- getTableColumns('node', 'logFC')

In [None]:
logFC.ov.min <- min(logFC.table.ovarian, na.rm = T)
logFC.ov.max <- max(logFC.table.ovarian, na.rm = T)
logFC.ov.center <- logFC.ov.min + (logFC.ov.max - logFC.ov.min)/2

Let’s create the mapping:



In [None]:
copyVisualStyle(from.style = "default", to.style = "ovarian")
setVisualStyle(style.name="ovarian")

data.values = c(logFC.ov.min, logFC.ov.center, logFC.ov.max)
node.colors <- c(brewer.pal(length(data.values), "RdBu"))
setNodeColorMapping('logFC', data.values, node.colors, style.name="ovarian")

Apply a force-directed layout.



In [None]:
layoutNetwork(paste('force-directed',
              'defaultSpringCoefficient=0.00003',
              'defaultSpringLength=50',
              'defaultNodeMass=4',
              sep=' '))

The TCGA found several genes that were commonly mutated in ovarian cancer, so called “cancer drivers”. We can add information about these genes to the network visualization, by changing the visual style of these nodes. Three of the most important drivers are TP53, BRCA1 and BRCA2. We will add a thicker, colored border for these genes in the network.

Select all three driver genes by:

In [None]:
selectNodes(c("TP53", "BRCA1", "BRCA2"), by.col = "display name")

$nodes
[1] 7633 7683 7672

$edges
list()


Add a style bypass for node Border Width (5) and node Border Paint (bright pink):



In [None]:
setNodeBorderWidthBypass(getSelectedNodes(), 5)
setNodeBorderColorBypass(getSelectedNodes(), '#FF007F')

## **Exporting Networks**
Jupyter Bridge RCy3 does not support import and export files now.

Please use local Cytoscape to import and export files.