# Protocol 1 - Find enriched pathways using gene lists from R

-----------------

## Set up environment

Gene lists can be built from any set of thresholds or parameters.  Commonly, a gene list consists of the set of significantly differential genes (for example, all genes with p-value < 0.05) but can also consists of only up-regulated or down-regulated genes. Or in our use case, mesenchymal specific genes and immunoreactive specific genes. The intepretation of enrichment results using a gene list very much depends on what threshold or parameters used to define the list.

In [15]:
## 1. Load required packages into R

In [53]:
library("gProfileR");

In [17]:
## 2. Set working directory

In [18]:
setwd("./data")

ERROR: Error in setwd("./data"): cannot change working directory


## Create list of genes you wish to test for enrichment

As an example we can create a list containing all genes that were found to be significant in RNAseq Mesenchymal samples. (Using the data generated in Supplementary Protocol 2 - Mesenchymal genes)

In [19]:
mesenchymal_genes <- read.table( "mesenvsimmuno_mesenonly_RNAseq_gprofiler.txt", header = FALSE, 
                                sep = "\t", quote="\"",  stringsAsFactors = FALSE)
mesenchymal_genes <- as.vector(t(mesenchymal_genes))

In [20]:
head(mesenchymal_genes)

## Run g:Profiler

In [21]:
mesenchymal_gprofiler_results <- gprofiler(mesenchymal_genes,significant=T,ordered_query = T,
                                           exclude_iea=T,max_set_size = 500,
                                           correction_method = "fdr",
                                           src_filter = c("GO:BP","KEGG","REAC"))

In [22]:
head(mesenchymal_gprofiler_results)

Unnamed: 0,query.number,significant,p.value,term.size,query.size,overlap.size,recall,precision,term.id,domain,subgraph.number,term.name,relative.depth,intersection
1,1,True,0.0356,102,757,10,0.013,0.098,GO:0050890,BP,229,cognition,1,"FYN,HOXA1,SOBP,DLG4,PLK2,NLGN4X,NTRK2,SHROOM4,CLDN5,PJA2"
2,1,True,0.0441,2,103,1,0.01,0.5,GO:0086100,BP,154,endothelin receptor signaling pathway,1,EDNRA
3,1,True,0.018,6,339,2,0.006,0.333,GO:0070141,BP,191,response to UV-A,1,"CCND1,MME"
4,1,True,0.00813,23,454,4,0.009,0.174,GO:1901522,BP,259,positive regulation of transcription from RNA polymerase II promoter involved in cellular response to chemical stimulus,1,"DLX5,SMAD9,RUNX2,BMP2"
5,1,True,0.0268,7,17,1,0.059,0.143,GO:0032493,BP,208,response to bacterial lipoprotein,1,SSC5D
6,1,True,0.0382,10,17,1,0.059,0.1,GO:0032490,BP,208,detection of molecule of bacterial origin,1,SSC5D


## Filter results

Some of the web interface parameters are not tuneable from the R package (for example, minimum term size or minimum overlap).  Filter the returned results to apply the same thresholds as applied using the web interface.

Exclude all results from terms with size < 3 and overlap < 2

In [23]:
mesenchymal_gprofiler_results <- mesenchymal_gprofiler_results[which(mesenchymal_gprofiler_results[,'term.size'] >= 3
                                        & mesenchymal_gprofiler_results[,'overlap.size'] >= 2 ),]


## Create Enrichment Map generic results file

The enrichment Map generic results file is the file that can be used as input to create an Enrichment map.  Minimally it contains term id, term name, pvalue, qvalue, phenotype and list of genes. 

In [24]:
# gProfileR returns corrected p-values only.  Set p-value to corrected p-value
mesenchymal_em_results <- cbind(mesenchymal_gprofiler_results[,c("term.id","term.name","p.value","p.value")], 1,
                                mesenchymal_gprofiler_results[,"intersection"])
colnames(mesenchymal_em_results) <- c("Name","Description", "pvalue","qvalue","phenotype","genes")

write.table(mesenchymal_em_results,"gprofiler_results_mesenonly_ordered_computedinR.txt",col.name=TRUE,sep="\t",row.names=FALSE,quote=FALSE)


## Create Enrichment map directly from R

Create EM through Cyrest interface - make sure you open cytoscape with a -R 1234 to enable rest functionality.

**Launch Cytoscape**

On **Windows** open a command window and run:
cd "C:\Program files\Cytoscape_v3.3.0\" \ 
    cytoscape.bat -R 1234
    
On **MAC** open a terminal window and run:
/Applications/Cytoscape_v3.3.0/cytoscape.sh -R 1234


><span style="color:red">**the code below can not be run from within the docker container of this tutorial unless you have mapped the localhost to the ip address of the computer you are running the docker image from.  Add the following to the docker run command (substitute your own ip address though)  --add-host="localhost:192.168.0.10" .  **</span>
* If you have cytoscape and R running on your computer you can run it directly from R. The R code is basically constructing a url that it then calls in order to create the network in cytoscape.  
* It requires two aditional libaries in R, 
  * install.packages('RJSONIO','httr')

In [25]:
library(RJSONIO)

library(httr)
# Basic settings
port.number = 1234
base.url = paste("http://localhost:", toString(port.number), "/v1", sep="")

print(base.url)

version.url = paste(base.url, "version", sep="/")
cytoscape.version = GET(version.url)
cy.version = fromJSON(rawToChar(cytoscape.version$content))
print(cy.version)


[1] "http://localhost:1234/v1"
      apiVersion cytoscapeVersion 
            "v1"          "3.3.0" 


<span style="color:blue">**Specify the path to your data directory. Change variable "path_to_file" **</span><br> 
<span style="color:red">**On windows use / instead of the regular \ to specify the path.!**</span> 

In [54]:

# to create an Enrichment map we need to specify
# analysisType = generic
# 
enrichmentmap.url <- paste(base.url, "commands","enrichmentmap","build", sep="/") 

#path_to_file="/Users/risserlin/Dropbox (Bader Lab)/Ruth Isserlin's files/Enrichment_Analyses/Jupyter_Notebooks/notebooks/data"
#on windows change the \ to / in order for the pathname to be interpretted correctly.
path_to_file="C:/Users/zaphod/Ruth_dropbox/Dropbox (Bader Lab)/Ruth Isserlin's files/Enrichment_Analyses/Jupyter_Notebooks/notebooks/data/"

enr_file = paste(path_to_file,"gprofiler_results_mesenonly_ordered_computedinR.txt",sep="")

em_params <- list(analysisType = "generic",enrichmentsDataset1 = enr_file,pvalue="1.0",qvalue="0.00001",
                  #expressionDataset1 = exp_file, 
                  similaritycutoff="0.25",coeffecients="JACCARD")

response <- GET(url=enrichmentmap.url, query=em_params)

In [51]:
#get the url used to generate network.  
response$url


In [52]:
#get the content returned.  If the call was successful the message should be "finished"
 content(response, "text", encoding = "ISO-8859-1")

**Go to your open instance of cytoscape to see you results. **<BR>
Network should look similar to the below figure

<img src="figures/gprofiler_example_network_forjupyter.png">