Merge branch 'master' of github.com:Bioconductor/BiocWorkshops

Bioconductor · Jul 22, 2018 · ddb6a05 · ddb6a05
2 parents dad707a + e005739
commit ddb6a05
Show file tree

Hide file tree

Showing 12 changed files with 400 additions and 224 deletions.
diff --git a/102_Lawrence_GenomicRanges.Rmd b/102_Lawrence_GenomicRanges.Rmd
@@ -1,6 +1,6 @@
 # 102: Solving common bioinformatic challenges using GenomicRanges
 
-## Instructor(s) name(s) and contact information
+## Instructor name and contact information
 
 * Michael Lawrence (michafla@gene.com)
 

diff --git a/103_Waldron_PublicData.Rmd b/103_Waldron_PublicData.Rmd
diff --git a/200_Law_RNAseq123.Rmd b/200_Law_RNAseq123.Rmd
@@ -89,10 +89,12 @@ Count data for these samples can be downloaded from the Gene Expression Omnibus
 
 
 ```{r setup2, eval=TRUE}
-library(limma)
-library(Glimma)
-library(edgeR)
-library(Mus.musculus)
+suppressPackageStartupMessages({
+  library(limma)
+  library(Glimma)
+  library(edgeR)
+  library(Mus.musculus)
+})
 ```
 
 ## Data packaging

diff --git a/201_Love_DESeq2.Rmd b/201_Love_DESeq2.Rmd
@@ -6,10 +6,6 @@ Authors:
     Wolfgang Huber^[EMBL Heidelberg, Germany]
 Last modified: 25 June, 2018.
 
-<!-- 
-  bookdown::render_book("201_Love_DESeq2.Rmd", "bookdown::gitbook") 
--->
-
 ## Overview
 
 ### Description

diff --git a/202_Das_SingleCellRNASeq.Rmd b/202_Das_SingleCellRNASeq.Rmd
@@ -102,18 +102,18 @@ The following packages are needed.
 
 
 ```{r packages}
-# Bioconductor
-library(BiocParallel)
-library(SingleCellExperiment)
-library(clusterExperiment)
-library(scone)
-library(zinbwave)
-library(slingshot)
-
-# CRAN
-library(gam)
-library(RColorBrewer)
-
+suppressPackageStartupMessages({
+  # Bioconductor
+  library(BiocParallel)
+  library(SingleCellExperiment)
+  library(clusterExperiment)
+  library(scone)
+  library(zinbwave)
+  library(slingshot)
+  # CRAN
+  library(gam)
+  library(RColorBrewer)
+})
 set.seed(20)
 ```
 
@@ -138,7 +138,7 @@ fletcher
 
 Throughout the workshop, we use the class `SingleCellExperiment` to keep track of the counts and their associated metadata within a single object. 
 
-```{r sce_schema, echo=FALSE, out.width="90%", fig.cap="Schematic view of the SingleCellExperiment class."}
+```{r sceschema, echo=FALSE, out.width="90%", fig.cap="Schematic view of the SingleCellExperiment class."}
 knitr::include_graphics("202_Das_SingleCellRNASeq/SingleCellExperiment.png")
 ```
 

diff --git a/210_Geistlinger_Omics.Rmd b/210_Geistlinger_Omics.Rmd
@@ -40,7 +40,6 @@ Execution of example code and hands-on practice
 
 * [EnrichmentBrowser](http://bioconductor.org/packages/EnrichmentBrowser) 
 * [regioneR](http://bioconductor.org/packages/regioneR)
-
 * [airway](http://bioconductor.org/packages/airway)
 * [ALL](http://bioconductor.org/packages/ALL)
 * [hgu95av2.db](http://bioconductor.org/packages/hgu95av2.db)
@@ -59,14 +58,16 @@ Execution of example code and hands-on practice
 
 ## Goals and objectives
 
-Theory
+Theory:
+
 * Gene sets, pathways & regulatory networks
 * Resources
 * Gene set analysis vs. gene set enrichment analysis
 * Underlying null: competitive vs. self-contained
 * Generations: ora, fcs & topology-based
 
 Practice:
+
 * Data types: microarray vs. RNA-seq
 * Differential expression analysis
 * Defining gene sets according to GO and KEGG
@@ -75,8 +76,6 @@ Practice:
 * Network-based enrichment analysis
 * Genomic region enrichment analysis
 
-## Workshop
-
 ## Where does it all come from?
 
 Test whether known biological functions or processes are over-represented

diff --git a/230_Isserlin_RCy3_intro.Rmd b/230_Isserlin_RCy3_intro.Rmd
@@ -12,7 +12,7 @@ suppressPackageStartupMessages({
 
 ##Overview
 
-### Instructor(s) name(s) and contact information
+### Instructor names and contact information
 
 * Ruth Isserlin - ruth dot isserlin (at) utoronto (dot) ca
 * Brendan Innes - brendan (dot) innes (at) mail (dot) utoronto (dot) ca

diff --git a/240_Lee_Plyranges.Rmd b/240_Lee_Plyranges.Rmd
@@ -1,6 +1,6 @@
 # 240: Fluent genomic data analysis with plyranges
 
-## Instructor(s) name(s) and contact information
+## Instructor names and contact information
 
 * Stuart Lee (lee.s@wehi.edu.au)
 * Michael Lawrence (michafla@gmail.com)

diff --git a/250_Cooley_DECIPHER.Rmd b/250_Cooley_DECIPHER.Rmd
@@ -3,10 +3,6 @@
 Authors:
     Nicholas Cooley^[University of Pittsburgh]
 Last Modified: 18 July, 2018
-
-<!--
-  bookdown::render_book("", "bookdown::gitbook")
--->
 
 ## Overview
 
@@ -216,27 +212,27 @@ There are multiple ways to view objects of class `synteny`. It has default view
 
 Plotting genomes 2 and 3, which are very syntenic, provides a good example of this. Grey uncolored space in the subject genome indicates a lack of syntenic blocks in that region. In some places, the ramp reverses, indicating an inversion, while in others, blocks are reordered from the default color bar, indicating possible rearrangements.
 
-```{r plot default}
+```{r plotdefault}
 plot(SyntenyObject[2:3, 2:3])
 ```
 
 Because visualizing large numbers of syntenic hits in genome sized sequences can be non-trivial, additional commands can be passed to plot in order to modify coloring schemes. `frequency` causes bars representing genomes to be colored by **how syntenic** that particular region (as determined by cumulative nucleotide position) of the genome is, indiscriminate of where that hit is in a corresponding genome. Once again genomes 2 and 3 in our set are good examples of this. They are highly syntenic, which this option shows well, but the rearrangements present are no longer shown.
 
-```{r plot frequency}
+```{r plotfrequency}
 plot(SyntenyObject[2:3, 2:3],
      "frequency")
 ```
 
 Because visualizing large numbers of syntenic hits in genome sized sequences can be non-trivial, additional commands can be passed to plot in order to modify coloring schemes. `neighbor` tries to provide each syntenic hit with a unique color, and additionally draws linear connections between them to facilitate better visualization. This can be a useful alternative to the default plotting, if many rearrangements are present in a way that you want to view.
 
-```{r plot neighbors}
+```{r plotneighbors}
 plot(SyntenyObject[2:3, 2:3],
      "neighbor")
 ```
 
 Plotting is a useful tool in analzying sequence data, however sometimes plots of genome sized sequence data can be a bit overwhelming. Plotting our entire database, 8 sequences, though good at representing the complexity present here in this workshop, can be daunting to make sense of quickly if you are unused to looking at figures of this type frequently.
 
-```{r plot neighbors 2}
+```{r plotneighbors2}
 plot(SyntenyObject,
      "neighbor")
 ```

diff --git a/260_Safikhani_Pharmacogenomics.Rmd b/260_Safikhani_Pharmacogenomics.Rmd
@@ -64,10 +64,10 @@ An example for a 45-minute workshop:
 * Assess whether known biomarkers are reproduced within these datasets 
 * Predict new biomarkers by applying different machine learning methods
 
-# Abstract}
+## Abstract
 This course will focus on the challenges encountered when applying machine learning techniques in complex, high dimensional biological data. In particular, we will focus on biomarker discovery from pharmacogenomic data, which consists of developing predictors of response of cancer cell lines to chemical compounds based on their genomic features. From a methodological viewpoint, biomarker discovery is strongly linked to variable selection, through methods such as Supervised Learning with sparsity inducing norms (e.g., ElasticNet) or techniques accounting for the complex correlation structure of biological features (e.g., mRMR). Yet, the main focus of this talk will be on sound use of such methods in a pharmacogenomics context, their validation and correct interpretation of the produced results. We will discuss how to assess the quality of both the input and output data. We will illustrate the importance of unified analytical platforms, data and code sharing in bioinformatics and biomedical research, as the data generation process becomes increasingly complex and requires high level of replication to achieve robust results. This is particularly relevant as our portfolio of machine learning techniques is ever enlarging, with its set of hyperparameters that can be tuning in a multitude of ways, increasing the risk of overfitting when developing multivariate predictors of drug response.
 
-# Introduction
+## Introduction
 
 Pharmacogenomics holds much potential to aid in discovering drug response
 biomarkers and developing novel targeted therapies, leading to development of
@@ -89,23 +89,25 @@ intuitive interface and, in combination with unified curation, simplifies
 analyses between multiple datasets.
 
 
-Load *PharamacoGx* into your current workspace:
+Load `r BiocStyle::Biocpkg("PharmacoGx")` into your current workspace:
 ```{r loadlib, eval=TRUE, results='hide'}
-library(PharmacoGx, verbose=FALSE)
-library(mCI, verbose=FALSE)
-library(PharmacoGxML, verbose=FALSE)
-library(Biobase, verbose=FALSE)
+suppressPackageStartupMessages({
+  library(PharmacoGx, verbose=FALSE)
+  library(mCI, verbose=FALSE)
+  library(PharmacoGxML, verbose=FALSE)
+  library(Biobase, verbose=FALSE)
+})
 ```
 
-## Downloading PharmacoSet objects}
+### Downloading PharmacoSet objects
 We have made the PharmacoSet objects of the curated datasets available for download using functions provided in the package. A table of available PharmacoSet objects can be obtained by using the *availablePSets* function. Any of the PharmacoSets in the table can then be downloaded by calling *downloadPSet*, which saves the datasets into a directory of the users choice, and returns the data into the R session. 
 ```{r download_psets, eval=TRUE, results='hide'}
   availablePSets(saveDir=file.path(".", "Safikhani_Pharmacogenomics"))
   GDSC <- downloadPSet("GDSC", saveDir=file.path(".", "Safikhani_Pharmacogenomics")) 
   CCLE <- downloadPSet("CCLE", saveDir=file.path(".", "Safikhani_Pharmacogenomics"))
 ```
 
-# Reproducibility}
+## Reproducibility
 *PharmacoGx* can be used to process pharmacogenomic datasets. First we want to check the heterogenity of cell lines in one of the available psets, CCLE.
 
 ```{r pie_chart, fig.cap="Tissue of origin of cell lines in CCLE study"}
@@ -120,7 +122,7 @@ pie(table(CCLE@cell[,"tissueid"]),
     cex=0.8)
 ```
 
-## Plotting Drug-Dose Response Data
+### Plotting Drug-Dose Response Data
 
 Drug-Dose response data included in the PharmacoSet objects can be conviniently plotted using the *drugDoseResponseCurve* function. Given a list of PharmacoSets, a drug name and a cell name, it will plot the drug dose response curves for the given cell-drug combination in each dataset, allowing direct comparisons of data between datasets. 
 
@@ -148,7 +150,7 @@ drugDoseResponseCurve(drug="lapatinib", cellline=cells[3],
                       legends.label="auc_published")
 ```
 
-## Pharmacological profiles}
+### Pharmacological profiles
 In pharmacogenomic studies, cancer cell lines were also tested for their response to increasing concentrations of various compounds, and form this the IC50 and AAC were computed. These pharmacological profiles are available for all the psets in *PharmacoGx*.
 
 ```{r ccle_auc, fig.cap="Cells response to drugs in CCLE"}
@@ -166,7 +168,7 @@ ggplot(melted_data, aes(x=Var1,y=value)) +
 #     col="gray", main="")
 ```
 
-# Replication 
+## Replication 
 In this section we will investigate the consistency between the GDSC and CCLE datasets. In both CCLE and GDSC, the transcriptome of cells was profiled using an Affymatrix microarray chip. Cells were also tested for their response to increasing concentrations of various compounds, and form this the IC50 and AUC were computed. However, the cell and drugs names used between the two datasets were not consistent. Furthermore, two different microarray platforms were used. However, *PharmacoGx* allows us to overcome these differences to do a comparative study between these two datasets. 
 
 GDSC was profiled using the hgu133a platform, while CCLE was profiled with the expanded hgu133plus2 platform. While in this case the hgu133a is almost a strict subset of hgu133plus2 platform, the expression information in *PharmacoSet* objects is summarized by Ensemble Gene Ids, allowing datasets with different platforms to be directly compared. The probe to gene mapping is done using the BrainArray customCDF for each platform \cite{sabatti_thresholding_2002}.
@@ -199,7 +201,7 @@ for (i in 1:nrow(cases)) {
 }
 ```
 
-## Consistency of pharmacological profiles}
+### Consistency of pharmacological profiles
 ```{r sensitivity_scatter_plots, fig.height=5, fig.width=10, fig.cap="Concordance of AAC values"}
   
   ##AAC scatter plot 
@@ -257,7 +259,7 @@ legend("topright",
          bty="n")
 ```
 
-## consistency assessment improved by Modified Concordance Index}
+### consistency assessment improved by Modified Concordance Index
 To better assess the concordance of multiple pharmacogenomic studies we introduced the modified concordance index (mCI). Recognizing that the noise in the drug screening assays is high and may yield to inaccurate sensitive-based ranking of cell lines with close AAC values, the mCI only considers cell line pairs with drug sensitivity (AAC) difference greater than $\delta$ .
 
 ```{r mci, eval=TRUE, results='hide'}
@@ -273,7 +275,7 @@ text(mp, par("usr")[3], labels=as.vector(rbind(drugs, rep("", 15))), srt=45, adj
 abline(h=.7, lty=2)
 ```
 
-## Known Biomarkers 
+### Known Biomarkers 
 The association between molecular features and response to a given drug is modelled using a linear regression model adjusted for tissue source: 
 $$Y = \beta_{0} + \beta_{i}G_i + \beta_{t}T + \beta_{b}B$$
 where $Y$ denotes the drug sensitivity variable, $G_i$, $T$ and $B$ denote the expression of gene $i$, the tissue source and the experimental batch respectively, and $\beta$s are the regression coefficients. The strength of gene-drug association is quantified by $\beta_i$, above and beyond the relationship between drug sensitivity and tissue source. The variables $Y$ and $G$ are scaled (standard deviation equals to 1) to estimate standardized coefficients from the linear model. Significance of the gene-drug association is estimated by the statistical significance of $\beta_i$ (two-sided t test). P-values are then corrected for multiple testing using the false discovery rate (FDR) approach.
@@ -355,13 +357,14 @@ Some of the widely used multivariate machine learning methods such as elastic ne
 
 
 ```{r machine_learning, results='hide'} 
-library(mRMRe, verbose=FALSE)
-library(Biobase, verbose=FALSE)
-library(Hmisc, verbose=FALSE)
-library(glmnet, verbose=FALSE)
-library(caret, verbose=FALSE)
-library(randomForest, verbose=FALSE)
-
+suppressPackageStartupMessages({
+  library(mRMRe, verbose=FALSE)
+  library(Biobase, verbose=FALSE)
+  library(Hmisc, verbose=FALSE)
+  library(glmnet, verbose=FALSE)
+  library(caret, verbose=FALSE)
+  library(randomForest, verbose=FALSE)
+})
 ##Preparing trainig dataset
 train_expr <- t(exprs(summarizeMolecularProfiles(GDSC, mDataType="rna", fill.missing=FALSE, verbose=FALSE)))
 aac <- summarizeSensitivityProfiles(GDSC, sensitivity.measure="auc_recomputed", drug="lapatinib", fill.missing=FALSE, verbose=FALSE)
@@ -394,16 +397,19 @@ for(method in c("ridge", "lasso", "random_forest", "svm")){
 }
 ```
 
-## Bonus: Using the Connectivity Map for drug repurposing
+### Bonus: Using the Connectivity Map for drug repurposing
 
 We show here how to use *PharmacoGx* for linking drug perturbation signatures inferred from CMAP to independent signatures of HDAC inhibitors published in Glaser et al. (2003). We therefore sought to reproduce the HDAC analysis in Lamb et al. (2006) using the latest version of CMAP that can be downloaded using downloadPSet. The connectivityScore function enables the computation of the connectivity scores between the 14-gene HDAC signature from (Glaser et al., 2003) and over 1000 CMAP drugs. This analysis results in the four HDAC inhibitors in CMAP being ranked at the top of the drug list (Fig. 2), therefore concurring with the original CMAP analysis (Lamb et al., 2006).
 
-```{r runCMAP, message=FALSE}
+```{r runCMAP, message=FALSE, warning=FALSE}
 ## download and process the HDAC signature
 mydir <- "1132939s"
 downloader::download(paste("http://www.sciencemag.org/content/suppl/2006/09/29/313.5795.1929.DC1/", mydir, ".zip", sep=""), destfile=paste(mydir,".zip",sep=""))
 unzip(paste(mydir,".zip", sep=""))
 
+library(hgu133a.db)
+library(PharmacoGx)
+
 HDAC_up <- gdata::read.xls(paste(mydir, paste(mydir, "sigS1.xls", sep="_"),sep="/"), sheet=1, header=FALSE, as.is=TRUE)
 HDAC_down <- gdata::read.xls(paste(mydir, paste(mydir, "sigS1.xls", sep="_"),sep="/"), sheet=2, header=FALSE, as.is=TRUE)
 HDAC <- as.data.frame(matrix(NA, nrow=nrow(HDAC_down)+nrow(HDAC_up), ncol=2))
@@ -425,15 +431,15 @@ dimnames(drug.perturbation)[[1]] <- gsub("_at", "", dimnames(drug.perturbation)[
 message("Be aware that computing sensitivity will take some time...")
 cl <- parallel::makeCluster(2)
 res <- parApply(drug.perturbation[ , , c("tstat", "fdr")], 2, function(x, HDAC){ 
-	return(connectivityScore(x=x, y=HDAC, method="gsea", nperm=10))
+	return(PharmacoGx::connectivityScore(x=x, y=HDAC, method="gsea", nperm=100))
 }, cl=cl, HDAC=HDAC)
 stopCluster(cl)
 rownames(res) <- c("Connectivity", "P Value")
 res <- t(res)
 
 
 res <- apply(drug.perturbation[ , , c("tstat", "fdr")], 2, function(x, HDAC){ 
-	return(connectivityScore(x=x, y=HDAC, method="gsea", nperm=100))
+	return(PharmacoGx::connectivityScore(x=x, y=HDAC, method="gsea", nperm=100))
 }, HDAC=HDAC)
 rownames(res) <- c("Connectivity", "P Value")
 res <- t(res)
@@ -445,7 +451,7 @@ HDAC_ranks <- which(rownames(res) %in% HDAC_inhibitors)
 ```
 
 
-# Session Info
+## Session Info
 
 This document was generated with the following R version and packages loaded:
 ```{r sessionInfo}

diff --git a/510_Turaga_MaintainBioc.Rmd b/510_Turaga_MaintainBioc.Rmd
@@ -471,3 +471,5 @@ There is plenty of infrastructure not covered in this short workshop,
 ## Acknowledgements
 
 The [Bioconductor core team](https://www.bioconductor.org/about/core-team/).
+
+# Bibliography
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -49,6 +49,7 @@ Imports:
   HDF5Array,
   HMP16SData,
   hgu95av2.db,
+  hgu133a.db,
   Hmisc,
   Homo.sapiens,
   htmltools,
@@ -123,7 +124,7 @@ Remotes:
   seandavi/SRAdbV2,
   jmacdon/Bioc2018Anno,
   seandavi/BiocPkgTools,
-  npcooley/FindHomology (>= 0.0.0.9001),
+  npcooley/FindHomology,
   drisso/fletcher2017data,
   bhklab/mCI,
   bhklab/PharmacoGxML