GeneExpressionWorkshopRMed2022

This repo contains materials for the Gene Expression Analysis in R workshop for the R/Medicine Conference 2022.

Authors

Piru Perampalam
Siavash Ghaffari

Setup

Before attending the workshop, please clone this repo and make sure you have a working installation of R with the following packages:

# From Cran
install.packages("ggplot2")
install.packages("RColorBrewer")
install.packages("circlize")

# From Bioconductor (requires BiocManager)
BiocManager::install("DESeq2")
BiocManager::install("apeglm")
BiocManager::install("AnnotationDbi")
BiocManager::install("org.Hs.eg.db")

# From GitHub (requires devtools)
devtools::install_github("jokergoo/ComplexHeatmap")
devtools::install_github("kevinblighe/EnhancedVolcano")

Introduction

This workshop will cover standard bulk RNA-seq analysis using R. For demonstration purposes, we will be using some unpublished prostate cancer data. For time limitations, we will begin will pre-aligned read counts. That is, raw read files (FASTQ files) have already been aligned to the human reference genome to obtain a read counts table. We will load these read counts into R and use DESeq2 to perform differential expression analysis.

Our experimental data was obtained by performing a drug screen on advanced-stage metastatic prostate cancer biopsies. In brief, tumor biopsies were obtained from five patients and dissociated into cell clumps. Patient tumor cells were seeded into plates and treated with either DMSO or an LSD1 inhibitor (LSD1i or SP2509) for 24 hours. Following this, cells were collected and total RNA was extracted for sequencing on the Illumina HiSeq platform. In addition, clinical attributes from the five patients were collected at the time of biopsy.

You will find read counts in the CSV file data/ProstateCancer_DMSO_SP2509_LSD1i_readcounts.csv. Each column pertains to one sample and each row pertains to a unique transcript. Each patient has two samples: one that is DMSO-treated, and one that is treated with the drug (LDS1i, SP2509).

We also require some sample metadata (also referred to in DESeq2 as colData). This metadata is stored in the CSV file data/ProstateCancer_sampleInfo.csv. This file contains condition information (DMSO or SP2509) and replicate information (replicates 1 to 5 for patients 1 through 5). It also contains some clinical features:

mutationCount: total number of mutations detected in the tumor biopsies by whole-genome sequencing
diagnosisAge: age of the patient at the time of initial diagnosis (note: not the age at which biopsy was taken)
psa: PSA levels at the time of biopsy
stage: prostate cancer stage grouping based on the AJCC (American Joint Committee on Cancer) TNM system.
tmb: tumor mutational burden (nonsynonymous mutations only)

Goals and outline of steps

The goals of this workshop are as follows:

Load input data and do some simple data preparation
Create DESeq2 data set (dds)
Perform differential expression
Perform some additional statistical operations
Filter the data
Make a PCA plot
Make a sample clustering plot
Make a heatmap to show top differentially expressed genes
Make a volcano plot to look at log fold changes and p values
Look at some individual gene expression data in relation to clinical attributes

After completing this workshop, you should feel comfortable analyzing bulk RNA-seq data in R!

Additional notes

We left out FASTQ alignments in this workshop due to the time it would take to perform this step. This is usually the most time consuming and resource intensive (CPU and memory) step in bulk RNA-seq analysis. If you are interested in learning about this step, please see a tutorial I have previously written: NGS alignments using STAR

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data		data
output		output
.DS_Store		.DS_Store
ExperimentalDesign.pdf		ExperimentalDesign.pdf
ExperimentalDesign.pptx		ExperimentalDesign.pptx
R workshop.pptx		R workshop.pptx
README.md		README.md
analysis.R		analysis.R
analysis_withpathway.R		analysis_withpathway.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GeneExpressionWorkshopRMed2022

Authors

Setup

Introduction

Goals and outline of steps

Additional notes

About

Releases

Packages

Contributors 3

Languages

developerpiru/GeneExpressionWorkshopRMed2022

Folders and files

Latest commit

History

Repository files navigation

GeneExpressionWorkshopRMed2022

Authors

Setup

Introduction

Goals and outline of steps

Additional notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages