KEMRI single-cell genomics workshop - September 2024

In this project you will be analyzing data from a monkey model of ebola infection.

The paper is "Single-Cell Profiling of Ebola Virus Disease In Vivo Reveals Viral and Host Dynamics"
(https://www.cell.com/cell/fulltext/S0092-8674(20)31308-8)

The questions include:
- What is the quality of the cells? Do you need to filter cells?
- Is there any batch effect between monkeys?
- What are the main cell types present in the sample?
- Where can we detect ebola transcripts?
- Do you see any differences between timepoints?

In [None]:
# Start with section to define shell call function and install packages
shell_call <- function(command, ...) {
  result <- system(command, intern = TRUE, ...)
  cat(paste0(result, collapse = "\n"))
}

loadPackages = function(pkgs){
  myrequire = function(...){
    suppressWarnings(suppressMessages(suppressPackageStartupMessages(require(...))))
  }
  ok = sapply(pkgs, require, character.only=TRUE, quietly=TRUE)
  if (!all(ok)){
    message("There are missing packages: ", paste(pkgs[!ok], collapse=", "))
  }
}

## Setup R2U
download.file("https://github.com/eddelbuettel/r2u/raw/master/inst/scripts/add_cranapt_jammy.sh",
              "add_cranapt_jammy.sh")
Sys.chmod("add_cranapt_jammy.sh", "0755")
shell_call("./add_cranapt_jammy.sh")
bspm::enable()
options(bspm.version.check=FALSE)
shell_call("rm add_cranapt_jammy.sh")

In [None]:
## Install and load Seurat
install.packages('Seurat')
library(Seurat)

In [None]:
# Download counts data and metadata
shell_call("wget -q --output-document counts.csv.gz https://www.dropbox.com/scl/fi/s8vr5pab2vzfqzwsbwywu/counts.csv.gz?rlkey=0dkg0abzumzxt8q8ce2ndf9r3&dl=0")
shell_call("wget -q --output-document metadata.csv https://www.dropbox.com/scl/fi/vre2dr5dpbc7v4kzlbzum/metadata.csv?rlkey=msph88xn2w7zmz2zzdk53rk32&dl=0")

In [None]:
##Now lets load the data
matrix <- read.csv2(file = "counts.csv.gz",sep = ",",row.names = 1)
matrix[1:5,1:5]
metadata <- read.csv2(file = "metadata.csv",sep = ",",row.names = 1,header = T)
head(metadata)

In [None]:
##Lets create the seurat object
seuratObject <- CreateSeuratObject(counts = matrix[,rownames(metadata) %in% colnames(matrix)], meta.data = metadata[rownames(metadata) %in% colnames(matrix),], project = "Project4")
seuratObject

In [None]:
# Note mitochondrial genes in monkey
mito.genes<-rownames(seuratObject)[rownames(seuratObject) %in% c('ENSMMUG00000028704','ENSMMUG00000028703','ENSMMUG00000028702','ENSMMUG00000028701','ENSMMUG00000028700','ND1',
  'ENSMMUG00000028698','ENSMMUG00000028697','ENSMMUG00000028696','ND2','ENSMMUG00000028694','ENSMMUG00000028693','ENSMMUG00000028692','ENSMMUG00000028691',
  'ENSMMUG00000028690','COX1','ENSMMUG00000028688','ENSMMUG00000028687','COX2','ENSMMUG00000028685','ATP8','ATP6','COX3','ENSMMUG00000028681','ND3','ENSMMUG00000028679','ND4L',
  'ND4','ENSMMUG00000028676','ENSMMUG00000028675','ENSMMUG00000028674','ND5','ND6','ENSMMUG00000028671','CYTB','ENSMMUG00000028669','ENSMMUG00000028668')]

In [None]:
seuratObject[["percent.mt"]] <- PercentageFeatureSet(seuratObject, features = mito.genes)

In [None]:
##How many cells do we have in each monkey? in each condition days post infection (DPI)?

In [None]:
## Lets do QC -  Use the practical from the previous days and
#check the seurat tutorial! https://satijalab.org/seurat/articles/pbmc3k_tutorial.html

In [None]:
## Once We QC and filter lets do normalization, scaling , PCA, clustering and umap

In [None]:
## Identify Clusters! Hint: Check for  immunes cells and intestinal cells for example B cells,
#Dendritic Cells, Epithelial cells, Erythrocites, Macrophages,
#Hematopoietic stem progenitor cell, Monocytes , Neutrophils , NK Cells and T Cells
#Use websites like https://singlecell.broadinstitute.org/single_cell and
#https://panglaodb.se/ (if cant connect use an online proxy) as well a protein tissue atlas

In [None]:
## How many cell do we have in every condition and sample? Is there any batch effect? Do we need to integrate?
#Tip use the practicals and the seurat tutorial https://satijalab.org/seurat/articles/integration_introduction.html

In [None]:
## How how many cells per cluster/Cell type

In [None]:
## Where can you identify infected cells, check the ebola genes
## "EBOV-GENOME" "EBOV-GP"     "EBOV-L"      "EBOV-NP"     "EBOV-VP24"   "EBOV-VP30"   "EBOV-VP35"   "EBOV-VP40"

In [None]:
## How do cytokines and ISGs change across time
##Check genes like "STAT1", "ISG15, "MX1" among others!