# Perform MAST Differential Expression Tests

In this notebook, we retrieve our CD4 and CD8 T cells and our cell type labels, then perform MAST differential gene expression tests. Comparisons will be carried out for each drug treatment at each timepoint compared to the DMSO-only control for each timepoint within each cell type.

To balance cell counts, we'll group cells by treatment or control and cell type, then use the minimum number of cells across all samples. For example, to test CD4 Naive T cells under Bortezomib treatment, we'll examine the number of CD4 Naive cells in Bortezomib and DMSO at 4, 24, and 72 hours and randomly sample based on the minimum counts from all 6 samples.

We'll then perform comparisons between treatment and control at each timepoint (e.g. CD4 Naive w/Bortezomib @ 4 hr vs. CD4 Naive w/DMSO @4 hr). 

For MAST, we need to also include Cellular Detection Rate (CDR) as a cofactor to control for gene expression differences between samples.

## Load packages

hise: The Human Immune System Explorer R SDK package  
purrr: Functional programming tools  
dplyr: Dataframe handling functions  
ggplot2: plotting functions  
Seurat: single cell genomics methods
MAST: single cell differential expression tests

In [1]:
quiet_library <- function(...) { suppressPackageStartupMessages(library(...)) }
quiet_library(hise)
quiet_library(purrr)
quiet_library(dplyr)
quiet_library(ggplot2)
quiet_library(Seurat)
quiet_library(MAST)

## Retrieve files

Now, we'll use the HISE SDK package to retrieve the Seurat objects and cell type labels based on file UUIDs. This will be placed in the `cache/` subdirectory by default.

In [2]:
file_uuids <- list(
    "7bdac6ef-e5e5-4150-b4f3-9c1a1e250334", # CD4 T cell Seurat object
    "46438bc4-cde6-4ae6-b349-9c513dd9d16f", # CD8 T cell Seurat object
    "a89342dc-be78-4856-aa2c-fe8dbef2d5ad", # CD4 type labels
    "c2b5cf2a-b83a-4eee-bda1-66a0e2ff45d2"  # CD8 type labels
)

In [3]:
fres <- cacheFiles(file_uuids)

[1] "Initiating file download for filtered_cd4_te_seurat.rds"
[1] "Download successful."
[1] "Initiating file download for filtered_cd8_te_seurat.rds"
[1] "Download successful."
[1] "Initiating file download for cd4_cell_type_labels.csv"
[1] "Download successful."
[1] "Initiating file download for cd8_cell_type_labels.csv"
[1] "Download successful."


## Select cells


In [4]:
cd4_labels <- read.csv("cache/a89342dc-be78-4856-aa2c-fe8dbef2d5ad/cd4_cell_type_labels.csv")
cd8_labels <- read.csv("cache/c2b5cf2a-b83a-4eee-bda1-66a0e2ff45d2/cd8_cell_type_labels.csv")

In [5]:
all_labels <- rbind(cd4_labels, cd8_labels)

In [6]:
head(all_labels)

Unnamed: 0_level_0,barcodes,treatment,timepoint,predicted.celltype.l1.score,predicted.celltype.l1,predicted.celltype.l2.score,predicted.celltype.l2,predicted.celltype.l3.score,predicted.celltype.l3,aifi_cell_type
Unnamed: 0_level_1,<chr>,<chr>,<int>,<dbl>,<chr>,<dbl>,<chr>,<dbl>,<chr>,<chr>
1,2da9d348fb8111eda35df29f570c0793,dmso,24,1,CD4 T,0.7379073,CD4 Naive,0.7379073,CD4 Naive,t_cd4_naive
2,2daec6d2fb8111eda35df29f570c0793,dmso,24,1,CD4 T,1.0,CD4 Naive,1.0,CD4 Naive,t_cd4_naive
3,2db119d2fb8111eda35df29f570c0793,dmso,24,1,CD4 T,0.6491493,CD4 TCM,0.4892181,CD4 TCM_1,t_cd4_cm
4,2db582c4fb8111eda35df29f570c0793,dmso,24,1,CD4 T,0.8972198,CD4 Naive,0.8972198,CD4 Naive,t_cd4_naive
5,2db6727efb8111eda35df29f570c0793,dmso,24,1,CD4 T,0.3939763,CD4 TCM,0.2974696,CD4 Naive,t_cd4_naive
6,2dc35a20fb8111eda35df29f570c0793,dmso,24,1,CD4 T,0.6306972,CD4 Naive,0.6306972,CD4 Naive,t_cd4_naive


Exclude untreated cells - we won't use these for our treatment comparisons

In [19]:
all_labels <- all_labels %>%
  filter(treatment != "untreated")

Get counts of each cell type for each sample:

In [20]:
count_summary <- all_labels %>%
  group_by(treatment, timepoint, aifi_cell_type) %>%
  summarise(n_cells = n(),
            .groups = "keep") %>%
  ungroup()

Add a column for DMSO counts per type and timepoint

In [25]:
count_summary <- count_summary %>%
  ungroup() %>%
  group_by(aifi_cell_type, timepoint) %>%
  mutate(n_dmso = n_cells[treatment == "dmso"]) %>%
  ungroup() %>%
  filter(treatment != "dmso")

Regroup by treatment and cell type, and use treatment and DMSO counts to find minimums for sampling

In [26]:
type_minimums <- count_summary %>%
  group_by(treatment, aifi_cell_type) %>%
  mutate(n_sample = min(c(n_cells, n_dmso)))

In [28]:
type_minimums %>%
  arrange(treatment, aifi_cell_type, timepoint)

treatment,timepoint,aifi_cell_type,n_cells,n_dmso,n_sample
<chr>,<int>,<chr>,<int>,<int>,<int>
bortezomib,4,t_cd4_cm,3481,2619,467
bortezomib,24,t_cd4_cm,3936,5180,467
bortezomib,72,t_cd4_cm,1399,467,467
bortezomib,4,t_cd4_naive,5975,4393,1341
bortezomib,24,t_cd4_naive,7016,9328,1341
bortezomib,72,t_cd4_naive,2504,1341,1341
bortezomib,4,t_cd4_treg,802,587,587
bortezomib,24,t_cd4_treg,716,1034,587
bortezomib,4,t_cd8_memory,1153,883,166
bortezomib,24,t_cd8_memory,1374,1770,166
