# Clonotype Statistical Enrichment Analysis

Here we will identify therapy-related clonotypes, clonotypes that are differentially expanded post therapy.

### Env Setup

In [None]:
# Load project configuration
setwd("/scratch_isilon/groups/singlecell/gdeuner/SERPENTINE_TCR")
options(repr.matrix.max.rows=100, repr.matrix.max.cols=100)
options(warn = -1)
source("code/helper/Config.R", echo = FALSE)

In [None]:
# Import plotting helper functions
source("/scratch_isilon/groups/singlecell/gdeuner/SERPENTINE_TCR/code/helper/Plotting_Functions.R", echo = FALSE)

In [None]:
# Define figures path
fig_dir <- "/scratch_isilon/groups/singlecell/gdeuner/SERPENTINE_TCR/out/figs/TCR_Fig_Jan/tumor_DE"

### Load Tumor 10x Processed TCR Data with matched GEX Profiles (wide)

In [None]:
# Read data (wide)
data <- qread(file = file.path(root_dir, "out", "data", "SERP_TCR-GEX_wide_11-2025_v2.qs"))

### Prepare data

In [None]:
# Filter data SCR and C02 time points, liver and lung mets, and patients with matched SCR and C02 data
patients_keep <- c("P01", "P02", "P03", "P10", "P14", "P17", "P20", "P26", "P29", "P31", "P33", "P34", "P35")
data <- data %>%
    filter(
        patient %in% patients_keep,
        met_loc %in% c("Lung", "Liver") 
    )
dim(data)
head(data,3)

## Lof2Fold change based approach

Post-treatment enriched clonotypes were defined as those exhibiting a log2 fold change ≥ 2 in normalized clonal size between pre-treatment (SCR) and post-treatment (C02) samples. For de novo clonotypes, which are not detected at baseline, direct fold-change calculations are not possible. To address this, we used the median normalized clonal size of non-persistent clones from the same patient as a baseline. This median serves as a reference for typical, non-tumor-specific clonal expansion, allowing us to identify de novo clonotypes whose expansion exceeds that of bystander clones. Clonotypes meeting this criterion were classified as enriched and retained for subsequent functional analyses, highlighting clones likely expanded in response to therapy rather than reflecting baseline clonal activity.

In [None]:
data <- data %>%

    group_by(patient) %>% # compute patient-specific lost clonal size median
    mutate(norm_cloneSize_T0 = na_if(norm_cloneSize_T0, 0)) %>%
    mutate(median_norm_cloneSize_T0 = median(norm_cloneSize_T0, na.rm = TRUE)) %>%
    mutate(norm_cloneSize_T0 = replace_na(norm_cloneSize_T0, 0)) %>%

    # compute log2fold changes
    mutate(
        LogFC = ifelse(presence_status == "Pre-existing", log2(norm_cloneSize_T1/norm_cloneSize_T0),
              ifelse(presence_status == "De Novo", log2(norm_cloneSize_T1/median_norm_cloneSize_T0), NA))
        , # 
        Delta = norm_cloneSize_T1 - norm_cloneSize_T0 # Delta in normalized clonal sizes
    ) 
    
head(data)

In [None]:
# Assign enrichment status
data <- data %>%
    mutate(enriched = ifelse( 
        (presence_status == "Pre-existing" & LogFC >= 2), TRUE, 
        ifelse( presence_status == "De Novo" & !(cloneClass_T1 %in% c("Singlet")) 
                & LogFC >= 2, TRUE, FALSE) ) )

In [None]:
table(data$enriched)

In [None]:
table(filter(data, enriched)$presence_status)

In [None]:
options(repr.plot.width = 5, repr.plot.height = 5)
geom_params = list(shape = 21, alpha = 0.5, stroke = 1)
ggplot(data, aes(x = log(norm_cloneSize_T1+1e-4), y = log(norm_cloneSize_T0+1e-4), fill = enriched, size = 3)) +
            ggrastr::rasterize(do.call(geom_point, geom_params)) +
            labs(
                x = expression(log("Clonal Proportions C02")),
                y = expression(log("Clonal Proportions SCR")),
                title = "Post-ICI Enriched Tumor Clones",
                fill = ""
            ) +
            theme_linedraw(base_size = 15) +
            guides(size = "none") + 
            theme(
                legend.position = "none",
                panel.grid = element_blank(),
                panel.border = element_rect(color = "black", linewidth = 1.5),
                #axis.title = element_text(size = 10, hjust = 0.5),
                plot.title = element_text(hjust = 0.5),
                axis.title.y = element_text(margin = margin(t = 0, r = 15, b = 0, l = 0)),
                axis.title.x = element_text(margin = margin(t = 10, r = 0, b = 0, l = 0))
            ) +
            xlim(log(1e-4),0) + ylim(log(1e-4),0) + 
            geom_abline(slope = 1, intercept = 0, color = "black") +
            geom_hline(yintercept = log(0.00014), color = "lightgrey", linewidth = 1) + 
            geom_vline(xintercept = log(0.00014), color = "lightgrey", linewidth = 1) +
            scale_fill_manual(values = c("FALSE" = "lightgray", "TRUE" = "#319DB2"))
ggsave(filename = file.path(fig_dir, "ClonalScatter_Enriched_LogFC_rasterized.pdf"), plot = last_plot(), dpi = 300, width = 5, height = 5)

In [None]:
# Subset columns for transfer to long data
enr_data <- data %>% 
    select(clonotype_id, patient, enriched, LogFC)

In [None]:
# read data in long format
data_long <- qread(file = file.path(root_dir, "out", "data", "SERP_TCR-GEX_11-2025_v2.qs"))

In [None]:
# Get cell barcodes that match these clonotypes
data_long <- data_long %>%
    left_join(enr_data, by = c("patient", "clonotype_id"))
head(data_long)

In [None]:
# Subset data for only enriched cells
data_save <- data_long %>%
    filter(enriched == TRUE) %>%
    select(barcode, clonotype_id, patient, enriched, LogFC, presence_status)
head(data_save)

In [None]:
write.table(data_save, file = file.path(root_dir, "out", "data", "enriched_cells.csv"))