# GSEA of MAST DEGs against Reactome Pathways

Similar to the analysis for Hallmark Pathways, we'll also perform enrichment tests against Reactome Pathways, which are much more mechanism-focused. These pathways are also hierarchically related, which is important for interpretation of enrichment results. We'll assemble both the gene sets and their relationships for later visualization so that we don't misinterpret related gene sets.

## Load packages

hise: The Human Immune System Explorer R SDK package  
purrr: Functional programming tools  
dplyr: Dataframe handling functions  
tibble: modern data.frame structures  
fgsea: Fast Gene Set Enrichment Analysis  

In [1]:
quiet_library <- function(...) { suppressPackageStartupMessages(library(...)) }
quiet_library(hise)
quiet_library(purrr)
quiet_library(dplyr)
quiet_library(tibble)
quiet_library(fgsea)

## Download Reactome gene sets

We'll obtain the Reactome pathways and relationships provided by reactome.org

In [2]:
# Pathway identifiers, names, and species
download.file(
    "https://reactome.org/download/current/ReactomePathways.txt", 
    "ReactomePathways.txt"
)
# Pathway gene sets
download.file(
    "https://reactome.org/download/current/ReactomePathways.gmt.zip", 
    "ReactomePathways.gmt.zip"
)
system("unzip ReactomePathways.gmt.zip")
# Pathway relationships
download.file(
    "https://reactome.org/download/current/ReactomePathwaysRelation.txt",
    "ReactomePathwaysRelation.txt"
)

## Read, filter, and structure gene sets and relationships

### Read and convert .gmt

The gene set GMT file contains one gene set per line, with the set name and id followed by the list of genes.

We'll read these lines, split on tabs, and then use the split data to build a tibble in which each row has a name, id, and gene list.

In [3]:
sets <- readLines("ReactomePathways.gmt")
sets <- strsplit(sets, split = "\t")

In [4]:
sets <- map_dfr(
    sets,
    function(v) {
        tibble(
            name = map_chr(sets, 1),
            id = map_chr(sets, 2),
            genes = lapply(sets, function(x) {x[-c(1,2)]})
        )
})

In [5]:
sets <- sets %>%
  select(id, genes) %>%
  unique()

In [6]:
nrow(sets)

### Read set IDs and filter for human pathways

Next, we'll read the ReactomePathways file, and use the 3rd column to filter for gene sets from *Homo sapiens*.

In [7]:
pw <- readLines("ReactomePathways.txt")
pw <- strsplit(pw, split = "\t")

In [8]:
pw <- map(pw, as.list)
pw <- map(pw, function(l) { names(l) <- c("id", "name", "species"); l })
pw <- map_dfr(pw, as.data.frame)

In [9]:
head(pw)

Unnamed: 0_level_0,id,name,species
Unnamed: 0_level_1,<chr>,<chr>,<chr>
1,R-BTA-73843,5-Phosphoribose 1-diphosphate biosynthesis,Bos taurus
2,R-BTA-1971475,A tetrasaccharide linker sequence is required for GAG synthesis,Bos taurus
3,R-BTA-1369062,ABC transporters in lipid homeostasis,Bos taurus
4,R-BTA-382556,ABC-family proteins mediated transport,Bos taurus
5,R-BTA-9033807,ABO blood group biosynthesis,Bos taurus
6,R-BTA-418592,ADP signalling through P2Y purinoceptor 1,Bos taurus


In [10]:
pw <- pw %>%
  filter(species == "Homo sapiens") %>%
  select(id, name)

In [11]:
nrow(pw)

### Load relationships

The last piece required is the relationships between pathways, which we'll structure as a data.frame.

In [12]:
links <- read.table("ReactomePathwaysRelation.txt", sep = "\t")
names(links) <- c("from", "to")
links <- links %>%
  filter(from %in% pw$id)

In [13]:
nrow(links)

### Identify root nodes

To find the major pathway root nodes and links, we'll select pathways that link *to* other nodes, but don't have links *from* any parent nodes.

In [14]:
root <- pw %>%
  filter(id %in% links$from & !id %in% links$to) %>%
  left_join(sets)
names(root) <- c("root_id", "root_name", "root_genes")
root_links <- links %>%
  filter(from %in% root$root_id)

[1m[22mJoining with `by = join_by(id)`


In [15]:
root$root_name

In [16]:
nrow(root)

In [17]:
length(unique(root$root_id))

## Identify sub-pathways

For our analysis, we'll use sub-pathways that are up to 4 levels below the Root nodes. For each level, we assemble the gene set and keep track of parent gene sets.

## Level 1

Just below the top nodes

In [18]:
root_split <- split(root_links, root_links$from)
l1 <- map2_dfr(
    root_split, names(root_split),
    function(link_df, parent_id) {
        parent_pw <- root %>%
          filter(root_id == parent_id)
        l1_pw <- pw %>%
          filter(id %in% link_df$to)
        names(l1_pw) <- c("l1_id", "l1_name")
        l1_pw <- l1_pw %>%
          mutate(root_id = parent_id) %>%
          left_join(root, by = "root_id")
        l1_pw
    }
)

In [19]:
l1 <- l1 %>%
  left_join(sets, by = c("l1_id" = "id"))

In [20]:
names(l1)[length(l1)] <- "l1_genes"

In [21]:
l1 <- l1 %>%
  filter(map_int(l1_genes, length) > 10)

Remove double parentage

In [22]:
n_links <- nrow(l1)
n_links

In [23]:
n_targets <- length(unique(l1$l1_id))
n_targets

In [24]:
l1 <- l1 %>%
  group_by(l1_id) %>%
  slice(1) %>%
  ungroup()

In [25]:
l1_links <- links %>%
  filter(from %in% l1$l1_id)

Filter available pathways to prevent double nesting at lower levels

In [26]:
filtered_pw <- pw %>%
  filter(!id %in% l1$l1_id)

## Level 2
Children of Level 1 nodes

In [27]:
l1_split <- split(l1_links, l1_links$from)
l2 <- map2_dfr(
    l1_split, names(l1_split),
    function(link_df, parent_id) {
        parent_pw <- l1 %>%
          filter(l1_id == parent_id)
        l2_pw <- filtered_pw %>%
          filter(id %in% link_df$to)
        names(l2_pw) <- c("l2_id", "l2_name")
        l2_pw <- l2_pw %>%
          mutate(l1_id = parent_id) %>%
          left_join(parent_pw, by = "l1_id")
        l2_pw
    }
)

In [28]:
l2 <- l2 %>%
  left_join(sets, by = c("l2_id" = "id"))

In [29]:
names(l2)[length(l2)] <- "l2_genes"

In [30]:
l2 <- l2 %>%
  filter(map_int(l2_genes, length) > 10)

In [31]:
n_links <- nrow(l2)
n_links

In [32]:
n_targets <- length(unique(l2$l2_id))
n_targets

In [33]:
l2 <- l2 %>%
  group_by(l2_id) %>%
  slice(1) %>%
  ungroup()

In [34]:
l2_links <- links %>%
  filter(from %in% l2$l2_id)

In [35]:
filtered_pw <- filtered_pw %>%
  filter(!id %in% l2$l2_id)

## Level 3
Children of Level 2 nodes

In [36]:
l2_split <- split(l2_links, l2_links$from)
l3 <- map2_dfr(
    l2_split, names(l2_split),
    function(link_df, parent_id) {
        parent_pw <- l2 %>%
          filter(l2_id == parent_id)
        l3_pw <- filtered_pw %>%
          filter(id %in% link_df$to)
        names(l3_pw) <- c("l3_id", "l3_name")
        l3_pw <- l3_pw %>%
          mutate(l2_id = parent_id) %>%
          left_join(parent_pw, by = "l2_id")
        l3_pw
    }
)

In [37]:
l3 <- l3 %>%
  left_join(sets, by = c("l3_id" = "id"))
names(l3)[length(l3)] <- "l3_genes"

In [38]:
l3 <- l3 %>%
  filter(map_int(l3_genes, length) > 10)

In [39]:
n_links <- nrow(l3)
n_links

In [40]:
n_targets <- length(unique(l3$l3_id))
n_targets

In [41]:
l3 <- l3 %>%
  group_by(l3_id) %>%
  slice(1) %>%
  ungroup()

In [42]:
l3_links <- links %>%
  filter(from %in% l3$l3_id)

In [43]:
filtered_pw <- filtered_pw %>%
  filter(!id %in% l3$l3_id)

## Level 4
Children of Level 3 nodes

In [44]:
l3_split <- split(l3_links, l3_links$from)
l4 <- map2_dfr(
    l3_split, names(l3_split),
    function(link_df, parent_id) {
        parent_pw <- l3 %>%
          filter(l3_id == parent_id)
        l4_pw <- filtered_pw %>%
          filter(id %in% link_df$to)
        names(l4_pw) <- c("l4_id", "l4_name")
        l4_pw <- l4_pw %>%
          mutate(l3_id = parent_id) %>%
          left_join(parent_pw, by = "l3_id")
        l4_pw
    }
)

In [45]:
l4 <- l4 %>%
  left_join(sets, by = c("l4_id" = "id"))
names(l4)[length(l4)] <- "l4_genes"

In [46]:
l4 <- l4 %>%
  filter(map_int(l4_genes, length) > 10)

In [47]:
n_links <- nrow(l4)
n_links

In [48]:
n_targets <- length(unique(l4$l4_id))
n_targets

In [49]:
l4 <- l4 %>%
  group_by(l4_id) %>%
  slice(1) %>%
  ungroup()

In [50]:
nrow(l4)

## Assemble and output gene sets

Now that we've built out the gene sets from each level, we'll assemble these all in a list of gene sets that can be used for GSEA analysis.

We'll save the list of gene sets along with the information from each level, which will be used later for visualization of the GSEA results.

In [51]:
set_list <- c(
    root$genes,
    l1$l1_genes,
    l2$l2_genes,
    l3$l3_genes,
    l4$l4_genes
)

names(set_list) <- c(
    root$id,
    l1$l1_id,
    l2$l2_id,
    l3$l3_id,
    l4$l4_id
)

all_sets <- list(
    root_tb = root,
    l1_tb = l1,
    l2_tb = l2,
    l3_tb = l3,
    l4_tb = l4,
    set_list = set_list
)

In [52]:
dir.create("output")

“'output' already exists”


In [53]:
reactome_out_file <- paste0("output/reactome_gene_sets_", Sys.Date(), ".rds")
saveRDS(
    all_sets, 
    reactome_out_file
)

In [54]:
reactome_df <- data.frame(
    pathway = names(all_sets$set_list),
    n_pathway_genes = map_int(all_sets$set_list, length),
    pathway_genes = map_chr(all_sets$set_list, paste, collapse = ";")
)

In [55]:
nrow(reactome_df)

In [56]:
reactome_df <- unique(reactome_df)

## Retrieve files

Now, we'll use the HISE SDK package to retrieve the MAST DEG results file based on its UUID. This will be placed in the `cache/` subdirectory by default.

In [57]:
file_uuid <- list(
    "fc83b89f-fd26-43b8-ac91-29c539703a45"
)

In [58]:
fres <- cacheFiles(file_uuid)

### Prepare DEG lists

To rank genes, we'll convert nomP to -log10(nomP), and incorporate the direction of differential expression by multiplying by the direction of effect size (sign(logFC)).

In [59]:
all_deg <- read.csv("cache/fc83b89f-fd26-43b8-ac91-29c539703a45/all_mast_deg_2023-09-06.csv")
all_deg$treat_time_type <- paste0(
    all_deg$fg, "_", 
    all_deg$timepoint, "_", 
    all_deg$aifi_cell_type)

Prior to ranking, we'll need to resolve missing `logFC` values. These can occur if one of the groups used for DEG analysis had no expression of the gene.

In [60]:
all_deg %>%
  filter(is.na(logFC)) %>%
  head()

Unnamed: 0_level_0,aifi_cell_type,timepoint,fg,bg,n_sample,gene,coef_C,coef_D,logFC,nomP,adjP,treat_time_type
Unnamed: 0_level_1,<chr>,<int>,<chr>,<chr>,<int>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>
1,t_cd4_em,4,bortezomib,dmso,180,TFDP1,,3.196997,,3.817725e-05,0.1481468,bortezomib_4_t_cd4_em
2,t_cd4_treg,4,bortezomib,dmso,78,ABCA3,,-2.453613,,0.01337154,0.9999222,bortezomib_4_t_cd4_treg
3,t_cd4_treg,4,bortezomib,dmso,78,AC005070.3,,-2.259963,,0.03110707,0.9999222,bortezomib_4_t_cd4_treg
4,t_cd4_treg,4,bortezomib,dmso,78,AC006504.5,,-2.470421,,0.01243915,0.9999222,bortezomib_4_t_cd4_treg
5,t_cd4_treg,4,bortezomib,dmso,78,AC007686.3,,-2.265379,,0.03041017,0.9999222,bortezomib_4_t_cd4_treg
6,t_cd4_treg,4,bortezomib,dmso,78,AC010754.1,,-2.272969,,0.02952072,0.9999222,bortezomib_4_t_cd4_treg


When this occurs, we can use the sign of `coef_D` to determine the direction of expression change, rather than using the missing `logFC` value.

In [61]:
all_deg <- all_deg %>%
  mutate(direction = ifelse(
      is.na(logFC),
      sign(coef_D), # if missing logFC, use coef_D
      sign(logFC) # otherwise, use logFC
  ))

We also need to avoid nomP values of 0. These will cause NA values due to log transformation. We'll convert these to `1e-300` so that they have a non-zero value.

In [62]:
all_deg <- all_deg %>%
  mutate(nomP = ifelse(
      nomP == 0,
      1e-300, # if zero, change to 1e-300
      nomP # otherwise, keep the value
  ))

In [63]:
deg_list <- split(all_deg, all_deg$treat_time_type)

In [64]:
deg_list <- map(
    deg_list,
    function(deg) {
        deg %>%
          mutate(rank_val = -log10(nomP) * direction) %>%
          arrange(desc(rank_val))
    }
)

In [65]:
rank_list <- map(
    deg_list,
    function(deg) {
        v <- deg$rank_val
        names(v) <- deg$gene
        v
    }
)

## Run GSEA

In [66]:
parallel_param <- BiocParallel::MulticoreParam(
    workers = 4, 
    progressbar = FALSE
)

In [67]:
fgsea_res <- map(
    rank_list,
    function(ranks) {
        fgsea(
            pathways = all_sets$set_list,
            stats    = ranks,
            minSize  = 10,
            maxSize  = 1000,
            BPPARAM  = parallel_param
        )
    }
)

“There are ties in the preranked stats (0.06% of the list).
The order of those tied genes will be arbitrary, which may produce unexpected results.”


### Format results

In [68]:
deg_meta <- map(
    deg_list,
    function(deg) {
        list(
            fg = deg$fg[1],
            bg = deg$bg[1],
            timepoint = deg$timepoint[1],
            aifi_cell_type = deg$aifi_cell_type[1]
        )
    }
)

In [69]:
formatted_fgsea_res <- map2_dfr(
    fgsea_res,
    deg_meta,
    function(res, meta) {
        res %>%
          mutate(
              leadingEdge = map_chr(leadingEdge, paste, collapse = ";"),
              fg = meta$fg,
              bg = meta$bg,
              timepoint = meta$timepoint,
              aifi_cell_type = meta$aifi_cell_type
          ) %>%
          left_join(reactome_df, by = "pathway") %>%
          rename(id = pathway,
                 nomP = pval,
                 adjP = padj,
                 n_leadingEdge = size) %>%
          left_join(pw, by = "id") %>%
          rename(pathway = name) %>%
          select(fg, bg, timepoint, aifi_cell_type,
                 id, pathway, NES, nomP, adjP, 
                 n_leadingEdge, n_pathway_genes,
                 leadingEdge, pathway_genes) %>%
          arrange(desc(NES))

    }
)

In [70]:
tail(formatted_fgsea_res)

fg,bg,timepoint,aifi_cell_type,id,pathway,NES,nomP,adjP,n_leadingEdge,n_pathway_genes,leadingEdge,pathway_genes
<chr>,<chr>,<int>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<int>,<int>,<chr>,<chr>
lenalidomide,dmso,72,t_cd8_naive,R-HSA-72613,Eukaryotic Translation Initiation,-2.166728,2.154851e-08,3.462127e-06,114,124,RPL10;RPLP1;RPLP0;RPSA;RPS3;EIF3G;EIF3L;RPS28;RPL28;RPL23A;RPL7A;RPL29;EIF4A1;RPS5;RPS27;RPL17;EIF5B;EIF3H;RPS16;EIF1AX;EIF3J;RPLP2;RPS2;RPS7;RPL18A;EIF2S2;RPL35;EIF4G1;RPS4X;EIF3B;RPL22L1;RPL22;RPL14;RPS11;RPS21;RPL24;EIF4H;RPS23;RPL19;RPS9;EIF2B4;RPS6;RPL35A;RPS26;RPL27;RPL39;RPS29;RPS24;EIF4A2;EIF2B1;RPL36;RPL36A;RPL8;RPL37A;RPL10A;RPS14;EIF3A;EIF2B5;RPS13;EIF4EBP1;RPL27A;RPL4;RPL31;RPL15,18S rRNA;28S rRNA;5.8S rRNA;5S rRNA;EIF1AX;EIF2B1;EIF2B2;EIF2B3;EIF2B4;EIF2B5;EIF2S1;EIF2S2;EIF2S3;EIF3A;EIF3B;EIF3C;EIF3D;EIF3E;EIF3F;EIF3G;EIF3H;EIF3I;EIF3J;EIF3K;EIF3L;EIF3M;EIF4A1;EIF4A2;EIF4B;EIF4E;EIF4EBP1;EIF4G1;EIF4H;EIF5;EIF5B;FAU;PABPC1;RPL10;RPL10A;RPL10L;RPL11;RPL12;RPL13;RPL13A;RPL14;RPL15;RPL17;RPL18;RPL18A;RPL19;RPL21;RPL22;RPL22L1;RPL23;RPL23A;RPL24;RPL26;RPL26L1;RPL27;RPL27A;RPL28;RPL29;RPL3;RPL30;RPL31;RPL32;RPL34;RPL35;RPL35A;RPL36;RPL36A;RPL36AL;RPL37;RPL37A;RPL38;RPL39;RPL39L;RPL3L;RPL4;RPL41;RPL5;RPL6;RPL7;RPL7A;RPL8;RPL9;RPLP0;RPLP1;RPLP2;RPS10;RPS11;RPS12;RPS13;RPS14;RPS15;RPS15A;RPS16;RPS17;RPS18;RPS19;RPS2;RPS20;RPS21;RPS23;RPS24;RPS25;RPS26;RPS27;RPS27A;RPS27L;RPS28;RPS29;RPS3;RPS3A;RPS4X;RPS4Y1;RPS4Y2;RPS5;RPS6;RPS7;RPS8;RPS9;RPSA;UBA52
lenalidomide,dmso,72,t_cd8_naive,R-HSA-72737,Cap-dependent Translation Initiation,-2.166728,2.154851e-08,3.462127e-06,114,124,RPL10;RPLP1;RPLP0;RPSA;RPS3;EIF3G;EIF3L;RPS28;RPL28;RPL23A;RPL7A;RPL29;EIF4A1;RPS5;RPS27;RPL17;EIF5B;EIF3H;RPS16;EIF1AX;EIF3J;RPLP2;RPS2;RPS7;RPL18A;EIF2S2;RPL35;EIF4G1;RPS4X;EIF3B;RPL22L1;RPL22;RPL14;RPS11;RPS21;RPL24;EIF4H;RPS23;RPL19;RPS9;EIF2B4;RPS6;RPL35A;RPS26;RPL27;RPL39;RPS29;RPS24;EIF4A2;EIF2B1;RPL36;RPL36A;RPL8;RPL37A;RPL10A;RPS14;EIF3A;EIF2B5;RPS13;EIF4EBP1;RPL27A;RPL4;RPL31;RPL15,18S rRNA;28S rRNA;5.8S rRNA;5S rRNA;EIF1AX;EIF2B1;EIF2B2;EIF2B3;EIF2B4;EIF2B5;EIF2S1;EIF2S2;EIF2S3;EIF3A;EIF3B;EIF3C;EIF3D;EIF3E;EIF3F;EIF3G;EIF3H;EIF3I;EIF3J;EIF3K;EIF3L;EIF3M;EIF4A1;EIF4A2;EIF4B;EIF4E;EIF4EBP1;EIF4G1;EIF4H;EIF5;EIF5B;FAU;PABPC1;RPL10;RPL10A;RPL10L;RPL11;RPL12;RPL13;RPL13A;RPL14;RPL15;RPL17;RPL18;RPL18A;RPL19;RPL21;RPL22;RPL22L1;RPL23;RPL23A;RPL24;RPL26;RPL26L1;RPL27;RPL27A;RPL28;RPL29;RPL3;RPL30;RPL31;RPL32;RPL34;RPL35;RPL35A;RPL36;RPL36A;RPL36AL;RPL37;RPL37A;RPL38;RPL39;RPL39L;RPL3L;RPL4;RPL41;RPL5;RPL6;RPL7;RPL7A;RPL8;RPL9;RPLP0;RPLP1;RPLP2;RPS10;RPS11;RPS12;RPS13;RPS14;RPS15;RPS15A;RPS16;RPS17;RPS18;RPS19;RPS2;RPS20;RPS21;RPS23;RPS24;RPS25;RPS26;RPS27;RPS27A;RPS27L;RPS28;RPS29;RPS3;RPS3A;RPS4X;RPS4Y1;RPS4Y2;RPS5;RPS6;RPS7;RPS8;RPS9;RPSA;UBA52
lenalidomide,dmso,72,t_cd8_naive,R-HSA-72689,Formation of a pool of free 40S subunits,-2.176472,3.952889e-08,5.443693e-06,96,106,RPL10;RPLP1;RPLP0;RPSA;RPS3;EIF3G;EIF3L;RPS28;RPL28;RPL23A;RPL7A;RPL29;RPS5;RPS27;RPL17;EIF3H;RPS16;EIF1AX;EIF3J;RPLP2;RPS2;RPS7;RPL18A;RPL35;RPS4X;EIF3B;RPL22L1;RPL22;RPL14;RPS11;RPS21;RPL24;RPS23;RPL19;RPS9;RPS6;RPL35A;RPS26;RPL27;RPL39;RPS29;RPS24,18S rRNA;28S rRNA;5.8S rRNA;5S rRNA;EIF1AX;EIF3A;EIF3B;EIF3C;EIF3D;EIF3E;EIF3F;EIF3G;EIF3H;EIF3I;EIF3J;EIF3K;EIF3L;EIF3M;FAU;RPL10;RPL10A;RPL10L;RPL11;RPL12;RPL13;RPL13A;RPL14;RPL15;RPL17;RPL18;RPL18A;RPL19;RPL21;RPL22;RPL22L1;RPL23;RPL23A;RPL24;RPL26;RPL26L1;RPL27;RPL27A;RPL28;RPL29;RPL3;RPL30;RPL31;RPL32;RPL34;RPL35;RPL35A;RPL36;RPL36A;RPL36AL;RPL37;RPL37A;RPL38;RPL39;RPL39L;RPL3L;RPL4;RPL41;RPL5;RPL6;RPL7;RPL7A;RPL8;RPL9;RPLP0;RPLP1;RPLP2;RPS10;RPS11;RPS12;RPS13;RPS14;RPS15;RPS15A;RPS16;RPS17;RPS18;RPS19;RPS2;RPS20;RPS21;RPS23;RPS24;RPS25;RPS26;RPS27;RPS27A;RPS27L;RPS28;RPS29;RPS3;RPS3A;RPS4X;RPS4Y1;RPS4Y2;RPS5;RPS6;RPS7;RPS8;RPS9;RPSA;UBA52
lenalidomide,dmso,72,t_cd8_naive,R-HSA-156842,Eukaryotic Translation Elongation,-2.178397,1.156868e-07,1.394026e-05,87,99,RPL10;RPLP1;EEF1A1;RPLP0;RPSA;RPS3;EEF1B2;RPS28;RPL28;RPL23A;RPL7A;RPL29;RPS5;RPS27;RPL17;RPS16;RPLP2;RPS2;RPS7;RPL18A;RPL35;RPS4X;RPL22L1;RPL22;RPL14;RPS11;RPS21;RPL24;RPS23;RPL19;RPS9;RPS6;RPL35A;RPS26;RPL27;RPL39;RPS29;RPS24,18S rRNA;28S rRNA;5.8S rRNA;5S rRNA;EEF1A1;EEF1A1P5;EEF1A2;EEF1B2;EEF1D;EEF1G;EEF2;FAU;RPL10;RPL10A;RPL10L;RPL11;RPL12;RPL13;RPL13A;RPL14;RPL15;RPL17;RPL18;RPL18A;RPL19;RPL21;RPL22;RPL22L1;RPL23;RPL23A;RPL24;RPL26;RPL26L1;RPL27;RPL27A;RPL28;RPL29;RPL3;RPL30;RPL31;RPL32;RPL34;RPL35;RPL35A;RPL36;RPL36A;RPL36AL;RPL37;RPL37A;RPL38;RPL39;RPL39L;RPL3L;RPL4;RPL41;RPL5;RPL6;RPL7;RPL7A;RPL8;RPL9;RPLP0;RPLP1;RPLP2;RPS10;RPS11;RPS12;RPS13;RPS14;RPS15;RPS15A;RPS16;RPS17;RPS18;RPS19;RPS2;RPS20;RPS21;RPS23;RPS24;RPS25;RPS26;RPS27;RPS27A;RPS27L;RPS28;RPS29;RPS3;RPS3A;RPS4X;RPS4Y1;RPS4Y2;RPS5;RPS6;RPS7;RPS8;RPS9;RPSA;UBA52
lenalidomide,dmso,72,t_cd8_naive,R-HSA-156827,L13a-mediated translational silencing of Ceruloplasmin expression,-2.20801,8.437858e-09,2.033524e-06,106,116,RPL10;RPLP1;RPLP0;RPSA;RPS3;EIF3G;EIF3L;RPS28;RPL28;RPL23A;RPL7A;RPL29;EIF4A1;RPS5;RPS27;RPL17;EIF3H;RPS16;EIF1AX;EIF3J;RPLP2;RPS2;RPS7;RPL18A;EIF2S2;RPL35;EIF4G1;RPS4X;EIF3B;RPL22L1;RPL22;RPL14;RPS11;RPS21;RPL24;EIF4H;RPS23;RPL19;RPS9;RPS6;RPL35A;RPS26;RPL27;RPL39;RPS29;RPS24;EIF4A2,18S rRNA;28S rRNA;5.8S rRNA;5S rRNA;EIF1AX;EIF2S1;EIF2S2;EIF2S3;EIF3A;EIF3B;EIF3C;EIF3D;EIF3E;EIF3F;EIF3G;EIF3H;EIF3I;EIF3J;EIF3K;EIF3L;EIF3M;EIF4A1;EIF4A2;EIF4B;EIF4E;EIF4G1;EIF4H;FAU;PABPC1;RPL10;RPL10A;RPL10L;RPL11;RPL12;RPL13;RPL13A;RPL14;RPL15;RPL17;RPL18;RPL18A;RPL19;RPL21;RPL22;RPL22L1;RPL23;RPL23A;RPL24;RPL26;RPL26L1;RPL27;RPL27A;RPL28;RPL29;RPL3;RPL30;RPL31;RPL32;RPL34;RPL35;RPL35A;RPL36;RPL36A;RPL36AL;RPL37;RPL37A;RPL38;RPL39;RPL39L;RPL3L;RPL4;RPL41;RPL5;RPL6;RPL7;RPL7A;RPL8;RPL9;RPLP0;RPLP1;RPLP2;RPS10;RPS11;RPS12;RPS13;RPS14;RPS15;RPS15A;RPS16;RPS17;RPS18;RPS19;RPS2;RPS20;RPS21;RPS23;RPS24;RPS25;RPS26;RPS27;RPS27A;RPS27L;RPS28;RPS29;RPS3;RPS3A;RPS4X;RPS4Y1;RPS4Y2;RPS5;RPS6;RPS7;RPS8;RPS9;RPSA;UBA52
lenalidomide,dmso,72,t_cd8_naive,R-HSA-72706,GTP hydrolysis and joining of the 60S ribosomal subunit,-2.240697,4.1023e-09,1.631962e-06,107,117,RPL10;RPLP1;RPLP0;RPSA;RPS3;EIF3G;EIF3L;RPS28;RPL28;RPL23A;RPL7A;RPL29;EIF4A1;RPS5;RPS27;RPL17;EIF5B;EIF3H;RPS16;EIF1AX;EIF3J;RPLP2;RPS2;RPS7;RPL18A;EIF2S2;RPL35;EIF4G1;RPS4X;EIF3B;RPL22L1;RPL22;RPL14;RPS11;RPS21;RPL24;EIF4H;RPS23;RPL19;RPS9;RPS6;RPL35A;RPS26;RPL27;RPL39;RPS29;RPS24;EIF4A2,18S rRNA;28S rRNA;5.8S rRNA;5S rRNA;EIF1AX;EIF2S1;EIF2S2;EIF2S3;EIF3A;EIF3B;EIF3C;EIF3D;EIF3E;EIF3F;EIF3G;EIF3H;EIF3I;EIF3J;EIF3K;EIF3L;EIF3M;EIF4A1;EIF4A2;EIF4B;EIF4E;EIF4G1;EIF4H;EIF5;EIF5B;FAU;RPL10;RPL10A;RPL10L;RPL11;RPL12;RPL13;RPL13A;RPL14;RPL15;RPL17;RPL18;RPL18A;RPL19;RPL21;RPL22;RPL22L1;RPL23;RPL23A;RPL24;RPL26;RPL26L1;RPL27;RPL27A;RPL28;RPL29;RPL3;RPL30;RPL31;RPL32;RPL34;RPL35;RPL35A;RPL36;RPL36A;RPL36AL;RPL37;RPL37A;RPL38;RPL39;RPL39L;RPL3L;RPL4;RPL41;RPL5;RPL6;RPL7;RPL7A;RPL8;RPL9;RPLP0;RPLP1;RPLP2;RPS10;RPS11;RPS12;RPS13;RPS14;RPS15;RPS15A;RPS16;RPS17;RPS18;RPS19;RPS2;RPS20;RPS21;RPS23;RPS24;RPS25;RPS26;RPS27;RPS27A;RPS27L;RPS28;RPS29;RPS3;RPS3A;RPS4X;RPS4Y1;RPS4Y2;RPS5;RPS6;RPS7;RPS8;RPS9;RPSA;UBA52


## Write output file

Write the metadata as a .csv for later use. We remove `row.names` and set `quote = FALSE` to simplify the outputs and increase compatibility with other tools.

In [71]:
gsea_out_file <- paste0("output/all_reactome_gsea_res_", Sys.Date(), ".tsv")
write.table(
    formatted_fgsea_res,
    gsea_out_file,
    sep = "\t",
    row.names = FALSE,
    quote = FALSE
)

## Store results in HISE

Finally, we store the output file in our Collaboration Space for later retrieval and use. We need to provide the UUID for our Collaboration Space (aka `studySpaceId`), as well as a title for this step in our analysis process.

The hise function `uploadFiles()` also requires the FileIDs from the original fileset for reference, which we used above when the DEG results were retrieved (`file_uuid`)

In [72]:
study_space_uuid <- "40df6403-29f0-4b45-ab7d-f46d420c422e"
title <- paste("VRd TEA-seq Reactome GSEA Analysis", Sys.Date())

In [73]:
out_list <- as.list(c(reactome_out_file, gsea_out_file))

In [74]:
uploadFiles(
    files = out_list,
    studySpaceId = study_space_uuid,
    title = title,
    inputFileIds = file_uuid,
    store = "project",
    doPrompt = FALSE
)

[1] "Cannot determine the current notebook."
[1] "1) /home/jupyter/repro-vrd-tea-seq/03-gsea-analysis/02-R_mast-deg_reactome_gsea.ipynb"
[1] "2) /home/jupyter/repro-vrd-tea-seq/figures/Supp-Fig-04_bortezomib_reactome.ipynb"
[1] "3) /home/jupyter/repro-vrd-tea-seq-deg-app/03_UploadVisualization.ipynb"


Please select (1-3)  1


In [75]:
sessionInfo()

R version 4.3.1 (2023-06-16)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS/LAPACK: /opt/conda/lib/libopenblasp-r0.3.24.so;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] fgsea_1.28.0 tibble_3.2.1 dplyr_1.1.3  purrr_1.0.2  hise_2.16.0 

loaded via a namespace (and not attached):
 [1] Matrix_1.6-1.1      gtable_0.3.4        jsonlite_1.8.7     
 [4] compiler_4.3.1      crayon_1.5.2        tidyselect_1.2.0   
 [7] Rcpp_1.0.11         IRdisplay_1.1       bitops_1.0-7       
[10] 