In [1]:
options(dplyr.summarise.inform = FALSE)
library(tidyverse)
library(data.table)
library( readxl )

── [1mAttaching core tidyverse packages[22m ─────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.4
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.4.4     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors

Attaching package:

In [2]:
source(paste0(dirname(getwd()),'/map.r'))

In [3]:
ISO_DIR <- paste0(I_DIR, 'isofox/data_isofox/')
patients <- list.files(ISO_DIR)

#### 0 - Collected and process adjusted TPMs from files

In [4]:
iso <- fread(paste0(TMP_DIR, "isofox_adj_tmp.csv"))

In [5]:
iso_base <- 
log(data.frame(
  t(iso |> 
    select(-GeneId) |>
    column_to_rownames("GeneName"))
) + 1)

#### 1 - Add in Gene Sets

In [6]:
gene_sets <- readRDS(paste0(REF_DIR, 'gene_sets.Rds'))

- Add MP sets

In [7]:
cell_types <- c("B_cells", "Endothelial", "Epithelial", "Fibroblasts", "Macrophages","CD4", "CD8", "Malignant")

In [8]:
mps <- list()
for(i in cell_types){
    tmp <- read_excel(paste0(REF_DIR, "/41586_2023_6130_MOESM14_ESM.xlsx"), sheet = i)
    names(tmp) <- paste0("mp_", i, "_", names(tmp))
    gene_sets <- c(gene_sets, as.list(tmp))
}
saveRDS(gene_sets, paste0(REF_DIR, 'gene_sets_full.Rds'))

In [9]:
computer <- function( i, df ) {
  tmp <- data.frame( apply(df %>% select(any_of(gene_sets[[i]])),1,mean) )
  colnames(tmp) <- i
  tmp %>% rownames_to_column("sampleId")
}

In [10]:
computed_sets <- list()
system.time(
for( i in names(gene_sets)){ 
  computed_sets[[i]] <- computer(i, iso_base)
})

   user  system elapsed 
 31.900   0.098  32.077 

In [11]:
gene_sets_base <- computed_sets %>% reduce(inner_join, by = "sampleId")

#### 2 - Send the files

- Isofox gene expression

In [12]:
isofox_ready <- iso_base
colnames(isofox_ready) <- paste0("rna_", colnames(iso_base))
isofox_ready <- isofox_ready %>% rownames_to_column("sampleId")

In [13]:
fwrite( isofox_ready, paste0(READY_DIR, "isofox_genes_ready.csv"))

- Add gene sets

In [14]:
gene_sets_ready <- gene_sets_base
colnames(gene_sets_ready) <- c("sampleId", paste0("rna_geneset_", colnames(gene_sets_base)[-1]))

In [15]:
fwrite( gene_sets_ready, paste0(READY_DIR, "isofox_genesets_ready.csv"))