In [1]:
quiet_library <- function(...) { suppressPackageStartupMessages(library(...)) }
quiet_library(dplyr)
quiet_library(hise)
quiet_library(purrr)

In [2]:
BR1_rna_desc <- getFileDescriptors(
    fileType = "scRNA-seq-labeled", 
    filter = list(cohort.cohortGuid = "BR1"))
BR2_rna_desc <- getFileDescriptors(
    fileType = "scRNA-seq-labeled", 
    filter = list(cohort.cohortGuid = "BR2"))

In [3]:
BR1_rna_desc <- fileDescToDataframe(BR1_rna_desc)
BR2_rna_desc <- fileDescToDataframe(BR2_rna_desc)

In [5]:
meta_data <- rbind(BR1_rna_desc, BR2_rna_desc )

## Filter batches and subjects

Batches starting with "EXP" - these are non-pipeline, experimental method testing batches.  
Batch B004 - this very early batch has technical problems, and samples have been re-run in later batches.  
Subjects BR2007, BR2049, and BR1034 - these subjects have non-healthy characteristics that make them outliers from our healthy cohorts.

In [6]:
meta_data <- meta_data %>%
  filter(!grepl("^EXP", file.batchID)) %>%
  filter(!file.batchID=="B004") %>% 
  filter(!subject.subjectGuid%in% c("BR2007","BR2049",'BR1034'))

Remove duplicate samples and keep the latest batch used for each sample

In [7]:
meta_data <- meta_data %>%
  group_by(sample.sampleKitGuid) %>%
  arrange(desc(file.batchID)) %>%
  slice(1) %>%
  ungroup()

Add a pbmc_sample_id column based on the file name, and align the visit names

In [8]:
meta_data <- meta_data %>%
  mutate(pbmc_sample_id = gsub("_","",paste0("PB0",substr(sub(".*PB0", "", meta_data$file.name),1,8)))) %>%
  mutate(smple.visitName = ifelse(sample.visitName == "Other - Non-Flu",
                                  sample.visitDetails,
                                  sample.visitName))

Select samples after Flu Year 1 Day 0, which are what we need to label

In [9]:
meta_data_non_Y1D0 <- meta_data %>% 
  filter(sample.visitName != 'Flu Year 1 Day 0')

In [10]:
table(meta_data_non_Y1D0$cohort.cohortGuid)


BR1 BR2 
371 405 

In [15]:
head(meta_data$sample.drawDate)

In [33]:
write.csv(meta_data_non_Y1D0,paste0("hise_meta_data_",Sys.Date(),"_nonY1D0.csv"))

## To Be Updated

We need to replace the calculations below with HISE-native calculation of days relative to COVID-19 vaccine doses, CMV status, and BMI.

BMI can be calculated using Height and Weight in labs.  
CMV status is stored in HISE in labs.  
COVID-19 vaccination in days relative to first visit need to be added to HISE.

In [16]:
sessionInfo()

R version 4.3.2 (2023-10-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS/LAPACK: /opt/conda/lib/libopenblasp-r0.3.25.so;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] purrr_1.0.2 hise_2.16.0 dplyr_1.1.4

loaded via a namespace (and not attached):
 [1] crayon_1.5.2     vctrs_0.6.5      httr_1.4.7       cli_3.6.2       
 [5] rlang_1.1.3      stringi_1.8.3    generics_0.1.3   assertthat_0.2.1
 [9] jsonlite_1.8.8   glue_1.7.0       RCurl_1.98-1.14  plyr_1.8.9      
[13] htmlt