# Retrieve RNA and ADT Metadata

To begin our analysis, we'll retrieve the .h5 files that contain RNA and ADT data and metadata after our TEA-seq QC and demultiplexing pipeline. We'll then extract the metadata for cells to use for cell filtering and QC plots.

## Setup

Install BarMixer if not present. BarMixer is an R package that is part of the BarWare tools for barcoded scRNA-seq data, and has helper functions for easily reading cell metadata from our .h5 files.

BarMixer repository: https://github.com/AllenInstitute/BarMixer  
BarWare paper: [Swanson, et al., BMC Bioinformatics (2022)](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-04620-2)

In [1]:
ip <- installed.packages()
if(!"BarMixer" %in% rownames(ip)) {
    devtools::install_github(
        "alleninstitute/BarMixer",
        upgrade = "never"
    )
}

## Load packages

hise: The Human Immune System Explorer R SDK package  
BarMixer: .h5 file handling  
purrr: Functional programming tools  


In [2]:
library(hise)
library(BarMixer)
library(purrr)

Loading required package: data.table

Loading required package: Matrix

Loading required package: rhdf5


Attaching package: ‘BarMixer’


The following objects are masked from ‘package:rhdf5’:

    h5dump, h5ls



Attaching package: ‘purrr’


The following object is masked from ‘package:data.table’:

    transpose




## Retrieve files

Now, we'll use the HISE SDK package to retrieve the TEA-seq .h5 file outputs based on their file UUIDs. These will be placed in the `cache/` subdirectory by default.

In [3]:
input_file_uuids <- list(
    "0a4dcc97-bc5a-4aa6-a3d3-16cf612502ce", 
    "0af11a1e-c721-40e1-9455-032416ab1aa1", 
    "1981a346-0a61-4360-a11c-2e846af8aa52", 
    "1be65739-445a-41fe-89a4-52fbbba2000d", 
    "5ed5b8a3-1a79-40fb-b47f-c38dda0f012c", 
    "636e9db3-1870-4691-832c-814588039474", 
    "63a493fb-f7cb-4544-bd3a-c55fc9c26426", 
    "7d9548ce-8468-45d6-9489-fa638e421934", 
    "d3a284cf-b101-44d4-9f88-d3683b371fa6", 
    "d5ddec3e-c699-4f04-89fa-34c9d3cfeec1", 
    "de3962ef-1b9f-4e16-a660-6d839d52e432", 
    "f9a1b436-a5cf-4950-99ba-add2c5ac0707"
)

In [4]:
fres <- hise::cacheFiles(
    input_file_uuids
)

## Assemble metadata

Here, we list each of the files in `cache/` and read cell metadata using the BarMixer function `read_h5_cell_meta()`. purrr's `map_dfr()` handles iteration over the files, and assembles a single table with metadata for all cells by row concatenation.

In [5]:
h5_files <- list.files(
    "cache/",
    pattern = ".h5$",
    recursive = TRUE,
    full.names = TRUE)

In [6]:
all_metadata <- map_dfr(
    h5_files,
    read_h5_cell_meta
)

In [7]:
head(all_metadata)

Unnamed: 0_level_0,barcodes,adt_qc_flag,adt_umis,batch_id,cell_name,chip_id,hto_barcode,hto_category,n_genes,n_mito_umis,⋯,n_umis,original_barcodes,pbmc_sample_id,pool_id,rna_cell_uuid,seurat_pbmc_type,seurat_pbmc_type_score,umap_1,umap_2,well_id
Unnamed: 0_level_1,<chr>,<chr>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<int>,⋯,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<chr>
1,2da9d348fb8111eda35df29f570c0793,Good,1746,EXP-00454,jovial_jockeyish_urus,EXP-00454-P1C1,TGTCTTTCCTGCCAG,singlet,2157,363,⋯,4974,AAACAGCCATAGTCAT,PC02184-044,EXP-00454-P1,2582f1a6fb8911edb940c6bd9515220e,CD4 Memory,0.6268495,-1.6470808,-8.41892,EXP-00454-P1C1W1
2,2daec6d2fb8111eda35df29f570c0793,Good,1506,EXP-00454,possessive_sirenic_esok,EXP-00454-P1C1,TGTCTTTCCTGCCAG,singlet,2134,161,⋯,4511,AAACCAACAGCTCATA,PC02184-044,EXP-00454-P1,2582f462fb8911edb940c6bd9515220e,CD4 Memory,0.6597796,-0.816071,-8.692452,EXP-00454-P1C1W1
3,2db119d2fb8111eda35df29f570c0793,Good,1661,EXP-00454,petaline_lawabiding_snowyowl,EXP-00454-P1C1,TGTCTTTCCTGCCAG,singlet,2091,302,⋯,4289,AAACCGAAGGAAGCAC,PC02184-044,EXP-00454-P1,2582f624fb8911edb940c6bd9515220e,CD4 Memory,0.8507561,-4.9873077,2.534072,EXP-00454-P1C1W1
4,2db4ad86fb8111eda35df29f570c0793,Good,1866,EXP-00454,vixenish_ardent_seahorse,EXP-00454-P1C1,TGTCTTTCCTGCCAG,singlet,2033,308,⋯,3931,AAACCGCGTTAAGCGC,PC02184-044,EXP-00454-P1,2582f7aafb8911edb940c6bd9515220e,CD4 Memory,0.6342125,-4.7517462,5.226646,EXP-00454-P1C1W1
5,2db582c4fb8111eda35df29f570c0793,Good,1400,EXP-00454,stimulated_maroon_jerboa,EXP-00454-P1C1,TGTCTTTCCTGCCAG,singlet,1560,179,⋯,2904,AAACCGCGTTGGCCGA,PC02184-044,EXP-00454-P1,2582f840fb8911edb940c6bd9515220e,CD4 Memory,0.6119409,0.8885885,-6.354211,EXP-00454-P1C1W1
6,2db5b3acfb8111eda35df29f570c0793,Good,2220,EXP-00454,antigorite_erect_earthworm,EXP-00454-P1C1,TGTCTTTCCTGCCAG,singlet,2053,213,⋯,4329,AAACCGGCAACTAGCC,PC02184-044,EXP-00454-P1,2582f94efb8911edb940c6bd9515220e,CD4 Memory,0.4161609,3.541462,-8.356009,EXP-00454-P1C1W1


## Write output file

Write the metadata as a .csv for later use. We remove `row.names` and set `quote = FALSE` to simplify the outputs and increase compatibility with other tools.

In [8]:
dir.create("output")

“'output' already exists”


In [9]:
write.csv(
    all_metadata,
    "output/vrd_te_rna_adt_cell_metadata.csv",
    row.names = FALSE,
    quote = FALSE
)

## Store results in HISE

Finally, we store the output file in our Collaboration Space for later retrieval and use. We need to provide the UUID for our Collaboration Space (aka `studySpaceId`), as well as a title for this step in our analysis process.

The hise function `uploadFiles()` also requires the FileIDs from the original fileset for reference, which we assembled above when files were retrieved (`input_file_uuids`)

In [10]:
study_space_uuid <- "40df6403-29f0-4b45-ab7d-f46d420c422e"
title <- "VRd TEA-seq unfiltered TE cell metadata"

In [11]:
out_files <- list.files(
    "output",
    full.names = TRUE
)
out_list <- as.list(out_files)

In [12]:
uploadFiles(
    files = out_list,
    studySpaceId = study_space_uuid,
    title = title,
    inputFileIds = input_file_uuids,
    store = "project",
    doPrompt = FALSE
)

In [13]:
sessionInfo()

R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] purrr_1.0.1       BarMixer_1.0.1    rhdf5_2.42.0      Matrix_1.5-3     
[5] data.table_1.14.8 hise_2.16.0      

loaded via a namespace (and not attached):
 [1] pillar_1.9.0        compiler_4.2.2      base64enc_0.1-3    
 [4] rhdf5filters_1.10.0 bitops_1.0-7        tools_4.2.2        
 [7] di