# Retrieve ATAC Metadata

To begin our analysis, we'll retrieve the .arrow files that contain ATAC data and metadata after our TEA-seq QC and demultiplexing pipeline. We'll then extract the metadata for cells to use for cell filtering and QC plots.

## Load packages

hise: The Human Immune System Explorer R SDK package  
ArchR: .arrow file handling  
purrr: Functional programming tools  


In [1]:
quiet_library <- function(...) { suppressPackageStartupMessages(library(...)) }
quiet_library(hise)
quiet_library(ArchR)
quiet_library(purrr)


                                                   / |
                                                 /    \
            .                                  /      |.
            \\\                              /        |.
              \\\                          /           `|.
                \\\                      /              |.
                  \                    /                |\
                  \\#####\           /                  ||
                ==###########>      /                   ||
                 \\##==......\    /                     ||
            ______ =       =|__ /__                     ||      \\\
       \               '        ##_______ _____ ,--,__,=##,__   ///
        ,    __==    ___,-,__,--'#'  ==='      `-'    | ##,-/
        -,____,---'       \\####\\________________,--\\_##,/
           ___      .______        ______  __    __  .______      
          /   \     |   _  \      /      ||  |  |  | |   _  \     
         /  ^  \    |  |_) 

## Retrieve files

Now, we'll use the HISE SDK package to retrieve the TEA-seq .arrow file outputs based on their file UUIDs. These will be placed in the `cache/` subdirectory by default.

In [2]:
sample_meta <- read.csv("sample_meta.csv")
project_store <- "PedvsSenior"

In [3]:
file_res <- map(
    sample_meta$arrow_file,
    function(file_name) {
        downloadFileFromProjectStore(
            storeName = project_store,
            file_name
        )
    }
)

## Assemble metadata

Here, we list each of the files in `cache/` and assemble an ArchR Project to read cell metadata using the ArchR function `getCellColData()`.

In [4]:
nrow(sample_meta)

In [5]:
arrow_files <- sample_meta$arrow_file

In [6]:
addArchRGenome("hg38")

Setting default genome to Hg38.



In [7]:
proj <- ArchRProject(
    ArrowFiles = arrow_files,
    copyArrows = FALSE
)

Using GeneAnnotation set by addArchRGenome(Hg38)!

Using GeneAnnotation set by addArchRGenome(Hg38)!

Validating Arrows...

Getting SampleNames...

1 
2 
3 
4 
5 
6 
7 
8 


Getting Cell Metadata...

1 
2 
3 
4 
5 
6 
7 
8 


Merging Cell Metadata...

Initializing ArchRProject...


                                                   / |
                                                 /    \
            .                                  /      |.
            \\\                              /        |.
              \\\                          /           `|.
                \\\                      /              |.
                  \                    /                |\
                  \\#####\           /                  ||
                ==###########>      /                   ||
                 \\##==......\    /                     ||
            ______ =       =|__ /__                     ||      \\\
       \               '        ##_______ _____ ,--,__,=##,__   ///
  

In [8]:
all_metadata <- getCellColData(proj)
all_metadata <- as.data.frame(all_metadata)

In [9]:
all_metadata$archr_name <- rownames(all_metadata)

In [10]:
head(all_metadata)

Unnamed: 0_level_0,Sample,TSSEnrichment,ReadsInTSS,ReadsInPromoter,ReadsInBlacklist,PromoterRatio,PassQC,NucleosomeRatio,nMultiFrags,nMonoFrags,nFrags,nDiFrags,BlacklistRatio,archr_name
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>
B065-P1_PB00173-02#b3be5d82e40911ebb24842010a19c839,B065-P1_PB00173-02,7.921,9032,11417,1379,0.1308659,1,1.2015242,10729,19814,43621,13078,0.015806607,B065-P1_PB00173-02#b3be5d82e40911ebb24842010a19c839
B065-P1_PB00173-02#28ba8e86e40911eb903842010a19c839,B065-P1_PB00173-02,12.844,10501,11525,453,0.1673589,1,2.0668923,8178,11227,34432,15027,0.006578183,B065-P1_PB00173-02#28ba8e86e40911eb903842010a19c839
B065-P1_PB00173-02#e4ad0042e40911eb8d8342010a19c839,B065-P1_PB00173-02,24.075,32826,30262,527,0.484378,1,0.9529853,7024,15995,31238,8219,0.008435239,B065-P1_PB00173-02#e4ad0042e40911eb8d8342010a19c839
B065-P1_PB00173-02#9a1ee2d0e40811eba89d42010a19c839,B065-P1_PB00173-02,21.471,27075,25088,469,0.4306509,1,1.0644978,7082,14109,29128,7937,0.008050673,B065-P1_PB00173-02#9a1ee2d0e40811eba89d42010a19c839
B065-P1_PB00173-02#9cb18a10e40911ebab5242010a19c839,B065-P1_PB00173-02,25.097,29898,28581,473,0.4923005,1,0.9477957,6705,14903,29028,7420,0.008147306,B065-P1_PB00173-02#9cb18a10e40911ebab5242010a19c839
B065-P1_PB00173-02#880184a8e40911ebba0842010a19c839,B065-P1_PB00173-02,22.03,27947,26145,476,0.4543164,1,1.0240574,6863,14216,28774,7695,0.008271356,B065-P1_PB00173-02#880184a8e40911ebba0842010a19c839


## Write output file

Write the metadata as a .csv for later use. We remove `row.names` and set `quote = FALSE` to simplify the outputs and increase compatibility with other tools.

In [11]:
dir.create("output")

“'output' already exists”


In [12]:
write.csv(
    all_metadata,
    "output/atac_cell_metadata.csv",
    row.names = FALSE,
    quote = FALSE
)

## Store results in HISE

Finally, we store the output file in our Collaboration Space for later retrieval and use. We need to provide the UUID for our Collaboration Space (aka `studySpaceId`), as well as a title for this step in our analysis process.

The hise function `uploadFiles()` also requires the FileIDs from the original fileset for reference, which we assembled above when files were retrieved (`input_file_uuids`)

In [13]:
study_space_uuid <- "4743c203-6af9-469c-b71d-0f66e3518820"
title <- "TEA-seq unfiltered A cell metadata"

In [14]:
search_id <- ids::adjective_animal()
search_id

In [15]:
in_list <- as.list(sample_meta$arrow_uuid)

In [16]:
in_list

In [17]:
out_list <- list("output/atac_cell_metadata.csv")

In [18]:
out_list

In [19]:
sessionInfo()

R version 4.3.2 (2023-10-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS/LAPACK: /opt/conda/lib/libopenblasp-r0.3.25.so;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats4    grid      stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] BSgenome.Hsapiens.UCSC.hg38_1.4.5 BSgenome_1.70.1                  
 [3] rtracklayer_1.62.0                BiocIO_1.12.0                    
 [5] Biostrings_2.70.1                 XVector_0.42.0                   
 [7] purrr_1.0.2                       rhdf5_2.46.1                   