# Retrieve ATAC Metadata

To begin our analysis, we'll retrieve the .arrow files that contain ATAC data and metadata after our TEA-seq QC and demultiplexing pipeline. We'll then extract the metadata for cells to use for cell filtering and QC plots.

## Load packages

hise: The Human Immune System Explorer R SDK package  
ArchR: .arrow file handling  
purrr: Functional programming tools  


In [1]:
library(hise)
library(ArchR)
library(purrr)


                                                   / |
                                                 /    \
            .                                  /      |.
            \\\                              /        |.
              \\\                          /           `|.
                \\\                      /              |.
                  \                    /                |\
                  \\#####\           /                  ||
                ==###########>      /                   ||
                 \\##==......\    /                     ||
            ______ =       =|__ /__                     ||      \\\
       \               '        ##_______ _____ ,--,__,=##,__   ///
        ,    __==    ___,-,__,--'#'  ==='      `-'    | ##,-/
        -,____,---'       \\####\\________________,--\\_##,/
           ___      .______        ______  __    __  .______      
          /   \     |   _  \      /      ||  |  |  | |   _  \     
         /  ^  \    |  |_) 

## Retrieve files

Now, we'll use the HISE SDK package to retrieve the TEA-seq .arrow file outputs based on their file UUIDs. These will be placed in the `cache/` subdirectory by default.

In [2]:
input_file_uuids <- list(
    "052b769d-cbdf-41f6-8fe8-0d34564b442a",
    "16e0c562-5d36-431f-bb27-b443aabc7077",
    "30167a93-70a8-4c38-b615-2252dabe417e",
    "57dde81e-bdaa-4138-add6-2551968672f4",
    "6853fb68-fa85-43d5-9071-c5e42667a75e",
    "6d8185bf-8a35-492d-a6ab-5783006b3b8e",
    "8c2a93be-de53-4d6f-ae2e-ab40a356edc3",
    "9ab975f8-7763-4892-96a7-a7438ecc9470",
    "a56fd2ba-a055-4ff8-9ab1-69df113bc032",
    "ad2f347d-e961-4a70-b294-4df83c12355d",
    "c299a55a-3325-4eb3-ba8e-c8ceccafaa8c",
    "ff8fe67e-cfe0-482a-bad1-aa189390a1c0"
)

In [3]:
fres <- hise::cacheFiles(
    input_file_uuids
)

[1] "Initiating file download for EXP-00454-P1_PC02184-038_archr.arrow"
[1] "Download successful."
[1] "Initiating file download for EXP-00454-P1_PC02184-040_archr.arrow"
[1] "Download successful."
[1] "Initiating file download for EXP-00454-P1_PC02184-041_archr.arrow"
[1] "Download successful."
[1] "Initiating file download for EXP-00454-P1_PC02184-039_archr.arrow"
[1] "Download successful."
[1] "Initiating file download for EXP-00454-P1_PC02184-045_archr.arrow"
[1] "Download successful."
[1] "Initiating file download for EXP-00454-P1_PC02184-046_archr.arrow"
[1] "Download successful."
[1] "Initiating file download for EXP-00454-P1_PC02184-048_archr.arrow"
[1] "Download successful."
[1] "Initiating file download for EXP-00454-P1_PC02184-044_archr.arrow"
[1] "Download successful."
[1] "Initiating file download for EXP-00454-P1_PC02184-043_archr.arrow"
[1] "Download successful."
[1] "Initiating file download for EXP-00454-P1_PC02184-049_archr.arrow"
[1] "Download successful."
[1] "Initi

## Assemble metadata

Here, we list each of the files in `cache/` and assemble an ArchR Project to read cell metadata using the ArchR function `getCellColData()`.

In [4]:
arrow_files <- list.files(
    "cache/",
    pattern = ".arrow$",
    recursive = TRUE,
    full.names = TRUE)

In [9]:
addArchRGenome("hg38")

Setting default genome to Hg38.



In [10]:
proj <- ArchRProject(
    ArrowFiles = arrow_files,
    copyArrows = FALSE
)

Using GeneAnnotation set by addArchRGenome(Hg38)!

Using GeneAnnotation set by addArchRGenome(Hg38)!

Validating Arrows...

Getting SampleNames...

1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 


Getting Cell Metadata...

1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 


Merging Cell Metadata...

Initializing ArchRProject...


                                                   / |
                                                 /    \
            .                                  /      |.
            \\\                              /        |.
              \\\                          /           `|.
                \\\                      /              |.
                  \                    /                |\
                  \\#####\           /                  ||
                ==###########>      /                   ||
                 \\##==......\    /                     ||
            ______ =       =|__ /__                     ||      \\\
       \               '        ##______

In [11]:
all_metadata <- getCellColData(proj)
all_metadata <- as.data.frame(all_metadata)

In [13]:
all_metadata$archr_name <- rownames(all_metadata)

In [14]:
head(all_metadata)

Unnamed: 0_level_0,Sample,well_id,TSSEnrichment,tss_frac,tss_count,singlet,ReadsInTSS,ReadsInPromoter,ReadsInBlacklist,PromoterRatio,⋯,DoubletScore,DoubletEnrichment,chip_id,cell_name,BlacklistRatio,batch_id,barcodes,altius_frac,altius_count,archr_name
Unnamed: 0_level_1,<chr>,<chr>,<dbl>,<dbl>,<int>,<lgl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<chr>,<chr>,<dbl>,<int>,<chr>
EXP-00454-P1_PC02184-038#3bca8a6cfb8111eda35df29f570c0793,EXP-00454-P1_PC02184-038,EXP-00454-AP1C1W1,5.055,0.200114,15797,False,13762,18845,1198,0.1193931,⋯,45.214929,3.846154,EXP-00454-AP1C1,cement_fanciful_waxwing,0.007589965,EXP,3bca8a6cfb8111eda35df29f570c0793,0.5805168,45826,EXP-00454-P1_PC02184-038#3bca8a6cfb8111eda35df29f570c0793
EXP-00454-P1_PC02184-038#12ffc4aefb8011ed8e8a7625f34dfd6b,EXP-00454-P1_PC02184-038,EXP-00454-AP1C2W5,5.047,0.2078985,12013,False,9590,14194,803,0.1228514,⋯,48.403124,3.866667,EXP-00454-AP1C2,furtive_droughty_ambushbug,0.006950094,EXP,12ffc4aefb8011ed8e8a7625f34dfd6b,0.5877334,33961,EXP-00454-P1_PC02184-038#12ffc4aefb8011ed8e8a7625f34dfd6b
EXP-00454-P1_PC02184-038#8b8240ccfb8111ed99031af2246d8fa2,EXP-00454-P1_PC02184-038,EXP-00454-AP1C1W4,9.091,0.2597108,14315,True,17065,18682,1059,0.1695066,⋯,0.0,0.6,EXP-00454-AP1C1,unlegal_indefinite_hornedtoad,0.00960858,EXP,8b8240ccfb8111ed99031af2246d8fa2,0.6442787,35512,EXP-00454-P1_PC02184-038#8b8240ccfb8111ed99031af2246d8fa2
EXP-00454-P1_PC02184-038#43f9d7d4fb7b11ed933dc6f18f77c1a7,EXP-00454-P1_PC02184-038,EXP-00454-AP1C2W7,5.915,0.2159906,9366,False,8364,11216,547,0.1293477,⋯,169.508656,7.3,EXP-00454-AP1C2,military_biggish_ling,0.006308239,EXP,43f9d7d4fb7b11ed933dc6f18f77c1a7,0.5943546,25773,EXP-00454-P1_PC02184-038#43f9d7d4fb7b11ed933dc6f18f77c1a7
EXP-00454-P1_PC02184-038#df220c84fb8711edbbcffe1e90e63656,EXP-00454-P1_PC02184-038,EXP-00454-AP1C2W1,7.109,0.2302855,9631,True,9743,12190,637,0.1457646,⋯,0.0,1.057143,EXP-00454-AP1C2,groundable_tearful_stilt,0.007617066,EXP,df220c84fb8711edbbcffe1e90e63656,0.5966477,24953,EXP-00454-P1_PC02184-038#df220c84fb8711edbbcffe1e90e63656
EXP-00454-P1_PC02184-038#c6524daafb8111edb32712c42668f295,EXP-00454-P1_PC02184-038,EXP-00454-AP1C2W6,10.955,0.3260506,12461,False,15175,16612,507,0.2174289,⋯,3.423388,1.909091,EXP-00454-AP1C2,measled_strained_taruca,0.006635952,EXP,c6524daafb8111edb32712c42668f295,0.6789994,25950,EXP-00454-P1_PC02184-038#c6524daafb8111edb32712c42668f295


## Write output file

Write the metadata as a .csv for later use. We remove `row.names` and set `quote = FALSE` to simplify the outputs and increase compatibility with other tools.

In [15]:
dir.create("output")

In [16]:
write.csv(
    all_metadata,
    "output/vrd_atac_cell_metadata.csv",
    row.names = FALSE,
    quote = FALSE
)

## Store results in HISE

Finally, we store the output file in our Collaboration Space for later retrieval and use. We need to provide the UUID for our Collaboration Space (aka `studySpaceId`), as well as a title for this step in our analysis process.

The hise function `uploadFiles()` also requires the FileIDs from the original fileset for reference, which we assembled above when files were retrieved (`input_file_uuids`)

In [17]:
study_space_uuid <- "40df6403-29f0-4b45-ab7d-f46d420c422e"
title <- "VRd TEA-seq unfiltered A cell metadata"

In [18]:
out_files <- list.files(
    "output",
    full.names = TRUE
)
out_list <- as.list(out_files)

In [19]:
uploadFiles(
    files = out_list,
    studySpaceId = study_space_uuid,
    title = title,
    inputFileIds = input_file_uuids,
    store = "project",
    doPrompt = FALSE
)

In [20]:
sessionInfo()

R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats4    grid      stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] BSgenome.Hsapiens.UCSC.hg38_1.4.5 BSgenome_1.66.3                  
 [3] rtracklayer_1.58.0                Biostrings_2.66.0                
 [5] XVector_0.38.0                    purrr_1.0.1                      
 [7] rhdf5_2.42.0                      SummarizedExper