# Call, annotate, and analyze peaks for each cell type

To analyze the scATAC-seq data, we'll need to call peaks and identify TF motifs in each peak. To facilitate downstream analyses, a single set of peaks will be called for all cells obtained for each cell type. This will give us a single set of features to use for differential testing for each type.

Once we have peaks, we'll analyze them from 3 angles: 
1. Identify differentially accessible sites induced by drug treatment at each time point.
2. Calculate enrichment of motif annotations in differentially-acessible peaks.
3. Compute Peak-to-gene correlations using the paired scRNA-seq data from our TEA-seq experiments.

Along the way, we'll save each of the stages of analysis for downstream work and visualization:
- Peaks and annotations
- Differential peaks and annotation enrichments
- Peak-to-gene correlations

# Setup

To call peaks, we'll need to install MACS2

In [1]:
system("pip install --upgrade --force-reinstall MACS2")

We'll also need the JASPAR2020 package.

In [15]:
BiocManager::install("JASPAR2020", update = FALSE, ask = FALSE, force = TRUE)

'getOption("repos")' replaces Bioconductor standard repositories, see
'help("repositories", package = "BiocManager")' for details.
Replacement repositories:
    CRAN: https://cran.r-project.org

Bioconductor version 3.17 (BiocManager 1.30.22), R 4.3.1 (2023-06-16)

Installing package(s) 'JASPAR2020'

Updating HTML index of packages in '.Library'

Making 'packages.html' ...
 done



## Load packages

hise: The Human Immune System Explorer R SDK package  
dplyr: Dataframe handling functions   
ArchR: scATAC-seq analysis  
purrr: Functional programming tools  


In [2]:
quiet_library <- function(...) { suppressPackageStartupMessages(library(...)) }
quiet_library(hise)
quiet_library(dplyr)
quiet_library(ArchR)
quiet_library(BSgenome.Hsapiens.UCSC.hg38)
quiet_library(purrr)


                                                   / |
                                                 /    \
            .                                  /      |.
            \\\                              /        |.
              \\\                          /           `|.
                \\\                      /              |.
                  \                    /                |\
                  \\#####\           /                  ||
                ==###########>      /                   ||
                 \\##==......\    /                     ||
            ______ =       =|__ /__                     ||      \\\
       \               '        ##_______ _____ ,--,__,=##,__   ///
        ,    __==    ___,-,__,--'#'  ==='      `-'    | ##,-/
        -,____,---'       \\####\\________________,--\\_##,/
           ___      .______        ______  __    __  .______      
          /   \     |   _  \      /      ||  |  |  | |   _  \     
         /  ^  \    |  |_) 

## Retrieve files

Now, we'll use the HISE SDK package to retrieve the TEA-seq ArchR Projects based on their file UUIDs. These will be placed in the `cache/` subdirectory by default.

In [3]:
atac_file_uuids <- list(
    "11235534-d09d-4c57-a648-20fb13317eab",
    "2d1a00ca-f1f6-41c1-9691-f37916fad00c",
    "365045be-e8a6-4a4d-9fe1-b31b7593799a",
    "403e1064-34ea-4992-8752-6d1ddb9fb614",
    "49d66578-bc16-4840-871c-25de96456f83",
    "f1a32e62-d2e9-4052-b971-dd960d605d70"
)

In [4]:
fres <- hise::cacheFiles(
    atac_file_uuids
)

[1] "Initiating file download for vrdtea_ArchR-t_cd4_cm_2023-10-02.tar"
[1] "Download successful."
[1] "Initiating file download for vrdtea_ArchR-t_cd4_em_2023-10-02.tar"
[1] "Download successful."
[1] "Initiating file download for vrdtea_ArchR-t_cd4_treg_2023-10-02.tar"
[1] "Download successful."
[1] "Initiating file download for vrdtea_ArchR-t_cd4_naive_2023-10-02.tar"
[1] "Download successful."
[1] "Initiating file download for vrdtea_ArchR-t_cd8_naive_2023-10-02.tar"
[1] "Download successful."
[1] "Initiating file download for vrdtea_ArchR-t_cd8_memory_2023-10-02.tar"
[1] "Download successful."


In [5]:
atac_tar_files <- map(fres, "filePath")

In [6]:
walk(
    atac_tar_files,
    function(tf) {
        command <- paste("tar -xf", tf)
        system(command)
    }
)

Note: un-tar-ing the files moves them to the `output` directory, based on their original filenames.

In [7]:
type_paths <- list.files(
    "output",
    full.names = TRUE
)

In [8]:
cell_types <- sub(".+-(.+)_20.+", "\\1", type_paths)
cell_types

In [9]:
type_proj <- map(
    type_paths,
    loadArchRProject,
    showLogo = FALSE
)
names(type_proj) <- cell_types

Successfully loaded ArchRProject!

Successfully loaded ArchRProject!

Successfully loaded ArchRProject!

Successfully loaded ArchRProject!

Successfully loaded ArchRProject!

Successfully loaded ArchRProject!



# Run analysis per cell type

## Peak calling and annotation

In [10]:
addArchRVerbose(verbose = FALSE)
addArchRThreads(14)
addArchRGenome("hg38")

Setting addArchRVerbose = FALSE

Setting default number of Parallel threads to 14.

Setting default genome to Hg38.



In [19]:
?TFBSTools::getMatrixSet

0,1
getMatrixSet {TFBSTools},R Documentation

0,1
x,"a character vector of length 1 for the path of JASPAR SQLite file, a SQLiteConnection object, or a JASPAR2014 object."
opts,a search options list. See more details below.


In [20]:
PWM <- TFBSTools::getMatrixSet(
    JASPAR2020,
    opts = list(
        collection = "CORE", 
        species = "9606", # taxid for Homo sapiens 
        all_versions = FALSE, 
        matrixtype = "PWM")
)

In [26]:
motif_names <- map_chr(
    PWM@listData,
    function(motif) {
        paste0(motif@name, "_", motif@ID)
    })
names(PWM) <- motif_names

In [27]:
type_proj <- map(
    type_proj,
    function(proj) {
        message("Adding coverage")
        proj <- addGroupCoverages(proj, groupBy = "Sample", force = TRUE)
        message("Adding peak set")
        proj <- addReproduciblePeakSet(proj, groupBy = "Sample", force = TRUE)
        message("Adding peak matrix")
        proj <- addPeakMatrix(proj, force = TRUE)
        message("Adding Peak Annotations")
        proj <- addMotifAnnotations(
            proj, 
            motifPWMs = PWM,
            name = "Motif",
            force = TRUE
        )
        proj <- saveArchRProject(proj)
    }
)

Adding coverage

Adding peak set

Searching For MACS2..

Found with $path!



                                            Group nCells nCellsUsed nReplicates
EXP-00454-P1_PC02184-049 EXP-00454-P1_PC02184-049   4614        540           2
EXP-00454-P1_PC02184-048 EXP-00454-P1_PC02184-048   2679        540           2
EXP-00454-P1_PC02184-047 EXP-00454-P1_PC02184-047   3562        540           2
EXP-00454-P1_PC02184-046 EXP-00454-P1_PC02184-046   1372        540           2
EXP-00454-P1_PC02184-045 EXP-00454-P1_PC02184-045   7124        540           2
EXP-00454-P1_PC02184-044 EXP-00454-P1_PC02184-044   5262        540           2
EXP-00454-P1_PC02184-043 EXP-00454-P1_PC02184-043   4371        540           2
EXP-00454-P1_PC02184-042 EXP-00454-P1_PC02184-042   2742        540           2
EXP-00454-P1_PC02184-041 EXP-00454-P1_PC02184-041    981        540           2
EXP-00454-P1_PC02184-040 EXP-00454-P1_PC02184-040    648        540           2
EXP-00454-P1_PC02184-039 EXP-00454-P1_PC02184-039    685        540           2
EXP-00454-P1_PC02184-038 EXP-00454-P1_PC

Converged after 5 iterations!

Adding peak matrix

Adding Peak Annotations

Saving ArchRProject...

Loading ArchRProject...

Successfully loaded ArchRProject!


                                                   / |
                                                 /    \
            .                                  /      |.
            \\\                              /        |.
              \\\                          /           `|.
                \\\                      /              |.
                  \                    /                |\
                  \\#####\           /                  ||
                ==###########>      /                   ||
                 \\##==......\    /                     ||
            ______ =       =|__ /__                     ||      \\\
       \               '        ##_______ _____ ,--,__,=##,__   ///
        ,    __==    ___,-,__,--'#'  ==='      `-'    | ##,-/
        -,____,---'       \\####\\________________,--\\_##,/
 

                                            Group nCells nCellsUsed nReplicates
EXP-00454-P1_PC02184-049 EXP-00454-P1_PC02184-049   1481        540           2
EXP-00454-P1_PC02184-048 EXP-00454-P1_PC02184-048    819        540           2
EXP-00454-P1_PC02184-047 EXP-00454-P1_PC02184-047   1214        540           2
EXP-00454-P1_PC02184-046 EXP-00454-P1_PC02184-046    479        479           2
EXP-00454-P1_PC02184-045 EXP-00454-P1_PC02184-045   2097        540           2
EXP-00454-P1_PC02184-044 EXP-00454-P1_PC02184-044   1812        540           2
EXP-00454-P1_PC02184-043 EXP-00454-P1_PC02184-043   1419        540           2
EXP-00454-P1_PC02184-042 EXP-00454-P1_PC02184-042    894        540           2
EXP-00454-P1_PC02184-041 EXP-00454-P1_PC02184-041    401        401           2
EXP-00454-P1_PC02184-040 EXP-00454-P1_PC02184-040    180        180           2
EXP-00454-P1_PC02184-039 EXP-00454-P1_PC02184-039    657        540           2
EXP-00454-P1_PC02184-038 EXP-00454-P1_PC

Converged after 6 iterations!

Adding peak matrix

Adding Peak Annotations

Saving ArchRProject...

Loading ArchRProject...

Successfully loaded ArchRProject!


                                                   / |
                                                 /    \
            .                                  /      |.
            \\\                              /        |.
              \\\                          /           `|.
                \\\                      /              |.
                  \                    /                |\
                  \\#####\           /                  ||
                ==###########>      /                   ||
                 \\##==......\    /                     ||
            ______ =       =|__ /__                     ||      \\\
       \               '        ##_______ _____ ,--,__,=##,__   ///
        ,    __==    ___,-,__,--'#'  ==='      `-'    | ##,-/
        -,____,---'       \\####\\________________,--\\_##,/
 

                                            Group nCells nCellsUsed nReplicates
EXP-00454-P1_PC02184-049 EXP-00454-P1_PC02184-049   6876        540           2
EXP-00454-P1_PC02184-048 EXP-00454-P1_PC02184-048   4420        540           2
EXP-00454-P1_PC02184-047 EXP-00454-P1_PC02184-047   5953        540           2
EXP-00454-P1_PC02184-046 EXP-00454-P1_PC02184-046   2296        540           2
EXP-00454-P1_PC02184-045 EXP-00454-P1_PC02184-045  10717        540           2
EXP-00454-P1_PC02184-044 EXP-00454-P1_PC02184-044   9299        540           2
EXP-00454-P1_PC02184-043 EXP-00454-P1_PC02184-043   6974        540           2
EXP-00454-P1_PC02184-042 EXP-00454-P1_PC02184-042   4733        540           2
EXP-00454-P1_PC02184-041 EXP-00454-P1_PC02184-041   1739        540           2
EXP-00454-P1_PC02184-040 EXP-00454-P1_PC02184-040   1352        540           2
EXP-00454-P1_PC02184-039 EXP-00454-P1_PC02184-039   2483        540           2
EXP-00454-P1_PC02184-038 EXP-00454-P1_PC

Converged after 5 iterations!

Adding peak matrix

Adding Peak Annotations

Saving ArchRProject...

Loading ArchRProject...

Successfully loaded ArchRProject!


                                                   / |
                                                 /    \
            .                                  /      |.
            \\\                              /        |.
              \\\                          /           `|.
                \\\                      /              |.
                  \                    /                |\
                  \\#####\           /                  ||
                ==###########>      /                   ||
                 \\##==......\    /                     ||
            ______ =       =|__ /__                     ||      \\\
       \               '        ##_______ _____ ,--,__,=##,__   ///
        ,    __==    ___,-,__,--'#'  ==='      `-'    | ##,-/
        -,____,---'       \\####\\________________,--\\_##,/
 

                                            Group nCells nCellsUsed nReplicates
EXP-00454-P1_PC02184-049 EXP-00454-P1_PC02184-049   1121        540           2
EXP-00454-P1_PC02184-048 EXP-00454-P1_PC02184-048    567        540           2
EXP-00454-P1_PC02184-047 EXP-00454-P1_PC02184-047    774        540           2
EXP-00454-P1_PC02184-046 EXP-00454-P1_PC02184-046    275        275           2
EXP-00454-P1_PC02184-045 EXP-00454-P1_PC02184-045   1271        540           2
EXP-00454-P1_PC02184-044 EXP-00454-P1_PC02184-044   1022        540           2
EXP-00454-P1_PC02184-043 EXP-00454-P1_PC02184-043    712        540           2
EXP-00454-P1_PC02184-042 EXP-00454-P1_PC02184-042    516        516           2
EXP-00454-P1_PC02184-041 EXP-00454-P1_PC02184-041    210        210           2
EXP-00454-P1_PC02184-040 EXP-00454-P1_PC02184-040    126        126           2
EXP-00454-P1_PC02184-039 EXP-00454-P1_PC02184-039     78         65           2
EXP-00454-P1_PC02184-038 EXP-00454-P1_PC

Converged after 5 iterations!

Adding peak matrix

Adding Peak Annotations

Saving ArchRProject...

Loading ArchRProject...

Successfully loaded ArchRProject!


                                                   / |
                                                 /    \
            .                                  /      |.
            \\\                              /        |.
              \\\                          /           `|.
                \\\                      /              |.
                  \                    /                |\
                  \\#####\           /                  ||
                ==###########>      /                   ||
                 \\##==......\    /                     ||
            ______ =       =|__ /__                     ||      \\\
       \               '        ##_______ _____ ,--,__,=##,__   ///
        ,    __==    ___,-,__,--'#'  ==='      `-'    | ##,-/
        -,____,---'       \\####\\________________,--\\_##,/
 

                                            Group nCells nCellsUsed nReplicates
EXP-00454-P1_PC02184-049 EXP-00454-P1_PC02184-049   2261        540           2
EXP-00454-P1_PC02184-048 EXP-00454-P1_PC02184-048    886        540           2
EXP-00454-P1_PC02184-047 EXP-00454-P1_PC02184-047   1231        540           2
EXP-00454-P1_PC02184-046 EXP-00454-P1_PC02184-046    460        460           2
EXP-00454-P1_PC02184-045 EXP-00454-P1_PC02184-045   2472        540           2
EXP-00454-P1_PC02184-044 EXP-00454-P1_PC02184-044   1793        540           2
EXP-00454-P1_PC02184-043 EXP-00454-P1_PC02184-043   1371        540           2
EXP-00454-P1_PC02184-042 EXP-00454-P1_PC02184-042    832        540           2
EXP-00454-P1_PC02184-041 EXP-00454-P1_PC02184-041    328        328           2
EXP-00454-P1_PC02184-040 EXP-00454-P1_PC02184-040    163        163           2
EXP-00454-P1_PC02184-039 EXP-00454-P1_PC02184-039    200        200           2
EXP-00454-P1_PC02184-038 EXP-00454-P1_PC

Converged after 5 iterations!

Adding peak matrix

Adding Peak Annotations

Saving ArchRProject...

Loading ArchRProject...

Successfully loaded ArchRProject!


                                                   / |
                                                 /    \
            .                                  /      |.
            \\\                              /        |.
              \\\                          /           `|.
                \\\                      /              |.
                  \                    /                |\
                  \\#####\           /                  ||
                ==###########>      /                   ||
                 \\##==......\    /                     ||
            ______ =       =|__ /__                     ||      \\\
       \               '        ##_______ _____ ,--,__,=##,__   ///
        ,    __==    ___,-,__,--'#'  ==='      `-'    | ##,-/
        -,____,---'       \\####\\________________,--\\_##,/
 

                                            Group nCells nCellsUsed nReplicates
EXP-00454-P1_PC02184-049 EXP-00454-P1_PC02184-049   2456        540           2
EXP-00454-P1_PC02184-048 EXP-00454-P1_PC02184-048   1448        540           2
EXP-00454-P1_PC02184-047 EXP-00454-P1_PC02184-047   1945        540           2
EXP-00454-P1_PC02184-046 EXP-00454-P1_PC02184-046    856        540           2
EXP-00454-P1_PC02184-045 EXP-00454-P1_PC02184-045   3672        540           2
EXP-00454-P1_PC02184-044 EXP-00454-P1_PC02184-044   2989        540           2
EXP-00454-P1_PC02184-043 EXP-00454-P1_PC02184-043   2292        540           2
EXP-00454-P1_PC02184-042 EXP-00454-P1_PC02184-042   1786        540           2
EXP-00454-P1_PC02184-041 EXP-00454-P1_PC02184-041    601        540           2
EXP-00454-P1_PC02184-040 EXP-00454-P1_PC02184-040    451        451           2
EXP-00454-P1_PC02184-039 EXP-00454-P1_PC02184-039    755        540           2
EXP-00454-P1_PC02184-038 EXP-00454-P1_PC

Converged after 5 iterations!

Adding peak matrix

Adding Peak Annotations

Saving ArchRProject...

Loading ArchRProject...

Successfully loaded ArchRProject!


                                                   / |
                                                 /    \
            .                                  /      |.
            \\\                              /        |.
              \\\                          /           `|.
                \\\                      /              |.
                  \                    /                |\
                  \\#####\           /                  ||
                ==###########>      /                   ||
                 \\##==......\    /                     ||
            ______ =       =|__ /__                     ||      \\\
       \               '        ##_______ _____ ,--,__,=##,__   ///
        ,    __==    ___,-,__,--'#'  ==='      `-'    | ##,-/
        -,____,---'       \\####\\________________,--\\_##,/
 

Output new annotations matrix for downstream use

In [28]:
dir.create("output")

“'output' already exists”


In [29]:
type_motif_mat <- map(
    type_proj,
    function(proj) {
        anno <- getPeakAnnotation(proj)
        matches <- readRDS(anno$Matches)
        assays(matches)$matches
    }
)

In [30]:
motif_mat_output_files <- paste0(
    "output/peak-JASPAR-matches-",
    cell_types,
    "_", Sys.Date(),
    ".rds")

In [32]:
walk2(
    type_motif_mat,
    motif_mat_output_files,
    saveRDS
)

### Store results in HISE

Update folder names for ArchR Projects and bundle the files as .tar for upload.

In [33]:
study_space_uuid <- "40df6403-29f0-4b45-ab7d-f46d420c422e"
title <- paste("VRd TEA-seq T Cell JASPAR Annotations", Sys.Date())

In [34]:
out_list <- as.list(
    list.files(
        "output", 
        pattern = "peak-JASPAR-matches",
        full.names = TRUE
))

In [35]:
uploadFiles(
    files = out_list,
    studySpaceId = study_space_uuid,
    title = title,
    inputFileIds = atac_file_uuids,
    store = "project",
    doPrompt = FALSE
)

[1] "Authorization token invalid or expired."
[1] "Retrying..."


# Session Info

In [36]:
sessionInfo()

R version 4.3.1 (2023-06-16)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS/LAPACK: /opt/conda/lib/libopenblasp-r0.3.23.so;  LAPACK version 3.11.0

Random number generation:
 RNG:     L'Ecuyer-CMRG 
 Normal:  Inversion 
 Sample:  Rejection 
 
locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
 [1] parallel  stats4    grid      stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] JASPAR2020_0.99.10                motifmatchr_1.22.0               
 [3] purrr_1.0.2                       BSgenome.Hsapiens.UCSC.hg38_1.4.5
 [5] BSgenome_1.68.0                