# Call, annotate, and analyze peaks for each cell type

To analyze the scATAC-seq data, we'll need to call peaks and identify TF motifs in each peak. To facilitate downstream analyses, a single set of peaks will be called for all cells obtained for each cell type. This will give us a single set of features to use for differential testing for each type.

Once we have peaks, we'll analyze them from 3 angles: 
1. Identify differentially accessible sites induced by drug treatment at each time point.
2. Calculate enrichment of motif annotations in differentially-acessible peaks.
3. Compute Peak-to-gene correlations using the paired scRNA-seq data from our TEA-seq experiments.

Along the way, we'll save each of the stages of analysis for downstream work and visualization:
- Peaks and annotations
- Differential peaks and annotation enrichments
- Peak-to-gene correlations

# Setup

To call peaks, we'll need to install MACS2

In [None]:
system("pip install --upgrade --force-reinstall MACS2")

## Load packages

hise: The Human Immune System Explorer R SDK package  
dplyr: Dataframe handling functions   
ArchR: scATAC-seq analysis  
purrr: Functional programming tools  


In [7]:
quiet_library <- function(...) { suppressPackageStartupMessages(library(...)) }
quiet_library(hise)
quiet_library(dplyr)
quiet_library(ArchR)
quiet_library(BSgenome.Hsapiens.UCSC.hg38)
quiet_library(purrr)

## Retrieve files

Now, we'll use the HISE SDK package to retrieve the TEA-seq ArchR Projects based on their file UUIDs. These will be placed in the `cache/` subdirectory by default.

In [140]:
atac_file_uuids <- list(
    "11235534-d09d-4c57-a648-20fb13317eab",
    "2d1a00ca-f1f6-41c1-9691-f37916fad00c",
    "365045be-e8a6-4a4d-9fe1-b31b7593799a",
    "403e1064-34ea-4992-8752-6d1ddb9fb614",
    "49d66578-bc16-4840-871c-25de96456f83",
    "f1a32e62-d2e9-4052-b971-dd960d605d70"
)

In [3]:
fres <- hise::cacheFiles(
    atac_file_uuids
)

[1] "Initiating file download for vrdtea_ArchR-t_cd4_cm_2023-10-02.tar"
[1] "Download successful."
[1] "Initiating file download for vrdtea_ArchR-t_cd4_em_2023-10-02.tar"
[1] "Download successful."
[1] "Initiating file download for vrdtea_ArchR-t_cd4_treg_2023-10-02.tar"
[1] "Download successful."
[1] "Initiating file download for vrdtea_ArchR-t_cd4_naive_2023-10-02.tar"
[1] "Download successful."
[1] "Initiating file download for vrdtea_ArchR-t_cd8_naive_2023-10-02.tar"
[1] "Download successful."
[1] "Initiating file download for vrdtea_ArchR-t_cd8_memory_2023-10-02.tar"
[1] "Download successful."


In [4]:
atac_tar_files <- map(fres, "filePath")

In [6]:
walk(
    atac_tar_files,
    function(tf) {
        command <- paste("tar -xf", tf)
        system(command)
    }
)

Note: un-tar-ing the files moves them to the `output` directory, based on their original filenames.

In [2]:
type_paths <- list.files(
    "output",
    full.names = TRUE
)

In [3]:
cell_types <- sub(".+-(.+)_20.+", "\\1", type_paths)
cell_types

In [4]:
type_proj <- map(
    type_paths,
    loadArchRProject,
    showLogo = FALSE
)
names(type_proj) <- cell_types

Successfully loaded ArchRProject!

Successfully loaded ArchRProject!

Successfully loaded ArchRProject!

Successfully loaded ArchRProject!

Successfully loaded ArchRProject!

Successfully loaded ArchRProject!



# Run analysis per cell type

## Peak calling and annotation

In [5]:
addArchRVerbose(verbose = FALSE)
addArchRThreads(14)
addArchRGenome("hg38")

Setting addArchRVerbose = FALSE

Setting default number of Parallel threads to 14.

Setting default genome to Hg38.



In [8]:
type_proj <- map(
    type_proj,
    function(proj) {
        message("Adding coverage")
        proj <- addGroupCoverages(proj, groupBy = "Sample", force = TRUE)
        message("Adding peak set")
        proj <- addReproduciblePeakSet(proj, groupBy = "Sample", force = TRUE)
        message("Adding peak matrix")
        proj <- addPeakMatrix(proj, force = TRUE)
        message("Adding Peak Annotations")
        proj <- addMotifAnnotations(
            proj, 
            motifSet = "cisbp", 
            name = "Motif",
            force = TRUE
        )
        proj <- saveArchRProject(proj)
    }
)

Adding coverage

Adding peak set

Searching For MACS2..

Found with $path!



                                            Group nCells nCellsUsed nReplicates
EXP-00454-P1_PC02184-049 EXP-00454-P1_PC02184-049   4614        540           2
EXP-00454-P1_PC02184-048 EXP-00454-P1_PC02184-048   2679        540           2
EXP-00454-P1_PC02184-047 EXP-00454-P1_PC02184-047   3562        540           2
EXP-00454-P1_PC02184-046 EXP-00454-P1_PC02184-046   1372        540           2
EXP-00454-P1_PC02184-045 EXP-00454-P1_PC02184-045   7124        540           2
EXP-00454-P1_PC02184-044 EXP-00454-P1_PC02184-044   5262        540           2
EXP-00454-P1_PC02184-043 EXP-00454-P1_PC02184-043   4371        540           2
EXP-00454-P1_PC02184-042 EXP-00454-P1_PC02184-042   2742        540           2
EXP-00454-P1_PC02184-041 EXP-00454-P1_PC02184-041    981        540           2
EXP-00454-P1_PC02184-040 EXP-00454-P1_PC02184-040    648        540           2
EXP-00454-P1_PC02184-039 EXP-00454-P1_PC02184-039    685        540           2
EXP-00454-P1_PC02184-038 EXP-00454-P1_PC

Converged after 5 iterations!

Adding peak matrix

Adding Peak Annotations

Using version 2 motifs!

Saving ArchRProject...

Loading ArchRProject...

Successfully loaded ArchRProject!


                                                   / |
                                                 /    \
            .                                  /      |.
            \\\                              /        |.
              \\\                          /           `|.
                \\\                      /              |.
                  \                    /                |\
                  \\#####\           /                  ||
                ==###########>      /                   ||
                 \\##==......\    /                     ||
            ______ =       =|__ /__                     ||      \\\
       \               '        ##_______ _____ ,--,__,=##,__   ///
        ,    __==    ___,-,__,--'#'  ==='      `-'    | ##,-/
        -,____,---'       \\####\\___

                                            Group nCells nCellsUsed nReplicates
EXP-00454-P1_PC02184-049 EXP-00454-P1_PC02184-049   1481        540           2
EXP-00454-P1_PC02184-048 EXP-00454-P1_PC02184-048    819        540           2
EXP-00454-P1_PC02184-047 EXP-00454-P1_PC02184-047   1214        540           2
EXP-00454-P1_PC02184-046 EXP-00454-P1_PC02184-046    479        479           2
EXP-00454-P1_PC02184-045 EXP-00454-P1_PC02184-045   2097        540           2
EXP-00454-P1_PC02184-044 EXP-00454-P1_PC02184-044   1812        540           2
EXP-00454-P1_PC02184-043 EXP-00454-P1_PC02184-043   1419        540           2
EXP-00454-P1_PC02184-042 EXP-00454-P1_PC02184-042    894        540           2
EXP-00454-P1_PC02184-041 EXP-00454-P1_PC02184-041    401        401           2
EXP-00454-P1_PC02184-040 EXP-00454-P1_PC02184-040    180        180           2
EXP-00454-P1_PC02184-039 EXP-00454-P1_PC02184-039    657        540           2
EXP-00454-P1_PC02184-038 EXP-00454-P1_PC

Converged after 6 iterations!

Adding peak matrix

Adding Peak Annotations

Using version 2 motifs!

Saving ArchRProject...

Loading ArchRProject...

Successfully loaded ArchRProject!


                                                   / |
                                                 /    \
            .                                  /      |.
            \\\                              /        |.
              \\\                          /           `|.
                \\\                      /              |.
                  \                    /                |\
                  \\#####\           /                  ||
                ==###########>      /                   ||
                 \\##==......\    /                     ||
            ______ =       =|__ /__                     ||      \\\
       \               '        ##_______ _____ ,--,__,=##,__   ///
        ,    __==    ___,-,__,--'#'  ==='      `-'    | ##,-/
        -,____,---'       \\####\\___

                                            Group nCells nCellsUsed nReplicates
EXP-00454-P1_PC02184-049 EXP-00454-P1_PC02184-049   6876        540           2
EXP-00454-P1_PC02184-048 EXP-00454-P1_PC02184-048   4420        540           2
EXP-00454-P1_PC02184-047 EXP-00454-P1_PC02184-047   5953        540           2
EXP-00454-P1_PC02184-046 EXP-00454-P1_PC02184-046   2296        540           2
EXP-00454-P1_PC02184-045 EXP-00454-P1_PC02184-045  10717        540           2
EXP-00454-P1_PC02184-044 EXP-00454-P1_PC02184-044   9299        540           2
EXP-00454-P1_PC02184-043 EXP-00454-P1_PC02184-043   6974        540           2
EXP-00454-P1_PC02184-042 EXP-00454-P1_PC02184-042   4733        540           2
EXP-00454-P1_PC02184-041 EXP-00454-P1_PC02184-041   1739        540           2
EXP-00454-P1_PC02184-040 EXP-00454-P1_PC02184-040   1352        540           2
EXP-00454-P1_PC02184-039 EXP-00454-P1_PC02184-039   2483        540           2
EXP-00454-P1_PC02184-038 EXP-00454-P1_PC

Converged after 5 iterations!

Adding peak matrix

Adding Peak Annotations

Using version 2 motifs!

Saving ArchRProject...

Loading ArchRProject...

Successfully loaded ArchRProject!


                                                   / |
                                                 /    \
            .                                  /      |.
            \\\                              /        |.
              \\\                          /           `|.
                \\\                      /              |.
                  \                    /                |\
                  \\#####\           /                  ||
                ==###########>      /                   ||
                 \\##==......\    /                     ||
            ______ =       =|__ /__                     ||      \\\
       \               '        ##_______ _____ ,--,__,=##,__   ///
        ,    __==    ___,-,__,--'#'  ==='      `-'    | ##,-/
        -,____,---'       \\####\\___

                                            Group nCells nCellsUsed nReplicates
EXP-00454-P1_PC02184-049 EXP-00454-P1_PC02184-049   1121        540           2
EXP-00454-P1_PC02184-048 EXP-00454-P1_PC02184-048    567        540           2
EXP-00454-P1_PC02184-047 EXP-00454-P1_PC02184-047    774        540           2
EXP-00454-P1_PC02184-046 EXP-00454-P1_PC02184-046    275        275           2
EXP-00454-P1_PC02184-045 EXP-00454-P1_PC02184-045   1271        540           2
EXP-00454-P1_PC02184-044 EXP-00454-P1_PC02184-044   1022        540           2
EXP-00454-P1_PC02184-043 EXP-00454-P1_PC02184-043    712        540           2
EXP-00454-P1_PC02184-042 EXP-00454-P1_PC02184-042    516        516           2
EXP-00454-P1_PC02184-041 EXP-00454-P1_PC02184-041    210        210           2
EXP-00454-P1_PC02184-040 EXP-00454-P1_PC02184-040    126        126           2
EXP-00454-P1_PC02184-039 EXP-00454-P1_PC02184-039     78         65           2
EXP-00454-P1_PC02184-038 EXP-00454-P1_PC

Converged after 5 iterations!

Adding peak matrix

Adding Peak Annotations

Using version 2 motifs!

Saving ArchRProject...

Loading ArchRProject...

Successfully loaded ArchRProject!


                                                   / |
                                                 /    \
            .                                  /      |.
            \\\                              /        |.
              \\\                          /           `|.
                \\\                      /              |.
                  \                    /                |\
                  \\#####\           /                  ||
                ==###########>      /                   ||
                 \\##==......\    /                     ||
            ______ =       =|__ /__                     ||      \\\
       \               '        ##_______ _____ ,--,__,=##,__   ///
        ,    __==    ___,-,__,--'#'  ==='      `-'    | ##,-/
        -,____,---'       \\####\\___

                                            Group nCells nCellsUsed nReplicates
EXP-00454-P1_PC02184-049 EXP-00454-P1_PC02184-049   2261        540           2
EXP-00454-P1_PC02184-048 EXP-00454-P1_PC02184-048    886        540           2
EXP-00454-P1_PC02184-047 EXP-00454-P1_PC02184-047   1231        540           2
EXP-00454-P1_PC02184-046 EXP-00454-P1_PC02184-046    460        460           2
EXP-00454-P1_PC02184-045 EXP-00454-P1_PC02184-045   2472        540           2
EXP-00454-P1_PC02184-044 EXP-00454-P1_PC02184-044   1793        540           2
EXP-00454-P1_PC02184-043 EXP-00454-P1_PC02184-043   1371        540           2
EXP-00454-P1_PC02184-042 EXP-00454-P1_PC02184-042    832        540           2
EXP-00454-P1_PC02184-041 EXP-00454-P1_PC02184-041    328        328           2
EXP-00454-P1_PC02184-040 EXP-00454-P1_PC02184-040    163        163           2
EXP-00454-P1_PC02184-039 EXP-00454-P1_PC02184-039    200        200           2
EXP-00454-P1_PC02184-038 EXP-00454-P1_PC

Converged after 5 iterations!

Adding peak matrix

Adding Peak Annotations

Using version 2 motifs!

Saving ArchRProject...

Loading ArchRProject...

Successfully loaded ArchRProject!


                                                   / |
                                                 /    \
            .                                  /      |.
            \\\                              /        |.
              \\\                          /           `|.
                \\\                      /              |.
                  \                    /                |\
                  \\#####\           /                  ||
                ==###########>      /                   ||
                 \\##==......\    /                     ||
            ______ =       =|__ /__                     ||      \\\
       \               '        ##_______ _____ ,--,__,=##,__   ///
        ,    __==    ___,-,__,--'#'  ==='      `-'    | ##,-/
        -,____,---'       \\####\\___

                                            Group nCells nCellsUsed nReplicates
EXP-00454-P1_PC02184-049 EXP-00454-P1_PC02184-049   2456        540           2
EXP-00454-P1_PC02184-048 EXP-00454-P1_PC02184-048   1448        540           2
EXP-00454-P1_PC02184-047 EXP-00454-P1_PC02184-047   1945        540           2
EXP-00454-P1_PC02184-046 EXP-00454-P1_PC02184-046    856        540           2
EXP-00454-P1_PC02184-045 EXP-00454-P1_PC02184-045   3672        540           2
EXP-00454-P1_PC02184-044 EXP-00454-P1_PC02184-044   2989        540           2
EXP-00454-P1_PC02184-043 EXP-00454-P1_PC02184-043   2292        540           2
EXP-00454-P1_PC02184-042 EXP-00454-P1_PC02184-042   1786        540           2
EXP-00454-P1_PC02184-041 EXP-00454-P1_PC02184-041    601        540           2
EXP-00454-P1_PC02184-040 EXP-00454-P1_PC02184-040    451        451           2
EXP-00454-P1_PC02184-039 EXP-00454-P1_PC02184-039    755        540           2
EXP-00454-P1_PC02184-038 EXP-00454-P1_PC

Converged after 5 iterations!

Adding peak matrix

Adding Peak Annotations

Using version 2 motifs!

Saving ArchRProject...

Loading ArchRProject...

Successfully loaded ArchRProject!


                                                   / |
                                                 /    \
            .                                  /      |.
            \\\                              /        |.
              \\\                          /           `|.
                \\\                      /              |.
                  \                    /                |\
                  \\#####\           /                  ||
                ==###########>      /                   ||
                 \\##==......\    /                     ||
            ______ =       =|__ /__                     ||      \\\
       \               '        ##_______ _____ ,--,__,=##,__   ///
        ,    __==    ___,-,__,--'#'  ==='      `-'    | ##,-/
        -,____,---'       \\####\\___

Extract and output peaks and annotations matrix for downstream use

In [40]:
type_peak_gr <- map(
    type_proj,
    getPeakSet
)

In [47]:
dir.create("output")

“'output' already exists”


In [48]:
peak_output_files <- paste0(
    "output/peak_GRanges-",
    cell_types,
    "_", Sys.Date(),
    ".rds")

In [50]:
walk2(
    type_peak_gr,
    peak_output_files,
    saveRDS
)

In [45]:
type_motif_mat <- map(
    type_proj,
    function(proj) {
        anno <- getPeakAnnotation(proj)
        matches <- readRDS(anno$Matches)
        assays(matches)$matches
    }
)

In [51]:
motif_mat_output_files <- paste0(
    "output/peak_motif_matches-",
    cell_types,
    "_", Sys.Date(),
    ".rds")

In [52]:
walk2(
    type_motif_mat,
    motif_mat_output_files,
    saveRDS
)

### Upload to HISE

In [23]:
study_space_uuid <- "40df6403-29f0-4b45-ab7d-f46d420c422e"
title <- paste("VRd TEA-seq T Cell Type Peaks and Motifs", Sys.Date())

In [24]:
out_list <- as.list(c(peak_output_files, motif_mat_output_files))

In [27]:
uploadFiles(
    files = out_list,
    studySpaceId = study_space_uuid,
    title = title,
    inputFileIds = atac_file_uuids,
    store = "project",
    doPrompt = FALSE
)

## Differential peak accessibility

To test for DAPs, we'll need to include the treatment and timepoint metadata for each sample in the ArchR projects:

In [9]:
sample_manifest <- read.csv("../common/EXP00454_TEAseq_sample_manifest.csv")

In [10]:
sample_meta <- sample_manifest %>%
  select(pbmc_sample_id, treatment, timepoint) %>%
  mutate(Sample = paste0("EXP-00454-P1_", pbmc_sample_id),
         treat_time = paste0(treatment, "_", timepoint)) %>%
  select(-pbmc_sample_id)
head(sample_meta)

Unnamed: 0_level_0,treatment,timepoint,Sample,treat_time
Unnamed: 0_level_1,<chr>,<int>,<chr>,<chr>
1,lenalidomide,72,EXP-00454-P1_PC02184-038,lenalidomide_72
2,bortezomib,72,EXP-00454-P1_PC02184-039,bortezomib_72
3,dmso,72,EXP-00454-P1_PC02184-040,dmso_72
4,dexamethasone,24,EXP-00454-P1_PC02184-041,dexamethasone_24
5,lenalidomide,24,EXP-00454-P1_PC02184-042,lenalidomide_24
6,bortezomib,24,EXP-00454-P1_PC02184-043,bortezomib_24


In [11]:
type_proj <- map(
    type_proj,
    function(proj) {
        proj_meta <- as.data.frame(getCellColData(proj))
        cell_names <- rownames(proj_meta)

        proj_meta <- proj_meta %>%
          select(Sample) %>%
          left_join(sample_meta)

        addCellColData(proj, proj_meta$treat_time, "treat_time", cell_names, force = TRUE)
        
    }
)

[1m[22mJoining with `by = join_by(Sample)`
[1m[22mJoining with `by = join_by(Sample)`
[1m[22mJoining with `by = join_by(Sample)`
[1m[22mJoining with `by = join_by(Sample)`
[1m[22mJoining with `by = join_by(Sample)`
[1m[22mJoining with `by = join_by(Sample)`


Next, we can define the foreground and background conditions for the DAP tests:

In [12]:
fg_treat_times <- c("bortezomib_4", "lenalidomide_4", "dexamethasone_4",
                    "bortezomib_24", "lenalidomide_24", "dexamethasone_24",
                    "bortezomib_72", "lenalidomide_72")
bg_treat_times <- c(rep("dmso_4", 3),
                    rep("dmso_24", 3),
                    rep("dmso_72", 2))

And we'll define this helper function for running the tests and retrieving the results.

Note that the cutoff of FDR < 2 is there to retrieve results for all peaks regardless of FDR, since all FDR values will be 1 or lower.

In [13]:
run_dap_test <- function(fg, bg, proj) {

    message(paste(fg, "vs", bg))
    
    suppressMessages(
        getMarkerFeatures(
            proj,
            useMatrix = "PeakMatrix",
            groupBy = "treat_time",
            useGroups = fg,
            bgdGroups = bg
        )
    )
    
}

In [14]:
table(getCellColData(type_proj[[1]])$treat_time)


   bortezomib_24     bortezomib_4    bortezomib_72 dexamethasone_24 
            4371             3562              685              981 
 dexamethasone_4          dmso_24           dmso_4          dmso_72 
            7124             5262             2679              648 
 lenalidomide_24   lenalidomide_4  lenalidomide_72      untreated_0 
            2742             1372             1583             4614 

In [15]:
all_dap_results <- map(
    type_proj,
    function(proj) {
        ct <- getCellColData(proj)$aifi_cell_type[1]
        message(ct)
        
        dap_results <- map2(
            fg_treat_times,
            bg_treat_times,
            run_dap_test,
            proj = proj
        )
        
        dap_results

    }
)

t_cd4_cm

bortezomib_4 vs dmso_4

lenalidomide_4 vs dmso_4

dexamethasone_4 vs dmso_4

bortezomib_24 vs dmso_24

lenalidomide_24 vs dmso_24

dexamethasone_24 vs dmso_24

bortezomib_72 vs dmso_72

lenalidomide_72 vs dmso_72

t_cd4_em

bortezomib_4 vs dmso_4

lenalidomide_4 vs dmso_4

dexamethasone_4 vs dmso_4

bortezomib_24 vs dmso_24

lenalidomide_24 vs dmso_24

dexamethasone_24 vs dmso_24

bortezomib_72 vs dmso_72

lenalidomide_72 vs dmso_72

t_cd4_naive

bortezomib_4 vs dmso_4

lenalidomide_4 vs dmso_4

dexamethasone_4 vs dmso_4

bortezomib_24 vs dmso_24

lenalidomide_24 vs dmso_24

dexamethasone_24 vs dmso_24

bortezomib_72 vs dmso_72

lenalidomide_72 vs dmso_72

t_cd4_treg

bortezomib_4 vs dmso_4

lenalidomide_4 vs dmso_4

dexamethasone_4 vs dmso_4

bortezomib_24 vs dmso_24

lenalidomide_24 vs dmso_24

dexamethasone_24 vs dmso_24

bortezomib_72 vs dmso_72

lenalidomide_72 vs dmso_72

t_cd8_memory

bortezomib_4 vs dmso_4

lenalidomide_4 vs dmso_4

dexamethasone_4 vs dmso_4

bortezom

Extract DAP results to data.frame

In [16]:
all_dap_df <- map2_dfr(
    all_dap_results,
    cell_types,
    function(dap, ct) {
        dap_df <- pmap_dfr(
            list(fg = fg_treat_times,
                 bg = bg_treat_times,
                 res = dap),
            function(fg, bg, res) {
                group_marker_res <- getMarkers(
                    res, 
                    cutOff = "FDR < 2")
                
                group_marker_df <- as.data.frame(group_marker_res[[1]])
                group_marker_df$fg <- fg
                group_marker_df$bg <- bg

                group_marker_df
            }
        )

        dap_df$aifi_cell_type <- ct
        dap_df
    }
)

In [62]:
all_dap_df <- all_dap_df %>%
  dplyr::rename(logFC = Log2FC,
                adjP = FDR) %>%
  select(aifi_cell_type, fg, bg, seqnames, start, end, logFC, adjP, MeanDiff, idx)

In [73]:
dap_output_file <- paste0(
    "output/all_archr_dap_",
    Sys.Date(),
    ".csv")

In [74]:
write.csv(all_dap_df,
          dap_output_file,
          quote = FALSE,
          row.names = FALSE)

## Motif annotation enrichment

For each set of differentially accessible peaks, we'll calculate differentially enriched motifs.

To account for the differences in cell count that affect sensitivity of DAP results, we'll use the top 500 peaks from each comparison to identify DEMs.

We'll also separate peaks with increased accessibility from peaks with decreased accessibility in the DAP results.

The helper function, below, does the bulk of this work:

In [17]:
top_dap_directional_dem <- function(fg, bg, dap, proj, top_n = 500) {

    message(paste(fg, "vs", bg))

    group_marker_res <- getMarkers(
        dap, 
        cutOff = "FDR < 2")
    
    peak_df <- as.data.frame(group_marker_res[[1]])
    names(peak_df) <- names(assays(dap))
    
    up_peaks <- peak_df %>%
      filter(Log2FC > 0) %>%
      arrange(Pval) %>%
      head(top_n)
    up_cut <- up_peaks$Pval[top_n]
    
    up_enriched_motifs <- suppressMessages(
        peakAnnoEnrichment(
            seMarker = dap,
            ArchRProj = proj,
            peakAnnotation = "Motif",
            cutOff = paste("Pval < ", up_cut,"& Log2FC > 0")
    ))
    
    up_res <- as.data.frame(as.list(assays(up_enriched_motifs)))
    names(up_res) <- names(assays(up_enriched_motifs))
    up_res$direction <- "up"
    
    dn_peaks <- peak_df %>%
      filter(Log2FC < 0) %>%
      arrange(Pval) %>%
      head(top_n)
    dn_cut <- dn_peaks$Pval[top_n]
    
    dn_enriched_motifs <- suppressMessages(
        peakAnnoEnrichment(
            seMarker = dap,
            ArchRProj = proj,
            peakAnnotation = "Motif",
            cutOff = paste("Pval < ", dn_cut, "& Log2FC < 0")
    ))
    
    dn_res <- as.data.frame(as.list(assays(dn_enriched_motifs)))
    names(dn_res) <- names(assays(dn_enriched_motifs))
    dn_res$direction <- "dn"
    
    res <- rbind(up_res, dn_res)
    res$fg <- fg
    res$bg <- bg

    res
}

Here, we'll iterate over each cell type (the first `map_dfr` call), and each comparison (the `pmap_dfr` call):

In [18]:
all_dem_res <- map2_dfr(
    type_proj,
    all_dap_results,
    function(proj, type_dap) {
        ct <- getCellColData(proj)$aifi_cell_type[1]
        message(ct)
         
        dem_res <- pmap_dfr(
            list(fg = fg_treat_times,
                 bg = bg_treat_times,
                 dap = type_dap),
            top_dap_directional_dem,
            proj = proj,
            top_n = 500
        )

        dem_res$aifi_cell_type <- ct
        dem_res
    }
)

t_cd4_cm

bortezomib_4 vs dmso_4

lenalidomide_4 vs dmso_4

dexamethasone_4 vs dmso_4

bortezomib_24 vs dmso_24

lenalidomide_24 vs dmso_24

dexamethasone_24 vs dmso_24

bortezomib_72 vs dmso_72

lenalidomide_72 vs dmso_72

t_cd4_em

bortezomib_4 vs dmso_4

lenalidomide_4 vs dmso_4

dexamethasone_4 vs dmso_4

bortezomib_24 vs dmso_24

lenalidomide_24 vs dmso_24

dexamethasone_24 vs dmso_24

bortezomib_72 vs dmso_72

lenalidomide_72 vs dmso_72

t_cd4_naive

bortezomib_4 vs dmso_4

lenalidomide_4 vs dmso_4

dexamethasone_4 vs dmso_4

bortezomib_24 vs dmso_24

lenalidomide_24 vs dmso_24

dexamethasone_24 vs dmso_24

bortezomib_72 vs dmso_72

lenalidomide_72 vs dmso_72

t_cd4_treg

bortezomib_4 vs dmso_4

lenalidomide_4 vs dmso_4

dexamethasone_4 vs dmso_4

bortezomib_24 vs dmso_24

lenalidomide_24 vs dmso_24

dexamethasone_24 vs dmso_24

bortezomib_72 vs dmso_72

lenalidomide_72 vs dmso_72

t_cd8_memory

bortezomib_4 vs dmso_4

lenalidomide_4 vs dmso_4

dexamethasone_4 vs dmso_4

bortezom

In [69]:
all_dem_res <- all_dem_res %>%
  mutate(
      nomP = 10^(-mlog10p),
      adjP = 10^(-mlog10Padj),
      tf_gene = sub("_.+", "", feature)
  ) %>%
  select(aifi_cell_type, fg, bg, direction,
         feature, tf_gene, Enrichment, nomP, adjP,
         everything())

In [70]:
head(all_dem_res)

Unnamed: 0_level_0,aifi_cell_type,fg,bg,direction,feature,tf_gene,Enrichment,nomP,adjP,mlog10Padj,mlog10p,BackgroundProporition,nBackground,BackgroundFrequency,CompareProportion,nCompare,CompareFrequency
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>,<int>,<dbl>,<int>,<int>
TFAP2B_1...1,t_cd4_cm,bortezomib_4,dmso_4,up,TFAP2B_1,TFAP2B,1.0086266,0.270832014,1.0,0.0,0.5673,0.06808505,72101,4909,0.06867239,35997,2472
TFAP2D_2...2,t_cd4_cm,bortezomib_4,dmso_4,up,TFAP2D_2,TFAP2D,1.0233564,0.002498619,0.02248757,1.648057,2.6023,0.16830557,72101,12135,0.17223658,35997,6200
TFAP2C_3...3,t_cd4_cm,bortezomib_4,dmso_4,up,TFAP2C_3,TFAP2C,1.0222279,0.007474807,0.06727327,1.172157,2.1264,0.14397859,72101,10381,0.14717893,35997,5298
TFAP2E_4...4,t_cd4_cm,bortezomib_4,dmso_4,up,TFAP2E_4,TFAP2E,0.9998059,0.51156403,1.0,0.0,0.2911,0.04959709,72101,3576,0.04958747,35997,1785
TFAP2A_5...5,t_cd4_cm,bortezomib_4,dmso_4,up,TFAP2A_5,TFAP2A,1.0139218,0.135987816,1.0,0.0,0.8665,0.0815384,72101,5879,0.08267356,35997,2976
ARID3A_6...6,t_cd4_cm,bortezomib_4,dmso_4,up,ARID3A_6,ARID3A,0.9822014,0.99907939,1.0,0.0,0.0004,0.29386555,72101,21188,0.28863516,35997,10390


In [75]:
dem_output_file <- paste0(
    "output/all_archr_dem_",
    Sys.Date(),
    ".csv")

In [76]:
write.csv(all_dem_res,
          dem_output_file,
          quote = FALSE,
          row.names = FALSE)

### Store differentials and enrichments in HISE

In [23]:
study_space_uuid <- "40df6403-29f0-4b45-ab7d-f46d420c422e"
title <- paste("VRd TEA-seq T Cell Type DAP and DEM", Sys.Date())

In [24]:
out_list <- as.list(c(dap_output_file, dem_output_file))

In [27]:
uploadFiles(
    files = out_list,
    studySpaceId = study_space_uuid,
    title = title,
    inputFileIds = atac_file_uuids,
    store = "project",
    doPrompt = FALSE
)

## Peak-to-Gene correlation

Finally, we'll test peak-to-gene correlation by integrating the scRNA-seq data from TEA-seq. This will be matched at the single-cell level to the cells in the ArchR Projects. First, we'll need to retrieve the scRNA-seq matrices from HISE:

In [141]:
scrna_file_ids <- list(
    "7bdac6ef-e5e5-4150-b4f3-9c1a1e250334", # CD4 data
    "46438bc4-cde6-4ae6-b349-9c513dd9d16f" # CD8 data
)

In [21]:
scrna_file_res <- hise::cacheFiles(
    scrna_file_ids
)

[1] "Initiating file download for filtered_cd4_te_seurat.rds"
[1] "Download successful."
[1] "Initiating file download for filtered_cd8_te_seurat.rds"
[1] "Download successful."


In [22]:
scrna_files <- map(scrna_file_res, "filePath")

In [23]:
so_list <- map(scrna_files, readRDS)

In [24]:
count_mat <- cbind(
    so_list[[1]][["RNA"]]@counts,
    so_list[[2]][["RNA"]]@counts
)

Loading required package: SeuratObject

Loading required package: sp

The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
which was just loaded, will retire in October 2023.
Please refer to R-spatial evolution reports for details, especially
https://r-spatial.org/r/2023/05/15/evolution4.html.
It may be desirable to make the sf package available;
package maintainers should consider adding sf to Suggests:.
The sp package is now running under evolution status 2
     (status 2 uses the sf package in place of rgdal)


Attaching package: ‘sp’


The following object is masked from ‘package:IRanges’:

    %over%



Attaching package: ‘SeuratObject’


The following object is masked from ‘package:SummarizedExperiment’:

    Assays




Before integration, we'll need to filter for genes that are in the gene annotations for the ArchR Projects.

We'll then be able to use the counts and the gene GRanges to build the SummarizedExperiment objects that ArchR requires.

In [25]:
gene_gr <- type_proj[[1]]@geneAnnotation$genes
gene_gr <- gene_gr[gene_gr$symbol %in% rownames(count_mat)]
count_mat <- count_mat[gene_gr$symbol,]

In [26]:
str(count_mat)

Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  ..@ i       : int [1:238272702] 7 50 53 68 90 106 109 134 142 143 ...
  ..@ p       : int [1:147415] 0 2010 4007 5983 7463 9226 10851 12304 14641 16258 ...
  ..@ Dim     : int [1:2] 21406 147414
  ..@ Dimnames:List of 2
  .. ..$ : chr [1:21406] "OR4F5" "FAM87B" "LINC01128" "LINC00115" ...
  .. ..$ : chr [1:147414] "2da9d348fb8111eda35df29f570c0793" "2daec6d2fb8111eda35df29f570c0793" "2db119d2fb8111eda35df29f570c0793" "2db582c4fb8111eda35df29f570c0793" ...
  ..@ x       : num [1:238272702] 1 1 3 1 1 1 1 1 3 1 ...
  ..@ factors : list()


For each cell type, we'll select the cells for that type using barcodes, then match the cell names to those used by ArchR so that the data are compatible.

In [27]:
type_proj <- map(
    type_proj,
    function(proj) {
        proj_meta <- as.data.frame(getCellColData(proj))
        rna_mat <- count_mat[,proj_meta$barcodes]
        colnames(rna_mat) <- rownames(proj_meta)

        serna <- SummarizedExperiment(
            assays = SimpleList(counts = rna_mat),
            rowRanges = gene_gr
        )
        
        addGeneExpressionMatrix(
            proj,
            serna
        )
    })

2023-10-05 20:18:36.47052 : SuppressedMessaged due to getArchRVerbose() is FALSE!

2023-10-05 20:18:36.545285 : SuppressedMessaged due to getArchRVerbose() is FALSE!

2023-10-05 20:20:48.105922 : SuppressedMessaged due to getArchRVerbose() is FALSE!

2023-10-05 20:20:48.143577 : SuppressedMessaged due to getArchRVerbose() is FALSE!

2023-10-05 20:22:54.911468 : SuppressedMessaged due to getArchRVerbose() is FALSE!

2023-10-05 20:22:55.04188 : SuppressedMessaged due to getArchRVerbose() is FALSE!

2023-10-05 20:25:11.941557 : SuppressedMessaged due to getArchRVerbose() is FALSE!

2023-10-05 20:25:11.970226 : SuppressedMessaged due to getArchRVerbose() is FALSE!

2023-10-05 20:27:20.67783 : SuppressedMessaged due to getArchRVerbose() is FALSE!

2023-10-05 20:27:20.717445 : SuppressedMessaged due to getArchRVerbose() is FALSE!

2023-10-05 20:29:27.942837 : SuppressedMessaged due to getArchRVerbose() is FALSE!

2023-10-05 20:29:27.994991 : SuppressedMessaged due to getArchRVerbose() is FAL

Now that we have gene expression in the object, we can compute peak-to-gene correlations. This step requires a reduced dimensionality projection to find neighborhoods of cells, so we'll also compute LSI for each cell type.

In [None]:
type_proj <- map(
    type_proj,
    addIterativeLSI
)

Checking Inputs...

Checking Inputs...



In [37]:
type_proj <- map(
    type_proj,
    addPeak2GeneLinks,
    useMatrix = "GeneExpressionMatrix"
)

Filtering 1 dims correlated > 0.75 to log10(depth + 1)



We'll use this helper function to extract the Peak2Gene results using an absolute correlation cutoff. This is more permissive than the methods built in to ArchR, as it will allow us to retrieve negative correlations as well as positive correlations. These can be difficult to interpret, but let's keep them at this point.

It's also nice to have the position of the peak, gene symbol, and distance from peak to gene for thinking about the results. This function will also collate that information from the peak and gene GRanges objects.

In [134]:
get_p2g_links <- function(proj, abs_cutoff = 0.1) {
    p2g <- metadata(proj@peakSet)$Peak2GeneLinks
    peaks <- metadata(p2g)$peakSet
    genes <- metadata(p2g)$geneSet
    p2g_df <- as.data.frame(p2g)
    p2g_df <- p2g_df %>%
      filter(abs(Correlation) > 0.1)
    p2g_df <- p2g_df %>%
      mutate(gene = as.character(genes$name[idxRNA]),
             seqnames = as.character(seqnames(peaks)[idxATAC]),
             start = start(peaks)[idxATAC],
             end = end(peaks)[idxATAC]) %>%
      mutate(
         distance = pmin(
             abs(start(peaks)[idxATAC] - start(genes)[idxRNA]),
             abs(end(peaks)[idxATAC] - start(genes)[idxRNA])
         )
      )
}

In [135]:
type_p2g <- map2(
    type_proj,
    cell_types,
    function(proj, ct) {
        p2g_df <- get_p2g_links(
            proj,
            abs_cutoff = 0.1
        )
        p2g_df$aifi_cell_type <- ct
        p2g_df
    }
)

In [137]:
p2g_output_files <- paste0(
    "output/peak_to_gene-",
    cell_types,
    "_", Sys.Date(),
    ".rds")

In [138]:
walk2(
    type_p2g,
    p2g_output_files,
    write.csv,
    quote = FALSE,
    row.names = FALSE
)

### Store Peak2Gene results in HISE

In [23]:
study_space_uuid <- "40df6403-29f0-4b45-ab7d-f46d420c422e"
title <- paste("VRd TEA-seq T Cell Type Peak2Gene", Sys.Date())

In [142]:
out_list <- as.list(p2g_output_files)
input_ids <- c(atac_file_uuids, scrna_file_ids)

In [27]:
uploadFiles(
    files = out_list,
    studySpaceId = study_space_uuid,
    title = title,
    inputFileIds = input_ids,
    store = "project",
    doPrompt = FALSE
)

# Session Info

In [29]:
sessionInfo()

R version 4.3.1 (2023-06-16)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS/LAPACK: /opt/conda/lib/libopenblasp-r0.3.23.so;  LAPACK version 3.11.0

Random number generation:
 RNG:     L'Ecuyer-CMRG 
 Normal:  Inversion 
 Sample:  Rejection 
 
locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
 [1] parallel  stats4    grid      stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] uwot_0.1.16                       Seurat_4.3.0.1                   
 [3] SeuratObject_4.1.3                sp_2.0-0                         
 [5] presto_1.0.0                   