# ArchR multi-sample recipe step 4 -- (optionally) prepare some visualizations of the data
**Author**: Adam Klie (last modified: 11/06/2023)<br>
***
**Description**: This script runs 

In [2]:
# Load libraries
suppressMessages(library(Seurat))
suppressMessages(library(ArchR))
suppressMessages(library(parallel))
suppressMessages(library(tidyverse))
suppressMessages(library(BSgenome.Hsapiens.UCSC.hg38))
suppressMessages(library(rtracklayer))
suppressMessages(library(GenomicRanges))

In [7]:
# Params
archr_proj_path = "/cellar/users/aklie/data/datasets/igvf_sc-islet_10X-Multiome/annotation/previous/2024_01_23/timecourse/A2_control/archr"
threads = 4
seed = 1234

In [8]:
# Move the working directory 
set.seed(seed)
addArchRThreads(threads)
setwd(archr_proj_path)

Setting default number of Parallel threads to 4.



The precompiled version of the hg38 genome in ArchR uses BSgenome.Hsapiens.UCSC.hg38, TxDb.Hsapiens.UCSC.hg38.knownGene, org.Hs.eg.db, and a blacklist that was merged using ArchR::mergeGR() from the hg38 v2 blacklist regions and from mitochondrial regions that show high mappability to the hg38 nuclear genome from Caleb Lareau and Jason Buenrostro. To set a global genome default to the precompiled hg38 genome:

In [9]:
# Add annotation
addArchRGenome("hg38")

Setting default genome to Hg38.



# Load the ArchR project

In [35]:
# Load the ArchR project
proj = loadArchRProject(path = "./")
proj

Successfully loaded ArchRProject!


                                                   / |
                                                 /    \
            .                                  /      |.
            \\\                              /        |.
              \\\                          /           `|.
                \\\                      /              |.
                  \                    /                |\
                  \\#####\           /                  ||
                ==###########>      /                   ||
                 \\##==......\    /                     ||
            ______ =       =|__ /__                     ||      \\\
       \               '        ##_______ _____ ,--,__,=##,__   ///
        ,    __==    ___,-,__,--'#'  ==='      `-'    | ##,-/
        -,____,---'       \\####\\________________,--\\_##,/
           ___      .______        ______  __    __  .______      
          /   \     |   _  \      /      ||  |  |  | |   _ 

class: ArchRProject 
outputDirectory: /cellar/users/aklie/data/datasets/igvf_sc-islet_10X-Multiome/annotation/previous/2024_01_23/timecourse/A2_control/archr 
samples(4): dm35a dm25a dm45a dm0b
sampleColData names(1): ArrowFiles
cellColData names(23): Sample TSSEnrichment ... ReadsInPeaks FRIP
numberOfCells(1): 16307
medianTSS(1): 12.402
medianFrags(1): 18614

# Add annotations

In [36]:
annotations_path <- "/cellar/users/aklie/data/datasets/igvf_sc-islet_10X-Multiome/annotation/annotations.txt"

In [37]:
# Read annotations with no header
annotations = read.csv(annotations_path, row.names = 1, sep = "\t", header = FALSE)
annotations = as.data.frame(annotations)
cellids = rownames(annotations)
matched_ids = intersect(cellids, rownames(proj@cellColData))
idxSample <- BiocGenerics::which(proj$cellNames %in% matched_ids)
proj <- proj[idxSample,]
annotations = annotations[proj$cellNames, ]
proj$annotation <- annotations
group_by = "annotation"
proj

Unnamed: 0_level_0,V2
Unnamed: 0_level_1,<chr>
dm45a#TTTAACCTCTGCAAGT-1,SC.EC
dm45a#TCCGGAATCCACCTTA-1,SC.beta
dm45a#GAAAGGCTCATTAGGC-1,SC.alpha
dm45a#ATTTGCAAGATTGAGG-1,SC.alpha
dm45a#TCCAGGTCAAGTGTTT-1,SC.EC
dm45a#GTCCTCCCAATAAGCA-1,SC.beta


# Make pseudo-bulk replicates

In [8]:
proj <- addGroupCoverages(ArchRProj = proj, groupBy = "rna_annotation")

ArchR logging to : ArchRLogs/ArchR-addGroupCoverages-34d001703bb734-Date-2023-11-14_Time-15-43-15.95358.log
If there is an issue, please report to github with logFile!



other (1 of 5) : CellGroups N = 5

SC.alpha (2 of 5) : CellGroups N = 5

SC.beta (3 of 5) : CellGroups N = 5

SC.delta (4 of 5) : CellGroups N = 2

SC.EC (5 of 5) : CellGroups N = 5

2023-11-14 15:43:19.386711 : Creating Coverage Files!, 0.057 mins elapsed.

2023-11-14 15:43:19.388862 : Batch Execution w/ safelapply!, 0.057 mins elapsed.

2023-11-14 15:48:15.2697 : Adding Kmer Bias to Coverage Files!, 4.989 mins elapsed.

Completed Kmer Bias Calculation

Adding Kmer Bias (1 of 22)

Adding Kmer Bias (2 of 22)

Adding Kmer Bias (3 of 22)

Adding Kmer Bias (4 of 22)

Adding Kmer Bias (5 of 22)

Adding Kmer Bias (6 of 22)

Adding Kmer Bias (7 of 22)

Adding Kmer Bias (8 of 22)

Adding Kmer Bias (9 of 22)

Adding Kmer Bias (10 of 22)

Adding Kmer Bias (11 of 22)

Adding Kmer Bias (12 of 22)

Adding Kmer Bias (13 of 22)

Adding Kmer Bias (14 of 22)

Adding Kmer Bias (15 of 22)

Adding Kmer Bias (16 of 22)

Adding Kmer Bias (17 of 22)

Adding Kmer Bias (18 of 22)

Adding Kmer Bias (19 of 22)


# Calling peaks

In [10]:
proj <- addReproduciblePeakSet(
    ArchRProj = proj, 
    groupBy = "rna_annotation", 
    pathToMacs2 = "/cellar/users/aklie/opt/miniconda3/envs/chrombpnet/bin/macs2"
)

ArchR logging to : ArchRLogs/ArchR-addReproduciblePeakSet-34d00128622efe-Date-2023-11-14_Time-15-55-30.381218.log
If there is an issue, please report to github with logFile!

Calling Peaks with Macs2

2023-11-14 15:55:32.674798 : Peak Calling Parameters!, 0.038 mins elapsed.



            Group nCells nCellsUsed nReplicates nMin nMax maxPeaks
other       other    803        733           5   87  245   150000
SC.alpha SC.alpha   3727       2087           5  304  500   150000
SC.beta   SC.beta   5021       2500           5  500  500   150000
SC.delta SC.delta    186        186           2   42  144    93000
SC.EC       SC.EC   4945       2500           5  500  500   150000


2023-11-14 15:55:33.500394 : Batching Peak Calls!, 0.052 mins elapsed.

2023-11-14 15:55:33.637315 : Batch Execution w/ safelapply!, 0 mins elapsed.

2023-11-14 16:05:07.484685 : Identifying Reproducible Peaks!, 9.618 mins elapsed.

2023-11-14 16:06:40.639117 : Creating Union Peak Set!, 11.171 mins elapsed.

Converged after 9 iterations!

Plotting Ggplot!

2023-11-14 16:08:06.851951 : Finished Creating Union Peak Set (212166)!, 12.608 mins elapsed.



In [31]:
peaks_dir <- file.path(archr_proj_path, "PeakCalls", "SplitPeaks")
if (!dir.exists(peaks_dir)) {
    dir.create(peaks_dir)
}

In [13]:
peakset <- getPeakSet(proj)

In [35]:
export.bed(peakset, con=file.path(peaks_dir, "consensus_peaks.bed"))

In [36]:
# Split the granges object by the names
peakset_list <- split(peakset, names(peakset))

In [37]:
# For each granges in the lsit export to bed file
for (i in 1:length(peakset_list)) {
    gr <- peakset_list[[i]]
    cell_type <- names(peakset_list)[i]
    export.bed(gr, con=file.path(peaks_dir, paste0(cell_type, ".bed")))
}

# Adding a peak matrix

In [11]:
proj <- addPeakMatrix(proj)

ArchR logging to : ArchRLogs/ArchR-addPeakMatrix-34d0011afbb789-Date-2023-11-14_Time-16-09-04.625874.log
If there is an issue, please report to github with logFile!

2023-11-14 16:09:06.769429 : Batch Execution w/ safelapply!, 0 mins elapsed.

ArchR logging successful to : ArchRLogs/ArchR-addPeakMatrix-34d0011afbb789-Date-2023-11-14_Time-16-09-04.625874.log



In [23]:
getAvailableMatrices(proj)

In [24]:
mtx <- getMatrixFromProject(
    proj, 
    useMatrix="PeakMatrix",
)

ArchR logging to : ArchRLogs/ArchR-getMatrixFromProject-34d001675737ca-Date-2023-11-14_Time-16-21-44.179487.log
If there is an issue, please report to github with logFile!

2023-11-14 16:22:22.78913 : Organizing colData, 0.644 mins elapsed.

2023-11-14 16:22:22.921831 : Organizing rowData, 0.646 mins elapsed.

2023-11-14 16:22:22.959622 : Organizing rowRanges, 0.646 mins elapsed.

2023-11-14 16:22:23.05748 : Organizing Assays (1 of 1), 0.648 mins elapsed.

2023-11-14 16:22:28.472186 : Constructing SummarizedExperiment, 0.738 mins elapsed.

2023-11-14 16:22:32.849986 : Finished Matrix Creation, 0.811 mins elapsed.



In [38]:
dim(mtx)

In [70]:
mtx_dir <- file.path(archr_proj_path, "Matrices", "PeakMatrix")
if (!dir.exists(mtx_dir)) {
    dir.create(mtx_dir)
}

In [71]:
# Save the peak matrix
library(Matrix)

In [72]:
# Make a list of regions based on the peakset chr:start-end
regions <- paste0(
    seqnames(peakset),
    ":",
    start(peakset),
    "-",
    end(peakset)
)

In [73]:
# Get the list of cellbarcodes from the ArchR project
cells <- proj$cellNames

In [74]:
mtx <- mtx@assays@data[[1]]

ERROR: Error in eval(expr, envir, enclos): no slot of name "assays" for this object of class "dgCMatrix"


In [75]:
rownames(mtx) <- regions
colnames(mtx) <- cells

In [78]:
# Write out mtx.mtx, features.tsv, and barcodes.tsv
writeMM(mtx, file.path(mtx_dir, "mtx.mtx"))
write.table(rownames(mtx), file.path(mtx_dir, "features.tsv"), row.names=F, col.names=F, quote=F)
write.table(colnames(mtx), file.path(mtx_dir, "barcodes.tsv"), row.names=F, col.names=F, quote=F)

NULL

# Saving group bw coverages

In [11]:
getGroupBW(
  ArchRProj = proj,
  groupBy = "rna_annotation",
  normMethod = "ReadsInTSS",
  tileSize = 100,
  maxCells = 1000,
  ceiling = 4,
  verbose = TRUE,
  threads = getArchRThreads(),
  logFile = createLogFile("getGroupBW")
)

ArchR logging to : ArchRLogs/ArchR-getGroupBW-3cc51d3d733bf4-Date-2024-01-12_Time-08-55-39.603901.log
If there is an issue, please report to github with logFile!

2024-01-12 08:55:58.038486 : other (1 of 5) : Creating BigWig for Group, 0.307 mins elapsed.

2024-01-12 08:56:54.065108 : SC.alpha (2 of 5) : Creating BigWig for Group, 1.241 mins elapsed.

2024-01-12 08:57:50.526167 : SC.beta (3 of 5) : Creating BigWig for Group, 2.182 mins elapsed.

2024-01-12 08:58:49.872113 : SC.delta (4 of 5) : Creating BigWig for Group, 3.171 mins elapsed.

2024-01-12 08:59:33.630328 : SC.EC (5 of 5) : Creating BigWig for Group, 3.9 mins elapsed.

ArchR logging successful to : ArchRLogs/ArchR-getGroupBW-3cc51d3d733bf4-Date-2024-01-12_Time-08-55-39.603901.log



# Saving ArchR project

In [None]:
# Save object with new stuff added
print("Saving ArchR project\n")
saveArchRProject(
  ArchRProj = proj,
  outputDirectory = "./",
)