In [1]:
suppressWarnings(library(ArchR))
suppressWarnings(library(dplyr))
suppressWarnings(library(BSgenome.Hsapiens.UCSC.hg38))


                                                   / |
                                                 /    \
            .                                  /      |.
            \\\                              /        |.
              \\\                          /           `|.
                \\\                      /              |.
                  \                    /                |\
                  \\#####\           /                  ||
                ==###########>      /                   ||
                 \\##==......\    /                     ||
            ______ =       =|__ /__                     ||      \\\
       \               '        ##_______ _____ ,--,__,=##,__   ///
        ,    __==    ___,-,__,--'#'  ==='      `-'    | ##,-/
        -,____,---'       \\####\\________________,--\\_##,/
           ___      .______        ______  __    __  .______      
          /   \     |   _  \      /      ||  |  |  | |   _  \     
         /  ^  \    |  |_) 

In [2]:
setwd('~/workspace/Mida_collab/')

In [3]:
addArchRGenome('hg38')
addArchRThreads(threads = 3)

Setting default genome to Hg38.

Setting default number of Parallel threads to 3.



### ArchRProject Construction

In [4]:
input_file <- c('sample_data/ATAC_raw/Adjacent//fragments.tsv.gz','sample_data/ATAC_raw/Distant/fragments.tsv.gz','sample_data/ATAC_raw/Sulcus/fragments.tsv.gz')
names(input_file) <- c('Adjacent','Distant','Sulcus')

In [5]:
ArrowFiles <- createArrowFiles(
  inputFiles = input_file,
  sampleNames = names(input_file),
  minTSS = 4,
  minFrags = 1000, 
  addTileMat = TRUE,
  addGeneScoreMat = TRUE
)

Using GeneAnnotation set by addArchRGenome(Hg38)!

Using GeneAnnotation set by addArchRGenome(Hg38)!

ArchR logging to : ArchRLogs/ArchR-createArrows-203d8334713f2-Date-2025-09-23_Time-03-33-21.502017.log
If there is an issue, please report to github with logFile!

Cleaning Temporary Files

subThreading Disabled since ArchRLocking is TRUE see `addArchRLocking`

2025-09-23 03:33:21.610821 : Batch Execution w/ safelapply!, 0 mins elapsed.

ArchR logging successful to : ArchRLogs/ArchR-createArrows-203d8334713f2-Date-2025-09-23_Time-03-33-21.502017.log



In [6]:
ArrowFiles <- c('Adjacent.arrow','Sulcus.arrow','Distant.arrow')

In [7]:
crfd <- ArchRProject(ArrowFiles = ArrowFiles,outputDirectory = 'CorticalFolding/',copyArrows = FALSE)

Using GeneAnnotation set by addArchRGenome(Hg38)!

Using GeneAnnotation set by addArchRGenome(Hg38)!

Validating Arrows...

Getting SampleNames...



Getting Cell Metadata...



Merging Cell Metadata...

Initializing ArchRProject...


                                                   / |
                                                 /    \
            .                                  /      |.
            \\\                              /        |.
              \\\                          /           `|.
                \\\                      /              |.
                  \                    /                |\
                  \\#####\           /                  ||
                ==###########>      /                   ||
                 \\##==......\    /                     ||
            ______ =       =|__ /__                     ||      \\\
       \               '        ##_______ _____ ,--,__,=##,__   ///
        ,    __==    ___,-,__,--'#'  ==='      `-'

### Adding AMULET Doublets info

In [8]:
####adding doublets info
doublet <- lapply(c('Adjacent','Distant','Sulcus'),function(sample){
    doublet_ids <- read.table(paste0('QC_ATAC/AMULET/',sample,'/MultipletCellIds_01.txt'),sep='\t')
    prob <- read.table(paste0('QC_ATAC/AMULET/',sample,'/MultipletProbabilities.txt'),header=1,sep='\t') %>%
            set_rownames(paste0(sample,'#',.$cell_id)) %>%
            mutate(Doublet = cell_id %in% doublet_ids$V1)
    return(prob)
}) %>%
do.call(rbind,.)
crfd@cellColData[,c('doublet_pval','doublet_qval','doublet')] <- doublet[getCellNames(crfd),c('p.value','q.value','Doublet')]

In [9]:
saveArchRProject(crfd,'CorticalFolding/')

Copying Arrow Files...

Copying Arrow Files (1 of 3)

Copying Arrow Files (2 of 3)

Copying Arrow Files (3 of 3)

Saving ArchRProject...

Loading ArchRProject...

Successfully loaded ArchRProject!


                                                   / |
                                                 /    \
            .                                  /      |.
            \\\                              /        |.
              \\\                          /           `|.
                \\\                      /              |.
                  \                    /                |\
                  \\#####\           /                  ||
                ==###########>      /                   ||
                 \\##==......\    /                     ||
            ______ =       =|__ /__                     ||      \\\
       \               '        ##_______ _____ ,--,__,=##,__   ///
        ,    __==    ___,-,__,--'#'  ==='      `-'    | ##,-/
        -,____,---'     

class: ArchRProject 
outputDirectory: /data/gaojie/Mida_collab/CorticalFolding 
samples(3): Adjacent Sulcus Distant
sampleColData names(1): ArrowFiles
cellColData names(16): Sample TSSEnrichment ... doublet_qval doublet
numberOfCells(1): 20981
medianTSS(1): 11.156
medianFrags(1): 37100

### Dimensional Reduction and Batch Correction

In [10]:
###adding LSI
crfd <- addIterativeLSI(
    ArchRProj = crfd,
    useMatrix = "TileMatrix", 
    name = "IterativeLSI", 
    iterations = 2, 
    clusterParams = list( 
        resolution = c(0.2), 
        sampleCells = 10000, 
        n.start = 10
    ), 
    varFeatures = 25000, 
    dimsToUse = 1:30
)

Checking Inputs...

ArchR logging to : ArchRLogs/ArchR-addIterativeLSI-203d85a897fb8-Date-2025-09-23_Time-03-33-43.489631.log
If there is an issue, please report to github with logFile!

2025-09-23 03:33:43.725943 : Computing Total Across All Features, 0.002 mins elapsed.

2025-09-23 03:33:46.790663 : Computing Top Features, 0.053 mins elapsed.

###########
2025-09-23 03:33:47.695949 : Running LSI (1 of 2) on Top Features, 0.068 mins elapsed.
###########

2025-09-23 03:33:47.751713 : Sampling Cells (N = 10002) for Estimated LSI, 0.069 mins elapsed.

2025-09-23 03:33:47.754319 : Creating Sampled Partial Matrix, 0.069 mins elapsed.

2025-09-23 03:34:18.835885 : Computing Estimated LSI (projectAll = FALSE), 0.587 mins elapsed.

2025-09-23 03:34:37.312382 : Identifying Clusters, 0.895 mins elapsed.

2025-09-23 03:34:48.37703 : Identified 7 Clusters, 1.079 mins elapsed.

2025-09-23 03:34:48.3907 : Saving LSI Iteration, 1.079 mins elapsed.

“[1m[22mThe `size` argument of `element_line()` i

In [11]:
crfd <- addHarmony(
    ArchRProj = crfd,
    reducedDims = "IterativeLSI",
    name = "Harmony",
    groupBy = "Sample",
    theta = 1,
    lambda = 0.1,
    sigma = 0.06,
    force = TRUE
)

Transposing data matrix

Initializing state using k-means centroids initialization

Harmony 1/10

Harmony 2/10

Harmony 3/10

Harmony converged after 3 iterations



In [12]:
crfd <- addUMAP(
    ArchRProj = crfd, 
    reducedDims = "Harmony", 
    name = "Harmony_UMAP", 
    nNeighbors = 30, 
    minDist = 0.5, 
    metric = "cosine",
    force = TRUE
)

03:37:08 UMAP embedding parameters a = 0.583 b = 1.334

03:37:08 Read 20981 rows and found 30 numeric columns

03:37:08 Using Annoy for neighbor search, n_neighbors = 30

03:37:08 Building Annoy index with metric = cosine, n_trees = 50

0%   10   20   30   40   50   60   70   80   90   100%

[----|----|----|----|----|----|----|----|----|----|

*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
|

03:37:10 Writing NN index file to temp file /tmp/RtmpKSuo8O/file203d8766dfb56

03:37:10 Searching Annoy index using 192 threads, search_k = 3000

03:37:11 Annoy recall = 100%

03:37:12 Commencing smooth kNN distance calibration using 192 threads
 with target n_neighbors = 30

03:37:13 Initializing from normalized Laplacian + noise (using RSpectra)

03:37:14 Commencing optimization for 200 epochs, with 986874 positive edges

03:37:14 Using rng type: pcg

03:37:23 Optimization finished

03:37:23 Creating temp model dir /tmp/RtmpKSuo8O/dir203d825fb

In [13]:
###adding Clusters, using resolution=0.4
###We are only using LSI clustering for breif analysis, will replace it using Multi-omics integration results by transfering labels
crfd <- addClusters(
    input = crfd,
    reducedDims = "IterativeLSI",
    method = "Seurat",
    name = "Clusters",
    resolution = 0.4
)

ArchR logging to : ArchRLogs/ArchR-addClusters-203d86fe65ab4-Date-2025-09-23_Time-03-37-24.895383.log
If there is an issue, please report to github with logFile!

2025-09-23 03:37:25.169341 : Running Seurats FindClusters (Stuart et al. Cell 2019), 0.001 mins elapsed.

Computing nearest neighbor graph

Computing SNN



Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 20981
Number of edges: 789628

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9209
Number of communities: 13
Elapsed time: 2 seconds


2025-09-23 03:37:41.252696 : Testing Biased Clusters, 0.269 mins elapsed.

2025-09-23 03:37:41.304736 : Testing Outlier Clusters, 0.27 mins elapsed.

2025-09-23 03:37:41.308644 : Assigning Cluster Names to 13 Clusters, 0.27 mins elapsed.

2025-09-23 03:37:41.369655 : Finished addClusters, 0.271 mins elapsed.



In [14]:
saveArchRProject(crfd,outputDirectory = 'CorticalFolding')

Copying Arrow Files...

Copying Arrow Files (1 of 3)

Copying Arrow Files (2 of 3)

Copying Arrow Files (3 of 3)

Saving ArchRProject...

Loading ArchRProject...

Successfully loaded ArchRProject!


                                                   / |
                                                 /    \
            .                                  /      |.
            \\\                              /        |.
              \\\                          /           `|.
                \\\                      /              |.
                  \                    /                |\
                  \\#####\           /                  ||
                ==###########>      /                   ||
                 \\##==......\    /                     ||
            ______ =       =|__ /__                     ||      \\\
       \               '        ##_______ _____ ,--,__,=##,__   ///
        ,    __==    ___,-,__,--'#'  ==='      `-'    | ##,-/
        -,____,---'     

class: ArchRProject 
outputDirectory: /data/gaojie/Mida_collab/CorticalFolding 
samples(3): Adjacent Sulcus Distant
sampleColData names(1): ArrowFiles
cellColData names(17): Sample TSSEnrichment ... doublet Clusters
numberOfCells(1): 20981
medianTSS(1): 11.156
medianFrags(1): 37100

### Peak Calling

In [15]:
crfd <- addGroupCoverages(ArchRProj = crfd, groupBy = "Clusters")

ArchR logging to : ArchRLogs/ArchR-addGroupCoverages-203d85c6a366f-Date-2025-09-23_Time-03-38-02.467088.log
If there is an issue, please report to github with logFile!

C1 (1 of 13) : CellGroups N = 3

C2 (2 of 13) : CellGroups N = 2

C3 (3 of 13) : CellGroups N = 2

C4 (4 of 13) : CellGroups N = 2

C5 (5 of 13) : CellGroups N = 2

C6 (6 of 13) : CellGroups N = 2

C7 (7 of 13) : CellGroups N = 2

C8 (8 of 13) : CellGroups N = 2

C9 (9 of 13) : CellGroups N = 3

C10 (10 of 13) : CellGroups N = 2

C11 (11 of 13) : CellGroups N = 3

C12 (12 of 13) : CellGroups N = 2

C13 (13 of 13) : CellGroups N = 2

2025-09-23 03:38:05.272827 : Further Sampled 2 Groups above the Max Fragments!, 0.047 mins elapsed.

2025-09-23 03:38:05.530276 : Creating Coverage Files!, 0.051 mins elapsed.

2025-09-23 03:38:05.532965 : Batch Execution w/ safelapply!, 0.051 mins elapsed.

2025-09-23 03:38:05.603368 : Group C1._.Sulcus (1 of 29) : Creating Group Coverage File : C1._.Sulcus.insertions.coverage.h5, 0.052 min

R_zmq_msg_send errno: 4 strerror: 被中断的系统调用
R_zmq_msg_send errno: 4 strerror: 被中断的系统调用
R_zmq_msg_send errno: 4 strerror: 被中断的系统调用
R_zmq_msg_send errno: 4 strerror: 被中断的系统调用
R_zmq_msg_send errno: 4 strerror: 被中断的系统调用
R_zmq_msg_send errno: 4 strerror: 被中断的系统调用
R_zmq_msg_send errno: 4 strerror: 被中断的系统调用
R_zmq_msg_send errno: 4 strerror: 被中断的系统调用
R_zmq_msg_send errno: 4 strerror: 被中断的系统调用
R_zmq_msg_send errno: 4 strerror: 被中断的系统调用
R_zmq_msg_send errno: 4 strerror: 被中断的系统调用
R_zmq_msg_send errno: 4 strerror: 被中断的系统调用
R_zmq_msg_send errno: 4 strerror: 被中断的系统调用
R_zmq_msg_send errno: 4 strerror: 被中断的系统调用
R_zmq_msg_send errno: 4 strerror: 被中断的系统调用
R_zmq_msg_send errno: 4 strerror: 被中断的系统调用
R_zmq_msg_send errno: 4 strerror: 被中断的系统调用
R_zmq_msg_send errno: 4 strerror: 被中断的系统调用
R_zmq_msg_send errno: 4 strerror: 被中断的系统调用
R_zmq_msg_send errno: 4 strerror: 被中断的系统调用
R_zmq_msg_send errno: 4 strerror: 被中断的系统调用
R_zmq_msg_send errno: 4 strerror: 被中断的系统调用
R_zmq_msg_send errno: 4 strerror: 被中断的系统调用
R_zmq_msg_s

Completed Kmer Bias Calculation

Adding Kmer Bias (1 of 29)

Adding Kmer Bias (2 of 29)

Adding Kmer Bias (3 of 29)

Adding Kmer Bias (4 of 29)

Adding Kmer Bias (5 of 29)

Adding Kmer Bias (6 of 29)

Adding Kmer Bias (7 of 29)

Adding Kmer Bias (8 of 29)

Adding Kmer Bias (9 of 29)

Adding Kmer Bias (10 of 29)

Adding Kmer Bias (11 of 29)

Adding Kmer Bias (12 of 29)

Adding Kmer Bias (13 of 29)

Adding Kmer Bias (14 of 29)

Adding Kmer Bias (15 of 29)

Adding Kmer Bias (16 of 29)

Adding Kmer Bias (17 of 29)

Adding Kmer Bias (18 of 29)

Adding Kmer Bias (19 of 29)

Adding Kmer Bias (20 of 29)

Adding Kmer Bias (21 of 29)

Adding Kmer Bias (22 of 29)

Adding Kmer Bias (23 of 29)

Adding Kmer Bias (24 of 29)

Adding Kmer Bias (25 of 29)

Adding Kmer Bias (26 of 29)

Adding Kmer Bias (27 of 29)

Adding Kmer Bias (28 of 29)

Adding Kmer Bias (29 of 29)

2025-09-23 04:00:51.97439 : Finished Creation of Coverage Files!, 22.825 mins elapsed.

ArchR logging successful to : ArchRLogs/ArchR-a

In [16]:
pathToMacs2 <- '/home/gaojie/workspace/miniforge3/envs/Samtools/bin/macs2' ####Replace with your own Macs2 path

In [17]:
crfd <- addReproduciblePeakSet(
    ArchRProj = crfd, 
    groupBy = "Clusters", 
    pathToMacs2 = pathToMacs2
)

ArchR logging to : ArchRLogs/ArchR-addReproduciblePeakSet-203d86930ad23-Date-2025-09-23_Time-04-00-52.21445.log
If there is an issue, please report to github with logFile!

Calling Peaks with Macs2

2025-09-23 04:00:52.415505 : Peak Calling Parameters!, 0.003 mins elapsed.



    Group nCells nCellsUsed nReplicates nMin nMax maxPeaks
C1     C1   2138       1500           3  500  500   150000
C2     C2    248        248           2   52  196   124000
C3     C3   1731       1000           2  500  500   150000
C4     C4    779        540           2   40  500   150000
C5     C5   1748        540           2   40  500   150000
C6     C6   2673       1000           2  500  500   150000
C7     C7   6187       1000           2  500  500   150000
C8     C8     69         62           2   40   40    31000
C9     C9    265        265           3   67  113   132500
C10   C10    167        159           2   66   93    79500
C11   C11   1837       1443           3  443  500   150000
C12   C12   1551       1000           2  500  500   150000
C13   C13   1588        574           2   74  500   150000


2025-09-23 04:00:52.42426 : Batching Peak Calls!, 0.003 mins elapsed.

2025-09-23 04:00:52.443687 : Batch Execution w/ safelapply!, 0 mins elapsed.



R_zmq_msg_send errno: 4 strerror: 被中断的系统调用
R_zmq_msg_send errno: 4 strerror: 被中断的系统调用


2025-09-23 04:14:29.357748 : Identifying Reproducible Peaks!, 13.619 mins elapsed.



��中断的系统调用
[1] "/data/gaojie/Mida_collab/CorticalFolding/PeakCalls/Clusters/C8-reproduciblePeaks.gr.rds"
[1] "/data/gaojie/Mida_collab/CorticalFolding/PeakCalls/Clusters/C6-reproduciblePeaks.gr.rds"
[1] "/data/gaojie/Mida_collab/CorticalFolding/PeakCalls/Clusters/C7-reproduciblePeaks.gr.rds"


2025-09-23 04:14:54.717417 : Creating Union Peak Set!, 14.042 mins elapsed.

Converged after 10 iterations!

Plotting Ggplot!

2025-09-23 04:15:02.416877 : Finished Creating Union Peak Set (360809)!, 14.17 mins elapsed.



In [18]:
crfd <- addPeakMatrix(crfd)

ArchR logging to : ArchRLogs/ArchR-addPeakMatrix-203d83ab3ab7d-Date-2025-09-23_Time-04-15-02.428556.log
If there is an issue, please report to github with logFile!

2025-09-23 04:15:02.566239 : Batch Execution w/ safelapply!, 0 mins elapsed.

ArchR logging successful to : ArchRLogs/ArchR-addPeakMatrix-203d83ab3ab7d-Date-2025-09-23_Time-04-15-02.428556.log



In [19]:
saveArchRProject(crfd,outputDirectory = 'CorticalFolding')

Copying Arrow Files...

Copying Arrow Files (1 of 3)

Copying Arrow Files (2 of 3)

Copying Arrow Files (3 of 3)

Saving ArchRProject...

Loading ArchRProject...

Successfully loaded ArchRProject!


                                                   / |
                                                 /    \
            .                                  /      |.
            \\\                              /        |.
              \\\                          /           `|.
                \\\                      /              |.
                  \                    /                |\
                  \\#####\           /                  ||
                ==###########>      /                   ||
                 \\##==......\    /                     ||
            ______ =       =|__ /__                     ||      \\\
       \               '        ##_______ _____ ,--,__,=##,__   ///
        ,    __==    ___,-,__,--'#'  ==='      `-'    | ##,-/
        -,____,---'     

class: ArchRProject 
outputDirectory: /data/gaojie/Mida_collab/CorticalFolding 
samples(3): Adjacent Sulcus Distant
sampleColData names(1): ArrowFiles
cellColData names(19): Sample TSSEnrichment ... ReadsInPeaks FRIP
numberOfCells(1): 20981
medianTSS(1): 11.156
medianFrags(1): 37100