Ref:
https://www.archrproject.com/bookdown/creating-an-archrproject-1.html

In [1]:
library(ArchR)
library(tidyverse)
library(BSgenome.Hsapiens.UCSC.hg38)


                                                   / |
                                                 /    \
            .                                  /      |.
            \\\                              /        |.
              \\\                          /           `|.
                \\\                      /              |.
                  \                    /                |\
                  \\#####\           /                  ||
                ==###########>      /                   ||
                 \\##==......\    /                     ||
            ______ =       =|__ /__                     ||      \\\
       \               '        ##_______ _____ ,--,__,=##,__   ///
        ,    __==    ___,-,__,--'#'  ==='      `-'    | ##,-/
        -,____,---'       \\####\\________________,--\\_##,/
           ___      .______        ______  __    __  .______      
          /   \     |   _  \      /      ||  |  |  | |   _  \     
         /  ^  \    |  |_) 

In [2]:
setwd('/nfs/team205/heart/anndata_objects/8regions/ArchR')
getwd()

In [3]:
# before starting a project we must set the ArchRGenome and default threads for parallelization.
# Setting default genome to Hg38.
addArchRGenome("hg38")

Setting default genome to Hg38.



In [4]:
# Setting default number of Parallel threads to 16
addArchRThreads(threads = 10) 

Setting default number of Parallel threads to 10.



## Read in ArchR project

In [5]:
archr_project_path = '/nfs/team205/heart/anndata_objects/8regions/ArchR/project_output'
proj = loadArchRProject(path = archr_project_path, showLogo = FALSE)
proj

Successfully loaded ArchRProject!


           ___      .______        ______  __    __  .______      
          /   \     |   _  \      /      ||  |  |  | |   _  \     
         /  ^  \    |  |_)  |    |  ,----'|  |__|  | |  |_)  |    
        /  /_\  \   |      /     |  |     |   __   | |      /     
       /  _____  \  |  |\  \\___ |  `----.|  |  |  | |  |\  \\___.
      /__/     \__\ | _| `._____| \______||__|  |__| | _| `._____|
    



class: ArchRProject 
outputDirectory: /nfs/team205/heart/anndata_objects/8regions/ArchR/project_output 
samples(47): HCAHeart9508627_HCAHeart9508819
  HCAHeart9508628_HCAHeart9508820 ...
  HCAHeartST13180618_HCAHeartST13177115
  HCAHeartST13180619_HCAHeartST13177116
sampleColData names(1): ArrowFiles
cellColData names(45): Sample TSSEnrichment ... cell_state Clusters
numberOfCells(1): 144762
medianTSS(1): 8.677
medianFrags(1): 9588

In [6]:
table(proj$cell_state)


            aCM1             aCM2             aCM3             aCM4 
            3443             7420             1021             1993 
           Adip1            Adip2            Adip3  AVN_bundle_cell 
             896              423               55               26 
      AVN_P_cell                B         B_plasma          CD14+Mo 
              91              238              107              467 
         CD16+Mo        CD4+T_act      CD4+T_naive      CD8+T_cytox 
             964              659              683               94 
        CD8+T_em         CD8+T_te      CD8+T_trans               DC 
             475              283              608              189 
         EC1_cap    EC10_CMC-like          EC2_cap          EC3_cap 
            2763             2064              623              783 
      EC4_immune          EC5_art          EC6_ven  EC7_endocardial 
            1571             2980             2482             2838 
          EC8_ln              FB1

In [7]:
# remove cell types lower than 10 cells 
cells = names(which(table(proj$cell_state) >= 10))
cells = cells[cells != 'unclassified']
cells

idxSample <- BiocGenerics::which(proj$cell_state %in% cells)
cellsSample <- proj$cellNames[idxSample]
proj = proj[cellsSample, ]
proj


           ___      .______        ______  __    __  .______      
          /   \     |   _  \      /      ||  |  |  | |   _  \     
         /  ^  \    |  |_)  |    |  ,----'|  |__|  | |  |_)  |    
        /  /_\  \   |      /     |  |     |   __   | |      /     
       /  _____  \  |  |\  \\___ |  `----.|  |  |  | |  |\  \\___.
      /__/     \__\ | _| `._____| \______||__|  |__| | _| `._____|
    



class: ArchRProject 
outputDirectory: /nfs/team205/heart/anndata_objects/8regions/ArchR/project_output 
samples(47): HCAHeart9508627_HCAHeart9508819
  HCAHeart9508628_HCAHeart9508820 ...
  HCAHeartST13180618_HCAHeartST13177115
  HCAHeartST13180619_HCAHeartST13177116
sampleColData names(1): ArrowFiles
cellColData names(45): Sample TSSEnrichment ... cell_state Clusters
numberOfCells(1): 139835
medianTSS(1): 8.699
medianFrags(1): 9459

## Create Pseudo-bulk Replicates

In [10]:
# set fake sample name
proj$combined_Sample = "1"

In [11]:
proj = addGroupCoverages(ArchRProj = proj, groupBy = "cell_state", sampleLabels = "combined_Sample",
                        minReplicates = 3,
                        maxReplicates = 3,
                         minCells = 40,
                      maxCells = 500,
                        force = TRUE)
# maxReplicate = sample number

ArchR logging to : ArchRLogs/ArchR-addGroupCoverages-af8cf7cb3-Date-2022-12-14_Time-11-04-08.log
If there is an issue, please report to github with logFile!

aCM1 (1 of 60) : CellGroups N = 2

aCM2 (2 of 60) : CellGroups N = 2

aCM3 (3 of 60) : CellGroups N = 2

aCM4 (4 of 60) : CellGroups N = 2

Adip1 (5 of 60) : CellGroups N = 2

Adip2 (6 of 60) : CellGroups N = 2

Adip3 (7 of 60) : CellGroups N = 3

AVN_bundle_cell (8 of 60) : CellGroups N = 3

AVN_P_cell (9 of 60) : CellGroups N = 3

B (10 of 60) : CellGroups N = 2

B_plasma (11 of 60) : CellGroups N = 3

CD4+T_act (12 of 60) : CellGroups N = 2

CD4+T_naive (13 of 60) : CellGroups N = 2

CD8+T_cytox (14 of 60) : CellGroups N = 3

CD8+T_em (15 of 60) : CellGroups N = 2

CD8+T_te (16 of 60) : CellGroups N = 2

CD8+T_trans (17 of 60) : CellGroups N = 2

CD14+Mo (18 of 60) : CellGroups N = 2

CD16+Mo (19 of 60) : CellGroups N = 2

DC (20 of 60) : CellGroups N = 2

EC1_cap (21 of 60) : CellGroups N = 2

EC2_cap (22 of 60) : CellGroups N

## Calling peaks

In [None]:
# need to make sure MACS2 is the latest version
# at terminal
pip install --upgrade --force-reinstall MACS2

In [12]:
start_time <- Sys.time()

proj <- addReproduciblePeakSet(
    ArchRProj = proj, 
    groupBy = "cell_state", 
    threads = 10,
    reproducibility = "2",
)

end_time <- Sys.time()
end_time - start_time

Searching For MACS2..

Found with $path!

ArchR logging to : ArchRLogs/ArchR-addReproduciblePeakSet-afd9009b7-Date-2022-12-14_Time-11-59-16.log
If there is an issue, please report to github with logFile!

Calling Peaks with Macs2

2022-12-14 11:59:17 : Peak Calling Parameters!, 0.027 mins elapsed.



                            Group nCells nCellsUsed nReplicates nMin nMax
aCM1                         aCM1   3443        580           2   80  500
aCM2                         aCM2   7420        580           2   80  500
aCM3                         aCM3   1021        580           2   80  500
aCM4                         aCM4   1993        580           2   80  500
Adip1                       Adip1    896        580           2   80  500
Adip2                       Adip2    423        423           2   80  343
Adip3                       Adip3     55         55           3   40   40
AVN_bundle_cell   AVN_bundle_cell     26         26           3   15   22
AVN_P_cell             AVN_P_cell     91         79           3   40   40
B                               B    238        238           2   80  158
B_plasma                 B_plasma    107         84           3   40   40
CD4+T_act               CD4+T_act    659        580           2   80  500
CD4+T_naive           CD4+T_naive    6

2022-12-14 11:59:17 : Batching Peak Calls!, 0.027 mins elapsed.

2022-12-14 11:59:17 : Batch Execution w/ safelapply!, 0 mins elapsed.

2022-12-14 12:13:32 : Identifying Reproducible Peaks!, 14.267 mins elapsed.

2022-12-14 12:14:41 : Creating Union Peak Set!, 15.417 mins elapsed.

Converged after 12 iterations!

Plotting Ggplot!

2022-12-14 12:14:57 : Finished Creating Union Peak Set (429828)!, 15.698 mins elapsed.



Time difference of 15.70521 mins

## Add Peak matrix

In [13]:
proj = addPeakMatrix(proj)

ArchR logging to : ArchRLogs/ArchR-addPeakMatrix-af418dfe32-Date-2022-12-14_Time-12-14-57.log
If there is an issue, please report to github with logFile!

2022-12-14 12:14:58 : Batch Execution w/ safelapply!, 0 mins elapsed.

ArchR logging successful to : ArchRLogs/ArchR-addPeakMatrix-af418dfe32-Date-2022-12-14_Time-12-14-57.log



In [14]:
saveArchRProject(ArchRProj = proj, outputDirectory = "project_output", load = FALSE)

Saving ArchRProject...

